Migrating from the extension to the python library
Previous versions of pgai vectorizer used an extension to provide the vectorizer
Previous versions of pgai vectorizer used an extension to provide the vectorizer functionality. We have removed the need for the extension and put the vectorizer code into the pgai python library. This change allows the vectorizer to be used on more PostgreSQL cloud providers (AWS RDS, Supabase, etc.) and simplifies the installation and upgrade process.
Versions that used the extension:
aiextension version < 0.10.0pgaipython library version < 0.10.0
Migrating from the extension to the python library
Section titled “Migrating from the extension to the python library”We made this change in a way that will allow current users of the vectorizer to continue using the feature without interruption, but they will have to modify how they upgrade vectorizer functionality in the future.
The upgrade process is as follows:
- Upgrade the extension: Run ALTER EXTENSION ai UPDATE TO ‘0.10.1’ to detach the vectorizer catalog tables and functions from the extension. This leaves them in your database in the ai schema, and the vectorizer will continue to work.
- Upgrade (or install) the pgai python library: Install pgai version
>0.10.0. This can be done withpip install -U pgaior via yourrequirements.txtor similar dependency file. - Manage the vectorizer with the python library: You can then manage the vectorizer from the python library or cli by using
pgai install -d DB_URLas described in the new python-library-based workflow. - (Optional) Remove the extension: If you are not using Timescale Cloud and you don’t use the model calling capabilities of pgai, you can then remove the pgai extension from your database.
If you are using Timescale Cloud, you will need to keep the extension installed to use the vectorizer cloud functions.
Changes to the create_vectorizer API.
Section titled “Changes to the create_vectorizer API.”During the transition to the python library, some APIs changed for the ai.create_vectorizer call. On a high level:
- The
ai.create_vectorizercall now requires a top-levelloadingargument. This allows us more flexibility in how we load data into the vectorizer. For example, we can now load data from file using theloading => loading_uri()function. - The destination where embeddings are stored is now configured via the
destinationtop-level argument. This was done to allow us to support more types of schema design for storing embeddings. For example, we can now store embeddings in a column of a table via thedestination => ai.destination_column()function in addition to the previous behavior of using a separate table via thedestination => ai.destination_table()function.
These changes are automatically applied to existing vectorizers. But, when creating new vectorizers, developers should be aware of the following changes:
ai.create_vectorizernow requires aloading =>argument. Previous behavior is provided via theloading => loading_column()function.ai.create_vectorizerno longer takesdestination,target_table,target_schema,view_schema,view_nameas arguments configure these options via the newdestination => ai.destination_table()function instead.- ai.chunking_character_text_splitter and ai.chunking_recursive_character_text_splitter no longer take a
chunk_columnargument, that column name is now provided vialoading => loading_column()function instead.
Commons issues:
Section titled “Commons issues:”Old extension still installed
Section titled “Old extension still installed”If you see something like
psycopg.errors.DuplicateTable: relation "vectorizer" already existsCONTEXT: SQL statement "create table ai.vectorizerWhen trying to run pgai install, it likely means that you have an old version (<0.10) of the extension installed.
Make sure to run ALTER EXTENSION ai UPDATE TO '0.10.1' first!