TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
TigerData logo

Products

Time-series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Tutorials Changelog Success Stories Time-series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2026 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Copy as HTML

Open in ChatGPT

Open in Claude

Open in v0

Avthar Sewrathan

By Avthar Sewrathan

6 min read

Apr 30, 2025

Announcements & ReleasesAIPostgreSQL, Blog

Table of contents

01 Why We Built Pgai Vectorizer for Postgres02 The Change: Moving From Extension-Only to Python CLI and Library03 Pgai Vectorizer Works With Any Postgres Database04 How to Use Pgai Vectorizer: A Quick Refresher05 Get Started With Pgai Vectorizer Today

Spend more time improving your AI app and less time managing a database.

Start building

We Listened: Pgai Vectorizer Now Works With Any Postgres Database

We Listened: Pgai Vectorizer Now Works With Any Postgres Database
Announcements & Releases
Avthar Sewrathan

By Avthar Sewrathan

6 min read

Apr 30, 2025

Table of contents

01 Why We Built Pgai Vectorizer for Postgres02 The Change: Moving From Extension-Only to Python CLI and Library03 Pgai Vectorizer Works With Any Postgres Database04 How to Use Pgai Vectorizer: A Quick Refresher05 Get Started With Pgai Vectorizer Today

Copy as HTML

Open in ChatGPT

Open in Claude

Open in v0

Spend more time improving your AI app and less time managing a database.

Start building

TL;DR: We're excited to announce that pgai Vectorizer—the tool for robust embedding creation and management—is now available as a Python CLI and library, making it compatible with any Postgres database, whether it be self-hosted Postgres or cloud-hosted on Timescale Cloud, Amazon RDS for PostgreSQL, or Supabase.

This expansion comes directly from developer feedback requesting broader accessibility while maintaining the Postgres integration that makes pgai Vectorizer the ideal solution for production-grade embedding creation, management, and experimentation. To get started, head over to the pgai GitHub.

Why We Built Pgai Vectorizer for Postgres

When we first launched pgai Vectorizer, we aimed to simplify vector embedding management for developers building AI systems with Postgres. We heard the horror stories of developers struggling with complex ETL (extract-transform-load) pipelines, embedding synchronization issues, and the constant battle to keep embeddings up-to-date when source data changes. Teams were spending more time maintaining infrastructure than building useful AI features.

Many developers found themselves cobbling together custom solutions involving message queues, Lambda functions, and background workers just to handle the embedding creation workflow. Others faced the frustration of stale embeddings that no longer matched their updated content, leading to degraded search quality and hallucinations in their RAG applications.

Pgai Vectorizer solved these problems with a declarative approach that automated the entire embedding lifecycle with a single SQL command, similar to how you'd create an index in Postgres. The tool resonated with developers and quickly gained traction among AI builders. However, we soon started hearing a consistent piece of feedback that would shape our next steps.

The Change: Moving From Extension-Only to Python CLI and Library

After our initial launch, we received consistent feedback from developers who wanted to use pgai Vectorizer with their existing managed Postgres databases. While our extension-based approach worked great for self-hosted Postgres and Timescale Cloud, users on platforms like Amazon RDS for PostgreSQL, Supabase, and other managed database services couldn't use pgai Vectorizer unless their cloud provider chose to make it available.

image
Source
image
Source
image
Source

Requests for pgai Vectorizer support on Supabase, Azure PostgreSQL, and Amazon RDS.

We knew we needed to make pgai Vectorizer more accessible without compromising its seamless Postgres integration. The solution? Repackaging our core functionality as a Python CLI (command-line interface) and library that can work with any Postgres database while maintaining the same robustness and "set it and forget it" simplicity.

This approach gives developers the best of both worlds: the powerful vectorization capabilities of pgai Vectorizer with the flexibility to use their existing database infrastructure, regardless of provider. The Python library handles the creation of database objects that house the pgai Vectorizer internals, and provides a SQL API that handles loading data, creating embeddings, and synchronizing changes, all while writing the results back to your Postgres database.

The library maintains all the core functionality that made pgai Vectorizer valuable:

  • Embedding creation and management: Automatically create and synchronize vector embeddings from Postgres data and S3 documents. Embeddings update automatically as data changes.
  • Production-ready out-of-the-box: Supports batch processing for efficient embedding generation, with built-in handling for model failures, rate limits, and latency spikes.
  • Experimentation and testing: Easily switch between embedding models, test different models, and compare performance without changing application code or manually reprocessing data.
  • Plays well with pgvector and pgvectorscale: Once your embeddings are created, use them to power vector and semantic search with pgvector and pgvectorscale. Embeddings are stored in the pgvector data format. 

What this means for existing users: 

  • Timescale Cloud customers: Existing vectorizers running on Timescale Cloud will continue to work as is, so no immediate action is necessary. We encourage you to use the new pgai Python library to create and manage new vectorizers. To do so, you have to upgrade to the latest version of both the pgai extension in Timescale Cloud and the pgai Python library. Upgrading the extension decouples the vectorizer-related database objects from the extension, therefore allowing them to be managed by the Python library. Pgai Vectorizer remains in Early Access on Timescale Cloud. See this guide for details and instructions on upgrading and migrating.
  • Self-hosted users: Existing self-hosting vectorizers will also continue to work as is, so no immediate action is required. If you already have the pgai extension installed, you’ll need to upgrade to version 0.10.1. Upgrading the extension decouples the vectorizer-related database objects from the extension, therefore allowing them to be created and managed by the Python library. See this guide for self-hosted upgrade and migration details and instructions.

What this means for new users: Whether you use pgai Vectorizer on Timescale Cloud or self-hosted, this change means a simplified installation process and more flexibility—you now have tighter integrations between pgai Vectorizer and your search and RAG backends in your AI applications. Self-hosted users no longer need to install the pgai extension to use pgai Vectorizer. Timescale Cloud customers will continue to get the pgai extension auto-installed for them. To try pgai Vectorizer for yourself, here’s how you can get started.

Pgai Vectorizer Works With Any Postgres Database

The new Python library implementation of pgai Vectorizer works with virtually any Postgres database, including:

  • Timescale Cloud
  • Self-hosted Postgres
  • Amazon RDS for PostgreSQL
  • Supabase
  • Google Cloud SQL for PostgreSQL
  • Azure Database for PostgreSQL
  • Neon PostgreSQL
  • Render PostgreSQL
  • DigitalOcean Managed Databases
  • Any other self-hosted or managed Postgres service running PostgreSQL 15 and later.

The new implementation addresses one of our most requested features from the community. Users were actively building AI applications with these managed services, but couldn't take advantage of pgai Vectorizer's powerful embedding management capabilities.

How to Use Pgai Vectorizer: A Quick Refresher

A standout feature of the new Python library is its enhanced support for document processing directly from cloud storage. 

With the expanded Amazon S3 integration, you can now seamlessly load documents and generate embeddings based on file URLs stored in your Postgres table. Pgai Vectorizer automatically loads and parses each into an LLM-friendly format like Markdown, then generates the required chunks for embedding creation, all according to your specification.

For document vectorization, we've included support for parsing multiple formats, including PDF, DOCX, XLSX, HTML, images, and more using IBM Docling, which provides advanced document understanding capabilities. This makes it easy to build powerful document search and retrieval systems without leaving the Postgres ecosystem.

Getting started with the pgai Vectorizer Python library is straightforward. Install pgai on your database via:

pip install pgai
pgai install -d postgresql://postgres:postgres@localhost:5432/postgres

Afterward, your database is enhanced with pgai’s capabilities. Here's a simple example of how to create a vectorizer for processing text data from a database column named ‘text’:

SELECT ai.create_vectorizer(
    'wiki'::regclass,
    if_not_exists => true,
    loading => ai.loading_column(column_name=>'text'),
    embedding => ai.embedding_openai(model=>'text-embedding-ada-002', dimensions=>'1536'),
    destination => ai.destination_table(view_name=>'wiki_embedding')
)

For document processing, you can use this configuration, which shows a document metadata table in PostgreSQL with references to data in Amazon S3:

-- Document source table
CREATE TABLE document (
    id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    uri TEXT NOT NULL,
    content_type TEXT,
    created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
    owner_id INTEGER,
    access_level TEXT,
    tags TEXT[]
);

-- Example with rich metadata
INSERT INTO document (title, uri, content_type, owner_id, access_level, tags) VALUES
    ('Product Manual', 's3://my-bucket/documents/product-manual.pdf', 'application/pdf', 12, 'internal', ARRAY['product', 'reference']),
    ('API Reference', 's3://my-bucket/documents/api-reference.md', 'text/markdown', 8, 'public', ARRAY['api', 'developer']);

SELECT ai.create_vectorizer(
    'document'::regclass,
    loading => ai.loading_uri(column_name => 'uri'),
    chunking => ai.chunking_recursive_character_text_splitter(
        chunk_size => 700,
        separators => array[E'\n## ', E'\n### ', E'\n#### ', E'\n- ', E'\n1. ', E'\n\n', E'\n', '.', '?', '!', ' ', '', '|']
    ),
    embedding => ai.embedding_openai('text-embedding-3-small', 768),
    destination => ai.destination_table('document_embeddings')
);

Run the worker via:

pgai vectorizer worker -d 
postgresql://postgres:postgres@localhost:5432/postgres

And watch the magic happen as pgai creates vector embeddings for your source data.

Get Started With Pgai Vectorizer Today

We're excited to see what you'll build with the new pgai Vectorizer, whether you're creating semantic search, RAG, or next-gen agentic applications.

Check out the GitHub repository to explore capabilities and getting started guides.

As you can tell by this post, we really value community feedback. If you encounter any issues or have suggestions for improvements, please open an issue on GitHub or join our community Discord. Your input will help shape the future development of pgai Vectorizer as we continue to enhance its capabilities.

Related posts

TimescaleDB 2.22 & 2.23 – 90x Faster DISTINCT Queries, Postgres 18 Support, Configurable Columnstore Indexes, and UUIDv7 for Event-Driven Analytics

TimescaleDB 2.22 & 2.23 – 90x Faster DISTINCT Queries, Postgres 18 Support, Configurable Columnstore Indexes, and UUIDv7 for Event-Driven Analytics

Announcements & ReleasesTimescaleDB

Nov 26, 2025

TimescaleDB 2.22 & 2.23: 90× faster DISTINCT queries, zero-config hypertables, UUIDv7 partitioning, Postgres 18 support, and configurable columnstore indexes.

Read more

The Big Shift in MCP: Why AI Guides Will Replace API Wrappers

The Big Shift in MCP: Why AI Guides Will Replace API Wrappers

Announcements & ReleasesAI

Nov 25, 2025

MCP servers need judgment, not just API access. AI Guides embed expert patterns into portable MCP tools, preventing bad engineering decisions at scale.

Read more

Stay updated with new posts and releases.

Receive the latest technical articles and release notes in your inbox.

Share

Get Started Free with Tiger CLI