TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Guide to PostgreSQL PerformanceHow to Reduce Bloat in Large PostgreSQL TablesDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)Top PostgreSQL Drivers for PythonWhen to Consider Postgres PartitioningGuide to PostgreSQL Database OperationsUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

AI and vector fundamentals

A Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong Abstraction

Cosine similarity

A Guide to Cosine SimilarityImplementing Cosine Similarity in Python

Vector databases

Vector Database Options for AWSVector Store vs. Vector Database: Understanding the Connection

Tutorials

How to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector Search

Hybrid search & filtering

PostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A Tutorial

Image search

Building an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector

Semantic search

Fundamentals

Understanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchWhen Should You Use Full-Text Search vs. Vector Search?

Vectorscale

Fundamentals

Understanding DiskANN

Schema design

Streaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector Data
Vector Database Basics: HNSW

Benchmarks

Pgvector vs. Pinecone: Vector Database Performance and Cost Comparison

Fundamentals

HNSW vs. DiskANN
Nearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They Work

AI query interfaces

Text-to-SQL: A Developer’s Zero-to-Hero Guide

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Sep 10, 2024

Using Pgvector With Python

Explore for free

AI development for all developers, not just AI experts. Build your AI app with Tiger Cloud today.

A high-tech Postgres elephant and a python side by side

Written by Haziqa Sajid

Have you ever wondered how artificial intelligence (AI) systems can understand our language? The key concept is embeddings, where words and phrases are converted into high-dimensional vectors that capture their meanings and relationships. These vectors allow computers to perform mathematical operations on language data. The challenge then becomes storing these high-dimensional vectors efficiently.

The good news is that you can use an old friend, PostgreSQL, and transform it into a full-fledged vector database with the pgvector extension. Adding Python to the mix allows you to build Python applications with machine learning elements. 

For those who aren’t sure about how to achieve this, here’s what you’ll learn in this guide:

  • What are vector databases and pgvector

  • How to install and use pgvector in Python

  • How to leverage PostgreSQL for your AI applications

Pgvector and Python

Let’s start by explaining the concepts required to understand pgvector. Then, we will provide the technical details to get up and running with pgvector in Python.

What is pgvector (and how does it make PostgreSQL a vector database)

A vector database stores data as high-dimensional vectors, mathematical representations of features or attributes. Vectors, with dimensions ranging from tens to thousands, are derived from raw data like text, images, and even videos through embedding algorithms. These databases are designed for fast and precise similarity search and enhanced search based on semantic meaning.

Editor’s Note: Check this article to learn the difference between vector search and semantic search.

PostgreSQL does not have native vector capabilities (as of PostgreSQL 16), and pgvector is designed to fill this gap. Pgvector is an open-source vector similarity search extension for PostgreSQL. Here's how it enhances PostgreSQL, making it a proficient vector database:

  1. Vector and traditional store combined: With pgvector, you can store both vector and traditional data, ensuring compatibility with traditional database features while providing newer capabilities.

  2. Advanced search: Pgvector stores data as vectors, enabling various nearest-neighbor search algorithms for exact or approximate searches (the latter are used when the search space is extensive), such as L2, inner product, and cosine distance. This facilitates the efficient finding of the most similar content based on a given query. Here are all the other search algorithms:

image

Supported operators in pgvector

3. Integration with PostgreSQL features: Pgvector seamlessly integrates with standard PostgreSQL features, enhancing its capabilities, which include the following:

  • ACID compliance: ensuring transactional integrity

  • Point-in-time recovery: enabling database restoration to specific moments

  • JOIN support: facilitating data combination from multiple tables

Enough theory; let’s get into practicality. 

Setting Up Pgvector for Python

In the previous section, we built an understanding of pgvector. This section will set the foundation for seeing it in action using Python. 

Set up a PostgreSQL database

If you haven't already, start by installing a PostgreSQL database. Then, install psql on your OS to connect to it more easily. Psql, a terminal-based interactive program from PostgreSQL, facilitates direct interaction with the PostgreSQL server. This interface allows users to execute SQL statements and manage various database objects seamlessly. 

If you have installed PostgreSQL correctly, executing the following command in the terminal will provide version information:

image

PostgreSQL installation verification

Note: You must manually add the path to the environment variables on Windows.

Now, we are ready to connect to the database.

Connect to database

To connect to the database using psql, we will run the following command in the terminal:

> psql -U postgres

It will prompt for the password, and upon successful authentication, you will be connected to the database.

image

Connecting to the database using psql

Adding the pgvector extension

Let’s add the pgvector extension and see the relevant requirements. Later, we will walk you through how to use pgvector with Python.

Installing pgvector On Windows, ensure that C++ support in Visual Studio is installed. If not, install it from here and check the Desktop development with C++. 

image

C++ support in Visual Studio

After this, run the following command in the Command prompt. Make sure it is open in administrator mode.

> call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvars64.bat

Note: The specific path will vary based on the version and edition of Visual Studio.

In the next steps, we will set the root and clone the pgvector repository. Then we will use nmake to build:

> set "PGROOT=C:\Program Files\PostgreSQL\16" > cd %TEMP% > git clone --branch v0.7.0 https://github.com/pgvector/pgvector.git > cd pgvector > nmake /F Makefile.win > nmake /F Makefile.win install

Note: nmake is Microsoft's implementation of the make tool. It comes packaged with the C++ development kit. If the path is not recognized in the command prompt, ensure it is added.

After success, using psql run the following query:

postgres=# CREATE EXTENSION VECTOR;

To ensure pgvector is installed, run this query:

postgres=# \dx

Importing pgvector to Python

To use the pgvector extension in your Python projects, follow these steps:

1. Install the pgvector package in your Python environment. This can be done using pip:

%pip install pgvector 2. Once installed, import the pgvector package in your Python script:

import pgvector

3. Additional packages are required to connect to a PostgreSQL database. A commonly used package is psycopg2.

import psycopg2

4. Depending on your specific requirements, you may need to install and import other packages. For example:

  •    For data manipulation and analysis, you might use pandas:

%pip install pandas

import pandas as pd

  •    For object-relational mapping (ORM), you might use SQLAlchemy:

%pip install SQLAlchemy from sqlalchemy import create_engine These steps will give you the tools and packages to work with pgvector and PostgreSQL in your Python environment. 

Using Pgvector in Python

This section will explore using the pgvector extension in PostgreSQL with Python, from connecting to the database to querying. We set up all the installations and imports, so let’s get started. 

Setting up the connection 

First, we must connect to our PostgreSQL database using psycopg2. Here's how you can do it:

import psycopg2 conn = psycopg2.connect( host="localhost",  database="your_database_name",  user="your_username",  password="your_password") In this snippet, we import the psycopg2 library and use it to connect to the PostgreSQL server. Replace your_database_name, your_username, and your_password with your actual database credentials.

Creating a cursor 

Next, create a cursor to execute SQL commands. 

cur = conn.cursor()

The cursor is used to execute SQL commands and fetch results. We can also ensure the connection by the following “hello world” example.

cur = conn.cursor() cur.execute("SELECT 'hello world'") print(cur.fetchone())

Creating a pgvector table 

  • Basic example: Start by creating a simple table with an embedding vector. In the following snippet, we define and execute an SQL command to create a table named vectors with an embedding column of dimension three (3).

# Define a SQL command to create a table create_table_command = """ CREATE TABLE vectors (     id bigserial primary key,     embedding vector(3)  -- specify the dimension of the vector ); """ # Execute the SQL command cur.execute(create_table_command) # Commit the transaction conn.commit() We can confirm the creation of a table by querying \dt in psql:

image

Psql service to ensure the table creation

  • Example with more features: Let's create a table with more features for language model training. This command creates a table named embeddings with additional columns such as label, url, content, tokens, and an embedding vector of dimension three (3).

# Define a SQL command to create a more complex table create_table_command = """ CREATE TABLE embeddings (     id bigserial primary key,     label text,     url text,     content text,     tokens integer,     embedding vector(3)  -- specify the dimension of the vector ); """ # Execute the SQL command cur.execute(create_table_command) # Commit the transaction conn.commit()

Data insertion 

Let's insert some data in the newly created table. In the real world, the embeddings are generated using a model trained on a vast corpus of data. These models understand the semantics behind the words. 

For this example, we resort to a random array. We use the NumPy library to generate and insert random vectors into the embeddings table. Each entry includes a label, URL, content, token count, and a vector.

import numpy as np # Define the SQL command for inserting data insert_command = """ INSERT INTO embeddings (label, url, content, tokens, embedding) VALUES (%s, %s, %s, %s, %s); """ # Create sample data data = [     ("label1", "http://example.com/1", "The article on dogs", 100, np.random.rand(3).tolist()),     ("label2", "http://example.com/2", "The article on cats", 150, np.random.rand(3).tolist()),     ("label3", "http://example.com/3", "The article on cars", 200, np.random.rand(3).tolist()),     ("label4", "http://example.com/4", "The article on books", 250, np.random.rand(3).tolist()),     ("label5", "http://example.com/5", "The article on embeddings", 300, np.random.rand(3).tolist()) ] # Insert data into the table for record in data:     cur.execute(insert_command, record) # Commit the transaction conn.commit()

Let's check the data stored in the embeddings table. We will use fetchall() to retrieve the data from the table.

fetch_contents = """ select * from embeddings """ # Execute the SQL command cur.execute(fetch_contents) # Commit the transaction cur.fetchall()

Querying similar objects 

Retrieve the top five similar objects from the database using nearest neighbors. In this example, we generate a random query vector to find the top five similar entries in the embeddings table based on vector similarity. Note the explicit type cast::vector to ensure the comparison is understood by PostgreSQL.

# Generate a random query vector query_vector = np.random.rand(3).tolist()  # Example query vector print(query_vector) # Define the SQL command to retrieve similar objects retrieve_command = """ SELECT content FROM embeddings ORDER BY embedding <=> %s::vector LIMIT 5; """ # Execute the command with the query vector cur.execute(retrieve_command, (query_vector,)) # Fetch the results similar_objects = cur.fetchall() # Print the similar objects for obj in similar_objects:     print(obj[0])

Closing the connection 

Finally, close the cursor and connection. Closing the cursor and connection ensures that all resources are properly released.

cur.close() conn.close()

Supercharged Pgvector With Python

So far, we've explored pgvector for vector storage and similarity search in PostgreSQL. But you can experience a supercharged version of pgvector: with Timescale Cloud, developers can use pgvector alongside pgvectorscale and pgai, two open-source extensions that turn PostgreSQL into an easy-to-use and high-performance vector database, plus a fully managed cloud database experience.

  • Timescale Cloud enhances PostgreSQL for AI applications by packaging pgvector as part of its offering. It retains all the capabilities of pgvector, such as the vector data type and indexing methods like HNSW and IVFFlat, making it a complement rather than a replacement for pgvector. 

  • Timescale Cloud introduces new features on top of pgvector, such as time-based vector search. This makes it easy to migrate your existing pgvector deployment and take advantage of additional features for scale. You also have the flexibility to create different index types suited to your needs.

  • With pgvectorscale, Timescale Cloud accelerates approximate nearest-neighbor (ANN) searches on large-scale vector datasets by incorporating a cutting-edge ANN index inspired by the DiskANN algorithm (StreamingDiskANN). Also included is pgai, which brings more AI workflows to PostgreSQL, making it easier for developers to build search and retrieval-augmented generation (RAG) applications. This complete stack will make the development of AI applications faster, more efficient, and scalable. 

Check out this blog post to learn more about pgai and how it brings embedding and generation models closer to the database. We also explained how pgvectorscale makes PostgreSQL faster and cheaper than other specialized vector databases, like Pinecone.

Let’s set up Timescale Cloud with Python and see it in action.

Set up Timescale Cloud with Python

  1. Installations: Ensure you have installed the required Python packages, timescale_vector, and python-dotenv, using the following command:

%pip install timescale_vector python-dotenv

2. Import required libraries: Import the necessary libraries in your Python script as shown below:

from dotenv import load_dotenv, find_dotenv import os from timescale_vector import client import uuid from datetime import datetime, timedelta 3. Setting up Timescale: To start, sign up, create a new database, and follow the provided instructions. For more information, refer to the Getting Started with Timescale guide. 

After signing up, connect to the Timescale database by providing the service URI, which can be found under the service section on the dashboard. The URI will look something like this:

postgres://tsdbadmin:@.tsdb.cloud.timescale.com:/tsdb?sslmode=require

image

Configuration dashboard for connecting to the service

The password can be created by going to project settings and clicking Create credentials. 

image

Project settings page for creating credentials

4. Set Up .env File: Create a .env file in your project directory and include your PostgreSQL credentials in the following format:

TIMESCALE_SERVICE_URL=your_service_url_here 

5. Load PostgreSQL Credentials: Load your PostgreSQL credentials from the .env file:

_ = load_dotenv(find_dotenv(), override=True) service_url = os.environ['TIMESCALE_SERVICE_URL']

Using Timescale Cloud With Python

A vector can be created with the client that takes the following arguments:

Name

Description

service_url

Timescale service URL / connection string

table_name

Name of the table to use for storing the embeddings

num_dimensions

Number of dimensions in the vector.

Initialize the vector client with your service URL, table name, and the number of dimensions in the vector:

vec = client.Sync(service_url, "embeddings", 3)

Create tables and insert data 

Now, we will create tables for the collection and insert data. The data is represented as follows: 

  • UUID serves as a unique identifier for the embedding. 

  • Metadata about the embedding is stored in JSON format.

  • Text representing the embedding.

  • Embedding itself is also included.

vec.create_tables()

vec.upsert([     (uuid.uuid1(), {"service": "pgvectors"}, "Pgvector is an open-source vector similarity search extension of PostgreSQL.", [1.0, 1.3, 2.0]),     (uuid.uuid1(), {"service": "timescale_vector"}, "Timescale Vector amplifies Pgvector's capabilities by integrating it into PostgreSQL for AI applications, adding new functionalities like the Timescale Vector index and time-based vector search, while also enhancing scalability with advanced ANN indexing inspired by the DiskANN algorithm.", [1.0, 10.8, 3.0]), ])

Let's ensure that the data is updated in the table as well. We can see it in the Explorer tab in the Timescale console:

image

Ensuring data insertion in the Timescale console

Query similar objects 

Timescale Cloud provides a very simple API for searching similar vectors. As mentioned before, these vectors are generated by specialized models when given data. But in this example, it's just a random array being queried.

vec.search([2.0, 9.4, 3.0]) Here’s the result:

>> [[UUID('5773dc0f-20d6-11ef-8777-dae2664cc367'),   {'service': 'timescale_vector'},   "Timescale Vector amplifies Pgvector's capabilities by integrating it into PostgreSQL for AI applications, adding new functionalities like the Timescale Vector index and time-based vector search, while also enhancing scalability with advanced ANN indexing inspired by the DiskANN algorithm.",   array([ 1. , 10.8,  3. ], dtype=float32),   0.006858884829482381], [UUID('5773dc0e-20d6-11ef-a334-dae2664cc367'),   {'service': 'pgvectors'},   'Pgvector is an open-source vector similarity search extension of PostgreSQL.',   array([1. , 1.3, 2. ], dtype=float32),   0.2235118982678087]]

Conclusion

In the article, we briefly understood the concept of vector databases and elaborated on pgvector. The article covered the installation of the extension, and we used Python to play around with it. Later, we used Timescale Cloud—which includes pgvector, pgvectorscale, and pgai—to leverage its faster search capabilities.   

Utilizing pgvector with Python empowers the development of robust machine-learning applications. With pgai and pgvectorscale, Timescale Cloud elevates PostgreSQL vector databases to new heights, offering enhanced capabilities and performance at scale. 

Pgai and pgvectorscale are both open source under the PostgreSQL License and available for you to use in your AI projects today. You can find installation instructions on the pgai and pgvectorscale GitHub repositories (Git ⭐s welcome!).  You can also access them on any database service on Timescale’s cloud PostgreSQL platform.

On this page

    Explore for free

    AI development for all developers, not just AI experts. Build your AI app with Tiger Cloud today.