---
title: "The Postgres Developer's Guide to Vector Index Tradeoffs"
published: 2026-05-26T10:23:55.000-04:00
updated: 2026-05-26T10:23:55.000-04:00
excerpt: "Vector search becomes an index design problem as your data grows. Here's how to make the right call without leaving Postgres."
tags: pg_textsearch, PostgreSQL, PostgreSQL Extensions
authors: Hien Phan
---

> **TimescaleDB is now Tiger Data.**

Vector search in Postgres usually starts simply. You add an embedding column, run a nearest-neighbor query, and order by distance.

```sql
SELECT content
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'
LIMIT 10;
```

For a while, that is enough.

That simplicity breaks down as the workload becomes real. The table grows, filters become part of the query path, and recall starts affecting user experience. The index still has to stay fast while new data keeps arriving.

That is when vector search stops being a query pattern and becomes an index design problem.

Most vector search advice starts with algorithms: HNSW, IVFFlat, DiskANN, recall, latency. That is useful, but incomplete once vector search lives inside Postgres. Postgres developers do not choose algorithms in the abstract. They choose indexes under constraints: memory, recall, write volume, filter selectivity, and the operational cost of adding another system.

The right index is not the best ANN algorithm in isolation. It is the index that fits the constraint your workload hits first: memory, recall, writes, or filters.

This article maps those constraints to real Postgres index choices: what each one costs, when it becomes the binding variable, and which index type it points to.

## When exact search stops being enough

Exact k-nearest neighbor search compares the query vector against every vector in the table. It gives perfect recall because it does not approximate the result set. It also scales linearly with the number of rows.

That tradeoff is fine early on. Exact search is the right starting point when the dataset is small, the query rate is low or you are still validating whether embeddings work for your application. It also gives you a useful baseline because the results are not affected by index tuning.

The problem shows up when the table grows into millions or tens of millions of vectors, or when users expect low latency. At that point, scanning every vector for every query becomes too expensive.

Approximate nearest neighbor search, or ANN search, exists for this moment. ANN indexes organize vectors ahead of time so the database can search only the most promising candidates instead of scanning the full table. The index gives up a small, controlled amount of accuracy in exchange for much lower query latency.

That is the first tradeoff: ANN is not magic. You are deciding how much recall you can afford to exchange for speed, memory efficiency, and lower infrastructure cost.

## The four constraints behind every vector index

The right vector index is usually decided by four constraints: whether the working set fits in memory, how much recall the application needs, how often the data changes and how selective the surrounding filters are.

### Memory

Memory is fast and low-latency, but expensive. SSDs are cheaper and can still work well for many workloads. Object storage is cheaper still, but its higher latency makes it a poor fit for index designs that require many small random reads.

Vector indexes do not all touch storage the same way. Graph-based indexes follow connections between vectors through the index. That access pattern works very well when the graph is in memory and becomes more expensive when each hop risks a disk read. Partitioning-based indexes group vectors into regions and scan the most promising ones, which can be more memory efficient but usually requires more tuning.

In Postgres, the practical question is whether the index working set fits comfortably in `shared_buffers` and the operating system page cache. If it does, an in-memory graph index can perform very well. If it does not, the storage access pattern starts to dominate the design.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2026/05/digram-A.png)

__Storage changes the index tradeoff. Graph-based indexes perform best when traversal stays hot in memory. Disk-aware and partition-based designs become increasingly important as the working set migrates to SSD or object storage.__

### Recall

Recall measures how close approximate search gets to exact search. Higher recall usually costs more because the index has to inspect more candidates, traverse more of a graph or scan more partitions.

For some applications, slightly lower recall is acceptable if latency improves dramatically. For others, especially RAG systems where missing the right document leads to a bad answer, recall is part of product quality.

The honest way to set this tradeoff is to measure against your own data. Embedding model, dimensionality, filters, and query distribution all affect the result.

### Writes

Some vector workloads are mostly read-heavy. You build the index, query it many times, and update it occasionally. Other workloads change constantly. New documents arrive, old ones are deleted, embeddings are regenerated.

A structure optimized for high-recall reads may have higher write or maintenance costs. A lighter-weight index may be easier to update but require more tuning to reach the same recall.

### Filters

Real Postgres queries rarely search vectors alone. A query might ask for the nearest vectors, but only within a specific customer, time range, tenant or category.

Those predicates change the shape of the search problem. If a filter is highly selective, it may be cheaper to narrow the rows first and then search. If the filter is broad, it may be better to use the vector index first and apply the filter after. The right plan depends on the data distribution, the selectivity of the filter, and the index available to the planner.

That is one reason vector benchmarks can vary so much. Vector search without filters is not the same workload as vector search inside a real application query.

That is why there is no universal best vector index. There is only the index that best matches the shape of your workload.

## The ANN algorithms behind Postgres index choices

The point of understanding ANN algorithms is not to memorize every paper. It is to understand why each index behaves differently as your workload changes. Most of the indexes discussed below fall into two broad patterns.

Graph-based indexes, such as HNSW and DiskANN-style designs, search by moving through connections between nearby vectors. Spatial partitioning indexes, such as IVFFlat and SPANN-style designs, divide the vector space into regions and search the most promising ones.

That distinction matters because graph-based indexes tend to optimize for high recall when the working set is hot, while partitioning-based indexes often trade more tuning for lower memory and maintenance overhead.

Each algorithm below is best understood as a response to a specific pressure: memory, write cost, disk access, or update churn.

### HNSW: When the index fits in memory

Your dataset fits in memory and you need high recall at high query throughput. HNSW is built for this.

[Hierarchical Navigable Small Worlds](https://arxiv.org/abs/1603.09320) organizes vectors as a layered graph where each node connects to nearby vectors across multiple levels of granularity. A query enters at the top layer, moves toward the target neighborhood, then descends to finer layers until it converges on the best candidates.

The layered structure is what gives HNSW its speed-recall profile. The upper layers help the search move quickly across the vector space. The lower layers refine the candidate set around the target neighborhood. When the graph is in memory, that traversal can be fast and accurate.

The tradeoffs show up on the write side and at scale. Each node stores multiple edge pointers, so the index carries a higher memory footprint than simpler partitioning-based alternatives. Inserts and deletes require maintaining graph structure, which makes writes more expensive. And when the index grows beyond available memory, latency can climb.

In `pgvector`, HNSW is often the first ANN index Postgres developers try when query latency and recall matter most. For a practical look at how it performs, see [Vector Database Basics: HNSW](https://www.tigerdata.com/blog/vector-database-basics-hnsw).

### IVFFlat: When memory and writes matter more

Your write throughput matters, or your index cannot comfortably fit in memory. IVFFlat is worth considering.

IVF stands for inverted file. The basic idea is to partition the vector space into lists, then search only the most promising lists at query time. In `pgvector`, this index type is exposed as ivfflat.

Compared with HNSW, IVFFlat is usually lighter to build and maintain. Inserts are simpler because adding a vector means assigning it to a list rather than updating a graph of neighboring nodes.

The tradeoff is that recall is more sensitive to tuning. If you create 1,000 lists and set `probes = 10`, the query searches a small fraction of the partitioned index. Increasing probes gives the query more chances to find the true nearest neighbors, but it also pushes the query closer to a broader scan. IVFFlat tuning is about finding the lowest probes value that still meets your recall target.

That is the core IVFFlat tradeoff: lower memory and maintenance overhead, but more responsibility for tuning lists and probes against your workload.

### DiskANN: When the index needs to live partly on disk

HNSW assumes the graph fits comfortably in memory. At tens of millions of high-dimensional vectors, that often stops being practical.

[DiskANN](https://www.microsoft.com/en-us/research/publication/diskann-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node/), developed at Microsoft Research, was built for this case. It is a graph-based algorithm designed for datasets too large to fit entirely in RAM. At a high level, it keeps enough compressed information in memory to guide the search while storing more of the full index and vector data on SSD.

The lesson for Postgres developers is the storage pattern. A vector index that works well in RAM may behave very differently when the query path depends on repeated disk reads. Disk-aware indexes are designed around that constraint instead of treating it as an afterthought.

DiskANN still carries higher update costs than many partitioning-based approaches. But for read-heavy workloads on large datasets, it explains the shape of the problem that disk-aware Postgres vector indexing is trying to solve. See [Understanding DiskANN](https://www.tigerdata.com/blog/understanding-diskann) for a deeper look.

### SPFresh: The update problem at scale

Large vector indexes create another problem: updates.

Many ANN systems handle inserts and deletes by buffering changes, maintaining secondary structures, or periodically rebuilding parts of the index. Those approaches can work, but at very large scale they require either accepting stale index state or paying an increasingly expensive maintenance cost to keep the index current.

SPFresh, from Microsoft Research, is one such direction. It builds on partitioning-oriented ideas to reduce the need for global rebuilds, incrementally rebalancing partitions as vectors are inserted, deleted, or updated. Partition assignments are not fixed. They can drift and be corrected over time.

SPFresh is not implemented in Postgres today. But it is not purely academic either. The ideas behind it have already shaped how production vector systems outside Postgres are being designed. Turbopuffer is one example: an object-storage-first vector search service whose architecture is built around centroid-based indexing and minimizing storage round trips. Turbopuffer is not a Postgres system. But the tradeoffs it navigates (high-update workloads, disk-based search, incremental index maintenance without global rebuilds) are real problems the Postgres ecosystem will need to address as vector workloads become more dynamic.

This is worth tracking because the maintenance cost of a vector index is not static. It grows with update frequency and dataset size. For read-heavy workloads on stable datasets, this is not a near-term concern. For teams with high insert and delete rates (documents being added, embeddings regenerated, records retired), it is worth understanding now, before the index becomes the bottleneck.

## The Postgres vector search stack

The algorithms above map to real problems Postgres developers run into. HNSW is useful for in-memory performance, IVFFlat for lighter-weight indexing and write-sensitive workloads, and DiskANN-style designs for larger datasets where memory becomes the constraint.

Here is how the Postgres ecosystem addresses those problems today.

### pgvector

[pgvector](https://github.com/pgvector/pgvector) is the starting point. It adds a native vector column type to Postgres and supports both HNSW and IVFFlat indexes directly.

An HNSW index looks like this:

```sql
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops);
```

For IVFFlat, you define the number of lists and tune the number of probes:

```sql
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 1000);
SET ivfflat.probes = 10;
```

The query planner can use these indexes for nearest-neighbor queries, and you can combine vector search with standard SQL filters, joins and CTEs in the same query. For many teams already running Postgres, this can remove the need to operate a separate vector database.

`pgvector` can start to show limits at larger scale, especially with high-dimensional embeddings at tens of millions of rows and indexes that no longer fit comfortably in memory. That is the problem `pgvectorscale` was built to address.

### pgvectorscale

The DiskANN section above describes a specific problem: vector workloads that have grown too large to keep the working index in memory. For Postgres, [`pgvectorscale`](https://github.com/timescale/pgvectorscale) addresses that directly. It introduces a StreamingDiskANN index type that keeps a compressed representation in memory to guide search while storing the full index on disk.

On a [Tiger Data benchmark](https://www.tigerdata.com/blog/pgvector-is-now-as-fast-as-pinecone-at-75-less-cost) of 50 million Cohere embeddings at 768 dimensions, Postgres with `pgvector` and `pgvectorscale` achieved 28x lower p95 latency and 16x higher query throughput compared to Pinecone's storage-optimized index at 99% recall. This was a vendor-run benchmark. Treat it as directionally useful, not universally predictive. Results will vary with embedding model, dimensionality, filters, recall target, and hardware.

The relevant point is that `pgvectorscale` stays inside the Postgres operational model. It remains composable with `pgvector` data types and standard SQL patterns. If your index has outgrown memory, you do not need a different system. You need a different index type.

### pg\_textsearch and ParadeDB

Vector similarity handles the semantic side of search well, but it is not the whole retrieval problem. Keyword-based retrieval still matters. It catches exact matches that embeddings miss, and for many queries, users know precisely what they are looking for.

This is where `pg_textsearch` and ParadeDB come in.

[pg\_textsearch](https://github.com/timescale/pg_textsearch), also from Tiger Data, brings BM25-based search into Postgres. BM25 accounts for term frequency saturation and document length normalization, which is why it is often a stronger ranking model for keyword search than simple term matching.

ParadeDB takes a related position as a Postgres distribution, bundling [pg\_search](https://github.com/paradedb/paradedb/tree/main/pg_search) for BM25-based full-text search and [pg\_analytics](https://github.com/paradedb/pg_analytics) for analytical query execution. If you want Elasticsearch-style search quality and are open to running a Postgres distribution rather than adding individual extensions, ParadeDB belongs on your evaluation list. When you are operating a small dataset, BM25 relevance ranking may not be a key requirement and `pg_search` will suffice. However, `pg_textsearch` is a better option when you need true BM25 relevance ranking with term saturation (how many times a term appears) or document length normalization to match the experience of Lucene (that powers Elasticsearch) or the algorithms that power Google.

The real payoff of having both vector search and BM25 inside Postgres is hybrid search: combining vector similarity and keyword scoring in a single query. For many RAG applications, this is often a stronger retrieval pattern than vector search alone because each approach covers the other's blind spots. Vector search captures semantic meaning. BM25 catches exact matches.

### A simple hybrid search pattern in SQL

One common way to merge vector and keyword results is Reciprocal Rank Fusion, or RRF.

RRF avoids averaging scores across different scales. Instead, it combines rank positions. A result that appears near the top of either list gets a boost.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2026/05/digram-B.png)

__Hybrid search combines semantic and lexical retrieval. Vector search finds meaning. BM25 catches exact matches. RRF merges the ranked lists without comparing raw scores directly.__

The exact syntax depends on which BM25 extension you use, but the query shape looks like this:

```sql
WITH keyword_results AS (
  SELECT
    id,
    content,
    paradedb.score(id) AS bm25_score,
    ROW_NUMBER() OVER (ORDER BY paradedb.score(id) DESC) AS keyword_rank
  FROM documents
  WHERE content @@@ 'vector search'
  LIMIT 60
),
vector_results AS (
  SELECT
    id,
    content,
    1 - (embedding <=> '[0.1, 0.2, ...]') AS similarity_score,
    ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, ...]') AS vector_rank
  FROM documents
  LIMIT 60
),
combined AS (
  SELECT
    COALESCE(k.id, v.id) AS id,
    COALESCE(k.content, v.content) AS content,
    COALESCE(1.0 / (60 + k.keyword_rank), 0) +
    COALESCE(1.0 / (60 + v.vector_rank), 0) AS rrf_score
  FROM keyword_results k
  FULL OUTER JOIN vector_results v ON k.id = v.id
)
SELECT id, content
FROM combined
ORDER BY rrf_score DESC
LIMIT 10;
```

This retrieves candidates from both systems, ranks them separately, and merges the ranked lists.

This is one of the strongest reasons to keep search in Postgres. Your embeddings, documents, metadata filters, joins, keyword search, and application data can live in one query model.

Learn more: [how to build Hybrid Search in Postgres using pg\_textsearch and pgvectorscale](https://www.tigerdata.com/docs/build/examples/hybrid-search), and [why hybrid search outperforms vector-only retrieval](https://www.tigerdata.com/blog/hybrid-search-postgres-you-probably-should).

## What this guide does not decide for you

No article can tell you the right vector index without your data.

Embedding model, dimensionality, filter selectivity, recall target, update rate, hardware, concurrency, and query distribution all change the answer. Even two datasets with the same number of rows can behave differently if their vectors cluster differently or their filters have different selectivity.

The point of this guide is not to replace benchmarking. It is to help you know what to benchmark first. Start with the simplest index that matches the shape of your workload. Measure it against exact search where possible. Tune recall and latency together. Then move to a more specialized index only when the workload gives you a reason.

## Which Postgres vector index should you use?

| Workload pattern | Start with | Why |
| --- | --- | --- |
| Small dataset or still validating the application | Exact search | Simple, accurate and useful as a recall baseline |
| Starting a serious Postgres vector search workload | pgvector with HNSW | Strong speed-recall tradeoff for read-heavy workloads |
| Lighter index or higher write throughput matters | pgvector with IVFFlat | Lower memory and maintenance overhead, with more tuning required |
| Index no longer fits comfortably in memory | pgvectorscale with StreamingDiskANN | Disk-aware vector indexing while staying inside Postgres |
| Retrieval quality is the bottleneck | Hybrid search with vector plus BM25 | Combines semantic similarity with exact keyword matching |

The path usually looks like this: start with exact search while the dataset is small, move to HNSW when latency requires ANN, consider IVFFlat when memory or write cost matters more, evaluate disk-aware indexing when the working set outgrows memory, and add BM25 when retrieval quality needs more than semantic similarity alone.

## Where things stand and where they are going

The practical rule is simple: benchmark the workload you actually run, not the cleanest version of vector search.

Start with exact search while the dataset is small. Move to HNSW when latency requires ANN. Consider IVFFlat when memory or write cost matters more. Evaluate StreamingDiskANN when the working set outgrows memory. Add BM25 when retrieval quality needs more than semantic similarity.

The one gap that remains is what SPFresh points toward: high-update workloads at scale without global index rebuilds. That capability is not yet in Postgres, but it is already showing up in production vector systems outside the Postgres ecosystem.

Whether it eventually appears as an extension, a fork or something nobody has named yet, the pattern is familiar: a hard problem gets real and someone in this community builds the thing.

Want to dig in further? Look at Tiger Data docs for [pgvectorscale](https://github.com/timescale/pgvectorscale) and [pg\_textsearch](https://github.com/timescale/pg_textsearch).