By Raja Rao DV

6 min read

Dec 23, 2025

Table of contents

01 So What Are The Problems With Postgres Native Search?02 So What’s The Solution?03 But What About AI Agents?04 Try It 05 The Point 06 Learn More

You Don’t Need Elasticsearch: BM25 is Now in Postgres

PostgreSQL

By Raja Rao DV

6 min read

Dec 23, 2025

Table of contents

01 So What Are The Problems With Postgres Native Search?02 So What’s The Solution?03 But What About AI Agents?04 Try It 05 The Point 06 Learn More

Postgres is everywhere. Millions of developers use it and love it. It's the database behind Stripe, Instagram, Spotify, and countless startups.

Search is everywhere too. Every app needs it. Product catalogs. Documentation. User content. Support tickets. And now: AI agents and RAG pipelines that need to find the right documents before generating answers.

So naturally, people try to use Postgres for search. And quickly hit its limits.

The usual next step? Add Elasticsearch. Or Algolia. Or Typesense.

Suddenly you're:

Spinning up another cluster (and keeping it running 24/7)
Building data sync pipelines between Postgres and your search system
Debugging why search results are stale or missing
Adding another system to your on-call rotation
Paying $1000s/month for a managed service, or hiring someone who knows how to run it

Yes, the 1% might need this. Elasticsearch for petabyte-scale log aggregation. Clickhouse for real-time analytics. Google and OpenAI build bespoke infrastructure.

But the 99% don't need that complexity. They need better search in the database they already have.

What if Postgres search was just... better?

It can be now. But first, let's look at what's actually wrong with native Postgres search.

BTW, you can try the demo app here: https://pgtextsearchdemo.vercel.app/. This app is like a playground that compares native search vs BM-25 vs vector search (using pgVectorScale) vs hybrid.

So What Are The Problems With Postgres Native Search?

The main goal of search is to return the most relevant and useful results for a given query. Sounds simple, right? In practice, it’s anything but.

Depending on the search query and the corpus of documents you’re searching against, a variety of subtle (and not-so-subtle) problems start to show up. Some documents may mention the query too often, others too little. Some may be long and rambling, while others are short and focused. And not every match is equally useful.

Let’s walk through these challenges one by one. To keep things concrete, assume you have the following documents:

📄 Database Connection Pooling Guide
   "Database connection pooling improves application performance. A pool maintains reusable connections. Configure pool size based on workload."

📄 PostgreSQL Authentication Setup  
   "Set up PostgreSQL database authentication methods. Configure pg_hba.conf for password, certificate, and LDAP authentication."

📄 Generic Blog Post
   "Database database database. Learn about database. Database is important. Database database database. More database info."

📄 EXPLAIN ANALYZE Quick Tip (15 words)
   "Use EXPLAIN ANALYZE to find slow PostgreSQL queries. Shows execution plan and actual timing."

📄 Complete PostgreSQL Query Tuning Guide (80 words)
   "This comprehensive PostgreSQL guide covers query tuning. PostgreSQL query performance depends on proper use of EXPLAIN and EXPLAIN ANALYZE. 
   Run EXPLAIN ANALYZE on slow queries. The EXPLAIN output shows decisions..."

Let’s say we started to search for different things like “database”, “database authentication” and so on, so that we can illustrate the problem.

Problem 1: Keyword Stuffing Wins

When you search for “database”, Native Postgres ranks by keyword count. The spam doc with "database" repeated 12 times ranks #1. The actual useful guides rank lower.

Problem 2: Common Words Dominate

When you search for “database authentication”, "database" appears in 10+ docs. "Authentication" appears in 1. Which word actually identifies what you're looking for?

Native Postgres treats them equally. BM25 knows "authentication" is the signal.

Problem 3: Long Documents Win

When you search for “EXPLAIN ANALYZE”, the 80-word guide mentions it 8 times. The 15-word tip mentions it 2 times. Native ranks the long doc higher.

But the short tip is entirely about EXPLAIN ANALYZE. It's the best result.

Problem 4: All-or-Nothing Matching

When you search for “database connection pooling”, Native uses Boolean AND. Only docs with ALL three terms match. You get 2 results out of 15.

Switch to OR? You get 13 results, but many have identical scores. No way to tell which is actually relevant.

So What’s The Solution?

The good news is that the search industry actually solved it in the 90s, but it just wasn’t added to Postgres. It’s called BM25. BM25 (Best Matching 25) powers Elasticsearch, Solr, Lucene, and virtually every production search system. It fixes exactly the problems above:

Term Frequency Saturation - Mentioning a word 12 times doesn't make a doc 12× more relevant. After a few mentions, additional repetitions barely help. Spam loses.

Inverse Document Frequency (IDF) - Rare terms matter more. "Database" appears everywhere, so it's noise. "Authentication" appears once, so it's a signal. BM25 weights accordingly.

Length Normalization - A focused 15-word tip about your query beats an 80-word doc that mentions it in passing. BM25 adjusts for document length.

Ranked Retrieval - Every doc gets a meaningful relevance score, not just "matches" or "doesn't match". Partial matches still appear, just ranked lower.

This is how Google worked from the beginning. It's table stakes for search.

Now It's in Postgres. pg_textsearch brings BM25 to Postgres:

CREATE EXTENSION pg_textsearch;
CREATE INDEX ON articles USING bm25(content);

SELECT * FROM articles 
ORDER BY content <@> to_bm25query('database performance')
LIMIT 10;

But What About AI Agents?

Remember the intro? AI agents and RAG pipelines need search too. And they have a problem that BM25 alone can’t solve.

When a user asks “why is my database slow?”, there’s no direct keyword match to “query optimization” or “index tuning”. BM25 finds nothing. The agent fails.

Vector search understands meaning. It knows that “slow database” is related to “performance optimization”. But vectors have the opposite problem: they’re fuzzy. Search for error PG-1234 and vectors return generic error documents, not the one with your exact error code.

The fix: use both.

Query: error PG-1234
BM25 finds: Document with the exact code
Vectors find: Generic error documents
Hybrid finds: The exact code document

Query: why is my database slow
BM25 finds: Nothing (no keyword match)
Vectors find: Performance optimization documents
Hybrid finds: Performance documents

Query: fix connection timeout
BM25 finds: Timeout configuration documents
Vectors find: Troubleshooting guides
Hybrid finds: Both, ranked by relevance

This is why every major AI search system uses hybrid search.

LangChain’s EnsembleRetriever combines BM25 and vectors using Reciprocal Rank Fusion.

Cohere Rerank recommends BM25 as a first-stage retriever.

Pinecone added hybrid search that combines sparse and dense vectors.

Postgres can do this too. With pgvector:

-- Hybrid search with Reciprocal Rank Fusion
WITH bm25 AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query($1)) as rank
  FROM docs LIMIT 20
),
vector AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $2) as rank  
  FROM docs LIMIT 20
)
SELECT id, 1.0/(60+bm25.rank) + 1.0/(60+vector.rank) as score
FROM bm25 FULL JOIN vector USING (id)
ORDER BY score DESC LIMIT 10;

Keywords + meaning. One database.

Try It

We built a demo that runs Native Postgres, BM25, Vector, and Hybrid search side-by-side. Same query, same documents, different results: https://github.com/rajaraodv/pg_textsearch_demo

As mentioned earlier, you can play around the hosted demo app here.

git clone https://github.com/rajaraodv/pg_textsearch_demo.git
cd pg_textsearch_demo
npm install
# Add DATABASE_URL and OPENAI_API_KEY to .env.local
npm run setup && npm run dev

Or add to your existing Postgres:

CREATE EXTENSION pg_textsearch;
CREATE INDEX ON your_table USING bm25(content);
SELECT * FROM your_table 
ORDER BY content <@> to_bm25query('your search')
LIMIT 10;

The Point

Postgres is everywhere. Search is everywhere. Now a good search is in Postgres.

pg_textsearch is fully open source under the PostgreSQL license. Use it anywhere, for anything.

Available now on Tiger Data.

Learn More

It’s 2026, Just Use Postgres

PostgreSQL TimescaleDB

Feb 02, 2026

Stop managing multiple databases. Postgres extensions replace Elasticsearch, Pinecone, Redis, MongoDB, and InfluxDB with BM25, vectors, JSONB, and time-series in one database.

10 Elasticsearch Production Issues (and How Postgres Avoids Them)

PostgreSQL

Jan 30, 2026

Why Elasticsearch is complex in production: garbage collection, shard math, data sync pipelines, and monitoring overhead. Postgres with pg_textsearch simplifies search.

Stay updated with new posts and releases.

Receive the latest technical articles and release notes in your inbox.

By Raja Rao DV

6 min read

Dec 23, 2025

PostgreSQL PostgreSQL Tips Search

Table of contents

01 So What Are The Problems With Postgres Native Search?02 So What’s The Solution?03 But What About AI Agents?04 Try It 05 The Point 06 Learn More

Get Started Free with Tiger CLI

You Don’t Need Elasticsearch: BM25 is Now in Postgres

PostgreSQL

By Raja Rao DV

6 min read

Dec 23, 2025

Table of contents

01 So What Are The Problems With Postgres Native Search?02 So What’s The Solution?03 But What About AI Agents?04 Try It 05 The Point 06 Learn More

Postgres is everywhere. Millions of developers use it and love it. It's the database behind Stripe, Instagram, Spotify, and countless startups.

So naturally, people try to use Postgres for search. And quickly hit its limits.

The usual next step? Add Elasticsearch. Or Algolia. Or Typesense.

Suddenly you're:

Spinning up another cluster (and keeping it running 24/7)
Building data sync pipelines between Postgres and your search system
Debugging why search results are stale or missing
Adding another system to your on-call rotation
Paying $1000s/month for a managed service, or hiring someone who knows how to run it

Yes, the 1% might need this. Elasticsearch for petabyte-scale log aggregation. Clickhouse for real-time analytics. Google and OpenAI build bespoke infrastructure.

But the 99% don't need that complexity. They need better search in the database they already have.

What if Postgres search was just... better?

It can be now. But first, let's look at what's actually wrong with native Postgres search.

BTW, you can try the demo app here: https://pgtextsearchdemo.vercel.app/. This app is like a playground that compares native search vs BM-25 vs vector search (using pgVectorScale) vs hybrid.

So What Are The Problems With Postgres Native Search?

The main goal of search is to return the most relevant and useful results for a given query. Sounds simple, right? In practice, it’s anything but.

Let’s walk through these challenges one by one. To keep things concrete, assume you have the following documents:

📄 Database Connection Pooling Guide
   "Database connection pooling improves application performance. A pool maintains reusable connections. Configure pool size based on workload."

📄 PostgreSQL Authentication Setup  
   "Set up PostgreSQL database authentication methods. Configure pg_hba.conf for password, certificate, and LDAP authentication."

📄 Generic Blog Post
   "Database database database. Learn about database. Database is important. Database database database. More database info."

📄 EXPLAIN ANALYZE Quick Tip (15 words)
   "Use EXPLAIN ANALYZE to find slow PostgreSQL queries. Shows execution plan and actual timing."

📄 Complete PostgreSQL Query Tuning Guide (80 words)
   "This comprehensive PostgreSQL guide covers query tuning. PostgreSQL query performance depends on proper use of EXPLAIN and EXPLAIN ANALYZE. 
   Run EXPLAIN ANALYZE on slow queries. The EXPLAIN output shows decisions..."

Let’s say we started to search for different things like “database”, “database authentication” and so on, so that we can illustrate the problem.

Problem 1: Keyword Stuffing Wins

When you search for “database”, Native Postgres ranks by keyword count. The spam doc with "database" repeated 12 times ranks #1. The actual useful guides rank lower.

Problem 2: Common Words Dominate

When you search for “database authentication”, "database" appears in 10+ docs. "Authentication" appears in 1. Which word actually identifies what you're looking for?

Native Postgres treats them equally. BM25 knows "authentication" is the signal.

Problem 3: Long Documents Win

When you search for “EXPLAIN ANALYZE”, the 80-word guide mentions it 8 times. The 15-word tip mentions it 2 times. Native ranks the long doc higher.

But the short tip is entirely about EXPLAIN ANALYZE. It's the best result.

Problem 4: All-or-Nothing Matching

When you search for “database connection pooling”, Native uses Boolean AND. Only docs with ALL three terms match. You get 2 results out of 15.

Switch to OR? You get 13 results, but many have identical scores. No way to tell which is actually relevant.

So What’s The Solution?

Term Frequency Saturation - Mentioning a word 12 times doesn't make a doc 12× more relevant. After a few mentions, additional repetitions barely help. Spam loses.

Inverse Document Frequency (IDF) - Rare terms matter more. "Database" appears everywhere, so it's noise. "Authentication" appears once, so it's a signal. BM25 weights accordingly.

Length Normalization - A focused 15-word tip about your query beats an 80-word doc that mentions it in passing. BM25 adjusts for document length.

Ranked Retrieval - Every doc gets a meaningful relevance score, not just "matches" or "doesn't match". Partial matches still appear, just ranked lower.

This is how Google worked from the beginning. It's table stakes for search.

Now It's in Postgres. pg_textsearch brings BM25 to Postgres:

CREATE EXTENSION pg_textsearch;
CREATE INDEX ON articles USING bm25(content);

SELECT * FROM articles 
ORDER BY content <@> to_bm25query('database performance')
LIMIT 10;

But What About AI Agents?

Remember the intro? AI agents and RAG pipelines need search too. And they have a problem that BM25 alone can’t solve.

When a user asks “why is my database slow?”, there’s no direct keyword match to “query optimization” or “index tuning”. BM25 finds nothing. The agent fails.

The fix: use both.

Query: error PG-1234
BM25 finds: Document with the exact code
Vectors find: Generic error documents
Hybrid finds: The exact code document

Query: why is my database slow
BM25 finds: Nothing (no keyword match)
Vectors find: Performance optimization documents
Hybrid finds: Performance documents

Query: fix connection timeout
BM25 finds: Timeout configuration documents
Vectors find: Troubleshooting guides
Hybrid finds: Both, ranked by relevance

This is why every major AI search system uses hybrid search.

LangChain’s EnsembleRetriever combines BM25 and vectors using Reciprocal Rank Fusion.

Cohere Rerank recommends BM25 as a first-stage retriever.

Pinecone added hybrid search that combines sparse and dense vectors.

Postgres can do this too. With pgvector:

-- Hybrid search with Reciprocal Rank Fusion
WITH bm25 AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query($1)) as rank
  FROM docs LIMIT 20
),
vector AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $2) as rank  
  FROM docs LIMIT 20
)
SELECT id, 1.0/(60+bm25.rank) + 1.0/(60+vector.rank) as score
FROM bm25 FULL JOIN vector USING (id)
ORDER BY score DESC LIMIT 10;

Keywords + meaning. One database.

Try It

git clone https://github.com/rajaraodv/pg_textsearch_demo.git
cd pg_textsearch_demo
npm install
# Add DATABASE_URL and OPENAI_API_KEY to .env.local
npm run setup && npm run dev

Or add to your existing Postgres:

CREATE EXTENSION pg_textsearch;
CREATE INDEX ON your_table USING bm25(content);
SELECT * FROM your_table 
ORDER BY content <@> to_bm25query('your search')
LIMIT 10;

The Point

Postgres is everywhere. Search is everywhere. Now a good search is in Postgres.

pg_textsearch is fully open source under the PostgreSQL license. Use it anywhere, for anything.

Available now on Tiger Data.

Learn More

It’s 2026, Just Use Postgres

PostgreSQL TimescaleDB

Feb 02, 2026

Stop managing multiple databases. Postgres extensions replace Elasticsearch, Pinecone, Redis, MongoDB, and InfluxDB with BM25, vectors, JSONB, and time-series in one database.

10 Elasticsearch Production Issues (and How Postgres Avoids Them)

PostgreSQL

Jan 30, 2026

Why Elasticsearch is complex in production: garbage collection, shard math, data sync pipelines, and monitoring overhead. Postgres with pg_textsearch simplifies search.

Stay updated with new posts and releases.

Receive the latest technical articles and release notes in your inbox.

You Don’t Need Elasticsearch: BM25 is Now in Postgres

So What Are The Problems With Postgres Native Search?

Problem 1: Keyword Stuffing Wins

Problem 2: Common Words Dominate

Problem 3: Long Documents Win

Problem 4: All-or-Nothing Matching

So What’s The Solution?

Now It's in Postgres. pg_textsearch brings BM25 to Postgres:

But What About AI Agents?

Try It

The Point

Learn More

Related posts

Stay updated with new posts and releases.

You Don’t Need Elasticsearch: BM25 is Now in Postgres

So What Are The Problems With Postgres Native Search?

Problem 1: Keyword Stuffing Wins

Problem 2: Common Words Dominate

Problem 3: Long Documents Win

Problem 4: All-or-Nothing Matching

So What’s The Solution?

Now It's in Postgres. pg_textsearch brings BM25 to Postgres:

But What About AI Agents?

Try It

The Point

Learn More

Related posts

Stay updated with new posts and releases.