Category: All posts
Dec 23, 2025

Posted by
Raja Rao DV
Postgres is everywhere. Millions of developers use it and love it. It's the database behind Stripe, Instagram, Spotify, and countless startups.
Search is everywhere too. Every app needs it. Product catalogs. Documentation. User content. Support tickets. And now: AI agents and RAG pipelines that need to find the right documents before generating answers.
So naturally, people try to use Postgres for search. And quickly hit its limits.
The usual next step? Add Elasticsearch. Or Algolia. Or Typesense.
Suddenly you're:
Yes, the 1% might need this. Elasticsearch for petabyte-scale log aggregation. Clickhouse for real-time analytics. Google and OpenAI build bespoke infrastructure.
But the 99% don't need that complexity. They need better search in the database they already have.
What if Postgres search was just... better?
It can be now. But first, let's look at what's actually wrong with native Postgres search.
BTW, you can try the demo app here: https://pgtextsearchdemo.vercel.app/. This app is like a playground that compares native search vs BM-25 vs vector search (using pgVectorScale) vs hybrid.
The main goal of search is to return the most relevant and useful results for a given query. Sounds simple, right? In practice, it’s anything but.
Depending on the search query and the corpus of documents you’re searching against, a variety of subtle (and not-so-subtle) problems start to show up. Some documents may mention the query too often, others too little. Some may be long and rambling, while others are short and focused. And not every match is equally useful.
Let’s walk through these challenges one by one. To keep things concrete, assume you have the following documents:
📄 Database Connection Pooling Guide
"Database connection pooling improves application performance. A pool maintains reusable connections. Configure pool size based on workload."
📄 PostgreSQL Authentication Setup
"Set up PostgreSQL database authentication methods. Configure pg_hba.conf for password, certificate, and LDAP authentication."
📄 Generic Blog Post
"Database database database. Learn about database. Database is important. Database database database. More database info."
📄 EXPLAIN ANALYZE Quick Tip (15 words)
"Use EXPLAIN ANALYZE to find slow PostgreSQL queries. Shows execution plan and actual timing."
📄 Complete PostgreSQL Query Tuning Guide (80 words)
"This comprehensive PostgreSQL guide covers query tuning. PostgreSQL query performance depends on proper use of EXPLAIN and EXPLAIN ANALYZE.
Run EXPLAIN ANALYZE on slow queries. The EXPLAIN output shows decisions..."
Let’s say we started to search for different things like “database”, “database authentication” and so on, so that we can illustrate the problem.
When you search for “database”, Native Postgres ranks by keyword count. The spam doc with "database" repeated 12 times ranks #1. The actual useful guides rank lower.

When you search for “database authentication”, "database" appears in 10+ docs. "Authentication" appears in 1. Which word actually identifies what you're looking for?
Native Postgres treats them equally. BM25 knows "authentication" is the signal.

When you search for “EXPLAIN ANALYZE”, the 80-word guide mentions it 8 times. The 15-word tip mentions it 2 times. Native ranks the long doc higher.
But the short tip is entirely about EXPLAIN ANALYZE. It's the best result.

When you search for “database connection pooling”, Native uses Boolean AND. Only docs with ALL three terms match. You get 2 results out of 15.
Switch to OR? You get 13 results, but many have identical scores. No way to tell which is actually relevant.

The good news is that the search industry actually solved it in the 90s, but it just wasn’t added to Postgres. It’s called BM25. BM25 (Best Matching 25) powers Elasticsearch, Solr, Lucene, and virtually every production search system. It fixes exactly the problems above:
Term Frequency Saturation - Mentioning a word 12 times doesn't make a doc 12× more relevant. After a few mentions, additional repetitions barely help. Spam loses.
Inverse Document Frequency (IDF) - Rare terms matter more. "Database" appears everywhere, so it's noise. "Authentication" appears once, so it's a signal. BM25 weights accordingly.
Length Normalization - A focused 15-word tip about your query beats an 80-word doc that mentions it in passing. BM25 adjusts for document length.
Ranked Retrieval - Every doc gets a meaningful relevance score, not just "matches" or "doesn't match". Partial matches still appear, just ranked lower.

This is how Google worked from the beginning. It's table stakes for search.
CREATE EXTENSION pg_textsearch;
CREATE INDEX ON articles USING bm25(content);
SELECT * FROM articles
ORDER BY content <@> to_bm25query('database performance')
LIMIT 10;Remember the intro? AI agents and RAG pipelines need search too. And they have a problem that BM25 alone can’t solve.
When a user asks “why is my database slow?”, there’s no direct keyword match to “query optimization” or “index tuning”. BM25 finds nothing. The agent fails.
Vector search understands meaning. It knows that “slow database” is related to “performance optimization”. But vectors have the opposite problem: they’re fuzzy. Search for error PG-1234 and vectors return generic error documents, not the one with your exact error code.
The fix: use both.
Query: error PG-1234
BM25 finds: Document with the exact code
Vectors find: Generic error documents
Hybrid finds: The exact code document
Query: why is my database slow
BM25 finds: Nothing (no keyword match)
Vectors find: Performance optimization documents
Hybrid finds: Performance documents
Query: fix connection timeout
BM25 finds: Timeout configuration documents
Vectors find: Troubleshooting guides
Hybrid finds: Both, ranked by relevance
This is why every major AI search system uses hybrid search.
LangChain’s EnsembleRetriever combines BM25 and vectors using Reciprocal Rank Fusion.
Cohere Rerank recommends BM25 as a first-stage retriever.
Pinecone added hybrid search that combines sparse and dense vectors.
Postgres can do this too. With pgvector:
-- Hybrid search with Reciprocal Rank Fusion
WITH bm25 AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query($1)) as rank
FROM docs LIMIT 20
),
vector AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $2) as rank
FROM docs LIMIT 20
)
SELECT id, 1.0/(60+bm25.rank) + 1.0/(60+vector.rank) as score
FROM bm25 FULL JOIN vector USING (id)
ORDER BY score DESC LIMIT 10;Keywords + meaning. One database.

We built a demo that runs Native Postgres, BM25, Vector, and Hybrid search side-by-side. Same query, same documents, different results: https://github.com/rajaraodv/pg_textsearch_demo
As mentioned earlier, you can play around the hosted demo app here.

git clone https://github.com/rajaraodv/pg_textsearch_demo.git
cd pg_textsearch_demo
npm install
# Add DATABASE_URL and OPENAI_API_KEY to .env.local
npm run setup && npm run devOr add to your existing Postgres:
CREATE EXTENSION pg_textsearch;
CREATE INDEX ON your_table USING bm25(content);
SELECT * FROM your_table
ORDER BY content <@> to_bm25query('your search')
LIMIT 10;Postgres is everywhere. Search is everywhere. Now a good search is in Postgres.
pg_textsearch is fully open source under the PostgreSQL license. Use it anywhere, for anything.
Available now on Tiger Data.