---
title: "You Don’t Need Elasticsearch: BM25 is Now in Postgres"
published: 2025-12-23T16:24:39.000-05:00
updated: 2026-02-25T10:39:44.000-05:00
excerpt: "You don't need Elasticsearch: BM25 is now in Postgres with pg_textsearch. Get better search rankings with term frequency, IDF, and length normalization."
tags: Announcements & Releases, pg_textsearch
authors: Raja Rao DV
---

> **TimescaleDB is now Tiger Data.**

Postgres is everywhere. Millions of developers use it and love it. It's the database behind Stripe, Instagram, Spotify, and countless startups.

Search is everywhere too. Every app needs it. Product catalogs. Documentation. User content. Support tickets. And now: AI agents and RAG pipelines that need to find the right documents before generating answers.

So naturally, people try to use Postgres for search. And quickly hit its limits.

The usual next step? Add Elasticsearch. Or Algolia. Or Typesense.

Suddenly you're:

-   Spinning up another cluster (and keeping it running 24/7)
-   Building data sync pipelines between Postgres and your search system
-   Debugging why search results are stale or missing
-   Adding another system to your on-call rotation
-   Paying $1000s/month for a managed service, or hiring someone who knows how to run it

Yes, the 1% might need this. Elasticsearch for petabyte-scale log aggregation. Clickhouse for real-time analytics. Google and OpenAI build bespoke infrastructure.

But the 99% don't need that complexity. They need better search in the database they already have.

What if Postgres search was just... better?

It can be now. But first, let's look at what's actually wrong with native Postgres search.

BTW, you can try the demo app here: [https://pgtextsearchdemo.vercel.app/](https://pgtextsearchdemo.vercel.app/). This app is like a playground that compares native search vs BM-25 vs vector search (using pgVectorScale) vs hybrid.

* * *

## So What Are The Problems With Postgres Native Search?

The main goal of search is to return the most relevant and useful results for a given query. Sounds simple, right? In practice, it’s anything but.

Depending on the search query and the corpus of documents you’re searching against, a variety of subtle (and not-so-subtle) problems start to show up. Some documents may mention the query too often, others too little. Some may be long and rambling, while others are short and focused. And not every match is equally useful.

Let’s walk through these challenges one by one. To keep things concrete, assume you have the following documents:

```
📄 Database Connection Pooling Guide
   "Database connection pooling improves application performance. A pool maintains reusable connections. Configure pool size based on workload."

📄 PostgreSQL Authentication Setup  
   "Set up PostgreSQL database authentication methods. Configure pg_hba.conf for password, certificate, and LDAP authentication."

📄 Generic Blog Post
   "Database database database. Learn about database. Database is important. Database database database. More database info."

📄 EXPLAIN ANALYZE Quick Tip (15 words)
   "Use EXPLAIN ANALYZE to find slow PostgreSQL queries. Shows execution plan and actual timing."

📄 Complete PostgreSQL Query Tuning Guide (80 words)
   "This comprehensive PostgreSQL guide covers query tuning. PostgreSQL query performance depends on proper use of EXPLAIN and EXPLAIN ANALYZE. 
   Run EXPLAIN ANALYZE on slow queries. The EXPLAIN output shows decisions..."
```

Let’s say we started to search for different things like “database”, “database authentication” and so on, so that we can illustrate the problem.

### Problem 1: Keyword Stuffing Wins

When you search for “**`database`**”, Native Postgres ranks by keyword count. The spam doc with "database" repeated 12 times ranks #1. The actual useful guides rank lower.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2025/12/problem1-new.png)

### Problem 2: Common Words Dominate

When you search for “**`database authentication`**”**,** "database" appears in 10+ docs. "Authentication" appears in 1. Which word actually identifies what you're looking for?

Native Postgres treats them equally. BM25 knows "authentication" is the signal.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2025/12/problem2-new.png)

### Problem 3: Long Documents Win

When you search for “**`EXPLAIN ANALYZE`**”, the 80-word guide mentions it 8 times. The 15-word tip mentions it 2 times. Native ranks the long doc higher.

But the short tip is _entirely_ about `EXPLAIN ANALYZE`. It's the best result.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2025/12/problem3-new.png)

### Problem 4: All-or-Nothing Matching

When you search for “**`database connection pooling`**”, Native uses Boolean AND. Only docs with ALL three terms match. You get 2 results out of 15.

Switch to `OR`? You get 13 results, but many have identical scores. No way to tell which is actually relevant.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2025/12/problem4-new.png)

* * *

## So What’s The Solution? 

The good news is that the search industry actually solved it in the 90s, but it just wasn’t added to Postgres. It’s called [BM25](https://en.wikipedia.org/wiki/Okapi_BM25). BM25 (Best Matching 25) powers Elasticsearch, Solr, Lucene, and virtually every production search system. It fixes exactly the problems above:

**Term Frequency Saturation** - Mentioning a word 12 times doesn't make a doc 12× more relevant. After a few mentions, additional repetitions barely help. Spam loses.

**Inverse Document Frequency (IDF)** - Rare terms matter more. "Database" appears everywhere, so it's noise. "Authentication" appears once, so it's a signal. BM25 weights accordingly.

**Length Normalization** - A focused 15-word tip about your query beats an 80-word doc that mentions it in passing. BM25 adjusts for document length.

**Ranked Retrieval** - Every doc gets a meaningful relevance score, not just "matches" or "doesn't match". Partial matches still appear, just ranked lower.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2025/12/bm25-venn-diagram.png)

This is how Google worked from the beginning. It's table stakes for search.

### Now It's in Postgres. [pg\_textsearch](https://github.com/timescale/pg_textsearch) brings BM25 to Postgres:

```sql
CREATE EXTENSION pg_textsearch;
CREATE INDEX ON articles USING bm25(content);

SELECT * FROM articles 
ORDER BY content <@> to_bm25query('database performance')
LIMIT 10;
```

* * *

## But What About AI Agents?

Remember the intro? AI agents and RAG pipelines need search too. And they have a problem that BM25 alone can’t solve.

When a user asks “why is my database slow?”, there’s no direct keyword match to “query optimization” or “index tuning”. BM25 finds nothing. The agent fails.

Vector search understands meaning. It knows that “slow database” is related to “performance optimization”. But vectors have the opposite problem: they’re fuzzy. Search for error PG-1234 and vectors return generic error documents, not the one with your exact error code.

The fix: use both.

Query: `error PG-1234`  
BM25 finds: Document with the exact code  
Vectors find: Generic error documents  
Hybrid finds: The exact code document

Query: `why is my database slow`  
BM25 finds: Nothing (no keyword match)  
Vectors find: Performance optimization documents  
Hybrid finds: Performance documents

Query: `fix connection timeout`  
BM25 finds: Timeout configuration documents  
Vectors find: Troubleshooting guides  
Hybrid finds: Both, ranked by relevance

This is why every major AI search system uses hybrid search.

LangChain’s [EnsembleRetriever](https://python.langchain.com/docs/how_to/ensemble_retriever/) combines BM25 and vectors using Reciprocal Rank Fusion.

[Cohere Rerank](https://docs.cohere.com/docs/reranking-best-practices) recommends BM25 as a first-stage retriever.

Pinecone added [hybrid search](https://docs.pinecone.io/guides/data/understanding-hybrid-search) that combines sparse and dense vectors.

Postgres can do this too. With [pgvector](https://github.com/pgvector/pgvector):

```sql
-- Hybrid search with Reciprocal Rank Fusion
WITH bm25 AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query($1)) as rank
  FROM docs LIMIT 20
),
vector AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $2) as rank  
  FROM docs LIMIT 20
)
SELECT id, 1.0/(60+bm25.rank) + 1.0/(60+vector.rank) as score
FROM bm25 FULL JOIN vector USING (id)
ORDER BY score DESC LIMIT 10;
```

Keywords + meaning. One database.

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2025/12/hybrid-search.png)

* * *

## Try It

We built a demo that runs Native Postgres, BM25, Vector, and Hybrid search side-by-side. Same query, same documents, different results: [https://github.com/rajaraodv/pg\_textsearch\_demo](https://github.com/rajaraodv/pg_textsearch_demo)  
  
As mentioned earlier, you can play around the hosted demo app [here](https://pgtextsearchdemo.vercel.app/).

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2025/12/app-image.png)

```shell
git clone https://github.com/rajaraodv/pg_textsearch_demo.git
cd pg_textsearch_demo
npm install
# Add DATABASE_URL and OPENAI_API_KEY to .env.local
npm run setup && npm run dev
```

Or add to your existing Postgres:

```sql
CREATE EXTENSION pg_textsearch;
CREATE INDEX ON your_table USING bm25(content);
SELECT * FROM your_table 
ORDER BY content <@> to_bm25query('your search')
LIMIT 10;
```

* * *

## The Point

Postgres is everywhere. Search is everywhere. Now a good search is in Postgres.

[pg\_textsearch](https://github.com/timescale/pg_textsearch) is fully open source under the [PostgreSQL license](https://opensource.org/licenses/PostgreSQL). Use it anywhere, for anything.

Available now on [Tiger Data](https://console.cloud.timescale.com/).

* * *

## Learn More

-   [pg\_textsearch GitHub](https://github.com/timescale/pg_textsearch)
-   [Documentation](https://docs.timescale.com/use-timescale/latest/extensions/pg-textsearch/)
-   [BM25 Algorithm (Wikipedia)](https://en.wikipedia.org/wiki/Okapi_BM25)
-   [https://pgtextsearchdemo.vercel.app/](https://pgtextsearchdemo.vercel.app/)
-   [https://github.com/rajaraodv/pg\_textsearch\_demo](https://github.com/rajaraodv/pg_textsearch_demo)