---
title: "Yes, You Can Do Hybrid Search in Postgres (And You Probably Should)"
published: 2026-04-20T14:30:28.000-04:00
updated: 2026-04-20T14:30:28.000-04:00
excerpt: "Most search stacks run four systems to answer one question. You don't need any of them. Build production hybrid search in Postgres with pg_textsearch for BM25, pgvectorscale for vector similarity, and Reciprocal Rank Fusion to combine them. One query. One database."
tags: pg_textsearch, Cloud, Hypertables, Hyperfunctions, pgvector, PostgreSQL, Scaling PostgreSQL, SQL, Thought Leadership, Tiger Cloud, Tiger Data, Timescale Cloud, Timescale Community, Tutorials
authors: Erin Mikail Staples
---

> **TimescaleDB is now Tiger Data.**

Picture this: you're searching through your company's internal knowledge base. You _know_ the answer is in there. You typed in what you thought were the right keywords. And yet, the top result is somebody's 2019 onboarding doc about the coffee machine.

Sound familiar?

I've watched this scene play out in every company I've worked at, across every tool that claimed to have "search." The problem isn't that search itself is broken. It's that we've been asking it to do two completely different jobs with one rigid toolset, and calling it solved.

Here's what most people miss: **search doesn't have a modality problem. It has an architecture problem.** And the fix isn't a bigger stack. It's a smarter query, running in the database you already trust.

## The two halves of "finding stuff"

We use "search" as though it's one thing. It isn't.

When someone types a query into a search bar, they're usually doing one of two things:

1.  **Looking for a specific word or phrase they remember.** "Find me that doc with 'SOC 2 audit' in it." This is **keyword search**. It rewards precision and specificity.
2.  **Trying to describe a concept, not quote it.** "Find me docs about how we handle customer data during compliance reviews." This is **semantic search**. It rewards understanding.

Traditional full-text search, powered by algorithms like **BM25**, is excellent at the first job. It's stemmed, ranked, battle-tested, and has been the backbone of search engines for decades. It finds "imposter syndrome" when you type "imposter."

**Vector search**, the newer kid on the block, is excellent at the second job. It embeds your text into a high-dimensional space where _meaning_ lives closer together than words do. It finds "imposter syndrome" when you type "feeling like a fraud at work," even though literally none of the words overlap.

Now here's the punchline: **neither of these systems is wrong, and neither of them is enough.**

If you only use BM25, you miss every user who doesn't know the exact vocabulary your docs use. If you only use vector search, you miss the person who typed a part number, a legal citation, or "episode 100." Vector search has no particular opinion about the number 100.

The answer is not to pick a side. The answer is to run both and combine the results. That's hybrid search, and it turns out you can do all of it in one database, in one query, against data you're already storing there.

## Why this matters now

Two years ago, hybrid search in Postgres was a demo. Now it's a production architecture. Here's what changed.

**BM25 actually runs fast in Postgres now.** `pg_textsearch` 1.0 built the full search engine in C, on top of Postgres's storage layer. Not a wrapper around an external library. Not ts\_rank with extra steps. Block-Max WAND optimization, native. For a lot of workloads, [you don't need Elasticsearch anymore](https://www.tigerdata.com/blog/you-dont-need-elasticsearch-bm25-is-now-in-postgres).

**Vector search in Postgres stopped being a party trick.** `pgvectorscale`'s DiskANN-inspired index [outperforms specialized vector databases on cost and latency](https://www.tigerdata.com/blog/pgvector-is-now-as-fast-as-pinecone-at-75-less-cost) at 75% less cost. The index lives on disk, so you're not RAM-capped. Filtered search actually works under production RAG conditions, which is historically where specialized vector databases fall apart.

**Every team I talk to is consolidating.** LLM apps don't need five databases. They need one system that can store source content, embeddings, chat history, user context, and audit trails, all in the same transactional boundary. The teams winning on infrastructure right now are the ones who realized [Postgres is that system](https://www.tigerdata.com/blog/postgres-for-agents) and stopped building around it. (My colleagues have been saying this loudly for a while: [it's 2026, just use Postgres](https://www.tigerdata.com/blog/its-2026-just-use-postgres).)

That third one is the one I keep coming back to. Five databases to answer one question is an architecture that made sense once and then quietly stopped making sense. Most teams are a quarter or two behind on noticing.

Put those three together and hybrid search in Postgres went from "neat prototype" to a real production architecture. That's the shift this post is about.

## Why Postgres is where this belongs

I've built the multi-system search stack. Elastic for BM25, Pinecone for vectors, a custom merge service, Redis for caching, and the ritual of explaining the whole thing to every new hire who asks "but why can't we just query the database?"

I've also cussed out that architecture at 2am when the sync job fell behind and results went stale and nobody could figure out which system was wrong.

Every piece of that stack is something you provision, secure, back up, upgrade, monitor, and carry on-call. Every hop between systems is a place where results drift, eventual consistency bites you, and your latency budget gets eaten. Every six months, one of those systems has a breaking change, an incompatible update, or a pricing restructure, and the whole Jenga tower needs attention.

The pragmatic answer: **your data is almost certainly already in Postgres.** Your users, your content, your metadata, your access controls, your audit logs. So why ship all of it to three other systems just to search it?

You don't have to. Postgres now has what you need:

-   [**pg\_textsearch**](https://www.tigerdata.com/blog/introducing-pg_textsearch-true-bm25-ranking-hybrid-retrieval-postgres) for BM25 keyword search — proper ranked retrieval, not the old ts\_rank approximation
-   [**pgvectorscale**](https://www.tigerdata.com/blog/how-we-made-postgresql-the-best-vector-database) for vector similarity search with a [StreamingDiskANN index](https://www.tigerdata.com/blog/understanding-diskann) that doesn't require your entire index to fit in RAM
-   **Everything else you already know**: joins, filters, transactions, row-level security, time-based partitioning, and the SQL your whole team can read

Put those together and you get a database that can rank by keyword, rank by meaning, filter by any column in your schema, respect your access rules, and return a single fused result set. In one query. Against one system.

That's not a compromise. That's a better architecture.

## How hybrid search actually works (the short version)

The full step-by-step is in the [Tiger Data docs](https://www.tigerdata.com/docs/build/examples/hybrid-search), including a walkthrough using podcast transcripts from [Conduit](https://www.relay.fm/conduit): 12 episodes, real embeddings, about 30 minutes. Swap "podcast episodes" for "support tickets" and the pattern is identical.

Strip it down to the bones: one table, two indexes, two queries, a little math, one answer.

1.  **Store your content and its embedding side by side.** A single episodes table with a description column _and_ a [vector(1536)](https://www.tigerdata.com/learn/postgresql-extensions-pgvector) column. No sync jobs. No reconciliation.
2.  **Index both.** A BM25 index on the text column. A StreamingDiskANN index on the embedding column.
3.  **Run both searches.** BM25 returns a ranked list of keyword matches. Vector search returns a ranked list of semantic matches. Each produces, say, the top 20 candidates.
4.  **Fuse the rankings with RRF.** Reciprocal Rank Fusion scores each result by 1 / (k + rank) across both lists and sums the scores. An item that's #1 in both lists crushes an item that's only in one. An item that's #15 in one and missing from the other still gets a say. (The math here is simpler than it sounds; you're basically rewarding documents that placed well in both races.)
5.  **Return the fused list.** One query. One result set. Ranked by a score that rewards showing up in both.

The SQL for step 4:

```SQL
WITH bm25_results AS (
  SELECT id, ROW_NUMBER() OVER (
    ORDER BY description <@> 'mental health boundaries'
  ) AS rank
  FROM episodes
  ORDER BY description <@> 'mental health boundaries'
  LIMIT 20
),
vector_results AS (
  SELECT id, ROW_NUMBER() OVER (
    ORDER BY embedding <=> $1
  ) AS rank
  FROM episodes
  ORDER BY embedding <=> $1
  LIMIT 20
)
SELECT
  d.id, d.title,
  COALESCE(1.0 / (60 + b.rank), 0)
    + COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score
FROM episodes d
LEFT JOIN bm25_results b ON d.id = b.id
LEFT JOIN vector_results v ON d.id = v.id
WHERE b.id IS NOT NULL OR v.id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 10;
```

That's hybrid search. No sidecar service, no merge layer, no second database, no "enterprise search platform" subscription. Your finance team can thank you later.

## Who should be building this right now

The teams I'd push hardest toward this pattern are the ones whose search is quietly embarrassing them without anyone saying it out loud.

-   **Internal knowledge and support teams.** Your agents need to find "ticket #48291" (a keyword query) _and_ "something similar to this customer issue" (a semantic query) in the same tool. Building two separate search surfaces for the same data is a solvable problem, and you're paying for that complexity every sprint.
-   **RAG pipelines.** If your retrieval layer is only doing vector search, you're leaving quality on the table. [Hybrid retrieval consistently outperforms pure semantic retrieval](https://www.tigerdata.com/blog/building-a-rag-system-with-claude-postgresql-python-on-aws), especially for queries that include specific identifiers, product names, or technical terms the embedding model treats as just more tokens. Worth fixing before you spend another month tuning your prompts.
-   **Product catalogs and discovery.** A user searching "red running shoes under $80 that won't fall apart" is giving you structured filters, keyword cues, _and_ a vibe. BM25 handles the keyword cues, vectors handle the vibe, and a SQL WHERE clause handles the filter. All three, one query.

## A few honest caveats

`pg_textsearch` 1.0 doesn't support native phrase queries... yet. There's an over-fetch plus ILIKE workaround in the tutorial, and AND/OR/NOT operators are on the roadmap. BM25 indexes are single-column by default; you use a generated column to search across title + body. And no, a 100-million-row table isn't going to be free to embed. Cost and batching are still real things you have to think through.

But those are tuning problems, not architectural ones. A tuning problem inside one database is a much better problem than an integration problem across four. (If you'd rather not hand-roll the embedding pipeline, [pgai Vectorizer](https://www.tigerdata.com/blog/pgai-vectorizer-now-works-with-any-postgres-database) automates most of that cost-and-batching work.)

## Try it yourself

The step-by-step tutorial walks through the whole thing: [Build hybrid search with BM25 and vector similarity](https://www.tigerdata.com/docs/build/examples/hybrid-search). Real podcast transcripts, about 30 minutes end to end, works on [Tiger Cloud](https://console.cloud.timescale.com), Docker, or a local Postgres install.

If you want to go straight to code, [the companion repo is here](https://github.com/timescale/cookbook-search/tree/main/Hybrid-search). Clone it, point it at a Postgres instance with `pg_textsearch` and `pgvectorscale` installed, and you have a working hybrid search system one `psql -f setup.sql` away. And if you want a deeper implementation walkthrough — including pgai auto-sync so embeddings stay current without a separate pipeline — [this post covers that pattern end to end](https://www.tigerdata.com/blog/elasticsearchs-hybrid-search-now-in-postgres-bm25-vector-rrf).

If you want the deeper "why," these are the posts I'd read next:

-   [From ts\_rank to BM25: introducing pg\_textsearch](https://www.tigerdata.com/blog/introducing-pg_textsearch-true-bm25-ranking-hybrid-retrieval-postgres)
-   [pg\_textsearch 1.0: how we built a BM25 search engine on Postgres](https://www.tigerdata.com/blog/pg-textsearch-bm25-full-text-search-postgres)
-   [Hybrid search with TimescaleDB: vector, keyword, and temporal filtering](https://www.tigerdata.com/blog/hybrid-search-timescaledb-vector-keyword-temporal-filtering)
-   [Combining semantic search and full-text search in Postgres with Cohere, pgvector, and pgai](https://www.tigerdata.com/blog/combining-semantic-search-and-full-text-search-in-postgresql-with-cohere-pgvector-and-pgai)
-   [Understanding DiskANN](https://www.tigerdata.com/blog/understanding-diskann)

## The takeaway

Search is hard. It's hard because humans are imprecise, language is slippery, and the things we want are rarely the things we type. No computer is reading our minds yet (and honestly, I'm not sure I want one to). One retrieval method will never be enough.

The solution isn't a bigger stack. **It's a smarter query.**

Postgres, with `pg_textsearch` and `pgvectorscale`, lets you run keyword search and semantic search against the same table, fuse them with a few lines of SQL, and ship one system instead of four. That's not just operationally cheaper. It's a better developer experience, a better user experience, and a better night of sleep for whoever's on call (you can thank me later)

If your search is frustrating your users (or worse, quietly frustrating them without anyone saying anything), this is the version I'd build next.

Have you built hybrid search on Postgres? What interesting queries are you running? I want to hear about it.

Find me at [erin@tigerdata.com](mailto:erin@tigerdata.com), or drop a note on [LinkedIn](https://www.linkedin.com/in/erinmikail/) or [Bluesky](https://bsky.app/profile/erinmikail.bsky.social).