By Damaso Sanoja

14 min read

Feb 05, 2026

Table of contents

01 When All Search Methods Agree on the Wrong Answer 02 Why Does Vector Search Miss Exact Technical Terms?03 Why Does Text Search Miss Synonyms and Context?04 How Does Hybrid Search Combine Semantic and Keyword Results?05 How Does Time-Windowed Search Prevent Stale Results?06 How Do You Design a Schema for Hybrid Search?07 How Does Reciprocal Rank Fusion Combine Search Results?08 Running the Demo 09 How Do Vector, Text, and Hybrid Search Compare?10 Production Tuning 11 Choosing Your Search Strategy

Hybrid Search with TimescaleDB: Vector, Keyword, and Temporal Filtering

TimescaleDB

By Damaso Sanoja

14 min read

Feb 05, 2026

Table of contents

Vector databases revolutionized semantic search, but they solve only one-third of the retrieval problem. When developers search documentation for “authentication setup,” vector embeddings excel at understanding intent. Yet two critical gaps remain: keyword precision and temporal relevance.

For RAG applications, retrieval quality directly determines generation quality: feed an LLM deprecated documentation, and it will confidently generate outdated answers.

Consider a developer searching for “configure OAuth authentication with environment variables.” Vector search returns a general security guide. Text search returns a three-year-old changelog. Hybrid search surfaces an OAuth guide from 2021, before breaking changes in 2023.

Three methods, three wrong answers. Temporal filtering completes the solution.

This tutorial explores vector similarity, keyword matching, and recency scoring using PostgreSQL full-text search, pgvectorscale, and TimescaleDB temporal partitioning: a unified stack that eliminates the complexity of maintaining separate databases.

When All Search Methods Agree on the Wrong Answer

Developers expect hybrid search to outperform pure vector or text search, but that’s not always the case. The following demonstration uses a 150-document database emulating documentation for NovaCLI, a fictional CLI tool, with intentionally engineered failure scenarios. You can run these queries yourself; setup instructions follow below.

Query: “How to enable logging in NovaCLI”

We ran four different search methods on the same query and the same 150 documents. Check out the top-ranked document each one returned.

Vector search fails. Returns “Configuring Application Logging in NovaCLI” (v1.0, January 2023, deprecated). The embedding captured semantic similarity to “logging” and “NovaCLI,” but the v1.0 document was comprehensive and well-structured when written. Its embeddings remain strong.

Text search fails. Returns the same deprecated v1.0 document with a perfect keyword match score. The title contains every query term.

Hybrid search fails. RRF combines both rankings. When vector and text search agree, RRF amplifies their consensus. The wrong answer wins with higher confidence.

Hybrid + temporal succeeds. Returns “Structured Logging Configuration in NovaCLI v3.1” (October 2025). Temporal filtering restricted the search to documents published within the last 12 months. The deprecated v1.0 document never entered the ranking.

The v1.0 document wasn’t maliciously designed. It was genuinely comprehensive when published. The v3.1 document introduces JSON-structured logging with different terminology. From a pure relevance perspective, the old document arguably matches the query better.

That's precisely the problem. Relevance without recency may return wrong results. Deprecated APIs, breaking changes, and superseded workflows make old documentation problematic. In RAG pipelines, this stale context propagates to the generated response, producing hallucinations grounded in superseded information.

The logging query above exposed a consensus failure: all methods agreed on incorrect output. Understanding why requires exploring each method's blind spots.

Why Does Vector Search Miss Exact Technical Terms?

Vector search converts text into high-dimensional embeddings that capture semantic meaning. Documents about “logging” cluster near documents about “monitoring,” “observability,” and “debugging” because the concepts relate.

This semantic understanding becomes a liability for precision queries. Vector search cannot reliably distinguish “User ID 123” from “User ID 132” because their embeddings are mathematically similar. It struggles with:

Out-of-domain terminology: Acronyms, product names, and jargon absent from the embedding model’s training data.
Exact identifiers: UUIDs, error codes, version numbers.
Negation: “disable logging” and “enable logging” may produce similar embeddings.

Our sample query failed because the deprecated v1.0 document had stronger semantic alignment. Its comprehensive coverage of logging concepts created dense, well-formed embeddings. The newer v3.1 document, with its focus on JSON structure, is embedded differently.

Vector search fails on precision. Text search should handle exact terms, but it has its own blind spot.

Why Does Text Search Miss Synonyms and Context?

PostgreSQL full-text search tokenizes text into lexemes and matches them literally. If your query contains “database replication,” documents using “data mirroring” or “streaming replication” score zero despite describing identical concepts.

This vocabulary mismatch problem compounds with:

Synonym blindness: “auth” vs “authentication” vs “login”
Context ignorance: Cannot distinguish “Python logging” from “logging Python exceptions”
Keyword density bias: Shorter documents with concentrated keywords often outscore longer, more relevant documents

The demo query failed because the v1.0 title matched perfectly: “Configuring Application Logging in NovaCLI” contains every query term. The v3.1 title “Structured Logging Configuration” introduces “Structured,” diluting the keyword density. This happens because standard ts_rank without normalization favors keyword-stuffed content.

Timescale's pg_textsearch extension addresses that limitation by providing BM25 ranking, which normalizes for document length. BM25 penalizes keyword density in shorter documents while rewarding natural term distribution.

Here's how BM25 search performs on the same query:

BM25 alone still misses the target answer at rank #1, but notice the score distribution: BM25's normalized scores (3.598, 3.564, 3.445) differ significantly from the raw ts_rank scores shown earlier. The ranking changes because BM25 accounts for document length and term frequency saturation. Yet keyword-based ranking alone cannot capture semantic intent.

This brings us to the core question: if BM25 improves text search and vector search captures semantics, how does hybrid search merge these signals, and when does that fusion succeed or fail?

How Does Hybrid Search Combine Semantic and Keyword Results?

Hybrid search executes vector similarity and text matching in parallel, then merges results using Reciprocal Rank Fusion (RRF). The formula:

RRF_score = 1/(k + rank_vector) + 1/(k + rank_text)

RRF normalizes rankings by position rather than raw scores. A document ranked #1 by vector search and #5 by text search receives a combined score reflecting both placements. The constant k (typically 60) balances the influence between top-ranked and moderately-ranked results.

This approach helps when methods disagree. If vector search prefers Document A and text search prefers Document B, but both rank Document C moderately well, RRF surfaces the consensus candidate.

The logging query failed because both methods agreed. RRF cannot rescue consensus failures. When vector and text search both rank the deprecated document #1, RRF reinforces the error with higher confidence.

However, improving the text search component changes what RRF has to work with. When using BM25 instead of standard ts_rank:

The target answer now appears at rank #2 (tied at 0.016), demonstrating how BM25's length normalization prevents keyword-stuffed documents from dominating the text search component. This allows the vector signal to influence the final ranking more effectively.

Nevertheless, even with improved text search, outdated documents may dominate rankings when their content genuinely matches query intent better than current alternatives.

Time-windowed search addresses this by restricting candidates to recent documents before any ranking occurs.

How Does Time-Windowed Search Prevent Stale Results?

Standard hybrid search ranks by relevance alone. A 2019 configuration guide can outrank the 2024 version if its content better matches the query.

Time-windowed hybrid search restricts scope at the database level, not through post-filtering. TimescaleDB’s hypertable partitioning by timestamp enables the query planner to skip entire chunks of old data:

-- See: src/search.py (search_hybrid_temporal function)
WHERE published_date >= NOW() - INTERVAL '12 months'

This clause triggers chunk exclusion. Partitions outside the time window are never scanned, reducing I/O while guaranteeing only current content enters the ranking pipeline.

Chunk exclusion also reduces query latency. On our small 150-document database:

Hybrid + temporal matched hybrid’s speed while returning correct results. On larger databases with millions of documents, partition pruning compounds these gains; queries skip entire chunks rather than filtering rows post-retrieval.

Summing up, the logging query succeeded with temporal filtering because the v1.0 document (January 2023) fell outside the 12-month window. With the trap removed, the v3.1 document rose to #1, demonstrating why time is a non-negotiable constraint in some use cases. Regardless of semantic similarity or keyword precision, documentation searches require current information. Temporal filtering encodes this requirement into the query itself.

Understanding why each method fails points directly to the implementation requirements: a schema that supports all three dimensions, indexes optimized for each query pattern, and SQL that combines them efficiently.

How Do You Design a Schema for Hybrid Search?

The documents table in our demo app combines vector, text, and temporal dimensions in a single schema:

-- See: sql/01_create_schema.sql
-- Note: Simplified schema shown below. Production schema includes additional fields
-- for trap quartet methodology (trap_set, trap_type) and metadata (tags, deprecation_note).
CREATE TABLE documents (
    id TEXT NOT NULL,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    category TEXT,
    version TEXT,

    -- Dual timestamp architecture
    created_at TIMESTAMPTZ NOT NULL,      -- For TimescaleDB partitioning
    published_date TIMESTAMPTZ,           -- For temporal filtering

    -- Full-text search (generated column)
    search_vector TSVECTOR GENERATED ALWAYS AS (
        setweight(to_tsvector('english', COALESCE(title, '')), 'A') ||
        setweight(to_tsvector('english', COALESCE(body, '')), 'B')
    ) STORED,

    -- Vector embedding (768-dim from MPNet)
    embedding VECTOR(768),

    PRIMARY KEY (id, created_at)
);

-- Convert to hypertable (6-month chunks)
SELECT create_hypertable('documents', 'created_at',
                         chunk_time_interval => INTERVAL '6 months');

It's worth mentioning that we used a dual timestamp architecture to separate partitioning from filtering:

created_at (fixed): Set once at insertion. TimescaleDB uses this column for hypertable partitioning. Never changes after initial load.
published_date (dynamic): Used for temporal filtering in queries. Can be updated when content is refreshed or republished.

The primary key includes created_at, enabling time-range queries to skip entire partitions without scanning them.

The generated search_vector column assigns weight ‘A’ to titles and weight ‘B’ to body content. PostgreSQL automatically maintains this column when the title or body changes. Embeddings are generated from title and body concatenation, explicitly excluding metadata fields irrelevant to semantic search in our demonstration.

What Indexes Does Hybrid Search Require?

For our demo app, two index types enabled performant hybrid search on the same table:

-- See: sql/02_create_indexes.sql
-- Note: Simplified index creation shown below. Production indexes include tuning parameters
-- for DiskANN (num_neighbors, search_list_size, max_alpha, num_dimensions, num_bits_per_dimension).

-- Vector similarity index (pgvectorscale)
CREATE INDEX documents_embedding_idx ON documents USING diskann (embedding);

-- Full-text search index (PostgreSQL)
CREATE INDEX documents_search_vector_idx ON documents USING GIN (search_vector);

These indexes coexist without conflicts because they serve orthogonal query patterns. PostgreSQL’s query planner treats them as independent access paths, selecting the appropriate index based on query predicates.

GIN (Generalized Inverted Index) on search_vector builds an inverted index where each unique lexeme points to the documents containing it. When you search for “authentication setup,” PostgreSQL stems both terms, then looks up which documents contain those lexemes. The ‘A’ and ‘B’ weights propagate into the index, influencing ranking without requiring separate indexes.

DiskANN on embedding builds an approximate nearest neighbor graph that keeps most data on disk rather than in memory. Unlike HNSW indexes that require the entire graph in RAM, DiskANN scales to millions of vectors on memory-constrained environments (4-8GB RAM), maintaining sub-50ms query performance.

Hybrid queries execute both index scans in parallel. Each scan runs concurrently, returning its ranked result set to the application layer, where RRF combines them.

Adapting to Your Domain

We embedded the title and body because documentation queries target content. Your domain may require different fields; embed what users mentally query against.

Analyze search logs to identify which fields appear in successful queries, then weight those higher in search_vector.
Match chunk size to data distribution. Documentation updates arrive steadily (6-month chunks). Real-time logs need hourly chunks. Quarterly reports align with fiscal periods.
Consider recency direction. Documentation benefits from recency bias; historical research may require the opposite.

With the schema and indexes in place, the next step is constructing the query that combines both search methods.

How Does Reciprocal Rank Fusion Combine Search Results?

Hybrid search executes vector and text queries as separate Common Table Expressions (CTEs), then merges rankings with RRF:

-- See: src/search.py (search_hybrid function, lines 155-214)
WITH vector_search AS (
    SELECT
        id, title, body, version, created_at, published_date,
        ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rank
    FROM documents
    ORDER BY embedding <=> $1::vector
    LIMIT 20
),
text_search AS (
    SELECT
        id, title, body, version, created_at, published_date,
        ROW_NUMBER() OVER (
            ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', $2)) DESC
        ) AS rank
    FROM documents
    WHERE search_vector @@ websearch_to_tsquery('english', $2)
    ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', $2)) DESC
    LIMIT 20
),
combined AS (
    SELECT
        COALESCE(v.id, t.id) AS id,
        COALESCE(v.title, t.title) AS title,
        COALESCE(v.body, t.body) AS body,
        COALESCE(v.version, t.version) AS version,
        COALESCE(v.created_at, t.created_at) AS created_at,
        COALESCE(v.published_date, t.published_date) AS published_date,
        -- RRF scoring: 1/(60 + rank) with equal weights (0.5 each)
        COALESCE(1.0 / (60 + v.rank), 0.0) * 0.5 +
        COALESCE(1.0 / (60 + t.rank), 0.0) * 0.5 AS score
    FROM vector_search v
    FULL OUTER JOIN text_search t ON v.id = t.id
)
SELECT * FROM combined
ORDER BY score DESC
LIMIT 5;

Parameters:

$1: Query embedding vector (768 dimensions)
$2: Query text for full-text search

RRF formula: 1.0 / (60 + rank) assigns each document a score based on position rather than raw similarity. The constant 60 dampens rank differences, preventing a #1 result from completely dominating. Documents found by both methods receive combined scores; documents found by only one method receive partial credit via COALESCE.

Why LIMIT 20 in CTEs: RRF can rerank results. A document ranked #15 in vector search and #3 in text search might win overall. Retrieving 20 candidates from each method provides sufficient headroom before the final LIMIT 5.

How Do You Add Temporal Filtering to Hybrid Search?

Add a WHERE clause to both CTEs:

-- See: src/search.py (search_hybrid_temporal function, lines 270-330)
WITH vector_search AS (
    SELECT
        id, title, body, version, created_at, published_date,
        ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rank
    FROM documents
    WHERE published_date >= NOW() - INTERVAL '12 months'
    ORDER BY embedding <=> $1::vector
    LIMIT 20
),
text_search AS (
    SELECT
        id, title, body, version, created_at, published_date,
        ROW_NUMBER() OVER (
            ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', $2)) DESC
        ) AS rank
    FROM documents
    WHERE published_date >= NOW() - INTERVAL '12 months'
      AND search_vector @@ websearch_to_tsquery('english', $2)
    ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', $2)) DESC
    LIMIT 20
),

The temporal filter applies before ranking, not after. Documents outside the time window never enter the candidate pool. TimescaleDB’s chunk exclusion skips entire partitions when filtering on the partitioning column (created_at), but filtering on published_date still benefits from index usage on that column.

Time window selection:

Documentation: 12-24 months (captures current major version)
Security advisories: 6 months (active threats only)
Compliance logs: 7 years (regulatory retention)
News/events: 30-90 days (recency critical)

With the schema, indexes, and query patterns in place, you can explore these behaviors yourself using the demo application.

Running the Demo

Clone the repository and restore the pre-built database:

# See: setup_demo.sh and run_demo.sh in repository root
git clone https://github.com/timescale/TimescaleDB-HybridSearch
cd TimescaleDB-HybridSearch

# Configure database connection
cp .env.example .env
# Edit .env with your DATABASE_URL

# Run automated setup (creates venv, installs dependencies, restores database)
./setup_demo.sh

# Launch the interactive demo
./run_demo.sh

The demo loads a sentence-transformer model (all-mpnet-base-v2) once at startup (~5 seconds). Enter any query at the prompt. For convenience, embeddings generate locally on-the-fly rather than using remote API calls to LLM models, which is usual in production.

All four search methods run in parallel against the same query.

Deployment options:

Local/K3s: If you completed our Kubernetes tutorial, simply forward the port to your TimescaleDB service
Tiger Cloud: Sign up for free and use the provided connection string.

See the repository README for detailed setup instructions.

The query at the start of the tutorial demonstrated one failure pattern: consensus failure, where all methods agreed on the wrong answer. But hybrid search can fail in other ways, and understanding these patterns helps you anticipate when temporal filtering alone won’t save you.

How Do Vector, Text, and Hybrid Search Compare?

Each search method excels in specific scenarios and fails in others:

Method	Index Type	Best Use Case	Primary Weakness
Vector Search	DiskANN (pgvectorscale)	Conceptual queries, semantic understanding	Keyword insensitivity; fails on exact technical terms
Text Search	GIN (tsvector)	Exact keyword matches, technical terms	Cannot handle synonyms or context
Hybrid Search	Both (RRF combination)	Queries requiring semantic + keyword precision	Cannot distinguish outdated from current docs
Hybrid + Temporal	Both + TimescaleDB partitioning	Current documentation, time-sensitive content	Excludes historical information by design

The following case studies demonstrate these failure modes in practice.

Case Study 1: When Vector Search Succeeds But Hybrid Fails

Query: “How to enable SCRAM-SHA-256 authentication in NovaCLI”

This query demonstrates a counterintuitive failure: vector search finds the correct answer, but hybrid search returns a deprecated document.

Vector search succeeds (rank #1, score 0.693):

The embedding captures semantic intent. Vector search ranks the current v3.1 configuration guide first.

Text search fails (rank #1, score 1.000):

Text search finds perfect lexical matches, ranking the deprecated v1.0 document first with a perfect score. The trap document title is shorter and keyword-dense (4 tokens); the target includes additional context (6 tokens). Standard ts_rank without normalization sums matching lexeme weights, favoring shorter, keyword-stuffed documents.

Hybrid search fails (rank #1, RRF score 0.016):

RRF combines both rankings, but text search’s failure creates a veto effect. Even though vector search correctly identified the best document, text search’s strong preference for the deprecated document pulled the hybrid result toward the wrong answer.

How to fix the veto effect:

Apply ts_rank normalization (constant 32) to penalize keyword-stuffed documents
Use BM25 ranking via pg_textsearch for natural length normalization
Weight RRF to favor vector search (0.7/0.3)
Filter deprecated documents before ranking

Case Study 2: When Hybrid Search Succeeds

Query: “How to configure NovaCLI”

This query demonstrates hybrid search’s intended behavior: when neither method is confident individually, RRF combines its weak signals to surface the correct answer.

Vector search fails (rank #5, score 0.704):

Vector search returns a broad getting-started guide at rank #1. The correct document “TOML Configuration Guide” (v3.1) appears at rank #5.

Text search fails (rank #3, score 0.996):

Text search finds keyword matches for “configuration” and “NovaCLI,” but ranks the deprecated v2.0 YAML guide at #1. The correct v3.1 TOML guide appears at rank #3.

Hybrid search succeeds (rank #1, RRF score 0.016):

The TOML guide wasn’t the best match for either method individually (rank #5 and #3), but it scored moderately well in both rankings. When both methods point toward the same document despite neither being certain, that convergence carries information. RRF amplifies this consensus.

Key Takeaways:

Vector search won the authentication query but failed on the configuration query.
Text search consistently failed when deprecated documents matched keywords perfectly.
Hybrid search rescued the configuration query but amplified errors when text search failed catastrophically.
Only temporal filtering eliminated the consensus failure mode where both methods agreed on the wrong answer.

Knowing when each method fails is diagnostic. Tuning your implementation to minimize those failures is the next step.

Production Tuning

RRF Weighting

The default 50/50 RRF weighting suits balanced workloads. Adjust when one method consistently outperforms:

Customer service (users say “the app crashed” not “exception thrown”): Weight vector 0.7, text 0.3
Code search (developers type exact function names): Weight text 0.7, vector 0.3
Documentation search (mixed terminology): Keep 50/50 or slight vector bias

Time Window Selection

Match time windows to content lifecycle:

Documentation: 12-24 months (captures current major version)
Security advisories: 6 months (active threats only)
Financial audit logs: 7 years (regulatory retention)
Fraud detection: 90 days (recent patterns)
CVE historical analysis: No limit (full archive needed)

Chunk Sizing

TimescaleDB chunk size affects partition pruning efficiency:

Real-time observability logs (minute-level ingestion): Hourly chunks (INTERVAL '1 hour')
Documentation (steady updates): 6-month chunks (INTERVAL '6 months')
Quarterly financial reports: Fiscal period alignment (INTERVAL '3 months')

Query Routing

Not every query needs hybrid search:

Exact-match lookups (UUIDs, transaction IDs, SKUs): Route directly to text search; vector computation wastes cycles
Exploratory browsing in homogeneous databases: Vector-only when keyword precision adds marginal value
Time-critical queries: Always include temporal filtering for documentation, support tickets, and any content with version lifecycles

These routing rules provide starting points. The right strategy depends on your specific use case and user behavior.

Choosing Your Search Strategy

No single method dominates. Vector search excels at exploratory queries where exact terminology is unknown. Text search handles exact term lookups like API method names or error codes. Hybrid search applies when queries combine technical terms with semantic intent. Hybrid + temporal becomes necessary when content freshness determines correctness.

Monitor your search logs to measure which queries succeed, which fail, and why. The demo repository includes the full implementation; experiment with RRF weights, temporal windows, and ranking functions to find what works for your database and user behavior. For RAG and AI search applications, this combination of semantic understanding, keyword precision, and temporal awareness provides the retrieval foundation that keeps generated answers current.

About the Author: Damaso is a Technical Content Writer and Content Engineer with over 20 years of hands-on IT experience. He specializes in translating deep technical expertise in Kubernetes, DevOps, CI/CD, and AI/ML into practical content for developers, architects, and IT leaders. Learn more: damasosanoja.com

It’s 2026, Just Use Postgres

PostgreSQL TimescaleDB

Feb 02, 2026

Stop managing multiple databases. Postgres extensions replace Elasticsearch, Pinecone, Redis, MongoDB, and InfluxDB with BM25, vectors, JSONB, and time-series in one database.

TimescaleDB for Manufacturing IoT: Optimizing for High-Volume Production Data

IoT TimescaleDB

Jan 28, 2026

Manufacturing IoT optimization guide: tune TimescaleDB with hypertables, chunk intervals, continuous aggregates, and compression for high-frequency sensor data.

Stay updated with new posts and releases.

Receive the latest technical articles and release notes in your inbox.

By Damaso Sanoja

14 min read

Feb 05, 2026

TimescaleDB

Table of contents

Get Started Free with Tiger CLI

Hybrid Search with TimescaleDB: Vector, Keyword, and Temporal Filtering

TimescaleDB

By Damaso Sanoja

14 min read

Feb 05, 2026

Table of contents

For RAG applications, retrieval quality directly determines generation quality: feed an LLM deprecated documentation, and it will confidently generate outdated answers.

Three methods, three wrong answers. Temporal filtering completes the solution.

When All Search Methods Agree on the Wrong Answer

Query: “How to enable logging in NovaCLI”

We ran four different search methods on the same query and the same 150 documents. Check out the top-ranked document each one returned.

Text search fails. Returns the same deprecated v1.0 document with a perfect keyword match score. The title contains every query term.

Hybrid search fails. RRF combines both rankings. When vector and text search agree, RRF amplifies their consensus. The wrong answer wins with higher confidence.

The logging query above exposed a consensus failure: all methods agreed on incorrect output. Understanding why requires exploring each method's blind spots.

Why Does Vector Search Miss Exact Technical Terms?

Out-of-domain terminology: Acronyms, product names, and jargon absent from the embedding model’s training data.
Exact identifiers: UUIDs, error codes, version numbers.
Negation: “disable logging” and “enable logging” may produce similar embeddings.

Vector search fails on precision. Text search should handle exact terms, but it has its own blind spot.

Why Does Text Search Miss Synonyms and Context?

This vocabulary mismatch problem compounds with:

Synonym blindness: “auth” vs “authentication” vs “login”
Context ignorance: Cannot distinguish “Python logging” from “logging Python exceptions”
Keyword density bias: Shorter documents with concentrated keywords often outscore longer, more relevant documents

Here's how BM25 search performs on the same query:

This brings us to the core question: if BM25 improves text search and vector search captures semantics, how does hybrid search merge these signals, and when does that fusion succeed or fail?

How Does Hybrid Search Combine Semantic and Keyword Results?

Hybrid search executes vector similarity and text matching in parallel, then merges results using Reciprocal Rank Fusion (RRF). The formula:

RRF_score = 1/(k + rank_vector) + 1/(k + rank_text)

This approach helps when methods disagree. If vector search prefers Document A and text search prefers Document B, but both rank Document C moderately well, RRF surfaces the consensus candidate.

However, improving the text search component changes what RRF has to work with. When using BM25 instead of standard ts_rank:

Nevertheless, even with improved text search, outdated documents may dominate rankings when their content genuinely matches query intent better than current alternatives.

Time-windowed search addresses this by restricting candidates to recent documents before any ranking occurs.

How Does Time-Windowed Search Prevent Stale Results?

Standard hybrid search ranks by relevance alone. A 2019 configuration guide can outrank the 2024 version if its content better matches the query.

-- See: src/search.py (search_hybrid_temporal function)
WHERE published_date >= NOW() - INTERVAL '12 months'

This clause triggers chunk exclusion. Partitions outside the time window are never scanned, reducing I/O while guaranteeing only current content enters the ranking pipeline.

Chunk exclusion also reduces query latency. On our small 150-document database:

How Do You Design a Schema for Hybrid Search?

The documents table in our demo app combines vector, text, and temporal dimensions in a single schema:

-- See: sql/01_create_schema.sql
-- Note: Simplified schema shown below. Production schema includes additional fields
-- for trap quartet methodology (trap_set, trap_type) and metadata (tags, deprecation_note).
CREATE TABLE documents (
    id TEXT NOT NULL,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    category TEXT,
    version TEXT,

    -- Dual timestamp architecture
    created_at TIMESTAMPTZ NOT NULL,      -- For TimescaleDB partitioning
    published_date TIMESTAMPTZ,           -- For temporal filtering

    -- Full-text search (generated column)
    search_vector TSVECTOR GENERATED ALWAYS AS (
        setweight(to_tsvector('english', COALESCE(title, '')), 'A') ||
        setweight(to_tsvector('english', COALESCE(body, '')), 'B')
    ) STORED,

    -- Vector embedding (768-dim from MPNet)
    embedding VECTOR(768),

    PRIMARY KEY (id, created_at)
);

-- Convert to hypertable (6-month chunks)
SELECT create_hypertable('documents', 'created_at',
                         chunk_time_interval => INTERVAL '6 months');

It's worth mentioning that we used a dual timestamp architecture to separate partitioning from filtering:

created_at (fixed): Set once at insertion. TimescaleDB uses this column for hypertable partitioning. Never changes after initial load.
published_date (dynamic): Used for temporal filtering in queries. Can be updated when content is refreshed or republished.

The primary key includes created_at, enabling time-range queries to skip entire partitions without scanning them.

What Indexes Does Hybrid Search Require?

For our demo app, two index types enabled performant hybrid search on the same table:

-- See: sql/02_create_indexes.sql
-- Note: Simplified index creation shown below. Production indexes include tuning parameters
-- for DiskANN (num_neighbors, search_list_size, max_alpha, num_dimensions, num_bits_per_dimension).

-- Vector similarity index (pgvectorscale)
CREATE INDEX documents_embedding_idx ON documents USING diskann (embedding);

-- Full-text search index (PostgreSQL)
CREATE INDEX documents_search_vector_idx ON documents USING GIN (search_vector);

Hybrid queries execute both index scans in parallel. Each scan runs concurrently, returning its ranked result set to the application layer, where RRF combines them.

Adapting to Your Domain

We embedded the title and body because documentation queries target content. Your domain may require different fields; embed what users mentally query against.

Analyze search logs to identify which fields appear in successful queries, then weight those higher in search_vector.
Match chunk size to data distribution. Documentation updates arrive steadily (6-month chunks). Real-time logs need hourly chunks. Quarterly reports align with fiscal periods.
Consider recency direction. Documentation benefits from recency bias; historical research may require the opposite.

With the schema and indexes in place, the next step is constructing the query that combines both search methods.

How Does Reciprocal Rank Fusion Combine Search Results?

Hybrid search executes vector and text queries as separate Common Table Expressions (CTEs), then merges rankings with RRF:

-- See: src/search.py (search_hybrid function, lines 155-214)
WITH vector_search AS (
    SELECT
        id, title, body, version, created_at, published_date,
        ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rank
    FROM documents
    ORDER BY embedding <=> $1::vector
    LIMIT 20
),
text_search AS (
    SELECT
        id, title, body, version, created_at, published_date,
        ROW_NUMBER() OVER (
            ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', $2)) DESC
        ) AS rank
    FROM documents
    WHERE search_vector @@ websearch_to_tsquery('english', $2)
    ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', $2)) DESC
    LIMIT 20
),
combined AS (
    SELECT
        COALESCE(v.id, t.id) AS id,
        COALESCE(v.title, t.title) AS title,
        COALESCE(v.body, t.body) AS body,
        COALESCE(v.version, t.version) AS version,
        COALESCE(v.created_at, t.created_at) AS created_at,
        COALESCE(v.published_date, t.published_date) AS published_date,
        -- RRF scoring: 1/(60 + rank) with equal weights (0.5 each)
        COALESCE(1.0 / (60 + v.rank), 0.0) * 0.5 +
        COALESCE(1.0 / (60 + t.rank), 0.0) * 0.5 AS score
    FROM vector_search v
    FULL OUTER JOIN text_search t ON v.id = t.id
)
SELECT * FROM combined
ORDER BY score DESC
LIMIT 5;

Parameters:

$1: Query embedding vector (768 dimensions)
$2: Query text for full-text search

How Do You Add Temporal Filtering to Hybrid Search?

Add a WHERE clause to both CTEs:

-- See: src/search.py (search_hybrid_temporal function, lines 270-330)
WITH vector_search AS (
    SELECT
        id, title, body, version, created_at, published_date,
        ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rank
    FROM documents
    WHERE published_date >= NOW() - INTERVAL '12 months'
    ORDER BY embedding <=> $1::vector
    LIMIT 20
),
text_search AS (
    SELECT
        id, title, body, version, created_at, published_date,
        ROW_NUMBER() OVER (
            ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', $2)) DESC
        ) AS rank
    FROM documents
    WHERE published_date >= NOW() - INTERVAL '12 months'
      AND search_vector @@ websearch_to_tsquery('english', $2)
    ORDER BY ts_rank(search_vector, websearch_to_tsquery('english', $2)) DESC
    LIMIT 20
),

Time window selection:

Documentation: 12-24 months (captures current major version)
Security advisories: 6 months (active threats only)
Compliance logs: 7 years (regulatory retention)
News/events: 30-90 days (recency critical)

With the schema, indexes, and query patterns in place, you can explore these behaviors yourself using the demo application.

Running the Demo

Clone the repository and restore the pre-built database:

# See: setup_demo.sh and run_demo.sh in repository root
git clone https://github.com/timescale/TimescaleDB-HybridSearch
cd TimescaleDB-HybridSearch

# Configure database connection
cp .env.example .env
# Edit .env with your DATABASE_URL

# Run automated setup (creates venv, installs dependencies, restores database)
./setup_demo.sh

# Launch the interactive demo
./run_demo.sh

All four search methods run in parallel against the same query.

Deployment options:

Local/K3s: If you completed our Kubernetes tutorial, simply forward the port to your TimescaleDB service
Tiger Cloud: Sign up for free and use the provided connection string.

See the repository README for detailed setup instructions.

How Do Vector, Text, and Hybrid Search Compare?

Each search method excels in specific scenarios and fails in others:

Method	Index Type	Best Use Case	Primary Weakness
Vector Search	DiskANN (pgvectorscale)	Conceptual queries, semantic understanding	Keyword insensitivity; fails on exact technical terms
Text Search	GIN (tsvector)	Exact keyword matches, technical terms	Cannot handle synonyms or context
Hybrid Search	Both (RRF combination)	Queries requiring semantic + keyword precision	Cannot distinguish outdated from current docs
Hybrid + Temporal	Both + TimescaleDB partitioning	Current documentation, time-sensitive content	Excludes historical information by design

The following case studies demonstrate these failure modes in practice.

Case Study 1: When Vector Search Succeeds But Hybrid Fails

Query: “How to enable SCRAM-SHA-256 authentication in NovaCLI”

This query demonstrates a counterintuitive failure: vector search finds the correct answer, but hybrid search returns a deprecated document.

Vector search succeeds (rank #1, score 0.693):

The embedding captures semantic intent. Vector search ranks the current v3.1 configuration guide first.

Text search fails (rank #1, score 1.000):

Hybrid search fails (rank #1, RRF score 0.016):

How to fix the veto effect:

Apply ts_rank normalization (constant 32) to penalize keyword-stuffed documents
Use BM25 ranking via pg_textsearch for natural length normalization
Weight RRF to favor vector search (0.7/0.3)
Filter deprecated documents before ranking

Case Study 2: When Hybrid Search Succeeds

Query: “How to configure NovaCLI”

This query demonstrates hybrid search’s intended behavior: when neither method is confident individually, RRF combines its weak signals to surface the correct answer.

Vector search fails (rank #5, score 0.704):

Vector search returns a broad getting-started guide at rank #1. The correct document “TOML Configuration Guide” (v3.1) appears at rank #5.

Text search fails (rank #3, score 0.996):

Text search finds keyword matches for “configuration” and “NovaCLI,” but ranks the deprecated v2.0 YAML guide at #1. The correct v3.1 TOML guide appears at rank #3.

Hybrid search succeeds (rank #1, RRF score 0.016):

Key Takeaways:

Vector search won the authentication query but failed on the configuration query.
Text search consistently failed when deprecated documents matched keywords perfectly.
Hybrid search rescued the configuration query but amplified errors when text search failed catastrophically.
Only temporal filtering eliminated the consensus failure mode where both methods agreed on the wrong answer.

Knowing when each method fails is diagnostic. Tuning your implementation to minimize those failures is the next step.

Production Tuning

RRF Weighting

The default 50/50 RRF weighting suits balanced workloads. Adjust when one method consistently outperforms:

Customer service (users say “the app crashed” not “exception thrown”): Weight vector 0.7, text 0.3
Code search (developers type exact function names): Weight text 0.7, vector 0.3
Documentation search (mixed terminology): Keep 50/50 or slight vector bias

Time Window Selection

Match time windows to content lifecycle:

Documentation: 12-24 months (captures current major version)
Security advisories: 6 months (active threats only)
Financial audit logs: 7 years (regulatory retention)
Fraud detection: 90 days (recent patterns)
CVE historical analysis: No limit (full archive needed)

Chunk Sizing

TimescaleDB chunk size affects partition pruning efficiency:

Real-time observability logs (minute-level ingestion): Hourly chunks (INTERVAL '1 hour')
Documentation (steady updates): 6-month chunks (INTERVAL '6 months')
Quarterly financial reports: Fiscal period alignment (INTERVAL '3 months')

Query Routing

Not every query needs hybrid search:

Exact-match lookups (UUIDs, transaction IDs, SKUs): Route directly to text search; vector computation wastes cycles
Exploratory browsing in homogeneous databases: Vector-only when keyword precision adds marginal value
Time-critical queries: Always include temporal filtering for documentation, support tickets, and any content with version lifecycles

These routing rules provide starting points. The right strategy depends on your specific use case and user behavior.

Choosing Your Search Strategy

It’s 2026, Just Use Postgres

PostgreSQL TimescaleDB

Feb 02, 2026

Stop managing multiple databases. Postgres extensions replace Elasticsearch, Pinecone, Redis, MongoDB, and InfluxDB with BM25, vectors, JSONB, and time-series in one database.

TimescaleDB for Manufacturing IoT: Optimizing for High-Volume Production Data

IoT TimescaleDB

Jan 28, 2026

Manufacturing IoT optimization guide: tune TimescaleDB with hypertables, chunk intervals, continuous aggregates, and compression for high-frequency sensor data.

Stay updated with new posts and releases.

Receive the latest technical articles and release notes in your inbox.

Hybrid Search with TimescaleDB: Vector, Keyword, and Temporal Filtering

When All Search Methods Agree on the Wrong Answer

Why Does Vector Search Miss Exact Technical Terms?

Why Does Text Search Miss Synonyms and Context?

How Does Hybrid Search Combine Semantic and Keyword Results?

How Does Time-Windowed Search Prevent Stale Results?

How Do You Design a Schema for Hybrid Search?

What Indexes Does Hybrid Search Require?

Adapting to Your Domain

How Does Reciprocal Rank Fusion Combine Search Results?

How Do You Add Temporal Filtering to Hybrid Search?

Running the Demo

How Do Vector, Text, and Hybrid Search Compare?

Case Study 1: When Vector Search Succeeds But Hybrid Fails

Case Study 2: When Hybrid Search Succeeds

Production Tuning

RRF Weighting

Time Window Selection

Chunk Sizing

Query Routing

Choosing Your Search Strategy

Related posts

Stay updated with new posts and releases.

Hybrid Search with TimescaleDB: Vector, Keyword, and Temporal Filtering

When All Search Methods Agree on the Wrong Answer

Why Does Vector Search Miss Exact Technical Terms?

Why Does Text Search Miss Synonyms and Context?

How Does Hybrid Search Combine Semantic and Keyword Results?

How Does Time-Windowed Search Prevent Stale Results?

How Do You Design a Schema for Hybrid Search?

What Indexes Does Hybrid Search Require?

Adapting to Your Domain

How Does Reciprocal Rank Fusion Combine Search Results?

How Do You Add Temporal Filtering to Hybrid Search?

Running the Demo

How Do Vector, Text, and Hybrid Search Compare?

Case Study 1: When Vector Search Succeeds But Hybrid Fails

Case Study 2: When Hybrid Search Succeeds

Production Tuning

RRF Weighting

Time Window Selection

Chunk Sizing

Query Routing

Choosing Your Search Strategy

Related posts

Stay updated with new posts and releases.