Tiger Cloud: Performance, Scale, Enterprise, Free
Self-hosted products
MST
Postgres full-text search at scale consistently hits a wall where performance degrades catastrophically.
Tiger Data's pg_textsearch brings modern BM25
-based full-text search directly into Postgres,
with a memtable architecture for efficient indexing and ranking.
pg_textsearch integrates seamlessly with SQL and
provides better search quality and performance than the Postgres built-in full-text search. With Block-Max WAND optimization,
pg_textsearch delivers up to 4x faster top-k queries compared to native BM25 implementations. Parallel index builds
reduce indexing times by 4x or more for large tables. Advanced compression using delta encoding and bitpacking reduces
index sizes by 41% while improving query performance by 10-20% for shorter queries.
BM25 scores in pg_textsearch are returned as negative values, where lower (more negative) numbers indicate better
matches. pg_textsearch implements the following:
- Corpus-aware ranking: BM25 uses inverse document frequency to weight rare terms higher
- Term frequency saturation: prevents documents with excessive term repetition from dominating results
- Length normalization: adjusts scores based on document length relative to corpus average
- Relative ranking: focuses on rank order rather than absolute score values
This page shows you how to install pg_textsearch, configure BM25 indexes, and optimize your search capabilities using
the following best practices:
- Parallel indexing: enable parallel workers for faster index creation on large tables
- Language configuration: choose appropriate text search configurations for your data language
- Hybrid search: combine with pgvector or pgvectorscale for applications requiring both semantic and keyword search
- Query optimization: use score thresholds to filter low-relevance results
- Index monitoring: regularly check index usage and memory consumption
To follow the steps on this page:
Create a target Tiger Cloud service with the Real-time analytics capability.
You need your connection details. This procedure also works for self-hosted TimescaleDB.
To install this Postgres extension:
Connect to your Tiger Cloud service
In Tiger Console
open an SQL editor. You can also connect to your service using psql.
Enable the extension on your Tiger Cloud service
For new services, simply enable the extension:
CREATE EXTENSION pg_textsearch;For existing services, update your instance, then enable the extension:
The extension may not be available until after your next scheduled maintenance window. To pick up the update immediately, manually pause and restart your service.
Verify the installation
SELECT * FROM pg_extension WHERE extname = 'pg_textsearch';
You have installed pg_textsearch on Tiger Cloud.
BM25 indexes provide modern relevance ranking that outperforms Postgres's built-in ts_rank functions by using corpus statistics and better algorithmic design.
To create a BM25 index with pg_textsearch:
Create a table with text content
CREATE TABLE products (id serial PRIMARY KEY,name text,description text,category text,price numeric);Insert sample data
INSERT INTO products (name, description, category, price) VALUES('Mechanical Keyboard', 'Durable mechanical switches with RGB backlighting for gaming and productivity', 'Electronics', 149.99),('Ergonomic Mouse', 'Wireless mouse with ergonomic design to reduce wrist strain during long work sessions', 'Electronics', 79.99),('Standing Desk', 'Adjustable height desk for better posture and productivity throughout the workday', 'Furniture', 599.99);Create a BM25 index
CREATE INDEX products_search_idx ON productsUSING bm25(description)WITH (text_config='english');BM25 supports single-column indexes only. For optimal performance, load your data first, then create the index.
You have created a BM25 index for full-text search.
pg_textsearch supports parallel index builds that can significantly reduce indexing times for large tables.
Postgres automatically uses parallel workers based on table size and available resources.
Configure parallel workers (optional)
Postgres uses server defaults, but you can adjust settings for your workload:
-- Set number of parallel workers (uses CPU count by default)SET max_parallel_maintenance_workers = 4;-- Set memory for index builds (must be at least 64MB for parallel builds)SET maintenance_work_mem = '256MB';Note: The planner requires
maintenance_work_mem >= 64MBto enable parallel index builds. With insufficient memory, builds fall back to serial mode silently.Create index (parallel workers used automatically for large tables)
CREATE INDEX products_search_idx ON productsUSING bm25(description)WITH (text_config='english');When parallel build is used, you see a notice:
NOTICE: parallel index build: launched 4 of 4 requested workersVerify parallel execution in partitioned tables
For partitioned tables, each partition builds its index independently with parallel workers if the partition is large enough. This allows efficient indexing of very large partitioned datasets.
You have configured parallel index builds for faster indexing.
Use efficient query patterns to leverage BM25 ranking and optimize search performance. The <@> operator provides
BM25-based ranking scores as negative values, where lower (more negative) scores indicate better matches. In ORDER BY
clauses, the index is automatically detected from the column. For WHERE clause filtering, use to_bm25query() with
an explicit index name.
Perform ranked searches using the distance operator
SELECT name, description, description <@> to_bm25query('ergonomic work', 'products_search_idx') as scoreFROM productsORDER BY scoreLIMIT 3;You see something like:
name | description | score----------------------------+-----------------------------------------------------------------------------------+---------------------Ergonomic Mouse | Wireless mouse with ergonomic design to reduce wrist strain during long work sessions | -1.8132977485656738Mechanical Keyboard | Durable mechanical switches with RGB backlighting for gaming and productivity | 0Standing Desk | Adjustable height desk for better posture and productivity throughout the workday | 0Filter results by score threshold
For filtering with WHERE clauses, use explicit index specification with
to_bm25query():SELECT name, description <@> to_bm25query('wireless', 'products_search_idx') as scoreFROM productsWHERE description <@> to_bm25query('wireless', 'products_search_idx') < -0.5;You see something like:
name | score----------------+---------------------Ergonomic Mouse | -0.9066488742828369Combine with standard SQL operations
SELECT category, name, description <@> to_bm25query('ergonomic', 'products_search_idx') as scoreFROM productsWHERE price < 500AND description <@> to_bm25query('ergonomic', 'products_search_idx') < -0.5ORDER BY scoreLIMIT 5;You see something like:
category | name | score-------------+-----------------+---------------------Electronics | Ergonomic Mouse | -0.9066488742828369Verify index usage with EXPLAIN
EXPLAIN SELECT * FROM productsORDER BY description <@> to_bm25query('ergonomic', 'products_search_idx')LIMIT 5;You see something like:
QUERY PLAN--------------------------------------------------------------------------------------------Limit (cost=8.55..8.56 rows=3 width=140)-> Sort (cost=8.55..8.56 rows=3 width=140)Sort Key: ((description <@> 'products_search_idx:ergonomic'::bm25query))-> Seq Scan on products (cost=0.00..8.53 rows=3 width=140)
You have optimized your search queries for BM25 ranking.
Combine pg_textsearch with pgvector or pgvectorscale to build powerful hybrid search systems that use both semantic vector search and keyword BM25 search.
Enable the vectorscale extension on your Tiger Cloud service
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;Create a table with both text content and vector embeddings
CREATE TABLE articles (id serial PRIMARY KEY,title text,content text,embedding vector(3) -- Using 3 dimensions for this example; use 1536 for OpenAI ada-002);Insert sample data
INSERT INTO articles (title, content, embedding) VALUES('Database Query Optimization', 'Learn how to optimize database query performance using indexes and query planning', '[0.1, 0.15, 0.2]'),('Performance Tuning Guide', 'A comprehensive guide to performance tuning in distributed systems and databases', '[0.12, 0.18, 0.25]'),('Introduction to Indexing', 'Understanding how database indexes improve query performance and data retrieval', '[0.09, 0.14, 0.19]'),('Advanced SQL Techniques', 'Master advanced SQL techniques for complex data analysis and reporting', '[0.5, 0.6, 0.7]'),('Data Warehousing Basics', 'Getting started with data warehousing and analytical query processing', '[0.8, 0.9, 0.85]');Create indexes for both search types
-- Vector index for semantic searchCREATE INDEX articles_embedding_idx ON articlesUSING hnsw (embedding vector_cosine_ops);-- Keyword index for BM25 searchCREATE INDEX articles_content_idx ON articlesUSING bm25(content)WITH (text_config='english');Perform hybrid search using reciprocal rank fusion
WITH vector_search AS (SELECT id,ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector) AS rankFROM articlesORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vectorLIMIT 20),keyword_search AS (SELECT id,ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rankFROM articlesORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')LIMIT 20)SELECT a.id,a.title,COALESCE(1.0 / (60 + v.rank), 0.0) + COALESCE(1.0 / (60 + k.rank), 0.0) AS combined_scoreFROM articles aLEFT JOIN vector_search v ON a.id = v.idLEFT JOIN keyword_search k ON a.id = k.idWHERE v.id IS NOT NULL OR k.id IS NOT NULLORDER BY combined_score DESCLIMIT 10;You see something like:
id | title | combined_score----+----------------------------+--------------------3 | Introduction to Indexing | 0.03252247488101531 | Database Query Optimization| 0.03226645849596672 | Performance Tuning Guide | 0.03200204813108045 | Data Warehousing Basics | 0.03100961538461544 | Advanced SQL Techniques | 0.0310096153846154Adjust relative weights for different search types
WITH vector_search AS (SELECT id,ROW_NUMBER() OVER (ORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vector) AS rankFROM articlesORDER BY embedding <=> '[0.1, 0.2, 0.3]'::vectorLIMIT 20),keyword_search AS (SELECT id,ROW_NUMBER() OVER (ORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')) AS rankFROM articlesORDER BY content <@> to_bm25query('query performance', 'articles_content_idx')LIMIT 20)SELECTa.id,a.title,0.7 * COALESCE(1.0 / (60 + v.rank), 0.0) + -- 70% weight to vectors0.3 * COALESCE(1.0 / (60 + k.rank), 0.0) -- 30% weight to keywordsAS combined_scoreFROM articles aLEFT JOIN vector_search v ON a.id = v.idLEFT JOIN keyword_search k ON a.id = k.idWHERE v.id IS NOT NULL OR k.id IS NOT NULLORDER BY combined_score DESCLIMIT 10;You see something like:
id | title | combined_score----+----------------------------+--------------------3 | Introduction to Indexing | 0.01631411951348492 | Performance Tuning Guide | 0.01605222734254991 | Database Query Optimization| 0.01602914389799644 | Advanced SQL Techniques | 0.01555288461538465 | Data Warehousing Basics | 0.0154567307692308
You have implemented hybrid search combining semantic and keyword search.
Customize pg_textsearch behavior for your specific use case and data characteristics.
Configure memory and performance settings
To manage memory usage, you control when the in-memory index spills to disk segments. When the memtable reaches the threshold, it automatically flushes to a segment at transaction commit.
Since pg_textsearch v0.1.0-- Set memtable spill threshold (default 32000000 posting entries, ~1M docs/segment)SET pg_textsearch.memtable_spill_threshold = 32000000;-- Set bulk load spill threshold (default 100000 terms per transaction)SET pg_textsearch.bulk_load_threshold = 150000;-- Set default query limit when no LIMIT clause is present (default 1000)SET pg_textsearch.default_limit = 5000;-- Enable Block-Max WAND optimization for faster top-k queries (enabled by default)SET pg_textsearch.enable_bmw = true;-- Log block skip statistics for debugging query performance (disabled by default)SET pg_textsearch.log_bmw_stats = false;Since pg_textsearch v0.4.0-- Enable segment compression using delta encoding and bitpacking (enabled by default)-- Reduces index size by ~41% with 10-20% query performance improvement for shorter queriesSET pg_textsearch.compress_segments = on;Configure language-specific text processing
You can create multiple BM25 indexes on the same column with different language configurations:
-- Create an additional index with simple tokenization (no stemming)CREATE INDEX products_simple_idx ON productsUSING bm25(description)WITH (text_config='simple');-- Example: French language configuration for a French products table-- CREATE INDEX products_fr_idx ON products_fr-- USING bm25(description)-- WITH (text_config='french');Tune BM25 parameters
-- Adjust term frequency saturation (k1) and length normalization (b)CREATE INDEX products_custom_idx ON productsUSING bm25(description)WITH (text_config='english', k1=1.5, b=0.8);Monitor index usage and memory consumption
Check index usage statistics
SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_readFROM pg_stat_user_indexesWHERE indexrelid::regclass::text ~ 'bm25';View index summary with corpus statistics and memory usage
SELECT bm25_summarize_index('products_search_idx');View detailed index structure (output is truncated for display)
SELECT bm25_dump_index('products_search_idx');Export full index dump to a file for detailed analysis
SELECT bm25_dump_index('products_search_idx', '/tmp/index_dump.txt');Force memtable spill to disk (useful for testing or memory management)
SELECT bm25_spill_index('products_search_idx');
You have configured pg_textsearch for optimal performance. For production applications, consider implementing result
caching and pagination to improve user experience with large result sets.
The preview releases focus on core BM25 functionality. Current limitations include:
- No phrase search: you cannot search for exact multi-word phrases.
- No compressed data support:
pg_textsearchdoes not work with compressed data.
Keywords
Found an issue on this page?Report an issue or Edit this page
in GitHub.