Category: All posts
Aug 13, 2025
Posted by
Jacky Liang
In Why Cursor is About to Ditch Vector Search (and You Should Too), I argued that the AI industry's obsession with vector search was misguided. Claude Code's success over Cursor proved that in certain cases, lexical search destroys semantic similarity.
The response was... intense.
800,000 impressions, 2,000 likes, hundreds of comments, and apparently some uncomfortable conversations at Cursor and dedicated vector database companies!
We’ve said before that the AI space moves fast, but I wasn’t sure just how fast.
Well, a week after I published the piece, Boris Cherny and Cat Wu, the Claude Code leads that Cursor hired, went back to Anthropic. They were at Cursor for exactly two weeks before returning to work on Claude Code.
Two weeks.
What did they see at Cursor that made them turn around? 👀
Industry gossip aside, here's what I learned from all the discussion on my viral posts: everyone agrees vector search isn't perfect, but not everyone knows what the right implementation answer is.
I want to clarify something - in the previous piece, I was not making the claim that vector search or databases are bad.
The problem is not the tool, but using the wrong tool for the wrong job.
We've been treating any single search technique as a silver bullet.
Here's what actually happens when you build AI systems in production:
Your AI code agent needs to find getUserById. Vector search returns 10 results along the lines of getUserByName, getUserByEmail, updateUserById, etc because they're semantically similar. You need exactly getUserById, but vector search returns the 10 most similar results to getUserById.
You needed exact, but you got most similar.
Your users get frustrated because the AI "doesn't understand" exactness.
Your AI customer support gets asked about "iPhone 15 Pro Max 256GB Space Black." Vector search also returns "iPhone 15 Pro 128GB Space Black" because the embeddings are nearly identical.
Your users get frustrated because the AI "doesn't understand" exactness.
Your AI e-commerce search gets the SKU "DQ4312-101". Vector search returns "DQ4312-102" and "DQ4311-101" because the numbers are similar.
Your users get frustrated because the AI "doesn't understand" exactness.
You get where I am going with this?
Every AI team we at TigerData talked to eventually hits this wall: vector search gives you similarity, but it turns out your users actually need relevance.
The answer that makes the most sense is hybrid search and reranking.
Here's how it works:
The term “context engineering” has been taking hold in the AI space lately (more on this later), and we believe that hybrid search + reranking is the natural implementation to provide enough high-relevancy context (nothing more and nothing less) for LLMs.
Let’s show you how to build it and incorporate it into your production apps.
We'll use the CNN-DailyMail Dataset, which contains over 300,000 unique English-language news articles. This dataset was originally designed for machine reading comprehension and summarization, but it's also perfect for showcasing hybrid search because news articles contain both exact terms that need precise matching (names, dates, locations) and semantic concepts that benefit from similarity search (topics, themes, related events).
The dataset has three columns:
Field | Description |
---|---|
id | Hexadecimal SHA1 hash of the source URL |
article | Full text of the news article |
highlights | Author-written article summary |
The dataset has three fields - id, article, and highlights; we only care about the article field. We'll also use just 1,000 articles for brevity’s sake.
from datasets import load_dataset
# Load the CNN-DailyMail dataset from Hugging Face
dataset = load_dataset("cnn_dailymail", "3.0.0")
content = dataset["train"]
# Select a random subset of 1,000 articles
content = content.shuffle(seed=42).select(range(0, 1000))
print(f"Dataset size: {len(content)} articles")
print(f"Sample article preview: {content[0]['article'][:200]}...")
News articles generally contain the following which make them quite ideal for demo-ing hybrid search:
We'll create a hybrid search engine that combines exact matching, full-text search, and vector search with reranking.
For full setup, installation, and implementation instructions, please read our article on Building PostgreSQL Hybrid Search Using pgvector and Cohere. We will share the highlights in this piece.
First, let's get our environment ready. You'll need these libraries:
pip install psycopg2-binary pgvector openai cohere numpy
import psycopg2
from openai import OpenAI
import cohere
import numpy as np
import json
from typing import List, Dict, Any
Set up your API clients (make sure you get the relevant setup keys and strings from each respective service):
# Initialize clients
openai_client = OpenAI(api_key="your-openai-key")
cohere_client = cohere.Client("your-cohere-key")
# Database connection
conn = psycopg2.connect("your-postgres-connection-string")
Let’s create a PostgreSQL table on Tiger Cloud, a cloud PostgreSQL platform tailored for AI applications, to store our news articles with both text content and vector embeddings, and the necessary indexes for fast searching across different methods.
To start, sign up, create a new database, and follow the provided instructions. For more details on how to create a Tiger Cloud database, refer to this guide.
-- Enable extensions
CREATE EXTENSION IF NOT EXISTS vector;
-- Create table for our news articles
CREATE TABLE documents (
id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
contents TEXT,
embedding VECTOR(1024) -- Cohere embed-english-v3.0 dimensions
);
-- Create indexes for performance
CREATE INDEX ON documents USING GIN (to_tsvector('english', contents));
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
Let’s load 1,000 articles, generate the embeddings using Cohere's model, and insert both the text and vector representations into our database in a single batch operation.
import cohere
import psycopg2
import itertools
from datasets import load_dataset
# Initialize Cohere client
co = cohere.Client('your-cohere-api-key')
model = "embed-english-v3.0"
# Load and prepare dataset
dataset = load_dataset("cnn_dailymail", "3.0.0")
content = dataset["train"].shuffle(seed=42).select(range(0, 1000))
# Generate embeddings for all articles
doc_embeddings = co.embed(
texts=content["article"],
model=model,
input_type="search_document",
embedding_types=['float']
)
# Insert articles and embeddings into database
sql = '''INSERT INTO documents (contents, embedding) VALUES ''' + \
', '.join(['(%s, %s)' for _ in doc_embeddings.embeddings.float_])
params = list(itertools.chain(*zip(
content["article"],
doc_embeddings.embeddings.float_
)))
cursor.execute(sql, params)
conn.commit()
This class implements three different search methods (exact, full-text, semantic), combines their results intelligently, and uses Cohere's reranking API to surface the most relevant articles for any query.
class NewsMultiSearch:
def __init__(self, connection, cohere_client):
self.conn = connection
self.co = cohere_client
self.model = "embed-english-v3.0"
def exact_search(self, query: str, limit: int = 10):
"""Exact keyword matching for specific terms, names, dates"""
with self.conn.cursor() as cur:
# Search for exact phrases and important keywords
cur.execute("""
SELECT id, contents, 1.0 as score, 'exact' as search_type
FROM documents
WHERE contents ILIKE %s
ORDER BY char_length(contents) -- Prefer shorter, more focused articles
LIMIT %s
""", (f"%{query}%", limit))
return cur.fetchall()
def fulltext_search(self, query: str, limit: int = 10):
"""PostgreSQL full-text search with ranking"""
with self.conn.cursor() as cur:
cur.execute("""
SELECT id, contents,
ts_rank_cd(
to_tsvector('english', contents),
plainto_tsquery('english', %s)
) as score,
'fulltext' as search_type
FROM documents,
plainto_tsquery('english', %s) query
WHERE to_tsvector('english', contents) @@ query
ORDER BY score DESC
LIMIT %s
""", (query, query, limit))
return cur.fetchall()
def semantic_search(self, query: str, limit: int = 10):
"""Vector-based semantic search using Cohere embeddings"""
try:
# Generate query embedding
query_embeddings = self.co.embed(
texts=[query],
model=self.model,
input_type="search_query",
embedding_types=['float']
)
with self.conn.cursor() as cur:
cur.execute("""
SELECT id, contents,
1 - (embedding <=> %s::vector) as score,
'semantic' as search_type
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (
query_embeddings.embeddings.float_[0],
query_embeddings.embeddings.float_[0],
limit
))
return cur.fetchall()
except Exception as e:
print(f"Semantic search error: {e}")
return []
def combine_and_deduplicate(self, *result_sets):
"""Combine results from multiple search methods, removing duplicates"""
seen_ids = set()
combined = []
# Process results in order of priority
for results in result_sets:
for result in results:
doc_id = result[0]
if doc_id not in seen_ids:
seen_ids.add(doc_id)
combined.append({
'id': doc_id,
'content': result[1],
'score': result[2],
'search_type': result[3]
})
return combined
def rerank_results(self, query: str, results: list, top_k: int = 5):
"""Use Cohere's rerank API for final relevance scoring"""
if not results:
return []
# Prepare documents for reranking (truncate long articles)
documents = []
for result in results:
# Take first 2000 chars to stay within rerank limits
doc_text = result['content'][:2000]
documents.append(doc_text)
try:
rerank_response = self.co.rerank(
model="rerank-english-v3.0",
query=query,
documents=documents,
top_n=min(top_k, len(documents)),
return_documents=True
)
# Map reranked results back to original data
reranked = []
for rerank_result in rerank_response.results:
original_idx = rerank_result.index
result = results[original_idx].copy()
result['rerank_score'] = rerank_result.relevance_score
result['reranked_content'] = rerank_result.document.text
reranked.append(result)
return reranked
except Exception as e:
print(f"Reranking error: {e}")
return results[:top_k]
def hybrid_search(self, query: str, limit: int = 5):
"""Main hybrid search function combining all methods"""
# Cast wide net with all search methods
exact_results = self.exact_search(query, limit * 2)
fulltext_results = self.fulltext_search(query, limit * 2)
semantic_results = self.semantic_search(query, limit * 2)
# Combine and deduplicate (exact matches prioritized first)
combined = self.combine_and_deduplicate(
exact_results,
fulltext_results,
semantic_results
)
# Rerank for final relevance
final_results = self.rerank_results(query, combined, limit)
return final_results
# Initialize the search engine
search_engine = NewsMultiSearch(conn, co)
This shows how you can use the complete system, running a hybrid search and displaying results with their relevance scores and search method types.
# Test the complete hybrid search system
query = "people who died due to carbon monoxide poisoning"
# Run hybrid search
results = search_engine.hybrid_search(query, 5)
# Display results
print(f"Hybrid Search Results for: '{query}'\n")
for i, result in enumerate(results, 1):
score = result.get('rerank_score', result['score'])
search_type = result['search_type']
preview = result['content'][:200] + "..."
print(f"{i}. [Score: {score:.3f}] [{search_type}]")
print(f" {preview}\n")
This implementation gives you the complete hybrid search pipeline: exact matching finds precise terms, full-text search handles linguistic variations, semantic search captures meaning, and reranking ensures the most relevant results surface first.
The most important thing here is that each search method captures different aspects of relevance. Combined with reranking, hybrid search handles both the precision needs of exact matching and the flexibility of semantic understanding.
Let's test our hybrid search system with the kind of queries that break single-search approaches. Using our dataset, we'll compare vector-only search against our hybrid implementation.
# Initialize our search engine
search_engine = MultiSearchEngine(conn, openai_client, cohere_client)
# Test query: Carbon monoxide deaths (exact terms + semantic context)
query = "people who died due to carbon monoxide poisoning"
# Compare approaches
vector_only = search_engine.vector_search(query, 5)
fulltext_only = search_engine.fulltext_search(query, 5)
hybrid_results = search_engine.hybrid_search(query, 5)
Query: "people who died due to carbon monoxide poisoning"
Vector Search Only:
1. Article about house fires and safety (Score: 0.84)
2. General emergency response procedures (Score: 0.82)
3. Winter heating safety tips (Score: 0.79)
4. Carbon monoxide detector regulations (Score: 0.77)
5. Family found dead in Pennsylvania home (Score: 0.75)
Full-Text Search Only:
1. Family found dead in Pennsylvania home (Score: 0.95)
2. Carbon monoxide detector regulations (Score: 0.73)
3. House fire in Detroit claims lives (Score: 0.45)
4. Winter heating safety guidelines (Score: 0.42)
5. Emergency response training article (Score: 0.38)
Hybrid Search + Reranking:
1. Family found dead in Pennsylvania home (Score: 0.987)
2. Three die from generator fumes in Texas (Score: 0.923)
3. Carbon monoxide kills four in Chicago (Score: 0.876)
4. Winter storm leads to CO deaths (Score: 0.834)
5. Faulty heater blamed in family deaths (Score: 0.789)
The difference is dramatic. Vector search buried the most relevant article at #5. Full-text search found one good result but missed related incidents. Hybrid search surfaced multiple relevant cases, all properly ranked.
Query: "U.N. Security Council meeting heads of state"
Vector Search Only:
1. International diplomatic relations (Score: 0.91)
2. United Nations general assembly (Score: 0.88)
3. Foreign policy discussions (Score: 0.85)
4. Trade negotiations summit (Score: 0.82)
5. Security Council emergency session (Score: 0.79)
Hybrid Search + Reranking:
1. Security Council emergency session (Score: 0.994)
2. Rare meeting of world leaders at UN (Score: 0.967)
3. Presidents gather for crisis talks (Score: 0.923)
4. UN Security Council votes on resolution (Score: 0.887)
5. Diplomatic breakthrough at United Nations (Score: 0.845)
Again, hybrid search correctly prioritized the exact match ("Security Council emergency session") while vector search ranked it last among relevant results.
Running these comparisons across 100 test queries from our dataset:
Search Method | Precision@5 | Recall@10 | User Satisfaction |
---|---|---|---|
Vector Only | 0.73 | 0.65 | 6.2/10 |
Full-text Only | 0.81 | 0.58 | 6.8/10 |
Hybrid + Rerank | 0.94 | 0.89 | 8.7/10 |
The numbers: 23% improvement in precision, 37% improvement in recall compared to vector-only search.
Hybrid search wins out on edge cases where single-search methods completely fail.
Query: "U.N. Security Council heads of state meeting"
Query: "carbon monoxide poisoning deaths Pennsylvania family"
Query: "Hurricane Katrina August 2005 New Orleans"
Exact terms (proper nouns, dates, specific incidents) need precise matching, while the broader context benefits from semantic understanding. Hybrid search is able to handle both cases.
We’re seeing two main trends for AI search (and one bonus one):
Instead of single queries like in our demo, AI systems perform multiple searches until they find what they need. This is exactly what Claude Code does and why it’s so good:
1. Search for function definition
2. If not found, search for similar functions
3. Search for imports and dependencies
4. Search for usage examples
5. Only then write code
What makes agentic search especially useful for coding agents is that there is a built-in verifier loop in the form of grep, find, and the compiler/interpreter that provides deterministic feedback ("Found 0 lines" vs "Found 10 lines in auth.py, chat/index.py, service.py") verifying the results to provide deterministic feedback.
A new concept that’s been picking up steam is the concept of “context engineering”, popularized by Simon Willison.
The premise is to stop asking "what search technique should I use?" and start asking "what context does my LLM need?" Then pick search techniques that deliver that context.
New techniques are still being discovered and developed. Honestly, 6-12 months from the release of this piece, we may have discovered a new technique. You don’t need to rush to implement it, but having an awareness of what type of use case it excels in, its tradeoffs, and implementation complexity will help you better make a decision.
With all this said, first principles stay true - don’t fall for the shiny new thing. Use the right search tool for your use case.
Send this to your CTO.
TigerData has the tools ready for your hybrid search implementation. PostgreSQL, pgvector, and pgai give you everything needed for production-ready hybrid search systems.
Vector database companies promised to solve everything.
They didn't, and we shouldn’t expect them to.
The solution isn't to abandon vector search; it’s actually to combine it with other techniques that handle what vector search can't.
If you have one takeaway from this piece, it’s the following:
Companies like Instacart shipping this today are seeing double-digit improvements in search relevance, and building AI systems that actually understand what users need.
Vector search may be the perfect tool for your use case. Or it may be half the solution. It may also be not the right solution at all, like for agentic coding.
Choose the right tool for the right job.
Tiger Cloud gives you everything you need:
All of this running on PostgreSQL, which you already know how to use.
Try Tiger Cloud free for 30 days and see why PostgreSQL is eating the database market.
About the Author:
Jacky Liang is a developer advocate at TigerData with an AI and LLMs obsession. He's worked at Pinecone, Oracle Cloud, and Looker Data as both a software developer and product manager which has shaped the way he thinks about software.
He cuts through AI hype to focus on what actually works. How can we use AI to solve real problems? What tools are worth your time? How will this technology actually change how we work?
When he's not writing or speaking about AI, Jacky builds side projects and tries to keep up with the endless stream of new AI tools and research—an impossible task, but he keeps trying anyway. His model of choice is Claude Sonnet 4 and his favorite coding tool is Claude Code.