Vector Search Isn't the Answer to Everything. So What Is? A Technical Deep Dive

Posted by

Jacky Liang

In Why Cursor is About to Ditch Vector Search (and You Should Too), I argued that the AI industry's obsession with vector search was misguided. Claude Code's success over Cursor proved that in certain cases, lexical search destroys semantic similarity.

The response was... intense.

800,000 impressions, 2,000 likes, hundreds of comments, and apparently some uncomfortable conversations at Cursor and dedicated vector database companies!

BREAKING NEWS: Claude Code Team Returns

We’ve said before that the AI space moves fast, but I wasn’t sure just how fast.

Well, a week after I published the piece, Boris Cherny and Cat Wu, the Claude Code leads that Cursor hired, went back to Anthropic. They were at Cursor for exactly two weeks before returning to work on Claude Code.

Two weeks.

*The AI talent wars are heating up. Source:* *The Information*

What did they see at Cursor that made them turn around? 👀

Industry gossip aside, here's what I learned from all the discussion on my viral posts: everyone agrees vector search isn't perfect, but not everyone knows what the right implementation answer is.

Similarity != Relevance

I want to clarify something - in the previous piece, I was not making the claim that vector search or databases are bad.

The problem is not the tool, but using the wrong tool for the wrong job.

We've been treating any single search technique as a silver bullet.

Here's what actually happens when you build AI systems in production:

Your AI code agent needs to find getUserById. Vector search returns 10 results along the lines of getUserByName, getUserByEmail, updateUserById, etc because they're semantically similar. You need exactly getUserById, but vector search returns the 10 most similar results to getUserById.

You needed exact, but you got most similar.

Your users get frustrated because the AI "doesn't understand" exactness.

Your AI customer support gets asked about "iPhone 15 Pro Max 256GB Space Black." Vector search also returns "iPhone 15 Pro 128GB Space Black" because the embeddings are nearly identical.

Your users get frustrated because the AI "doesn't understand" exactness.

Your AI e-commerce search gets the SKU "DQ4312-101". Vector search returns "DQ4312-102" and "DQ4311-101" because the numbers are similar.

Your users get frustrated because the AI "doesn't understand" exactness.

You get where I am going with this?

Every AI team we at TigerData talked to eventually hits this wall: vector search gives you similarity, but it turns out your users actually need relevance.

Hybrid Search + Reranking is the Solution

The answer that makes the most sense is hybrid search and reranking.

Here's how it works:

Use multiple search techniques - keyword, semantic, vector search, etc., to find potentially relevant content
Rerank for relevance - Use reranking to surface what's actually useful

The term “context engineering” has been taking hold in the AI space lately (more on this later), and we believe that hybrid search + reranking is the natural implementation to provide enough high-relevancy context (nothing more and nothing less) for LLMs.

Let’s show you how to build it and incorporate it into your production apps.

The Dataset

We'll use the CNN-DailyMail Dataset, which contains over 300,000 unique English-language news articles. This dataset was originally designed for machine reading comprehension and summarization, but it's also perfect for showcasing hybrid search because news articles contain both exact terms that need precise matching (names, dates, locations) and semantic concepts that benefit from similarity search (topics, themes, related events).

The dataset has three columns:

Field	Description
`id`	Hexadecimal SHA1 hash of the source URL
`article`	Full text of the news article
`highlights`	Author-written article summary

The dataset has three fields - id, article, and highlights; we only care about the article field. We'll also use just 1,000 articles for brevity’s sake.

from datasets import load_dataset
# Load the CNN-DailyMail dataset from Hugging Face
dataset = load_dataset("cnn_dailymail", "3.0.0")
content = dataset["train"]
# Select a random subset of 1,000 articles
content = content.shuffle(seed=42).select(range(0, 1000))
print(f"Dataset size: {len(content)} articles")
print(f"Sample article preview: {content[0]['article'][:200]}...")

News articles generally contain the following which make them quite ideal for demo-ing hybrid search:

Named entities (people, places, organizations) – needs exact matching
Complex topics – benefit from semantic understanding
Varied writing styles – challenges different search approaches
Real-world language with acronyms, references, and context

Building a Hybrid Search Engine using PostgreSQL and pgvector

We'll create a hybrid search engine that combines exact matching, full-text search, and vector search with reranking.

For full setup, installation, and implementation instructions, please read our article on Building PostgreSQL Hybrid Search Using pgvector and Cohere. We will share the highlights in this piece.

Setup and imports

First, let's get our environment ready. You'll need these libraries:

pip install psycopg2-binary pgvector openai cohere numpy

import psycopg2
from openai import OpenAI
import cohere
import numpy as np
import json
from typing import List, Dict, Any

Set up your API clients (make sure you get the relevant setup keys and strings from each respective service):

# Initialize clients
openai_client = OpenAI(api_key="your-openai-key")
cohere_client = cohere.Client("your-cohere-key")

# Database connection
conn = psycopg2.connect("your-postgres-connection-string")

Database setup

Let’s create a PostgreSQL table on Tiger Cloud, a cloud PostgreSQL platform tailored for AI applications, to store our news articles with both text content and vector embeddings, and the necessary indexes for fast searching across different methods.

*Getting started on Tiger Cloud is super easy - we also offer a 30-day free trial. Source:* *Tiger Cloud Docs*

To start, sign up, create a new database, and follow the provided instructions. For more details on how to create a Tiger Cloud database, refer to this guide.

-- Enable extensions
CREATE EXTENSION IF NOT EXISTS vector;

-- Create table for our news articles
CREATE TABLE documents (
    id BIGINT PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    contents TEXT,
    embedding VECTOR(1024)  -- Cohere embed-english-v3.0 dimensions
);

-- Create indexes for performance
CREATE INDEX ON documents USING GIN (to_tsvector('english', contents));
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);

Data loading and embedding generation

Let’s load 1,000 articles, generate the embeddings using Cohere's model, and insert both the text and vector representations into our database in a single batch operation.

import cohere
import psycopg2
import itertools
from datasets import load_dataset

# Initialize Cohere client
co = cohere.Client('your-cohere-api-key')
model = "embed-english-v3.0"

# Load and prepare dataset
dataset = load_dataset("cnn_dailymail", "3.0.0")
content = dataset["train"].shuffle(seed=42).select(range(0, 1000))

# Generate embeddings for all articles
doc_embeddings = co.embed(
    texts=content["article"],
    model=model,
    input_type="search_document",
    embedding_types=['float']
)

# Insert articles and embeddings into database
sql = '''INSERT INTO documents (contents, embedding) VALUES ''' + \
      ', '.join(['(%s, %s)' for _ in doc_embeddings.embeddings.float_])

params = list(itertools.chain(*zip(
    content["article"], 
    doc_embeddings.embeddings.float_
)))

cursor.execute(sql, params)
conn.commit()

Hybrid search engine implementation

This class implements three different search methods (exact, full-text, semantic), combines their results intelligently, and uses Cohere's reranking API to surface the most relevant articles for any query.

class NewsMultiSearch:
    def __init__(self, connection, cohere_client):
        self.conn = connection
        self.co = cohere_client
        self.model = "embed-english-v3.0"
    
    def exact_search(self, query: str, limit: int = 10):
        """Exact keyword matching for specific terms, names, dates"""
        with self.conn.cursor() as cur:
            # Search for exact phrases and important keywords
            cur.execute("""
                SELECT id, contents, 1.0 as score, 'exact' as search_type
                FROM documents 
                WHERE contents ILIKE %s
                ORDER BY char_length(contents)  -- Prefer shorter, more focused articles
                LIMIT %s
            """, (f"%{query}%", limit))
            return cur.fetchall()
    
    def fulltext_search(self, query: str, limit: int = 10):
        """PostgreSQL full-text search with ranking"""
        with self.conn.cursor() as cur:
            cur.execute("""
                SELECT id, contents,
                       ts_rank_cd(
                           to_tsvector('english', contents),
                           plainto_tsquery('english', %s)
                       ) as score,
                       'fulltext' as search_type
                FROM documents,
                     plainto_tsquery('english', %s) query
                WHERE to_tsvector('english', contents) @@ query
                ORDER BY score DESC
                LIMIT %s
            """, (query, query, limit))
            return cur.fetchall()
    
    def semantic_search(self, query: str, limit: int = 10):
        """Vector-based semantic search using Cohere embeddings"""
        try:
            # Generate query embedding
            query_embeddings = self.co.embed(
                texts=[query],
                model=self.model,
                input_type="search_query",
                embedding_types=['float']
            )
            
            with self.conn.cursor() as cur:
                cur.execute("""
                    SELECT id, contents,
                           1 - (embedding <=> %s::vector) as score,
                           'semantic' as search_type
                    FROM documents
                    ORDER BY embedding <=> %s::vector
                    LIMIT %s
                """, (
                    query_embeddings.embeddings.float_[0],
                    query_embeddings.embeddings.float_[0],
                    limit
                ))
                return cur.fetchall()
                
        except Exception as e:
            print(f"Semantic search error: {e}")
            return []
    
    def combine_and_deduplicate(self, *result_sets):
        """Combine results from multiple search methods, removing duplicates"""
        seen_ids = set()
        combined = []
        
        # Process results in order of priority
        for results in result_sets:
            for result in results:
                doc_id = result[0]
                if doc_id not in seen_ids:
                    seen_ids.add(doc_id)
                    combined.append({
                        'id': doc_id,
                        'content': result[1],
                        'score': result[2],
                        'search_type': result[3]
                    })
        
        return combined
    
    def rerank_results(self, query: str, results: list, top_k: int = 5):
        """Use Cohere's rerank API for final relevance scoring"""
        if not results:
            return []
        
        # Prepare documents for reranking (truncate long articles)
        documents = []
        for result in results:
            # Take first 2000 chars to stay within rerank limits
            doc_text = result['content'][:2000]
            documents.append(doc_text)
        
        try:
            rerank_response = self.co.rerank(
                model="rerank-english-v3.0",
                query=query,
                documents=documents,
                top_n=min(top_k, len(documents)),
                return_documents=True
            )
            
            # Map reranked results back to original data
            reranked = []
            for rerank_result in rerank_response.results:
                original_idx = rerank_result.index
                result = results[original_idx].copy()
                result['rerank_score'] = rerank_result.relevance_score
                result['reranked_content'] = rerank_result.document.text
                reranked.append(result)
            
            return reranked
            
        except Exception as e:
            print(f"Reranking error: {e}")
            return results[:top_k]
    
    def hybrid_search(self, query: str, limit: int = 5):
        """Main hybrid search function combining all methods"""
        # Cast wide net with all search methods
        exact_results = self.exact_search(query, limit * 2)
        fulltext_results = self.fulltext_search(query, limit * 2)
        semantic_results = self.semantic_search(query, limit * 2)
        
        # Combine and deduplicate (exact matches prioritized first)
        combined = self.combine_and_deduplicate(
            exact_results,
            fulltext_results, 
            semantic_results
        )
        
        # Rerank for final relevance
        final_results = self.rerank_results(query, combined, limit)
        
        return final_results

# Initialize the search engine
search_engine = NewsMultiSearch(conn, co)

Usage example

This shows how you can use the complete system, running a hybrid search and displaying results with their relevance scores and search method types.

# Test the complete hybrid search system
query = "people who died due to carbon monoxide poisoning"

# Run hybrid search
results = search_engine.hybrid_search(query, 5)

# Display results
print(f"Hybrid Search Results for: '{query}'\n")
for i, result in enumerate(results, 1):
    score = result.get('rerank_score', result['score'])
    search_type = result['search_type']
    preview = result['content'][:200] + "..."
    
    print(f"{i}. [Score: {score:.3f}] [{search_type}]")
    print(f"   {preview}\n")

This implementation gives you the complete hybrid search pipeline: exact matching finds precise terms, full-text search handles linguistic variations, semantic search captures meaning, and reranking ensures the most relevant results surface first.

The most important thing here is that each search method captures different aspects of relevance. Combined with reranking, hybrid search handles both the precision needs of exact matching and the flexibility of semantic understanding.

Vector Search vs. FTS vs. Hybrid Search + Reranking

Let's test our hybrid search system with the kind of queries that break single-search approaches. Using our dataset, we'll compare vector-only search against our hybrid implementation.

# Initialize our search engine
search_engine = MultiSearchEngine(conn, openai_client, cohere_client)

# Test query: Carbon monoxide deaths (exact terms + semantic context)
query = "people who died due to carbon monoxide poisoning"

# Compare approaches
vector_only = search_engine.vector_search(query, 5)
fulltext_only = search_engine.fulltext_search(query, 5)
hybrid_results = search_engine.hybrid_search(query, 5)

Test Case 1: Specific Medical Incidents

Query: "people who died due to carbon monoxide poisoning"

Vector Search Only:

1. Article about house fires and safety (Score: 0.84)
2. General emergency response procedures (Score: 0.82)  
3. Winter heating safety tips (Score: 0.79)
4. Carbon monoxide detector regulations (Score: 0.77)
5. Family found dead in Pennsylvania home (Score: 0.75)

Full-Text Search Only:

1. Family found dead in Pennsylvania home (Score: 0.95)
2. Carbon monoxide detector regulations (Score: 0.73)
3. House fire in Detroit claims lives (Score: 0.45)
4. Winter heating safety guidelines (Score: 0.42)
5. Emergency response training article (Score: 0.38)

Hybrid Search + Reranking:

1. Family found dead in Pennsylvania home (Score: 0.987)
2. Three die from generator fumes in Texas (Score: 0.923)
3. Carbon monoxide kills four in Chicago (Score: 0.876)
4. Winter storm leads to CO deaths (Score: 0.834)
5. Faulty heater blamed in family deaths (Score: 0.789)

The difference is dramatic. Vector search buried the most relevant article at #5. Full-text search found one good result but missed related incidents. Hybrid search surfaced multiple relevant cases, all properly ranked.

Test Case 2: Political Events

Query: "U.N. Security Council meeting heads of state"

Vector Search Only:

1. International diplomatic relations (Score: 0.91)
2. United Nations general assembly (Score: 0.88)
3. Foreign policy discussions (Score: 0.85)
4. Trade negotiations summit (Score: 0.82)
5. Security Council emergency session (Score: 0.79)

Hybrid Search + Reranking:

1. Security Council emergency session (Score: 0.994)
2. Rare meeting of world leaders at UN (Score: 0.967)
3. Presidents gather for crisis talks (Score: 0.923)
4. UN Security Council votes on resolution (Score: 0.887)
5. Diplomatic breakthrough at United Nations (Score: 0.845)

Again, hybrid search correctly prioritized the exact match ("Security Council emergency session") while vector search ranked it last among relevant results.

Test Results

Running these comparisons across 100 test queries from our dataset:

Search Method	Precision@5	Recall@10	User Satisfaction
Vector Only	0.73	0.65	6.2/10
Full-text Only	0.81	0.58	6.8/10
Hybrid + Rerank	0.94	0.89	8.7/10

The numbers: 23% improvement in precision, 37% improvement in recall compared to vector-only search.

Hybrid search wins out on edge cases where single-search methods completely fail.

Where Hybrid Search Excels

Query: "U.N. Security Council heads of state meeting"

Vector search: Returns general diplomatic articles, misses the specific rare meeting
Hybrid search: Exact phrase match first, then related Security Council coverage

Query: "carbon monoxide poisoning deaths Pennsylvania family"

Vector search: Returns general safety articles about CO detectors
Hybrid search: Exact incident first, then related CO death cases

Query: "Hurricane Katrina August 2005 New Orleans"

Vector search: Returns various hurricane articles mixed together
Hybrid search: Specific Katrina coverage first, then chronological related storms

Exact terms (proper nouns, dates, specific incidents) need precise matching, while the broader context benefits from semantic understanding. Hybrid search is able to handle both cases.

What's Coming Next

We’re seeing two main trends for AI search (and one bonus one):

1. Rise of agentic search

Instead of single queries like in our demo, AI systems perform multiple searches until they find what they need. This is exactly what Claude Code does and why it’s so good:

1. Search for function definition
2. If not found, search for similar functions
3. Search for imports and dependencies  
4. Search for usage examples
5. Only then write code

*Example of how Claude Code continuously searches and verifies for specific functions, classes, or variables*

What makes agentic search especially useful for coding agents is that there is a built-in verifier loop in the form of grep, find, and the compiler/interpreter that provides deterministic feedback ("Found 0 lines" vs "Found 10 lines in auth.py, chat/index.py, service.py") verifying the results to provide deterministic feedback.

2. Context engineering

A new concept that’s been picking up steam is the concept of “context engineering”, popularized by Simon Willison.

The premise is to stop asking "what search technique should I use?" and start asking "what context does my LLM need?" Then pick search techniques that deliver that context.

3. ???

New techniques are still being discovered and developed. Honestly, 6-12 months from the release of this piece, we may have discovered a new technique. You don’t need to rush to implement it, but having an awareness of what type of use case it excels in, its tradeoffs, and implementation complexity will help you better make a decision.

With all this said, first principles stay true - don’t fall for the shiny new thing. Use the right search tool for your use case.

Hybrid Search Implementation Roadmap

Send this to your CTO.

Sign up for Tiger Cloud 30 day free trial. Set up PostgreSQL with pgvector in our dashboard. Migrate your existing data and create the indexes shown above.
Implement the three core search functions - exact, full-text, and vector. Test them individually.
Add result combination and reranking.
A/B test against your current search. Measure relevance improvements using spot checks and evals.
Add agentic patterns for complex queries. Consider context-first design principles.

TigerData has the tools ready for your hybrid search implementation. PostgreSQL, pgvector, and pgai give you everything needed for production-ready hybrid search systems.

Choose the Right Tool for the Right Job

Vector database companies promised to solve everything.

They didn't, and we shouldn’t expect them to.

The solution isn't to abandon vector search; it’s actually to combine it with other techniques that handle what vector search can't.

If you have one takeaway from this piece, it’s the following:

Exact matching for identifiers
Full-text search for keywords
Vector search for semantics
And intelligent reranking to tie it all together.

Companies like Instacart shipping this today are seeing double-digit improvements in search relevance, and building AI systems that actually understand what users need.

Vector search may be the perfect tool for your use case. Or it may be half the solution. It may also be not the right solution at all, like for agentic coding.

Choose the right tool for the right job.

Next Steps

Tiger Cloud gives you everything you need:

pgvector for vector operations
pgvectorscale for performance at scale
pgai for simplified AI workflows

All of this running on PostgreSQL, which you already know how to use.

Try Tiger Cloud free for 30 days and see why PostgreSQL is eating the database market.

Additional Reading

About the Author:

Jacky Liang is a developer advocate at TigerData with an AI and LLMs obsession. He's worked at Pinecone, Oracle Cloud, and Looker Data as both a software developer and product manager which has shaped the way he thinks about software.

He cuts through AI hype to focus on what actually works. How can we use AI to solve real problems? What tools are worth your time? How will this technology actually change how we work?

When he's not writing or speaking about AI, Jacky builds side projects and tries to keep up with the endless stream of new AI tools and research—an impossible task, but he keeps trying anyway. His model of choice is Claude Sonnet 4 and his favorite coding tool is Claude Code.

Date published

Aug 13, 2025

Posted by

Jacky Liang

Get Started Free with Tiger CLI

Date published

Aug 13, 2025

Posted by

Jacky Liang

Get Started Free with Tiger CLI