
PostgreSQL is extremely versatile. With thousands of extensions available, you can add time-series, vector search, geospatial, encryption, and more without switching databases. That versatility is why PostgreSQL is the most popular database in the world.
But with so many extensions out there, which ones do people actually use, especially in this AI era and in 2026?
To find out, we analyzed tens of thousands of customer databases on Tiger Data to see which extensions are actively used in production. But many people don’t know exactly why they need them and how to use them.
So, in this blog post, we'll not only list them out, but also explain each one: why you even need it, how it works, and how to use it. All with simple before-and-after examples.
Let's get started.
Category | Extensions | What It Means |
Pre-installed | plpgsql, pg_stat_statements, pgcrypto, file_fdw, timescaledb | Every Tiger Data instance has these. Zero setup. |
User-installed | vector, vectorscale, ai, pg_auth_mon, postgis | Developers explicitly enabled these for their workloads. |
The pre-installed extensions represent our opinion on what every modern Postgres deployment needs. The user-installed ones? That's the market voting with their feet.
Let's dive into each one.
These five extensions ship on every Tiger Data instance. You don't need to run CREATE EXTENSION—they're already there.
You're tracking data over time: server metrics every second, IoT sensor readings, user events, stock prices. Your table grows to 100 million rows.
The problem: "Show me the last hour" scans 100 million rows. "Average per hour for last week" takes 45 seconds. Everything is slow.
TimescaleDB automatically partitions your data by time. Queries for "last hour" only touch last hour's data—not all 100 million rows.

Before: Dashboard takes forever to load
Frontend (React):
useEffect(() => {
fetch('/api/metrics/cpu').then(res => res.json()).then(setCpuData);
}, []); // User clicks "Last Hour" - Loading... 30 seconds... still loading...
Backend (Node.js):
app.get('/api/metrics/cpu', async (req, res) => {
const result = await db.query(`
SELECT date_trunc('minute', time) AS minute, AVG(cpu_percent) AS avg_cpu
FROM metrics WHERE time > NOW() - INTERVAL '1 hour' GROUP BY minute ORDER BY minute
`);
res.json(result.rows); // With 100M rows, this takes 30+ seconds
});
After: Dashboard loads instantly
Frontend (React): Same code—no changes!
useEffect(() => {
fetch('/api/metrics/cpu').then(res => res.json()).then(setCpuData);
}, []); // User clicks "Last Hour" - Data appears in 50ms
Backend (Node.js): Same query—just use time_bucket!
app.get('/api/metrics/cpu', async (req, res) => {
const result = await db.query(`
SELECT time_bucket('1 minute', time) AS minute, AVG(cpu_percent) AS avg_cpu
FROM metrics WHERE time > NOW() - INTERVAL '1 hour' GROUP BY minute ORDER BY minute
`);
res.json(result.rows); // Hypertable: only scans last hour's chunk. 50ms!
});
Database (one-time setup):
CREATE TABLE metrics (time TIMESTAMPTZ NOT NULL, server TEXT, cpu_percent FLOAT);
SELECT create_hypertable('metrics', 'time'); -- This one line makes it fast!
The magic: TimescaleDB automatically partitions by time. "Last hour" only touches the last hour's chunk.
-- Compress old data (1TB to 100GB)
ALTER TABLE metrics SET (timescaledb.compress);
SELECT add_compression_policy('metrics', INTERVAL '7 days');
-- Auto-delete data older than 90 days
SELECT add_retention_policy('metrics', INTERVAL '90 days');
Imagine you run an e-commerce site. Every time someone updates their profile, you want to log who changed what, when.
The problem: You have 5 different services that can update user profiles-the web app, mobile app, admin dashboard, customer support tool, and a background sync job. Do you really want to copy-paste the same logging code into all 5 places?
PL/pgSQL lets you write code that runs inside your database. Instead of adding logging to 5 services, you add it once to the database—and it works everywhere automatically.

Before: Logging logic in every service
Frontend (React):
const updateProfile = async (name) => {
await fetch('/api/user', {
method: 'PUT',
body: JSON.stringify({ name })
});
};
Backend - web-app/routes/user.js:
app.put('/api/user', async (req, res) => {
const old = await db.query('SELECT * FROM users WHERE id = $1', [req.userId]);
await db.query('UPDATE users SET name = $1 WHERE id = $2', [req.body.name, req.userId]);
await db.query('INSERT INTO audit_log VALUES ($1, $2, $3)', [req.userId, old.rows[0], req.body]);
res.json({ success: true });
});
Backend - mobile-api/routes/user.js: Same code again!
app.put('/api/user', async (req, res) => {
const old = await db.query('SELECT * FROM users WHERE id = $1', [req.userId]);
await db.query('UPDATE users SET name = $1 WHERE id = $2', [req.body.name, req.userId]);
await db.query('INSERT INTO audit_log VALUES ($1, $2, $3)', [req.userId, old.rows[0], req.body]);
res.json({ success: true });
});
Backend - admin-dashboard/routes/user.js: And again...
app.put('/api/user', async (req, res) => {
const old = await db.query('SELECT * FROM users WHERE id = $1', [req.userId]);
await db.query('UPDATE users SET name = $1 WHERE id = $2', [req.body.name, req.userId]);
// Oops! Forgot the audit log here. Bug found 6 months later.
res.json({ success: true });
});
After: Database handles logging automatically
Frontend (React): Same code-no changes!
const updateProfile = async (name) => {
await fetch('/api/user', {
method: 'PUT',
body: JSON.stringify({ name })
});
};
Backend (Node.js): Much simpler!
app.put('/api/user', async (req, res) => {
const { name } = req.body;
await db.query('UPDATE users SET name = $1 WHERE id = $2', [name, req.userId]);
res.json({ success: true });
});
// That's it. No audit logic needed. The database handles it.
Database (one-time setup):
CREATE FUNCTION log_changes() RETURNS TRIGGER AS $$
BEGIN
INSERT INTO audit_log (user_id, old_data, new_data, changed_at)
VALUES (NEW.id, to_jsonb(OLD), to_jsonb(NEW), NOW());
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER audit_users AFTER UPDATE ON users
FOR EACH ROW EXECUTE FUNCTION log_changes();
Now any update—web app, mobile app, admin tool, even manual SQL—gets logged automatically. One place. Zero chance of forgetting.
Your app is slow. Users are complaining. Your boss is asking questions.
The problem: You have 500 different queries running against your database. Which one is the bottleneck? Is it the search query? The dashboard? The nightly report? Without data, you're just guessing.
pg_stat_statements automatically tracks every query: how many times it ran, how long it took, how much data it touched. It's like having a fitness tracker for your database.

Before: Guessing and praying
Developer: "I think it's the search query?"
DBA: "Let me enable query logging..."
*3 hours later, parsing 50GB of logs*
DBA: "Actually, it might be the dashboard query?"
*Users still complaining*
After: Instant answers
-- Show me the 5 queries eating the most time
SELECT
substring(query, 1, 50) AS query,
calls,
round(total_exec_time / 1000) AS total_seconds,
round(mean_exec_time) AS avg_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;
query | calls | total_seconds | avg_ms |
SELECT * FROM orders WHERE user_id = $1 | 2,847,293 | 4,521 | 1.6 |
SELECT COUNT(*) FROM products WHERE... | 15,234 | 1,891 | 124 |
UPDATE cart SET updated_at = $1 WHERE... | 892,441 |
Look at that second query—it only runs 15K times, but each call takes 124 milliseconds. That's 100x slower than the others! Probably a missing index. Fix that one query and your app gets dramatically faster.
-- Queries that return few rows but read lots of data = missing index
SELECT query, calls, rows
FROM pg_stat_statements
WHERE calls > 100 -- Runs often
AND rows / calls < 10 -- Returns few rows per call
AND shared_blks_read / calls > 100 -- But reads tons of data
LIMIT 5;
This is like finding a needle in a haystack—except you're scanning the entire haystack every time instead of using a magnet.
You're storing user passwords. Or credit card numbers. Or medical records. Or anything sensitive.
The problem: If your database gets leaked (it happens to the best companies), what do attackers see? Plain-text passwords mean everyone's account is compromised. Everywhere. Immediately.
pgcrypto gives you military-grade encryption and hashing, right in your database. Passwords get hashed so even you can't read them. Sensitive data gets encrypted so leaks are useless without the key.

Before: Plain text passwords (disaster waiting to happen)
Frontend (React):
const login = async (email, password) => {
const res = await fetch('/api/login', { method: 'POST', body: JSON.stringify({ email, password }) });
return res.json();
};
Backend (Node.js):
app.post('/api/login', async (req, res) => {
const user = await db.query('SELECT * FROM users WHERE email = $1 AND password = $2', [req.body.email, req.body.password]);
res.json(user.rows[0]); // Plain text! Database leaked = everyone's password exposed
});
After: Properly hashed with pgcrypto
Frontend (React): Same code-no changes!
Backend (Node.js): One-line change!
app.post('/api/login', async (req, res) => {
const user = await db.query(
`SELECT * FROM users WHERE email = $1 AND password_hash = crypt($2, password_hash)`,
[req.body.email, req.body.password] // pgcrypto compares hashes
);
res.json(user.rows[0]); // Database leaked? Attackers see '$2a$06$xK9m...' - useless!
});
Database (signup):
INSERT INTO users (email, password_hash) VALUES ('alice@example.com', crypt('password123', gen_salt('bf')));Real-World Example: Encrypting Sensitive Data
-- Store encrypted (only you can decrypt with the key)
INSERT INTO medical_records (patient_id, diagnosis)
VALUES (42, pgp_sym_encrypt('Diabetes Type 2', 'secret-key'));
-- Decrypt when needed
SELECT pgp_sym_decrypt(diagnosis, 'secret-key') FROM medical_records WHERE patient_id = 42;
Generating Secure Tokens
SELECT encode(gen_random_bytes(32), 'hex') AS api_key; -- 'a7f3b2c1d4e5...'
SELECT encode(gen_random_bytes(16), 'base64') AS token; -- 'xK9mN2pQ4rS6tU8v'
Your marketing team sends you a CSV: "Here's the list of customers from our trade show. Can you match them against our database?"
The problem: You need to write a script to parse the CSV, create a temp table, import the data, run your query, then clean up. For a one-time analysis? That's a lot of work.
file_fdw lets you query files directly with SQL. The CSV stays where it is—you just point to it and query.

Before: Write a script
import csv
import psycopg2
# Parse the CSV
with open('tradeshow_leads.csv') as f:
leads = list(csv.DictReader(f))
# Create temp table, insert rows, run query...
# 30 lines of code for a simple question
After: Just query it
First, upload your CSV to the server (or a location the database can access):
# Copy file to server's data directory
scp tradeshow_leads.csv your-server:/var/lib/postgresql/data/imports/
Then query it directly:
-- One-time setup: create a foreign table pointing to the file
CREATE FOREIGN TABLE tradeshow_leads (email TEXT, company TEXT, booth TEXT)
SERVER files OPTIONS (filename '/var/lib/postgresql/data/imports/tradeshow_leads.csv', format 'csv', header 'true');
-- Now query it like any table!
SELECT t.email, t.company, c.plan
FROM tradeshow_leads t JOIN customers c ON t.email = c.email;
Note: The file path is from the database server's perspective, not your local machine. The file must be readable by the PostgreSQL process.
These extensions are available on Tiger Data but not pre-installed. Developers explicitly enable them for specific workloads.
Let’s say you're building a search for your help docs. User searches "how do I reset my password?"
The problem: Your article is titled "Account Recovery Steps." Keyword search finds nothing—no matching words. But it's exactly what the user needs!
Vector search matches meaning, not words. It understands that "reset password" and "account recovery" are the same concept.

Before: Keyword search misses relevant results
Frontend (React):
const search = async (query) => {
const res = await fetch(`/api/search?q=${query}`);
setResults(await res.json()); // User types: "reset my password" - No results
};
Backend (Node.js):
app.get('/api/search', async (req, res) => {
const results = await db.query(
`SELECT * FROM articles WHERE title ILIKE $1 OR content ILIKE $1`,
[`%${req.query.q}%`]
);
res.json(results.rows); // Misses "Account Recovery Steps" - no matching keywords!
});
After: Semantic search finds meaning
Frontend (React): Same code—no changes!
const search = async (query) => {
const res = await fetch(`/api/search?q=${query}`);
setResults(await res.json()); // "reset my password" finds "Account Recovery Steps"
};
Backend (Node.js): One line change!
app.get('/api/search', async (req, res) => {
const results = await db.query(
`SELECT title, content FROM articles
ORDER BY embedding <=> ai.openai_embed('text-embedding-3-small', $1) LIMIT 5`,
[req.query.q]
);
res.json(results.rows); // Finds "Account Recovery Steps" - same meaning!
});
Database (one-time setup):
ALTER TABLE articles ADD COLUMN embedding vector(1536);
CREATE INDEX ON articles USING ivfflat (embedding vector_cosine_ops);
UPDATE articles SET embedding = ai.openai_embed('text-embedding-3-small', content);
SELECT name, price FROM products WHERE id != 42
ORDER BY embedding <=> (SELECT embedding FROM products WHERE id = 42) LIMIT 4;
pgvector is great, but it has a limitation: its indexes (HNSW, IVFFlat) live entirely in RAM. With 1 million vectors, no problem. With 50 million? You need 64GB+ of RAM just for the index.
The problem: You're building a product search with 50 million items. Your cloud bill is exploding because you need massive RAM instances. Index builds take 8 hours. And queries are getting slower as you add more data.
pgvectorscale adds a new index type called StreamingDiskANN (inspired by the algorithm behind Bing's image search). The key difference: the index lives on disk, not RAM. This means you can handle 10x more vectors on the same hardware.
In benchmarks, pgvector + pgvectorscale achieved 28x lower latency and 16x higher throughput than Pinecone, at 75% less cost.

Before: pgvector's HNSW index (RAM-hungry)
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops);
-- 50M vectors = 64GB RAM needed
-- Build time: 8 hours
-- Query time: 500ms (slowing down as data grows)
After: pgvectorscale's StreamingDiskANN (disk-based)
CREATE INDEX ON products USING diskann (embedding vector_cosine_ops);
-- 50M vectors = 8GB RAM (index on disk)
-- Build time: 1 hour
-- Query time: 5ms (stays fast)
Vector Count | Recommendation |
Under 1 million | pgvector (HNSW) is fine |
1-10 million | Either works, but pgvectorscale is cheaper |
10+ million | pgvectorscale (you'll save a lot on RAM) |
Note: pgvectorscale works alongside pgvector. You still store vectors with pgvector's vector type. pgvectorscale just adds a better index for large scale.
Building AI features today means juggling multiple services: your database, an embedding API, and an LLM API. Every feature requires orchestrating calls between them.
The problem: Want to build a simple "Ask AI" feature? Your backend becomes a mess:
That's 4 round trips, 2 different APIs, and lots of code that can break.
pgai brings AI models directly into PostgreSQL. You can call OpenAI, Anthropic, Cohere, or even local models (Ollama) from SQL. This means:
SELECT ai.openai_embed('text-embedding-3-small', 'your text')SELECT ai.openai_chat_complete('gpt-4', ...)SELECT ai.ollama_generate('llama3', 'your prompt')One query. No orchestration. No separate API keys in your backend.

Before: Multiple API calls in your backend
app.post('/api/ask', async (req, res) => {
// Call 1: Get embedding from OpenAI
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: req.body.question
});
// Call 2: Find similar docs in database
const docs = await db.query(
`SELECT content FROM docs ORDER BY embedding <=> $1 LIMIT 3`,
[embedding.data[0].embedding]
);
// Call 3: Get answer from OpenAI
const answer = await openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: docs.rows.map(d => d.content).join('\n') + '\n\n' + req.body.question }]
});
res.json({ answer: answer.choices[0].message.content });
});
After: One SQL query does everything
app.post('/api/ask', async (req, res) => {
const result = await db.query(`
WITH relevant_docs AS (
SELECT content FROM docs
ORDER BY embedding <=> ai.openai_embed('text-embedding-3-small', $1)
LIMIT 3
)
SELECT ai.openai_chat_complete('gpt-4',
jsonb_build_array(jsonb_build_object('role', 'user', 'content',
(SELECT string_agg(content, E'\n') FROM relevant_docs) || E'\n\n' || $1
))
) AS answer
`, [req.body.question]);
res.json(result.rows[0]);
});
The database handles embedding, retrieval, and LLM call in one query. Your backend is just a pass-through.
-- Generate embeddings for all your articles (one-time)
UPDATE articles SET embedding = ai.openai_embed('text-embedding-3-small', content);
-- Summarize text with GPT-4
SELECT ai.openai_chat_complete('gpt-4',
jsonb_build_array(jsonb_build_object('role', 'user', 'content', 'Summarize: ' || article_text))
);
-- Use local Ollama models (free, private, no API costs)
SELECT ai.ollama_generate('llama3', 'Explain this error: connection refused');
SELECT ai.ollama_embed('nomic-embed-text', 'your text here');
Why this matters: Your AI logic lives in the database, close to the data. No network hops between services. No API keys scattered across your codebase. And if you want to switch from OpenAI to a local model, you change one function call.
Your users ask: "Show me coffee shops within 1 mile." Or: "Which delivery zone is this address in?" Or: "Find drivers near this pickup location."
The problem: Location math is hard. And slow. And easy to get wrong.
PostGIS adds location superpowers: store points/polygons, create spatial indexes, and answer location questions with simple functions.

Before: Painful math in your backend
Frontend (React):
const findNearbyStores = async (lat, lng) => {
const res = await fetch(`/api/stores/nearby?lat=${lat}&lng=${lng}`);
return res.json();
};
Backend (Node.js): Ugly Haversine formula!
app.get('/api/stores/nearby', async (req, res) => {
const stores = await db.query(`
SELECT *, (3959 * acos(cos(radians($1)) * cos(radians(lat)) *
cos(radians(lng) - radians($2)) + sin(radians($1)) * sin(radians(lat)))) AS distance
FROM stores HAVING distance < 5 ORDER BY distance
`, [req.query.lat, req.query.lng]);
res.json(stores.rows); // Slow. No index. Wrong at poles.
});
After: Clean and fast with PostGIS
Frontend (React): Same code-no changes!
const findNearbyStores = async (lat, lng) => {
const res = await fetch(`/api/stores/nearby?lat=${lat}&lng=${lng}`);
return res.json();
};
Backend (Node.js): Simple and readable!
app.get('/api/stores/nearby', async (req, res) => {
const stores = await db.query(`
SELECT name, address, ST_Distance(location, ST_Point($2, $1)::geography) / 1609 AS miles
FROM stores
WHERE ST_DWithin(location, ST_Point($2, $1)::geography, 8047)
ORDER BY location <-> ST_Point($2, $1)::geography
`, [req.query.lat, req.query.lng]);
res.json(stores.rows); // Uses spatial index. Fast!
});
Database (one-time setup):
ALTER TABLE stores ADD COLUMN location GEOGRAPHY(POINT, 4326);
CREATE INDEX ON stores USING GIST(location);
-- Which delivery zone is this address in?
SELECT zone_name, delivery_fee FROM delivery_zones
WHERE ST_Contains(zone_polygon, ST_Point(-73.99, 40.73)::geometry);
-- Uber-style: find available drivers within 2 miles
SELECT name, ST_Distance(location, ST_Point(-73.99, 40.73)::geography) AS meters
FROM drivers
WHERE ST_DWithin(location, ST_Point(-73.99, 40.73)::geography, 3218) AND available = true
ORDER BY meters LIMIT 5;
Looking at the list, clear patterns emerge:
Category | Extensions | What It Tells Us |
AI/ML | vector, vectorscale, ai | Every app needs semantic search |
Time-Series | timescaledb | Metrics, IoT, events are everywhere |
Security | pgcrypto, pg_auth_mon | Compliance is non-negotiable |
Geospatial |
The PostgreSQL ecosystem isn't just surviving—it's thriving. With these extensions, you can build AI-powered apps, handle time-series at scale, and add location features without leaving Postgres.
That's the power of extensions. That's why PostgreSQL is a platform, not just a database.
Data based on actual Tiger Data usage as of January 2026. Extensions ranked by number of instances where they're enabled.
-- Pre-installed (already available)
CREATE EXTENSION IF NOT EXISTS plpgsql;
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
CREATE EXTENSION IF NOT EXISTS file_fdw;
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- User-installed (available on Tiger Data)
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS vectorscale;
CREATE EXTENSION IF NOT EXISTS ai;
CREATE EXTENSION IF NOT EXISTS pg_auth_mon;
CREATE EXTENSION IF NOT EXISTS postgis;
Ready to try these extensions? Sign up for Tiger Data and get most of them pre-configured and ready to use.

Building a Reverse Video Search System With Mixpeek & PostgreSQL
Follow this step-by-step guide to learn how to create a reverse video search system with Mixpeek and PostgreSQL's AI extensions: pgvector and pgvectorscale.
Read more
Receive the latest technical articles and release notes in your inbox.
423 |
0.5 |
postgis
Location features are table stakes |
Performance | pg_stat_statements | Everyone needs to optimize |