---
title: "How to Choose a Database"
description: "Database decision guide: Start with PostgreSQL extensions for time-series, vector, and document data. Split your stack only at proven bottlenecks or compliance."
section: "Postgres guides"
---

*Written by Jakkie Koekemoer*

Most database selection guides fall into two camps: vendor pitches disguised as advice, or vague platitudes that end with "it depends." Neither helps when you're staring at a blank architecture diagram, wondering if you really need multiple databases to handle user accounts, sensor logs, and vector embeddings.

**The main principle: Start with PostgreSQL extensions (Tiger Data/TimescaleDB for time-series, **`**pgvector**`** for embeddings) and only fragment your stack when you hit proven bottlenecks at 100M+ events/day or face hard compliance isolation requirements.** Modern applications need hybrid data models, and managing multiple specialized databases creates synchronization lag, double storage costs, and distributed join complexity in application code.

This guide provides a structured decision framework based on data shape (i.e. if the data contains flat rows, nested documents, time-series points, or high-dimensional vectors), query patterns, and scale thresholds, with concrete performance benchmarks for PostgreSQL, Tiger Data (TimescaleDB), InfluxDB, MongoDB, and vector databases.

Here is a glimpse of four main database categories, their typical uses, and the golden rule:

| # | Category | Examples | When to Use |
| --- | --- | --- | --- |
| 1 | **General Purpose Relational** | Postgres, MySQL | Default choice for most applications. ACID guarantees, complex queries, data integrity. |
| 2 | **Time-Series Optimized** | TimescaleDB, InfluxDB, Tiger Data | High-volume timestamped data, metrics, IoT sensors, monitoring systems. |
| 3 | **Document Databases** | MongoDB | Flexible schemas, semi-structured data, rapid iteration, content management. |
| 4 | **Vector Databases** | Pinecone, Weaviate, pgvector | Embeddings, semantic search, AI/ML applications, similarity matching. |

> **Golden Rule:** Don’t specialize until you hit >100M events/day or have specific isolation requirements for compliance. 


## Which Database Should I Use? The Three-Step Decision Framework

Database selection starts with understanding your data and how you’ll query it—not with feature lists or marketing claims.

### Step 1: What Types of Data Am I Storing?

Different data structures require different storage patterns. Relational data (users, orders) needs ACID transactions and joins. Time-series data (sensor readings, metrics) requires compression and temporal queries. Document data (product catalogs) works with flexible schemas. Vector data (embeddings) needs specialized similarity search. When multiple types must be queried together, unified databases prevent data silos and eliminate synchronization lag.

**Relational Data**: Structured tables with foreign keys and complex relationships. Best stored in RDBMS systems that enforce [ACID guarantees](https://www.tigerdata.com/learn/understanding-acid-compliance) and support complex joins. Example: e-commerce with users, orders, and inventory.

**Time-Series Data**: Ordered sequences of timestamped events. Optimized storage requires [compression](https://www.tigerdata.com/compression), [time-based partitioning](https://www.tigerdata.com/learn/data-partitioning-what-it-is-and-why-it-matters), and efficient range queries. Example: IoT sensor readings, application metrics, financial tick data.

**Document Data**: Semi-structured JSON with flexible schemas. Works well in document stores or [JSONB columns in PostgreSQL](https://www.tigerdata.com/learn/how-to-query-jsonb-in-postgresql). Example: product catalogs with varying attributes.

**Vector Data**: High-dimensional arrays for semantic search and AI applications. Requires specialized indexing ([HNSW](https://www.tigerdata.com/learn/vector-database-basics-hnsw), IVF) for performant similarity queries. Example: RAG applications, image search.

**Key insight**: If you have multiple data types that need to be queried together, a multi-model database prevents data silos. For example, an [IoT platform](https://www.tigerdata.com/industrial-iot) storing both device metadata (relational) and sensor readings (time-series) benefits from querying both in a single transaction.

### Step 2: What Are My Query Patterns?

Your access patterns determine database architecture more than data shape.

**On-line transaction processing **[**(OLTP)**](https://www.tigerdata.com/learn/understanding-oltp)** (Transaction Processing)**: Point lookups, updates, and inserts optimized for low-latency single-row operations. Example: `SELECT * FROM users WHERE id = 123`.

**Online analytical processing **[**(OLAP)**](https://www.tigerdata.com/learn/olap-workloads-on-postgresql-a-guide)** (Analytical Processing)**: Aggregations across large datasets with time ranges. Example: `SELECT device_id, AVG(temperature) FROM sensors WHERE time > NOW() - INTERVAL '7 days' GROUP BY device_id`.

**Vector Search**: Nearest-neighbor queries using distance metrics. Example: Finding semantically similar documents for RAG applications.

[**Hybrid Queries**](https://www.tigerdata.com/search): Combining multiple patterns. Example: “Find products similar to this one (vector) purchased in the last month (time-series) by premium users (relational).”

Most real-world applications require hybrid queries. This is where the “one database per data type” approach breaks down, forcing complex orchestration in application code.

### Step 3: What Scale Am I Operating At?

Scale determines whether optimization matters or if developer velocity should win.

**Under 10K events/day**: Any database works. Choose based on team familiarity. Poor tuning overwhelms actual query performance differences at this scale.

**10K to 1M events/day**: Optimization starts to matter. [Indexes](https://www.tigerdata.com/learn/postgresql-performance-tuning-optimizing-database-indexes), [partitioning](https://www.tigerdata.com/learn/postgresql-partition-strategies-and-more), and query planning become relevant. PostgreSQL with extensions (Tiger Data for time-series, `pgvector` for embeddings) provides the best balance.

**1M to 100M events/day**: Purpose-built optimizations become critical but don’t necessarily require separate databases. Features like [hypertables](https://www.tigerdata.com/learn/is-postgres-partitioning-really-that-hard-introducing-hypertables) (automatic time-based partitioning), [continuous aggregates](https://www.tigerdata.com/learn/creating-a-fast-time-series-graph-with-postgres-materialized-views), and [compression](https://www.tigerdata.com/learn/what-is-data-compression-and-how-does-it-work) deliver 90%+ compression ratios and sub-millisecond queries.

**Over 100M events/day**: Horizontal scaling or highly specialized databases may be required. Operational complexity becomes justified by performance gains.

Here is a diagram that shows how you can choose the right database based on your data type and scale:


## How Do Different Database Types Compare? 

| Database Type | Best For | Key Strengths | Limitations | Typical Use Case |
| --- | --- | --- | --- | --- |
| PostgreSQL | ACID compliance, complex joins, structured data | Mature ecosystem, strong consistency, SQL standardization | High ingestion rates (>100K/sec) require tuning | E-commerce platforms |
| Tiger Data (TimescaleDB) | High ingest (>100K/sec), temporal queries, IoT | 90%+ compression, automatic partitioning, SQL compatible | Learning curve for time-series features | Sensor networks, monitoring |
| InfluxDB | Pure time-series workloads, metrics | Purpose-built for time-series, SQL support (v3+) | Limited relational capabilities | Application performance monitoring |
| MongoDB | Unstructured data, rapid prototyping | Schema flexibility, horizontal scaling | Complex analytical queries expensive | Content management, catalogs |
| Pinecone/Weaviate | Massive-scale vector search (100M+ vectors) | Automatic scaling, optimized similarity search | Limited metadata filtering, operational overhead | Large-scale semantic search |
| pgvector | Vector search with relational/temporal filters | Integrated filtering, PostgreSQL ecosystem | Scales efficiently to 50M vectors; beyond that, cost efficiency decreases | RAG applications, hybrid search |

## When Should I Use One Database vs Multiple Databases?

**Use a unified database (PostgreSQL with extensions) when**: Your team has fewer than 10 engineers, your data types overlap (IoT platforms querying sensor readings with device metadata), or you operate between 1M-100M events/day where optimization matters but horizontal scaling doesn’t.

**Use multiple databases when**: Compliance requires physical isolation (HIPAA, PCI-DSS), workloads have fundamentally incompatible requirements (strict serializable isolation vs eventual consistency), or you’ve proven a bottleneck that exhausts optimization options. [See how Glooko unified its stack on Tiger Cloud without sacrificing HIPAA compliance](https://www.tigerdata.com/blog/how-glooko-turns-3b-data-points-per-month-into-lifesaving-diabetes-healthcare-tiger-data).

### The Hidden Costs of Multiple Databases

The *“polyglot persistence”* (using multiple types of databases in the same application, each for its specific use) pattern creates:

- **Synchronization lag**: Vector search returns documents deleted 5 seconds ago because the delete event hasn’t propagated from PostgreSQL to Pinecone.
- **Double storage costs**: Same data exists in PostgreSQL (metadata) and InfluxDB (time-series), often with minimal compression in one system.
- **Distributed joins in application code**: Your backend becomes a database query coordinator, implementing joins, filters, and pagination across multiple systems, which effectively slows it down.

## Common Scenarios: Which Database Should I Choose?

### Scenario A: IoT Platform (Sensors + Device Management)

**Requirements**: Ingest 500K sensor readings/second, join with device metadata, query historical trends.

**Recommendation**: Tiger Data (TimescaleDB)

**Why**: You need to join time-series data (sensor readings) with relational data (device configurations, alert rules). A split architecture (InfluxDB + PostgreSQL) requires denormalizing device metadata into every sensor reading (bloat), implementing joins in application code (complexity), or accepting stale metadata (correctness issues).

`-- Single query joining time-series and relational data
SELECT 
    d.name AS device_name,
    AVG(s.temperature) AS avg_temp,
    d.alert_threshold
FROM sensor_readings s
JOIN devices d ON s.device_id = d.id
WHERE 
    s.time > NOW() - INTERVAL '1 hour'
    AND d.location = 'Building A'
    GROUP BY d.name, d.alert_threshold, d.id
HAVING AVG(s.temperature) > MAX(d.alert_threshold);`

### Scenario B: AI Agent with Conversational Memory

**Requirements**: Store conversation history (time-series), embeddings for semantic search (vector), user preferences (relational).

**Recommendation**: [Tiger Data Hybrid Search](https://www.tigerdata.com/search)

**Why**: AI agents need episodic memory (“What did the user ask 10 minutes ago?”) and semantic memory (“What past conversations are relevant?”). These must be queried together for context-aware responses.

`-- AI agent memory: hybrid query
SELECT 
    c.message,
    c.timestamp,
    u.name AS user_name,
    c.embedding <=> $1::vector AS distance
FROM conversations c
JOIN users u ON c.user_id = u.id
WHERE 
    c.user_id = $2
    AND c.timestamp > NOW() - INTERVAL '7 days'
    AND u.subscription_tier = 'premium'
ORDER BY c.embedding <=> $1::vector
LIMIT 5;`

### Scenario C: Content Management System

**Requirements**: Flexible schemas, document relationships, full-text search.

**Recommendation**: PostgreSQL with JSONB

**Why**: PostgreSQL’s JSONB provides schema flexibility without sacrificing indexing, full-text search via GIN indexes, and JSON path queries comparable to MongoDB. MongoDB is only necessary if you need horizontal sharding from day one or lack PostgreSQL expertise.

### Scenario D: High-Frequency Trading

**Requirements**: Sub-millisecond latency, millions of ticks per second, complex event processing.

**Recommendation**: Tiger Data for storage/analytics + in-memory processing (Redis/Apache Flink) for execution

**Why**: Trading systems need both real-time processing (<1ms) and historical analysis (backtesting). Tiger Data handles the analytical workload while purpose-built in-memory systems handle execution. This is a justified split because the latency requirements are fundamentally different.

## Performance Benchmarks

The following benchmarks compare Tiger Data/TimescaleDB against standard PostgreSQL and InfluxDB for typical time-series workloads. Test setup: 100M rows inserted, 1000 concurrent queries over 7-day time ranges on AWS EC2 m5.2xlarge instances.

| Metric | Tiger Data/TimescaleDB | Standard PostgreSQL | InfluxDB 3.x |
| --- | --- | --- | --- |
| Sustained Ingest Rate | 100K-115K rows/sec | 15K-20K rows/sec | 300K-400K rows/sec |
| Peak Ingest (optimized) | Up to 1.2M rows/sec* | 50K rows/sec | 800K rows/sec |
| Compression Ratio | 90-97% | 0% (no native compression) | 60% |
| Aggregate Query (p95) | 45-120ms | 8,200ms | 120ms |
| Point Query (p95) | 2ms | 3ms | 5ms |
| Storage (100M rows) | 2-4 GB | 42 GB | 16 GB |


* Peak rates require optimized conditions: high cardinality, 32 concurrent connections, ordered data. Real-world sustained rates typically range from 100K-140K rows/sec depending on hardware and data characteristics. Source: [Official TimescaleDB benchmarks](https://www.timescale.com/blog/timescaledb-vs-6a696248104e/).

**Key observations**: Tiger Data’s compression achieves 90-97% reduction through columnar storage and delta encoding. Smaller datasets fit in memory, improving query performance. Aggregate queries show the biggest gap due to continuous aggregates pre-computing common rollups.

## Frequently Asked Questions

### Is PostgreSQL Fast Enough for Time-Series Data?

**Yes, with extensions like Tiger Data (TimescaleDB).** Standard PostgreSQL lacks automatic time-based partitioning, compression optimized for temporal patterns, and time-bucketing functions. Tiger Data adds these features while maintaining PostgreSQL’s ACID guarantees and full SQL compatibility.

**Note**: Tiger Data is the company name (rebranded from Timescale Inc. in June 2025); TimescaleDB is the PostgreSQL extension that provides time-series optimizations.

#### Time-Series Optimization Example

`-- Create a hypertable for sensor data
CREATE TABLE sensor_readings (
    time TIMESTAMPTZ NOT NULL,
    device_id INTEGER NOT NULL,
    temperature DOUBLE PRECISION,
    humidity DOUBLE PRECISION
);

-- Convert to hypertable (automatic time-based partitioning)
SELECT create_hypertable('sensor_readings', 'time');

-- Create continuous aggregate for real-time dashboards
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT 
    time_bucket('1 hour', time) AS hour,
    device_id,
    AVG(temperature) as avg_temp
FROM sensor_readings
GROUP BY hour, device_id;

-- Set up data retention (auto-delete data older than 90 days)
SELECT add_retention_policy('sensor_readings', INTERVAL '90 days');`

**Why this matters**: You can join sensor data with device metadata without ETL pipelines. Performance scales automatically as data grows. [Retention policies](https://www.tigerdata.com/learn/what-is-data-retention-policy) are native SQL, not custom scripts.

### Do I Need a Separate Vector Database for RAG Applications?

**Not unless you exceed 50 million embeddings or require specialized distributed architectures.** Integrated solutions (`pgvector` in PostgreSQL/Tiger Data) provide a critical advantage: filtering vectors by metadata or time in a single query.

#### The Vector Database Threshold

`pgvector` scales efficiently to tens of millions of vectors with proper tuning. At approximately 10M vectors with 1536 dimensions, you need ~60GB RAM for in-memory HNSW indexes. Beyond this scale, cost-efficiency may favor disk-based indexes (pgvectorscale’s StreamingDiskANN) or dedicated vector databases.

**Verified capabilities**: Tiger Data benchmarks show `pgvector` handling 50 million vectors successfully. With the `pgvectorscale` extension, performance shows 28x lower latency and 16x higher throughput versus Pinecone’s s1 pods at 99% recall. Source: [Tiger Data benchmarks](https://www.tigerdata.com/blog/pgvector-vs-pinecone).

#### Hybrid Query Advantage

Consider this [RAG (Retrieval-Augmented Generation)](https://www.tigerdata.com/learn/rag-is-more-than-just-vector-search) scenario:

`-- Create table for document embeddings
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding vector(1536),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    category TEXT,
    author_id INTEGER
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents 
USING hnsw (embedding vector_cosine_ops);

-- Hybrid query: similarity + time filter + relational join
SELECT 
    d.content,
    d.embedding <=> $1::vector AS distance,
    a.name AS author_name
FROM documents d
JOIN authors a ON d.author_id = a.id
WHERE 
    d.created_at > NOW() - INTERVAL '30 days'
    AND d.category = 'technical'
ORDER BY d.embedding <=> $1::vector
LIMIT 10;`

**In a dedicated vector database**, you’d need to: (1) Store embeddings in Pinecone, (2) Store metadata in PostgreSQL, (3) Query Pinecone for top 1000 similar vectors, (4) Filter in application code for time/category, (5) Query PostgreSQL for author names, (6) Merge results in application logic. The integrated approach executes this in a single query with proper indexing for all filters.

### When Should I Use MongoDB vs PostgreSQL?

**Migrate to MongoDB when**: Your data is genuinely schema-less with unpredictable structure, you need horizontal sharding from day one, or your team has MongoDB expertise but not PostgreSQL.

**Stay with PostgreSQL when**: You need consistent joins, ACID transactions across documents, or complex analytics. PostgreSQL’s JSONB data type provides schema flexibility with better query performance for most use cases.

#### PostgreSQL JSONB vs MongoDB

Modern PostgreSQL handles [semi-structured data](https://www.tigerdata.com/learn/structured-vs-semi-structured-vs-unstructured-data-in-postgresql) efficiently:

`-- Store flexible product catalog in PostgreSQL
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name TEXT,
    attributes JSONB
);

-- Create GIN index for fast JSONB queries
CREATE INDEX idx_attributes ON products USING GIN (attributes);

-- Query nested JSON with indexing
SELECT name, attributes->'specs'->>'weight' AS weight
FROM products
WHERE attributes @> '{"category": "electronics", "in_stock": true}'::jsonb
ORDER BY (attributes->>'price')::numeric;`

**Trade-offs**: MongoDB excels at horizontal scaling and schema evolution. PostgreSQL excels at complex joins (MongoDB’s `$lookup` is 10-100x slower according to [independent benchmarks](https://kinsta.com/blog/mongodb-vs-postgresql/)), analytical queries, and maintaining data consistency. MongoDB 5.0+ includes native [time-series collections](https://www.mongodb.com/docs/manual/core/timeseries-collections/), narrowing the gap for temporal workloads.

### What About InfluxDB 3.x?

InfluxDB 3.x (current version) uses SQL as its primary query language via Apache DataFusion, maintaining backward compatibility with InfluxQL. Previous versions used Flux (v2.x) or InfluxQL (v1.x). InfluxDB excels at pure metrics workloads but lacks the relational capabilities needed for hybrid queries. Learn more: [InfluxDB vs Tiger Data comparison](https://www.tigerdata.com/timescaledb-vs-influxdb).

### What’s the Difference Between Tiger Data and TimescaleDB?

Tiger Data is the company name (rebranded from Timescale Inc. in June 2025). TimescaleDB is the PostgreSQL extension that provides time-series optimizations. The extension works with PostgreSQL 15, 16, and 17.

## Conclusion: Start Simple, Scale When Necessary

Most applications benefit from a unified architecture until they have proven bottlenecks. Start with PostgreSQL. Add Tiger Data extensions for time-series optimization. Use `pgvector` for vector search. Leverage `JSONB` for document flexibility. Only split your stack when you’ve exhausted optimization options or hit hard constraints like compliance isolation.

> **The database decision rule:** Don’t introduce operational complexity to solve theoretical problems. Benchmark your actual workload. If you’re considering multiple databases for relational, time-series, and vector data without hitting 100M+ events/day or facing compliance requirements, you’re likely over-engineering.

**Stop managing three different databases for one application.** Unify your relational, time-series, and vector workloads with Tiger Data. Experience the speed of a specialized database with the reliability of PostgreSQL. [Try Tiger Data free today](https://www.tigerdata.com/signup).

---

*About the author: Jakkie Koekemoer is the Engineering Manager at Draft.dev. With more than 20 years of experience as a developer, data engineer, and manager, he's committed to solving real problems and bringing the best out of people. He lives in Cape Town, South Africa. *