---
title: "Understanding Postgres Performance Limits for Analytics on Live Data"
published: 2026-02-25T14:18:16.000-05:00
updated: 2026-03-24T13:21:22.000-04:00
excerpt: "PostgreSQL hits hard limits under analytics workloads. Here's why MVCC, WAL, and row storage compound — and what to do instead."
tags: PostgreSQL, Analytics, PostgreSQL Performance
authors: Matty Stratton
---

> **TimescaleDB is now Tiger Data.**

## The Pattern Recognition Moment

You're reviewing monitoring on a normal workday. There hasn't been a new deployment, no weird traffic spike, and no schema changes. But p95 write latency has crept from 8ms to 25ms over the past month, and last week it touched 45ms. Your largest tables crossed 500M rows sometime in March and they're still climbing.

Six weeks of data points, all trending the same direction.

![](https://timescale.ghost.io/blog/content/images/2026/02/diagram-2-p95-write-latency.png)

You've run Postgres in production for years. You've tuned queries, rebuilt indexes, and right-sized instances. But this time the fixes don't stick; every new index or config tweak brings the metrics back down for a few weeks, then they climb again. You can plot the trajectory out three months and know exactly where it lands.

So you do a proper audit: query plans, connection overhead, table stats, bloat. Everything checks out. Schema is sound, indexes cover the hot paths, and configs follow best practices. A consultant confirms the same: nothing misconfigured. But performance keeps degrading, and it correlates with data volume, not traffic.

You look closer at the workload. Most writes are inserts, not updates, and every row carries a timestamp. Queries almost always filter by time range. Data arrives continuously, not in batches or bursts, but as a steady stream that never pauses. You need months or years of retention, and you're not just storing this data. You're querying it under latency requirements.

This doesn't fit the profile of a transactional workload, and it doesn't fit a data warehouse either. It's continuous high-frequency ingestion that needs to stay operationally queryable.

Postgres is a brilliant general-purpose database. The same design choices that make it handle e-commerce, SaaS backends, and CMS workloads so well create compounding overhead for sustained high-frequency [time-series](https://www.tigerdata.com/learn/time-series-database-what-it-is-how-it-works-and-when-you-need-one) ingestion with long retention. Design tradeoffs, not bugs. Baked into the architecture by intent.

You are not fighting misconfiguration. You are fighting architectural boundaries designed for a different workload class.

This piece walks through what we call the Optimization Treadmill: the sequence of phases most teams follow, each a correct response to observed symptoms, each providing temporary relief without changing the underlying trajectory. Understanding the mechanics of why the treadmill exists is what lets you recognize it early. If you recognize the scenario above, this is a common path. The question isn't whether you'll hit the ceiling. It's when, and how much runway you have left when you do.

## What This Workload Looks Like

Not all high-write workloads will hit this wall. Postgres handles enormous write volumes for e-commerce, social networks, and SaaS backends without issue. The friction comes from a specific combination of six characteristics. If four or five describe your system, the optimization phases in the next section will be familiar.

**Continuous high-frequency ingestion.** Thousands to hundreds of thousands of inserts per second, 24/7, with no pause: IoT sensors reporting every few seconds, financial systems processing trades in real time, or APM platforms collecting metrics from thousands of hosts. High-frequency data generation is independent of user count. Batch systems get quiet periods where the database can run maintenance, but continuous ingestion never stops. Maintenance competes directly with writes, and there is no scheduling window.

**Time-series access patterns.** Nearly every row has a timestamp, and queries almost always include time range filters. "Last 30 minutes of CPU utilization," "this week compared to last week," "all transactions between two dates." This goes beyond a `created_at` column; the entire query pattern revolves around time. General-purpose indexes aren't built for this access pattern, which is why teams end up reimplementing time-based data organization through manual partitioning scripts and custom tooling.

**Append-only data.** Once written, rows rarely change. Sensor readings don't get updated, financial transactions are immutable, log entries are permanent. Deletes happen in bulk (drop an entire month's partition), not row by row. MVCC exists to handle concurrent reads and writes on the same rows. Append-only workloads pay that overhead on data they never touch again. Autovacuum is running constantly just to clean up dead tuples that were never created through updates.

**Long retention.** Months to years, not days or weeks. Compliance might require seven years of financial records, manufacturing teams need root cause analysis across quarters, and ML pipelines need two-plus years of training data. Shortening retention will just hide architectural problems because old data ages out, and long retention means unbounded table growth. At 50K inserts per second, that's roughly 1.5 billion rows per year. After three years? 4.5 billion rows.

**Operational query requirements.** This isn't cold storage or an analytics warehouse you query once a day. You need millisecond responses on the last day's data, sub-second on the last week, and reasonable performance across the full retention window. Real-time dashboards, alert systems, user-facing analytics, ad-hoc investigation, all querying the same database. Data warehouse depth with operational latency requirements.

**Sustained growth.** Data volume growing 50–100%+ year over year on a predictable curve. Static workloads can be over-provisioned once and left alone, but growing workloads demand constant re-optimization. You're not solving for current scale. You're chasing projected scale, and the gap keeps widening.

If four or five of these apply, the next section maps the optimization path most teams follow. If your workload is standard OLTP, batch warehouse, low-volume time-series, or short-retention, the underlying issues are likely different.

This combination of characteristics didn't exist at scale 15 years ago. It's a product of specific infrastructure shifts: billions of connected devices generating continuous telemetry, high-frequency trading systems that treat microseconds as a competitive moat, AI pipelines that require years of operational history as training data, and observability platforms collecting metrics from every process in a distributed system. The cloud didn't just scale these workloads up. It made them continuous. Machines that never go offline generate data that never stops. That changed what operational databases are asked to do, and general-purpose engines weren't redesigned to match.

## The Optimization Path

Most teams working this pattern follow roughly the same sequence. Each phase is a reasonable response to observed symptoms, but each buys 3–6 months of relief at most, adds operational complexity, and has diminishing returns. The optimizations address symptoms without changing the underlying architecture. The ceiling doesn't move. You do, until you run out of room.

### Phase 1: Index optimization

The trigger is predictable: query performance degrades as tables grow past 50–100M rows, or sequential scans on a 100M-row table take minutes. The textbook answer is to add B-tree indexes on timestamp columns, build composite indexes for common filter combinations, create partial indexes on hot time ranges, and run ANALYZE to refresh `pg_statistic`.

```sql
-- Composite index for the most common dashboard query pattern
CREATE INDEX idx_metrics_device_time
  ON device_metrics (device_id, ts DESC);

-- Partial index covering only the hot partition
CREATE INDEX idx_metrics_recent
  ON device_metrics (ts DESC)
  WHERE ts > now() - interval '7 days';
```

A query that did a sequential scan across 100M rows now hits an index and returns in milliseconds. 10–100x improvement on read performance is typical. Problem solved, for now.

Issues start showing up as tables grow past 300M rows. Every INSERT must update every index on the table. With five indexes, each insert performs six write operations: one heap tuple write and five B-tree leaf page insertions. At 50K inserts/sec, that's 300K write operations per second. Each index insertion traverses the B-tree, potentially causing page splits that trigger additional I/O. `pg_stat_user_indexes` starts showing index bloat climbing:

```sql
-- Monitoring index bloat
SELECT schemaname, tablename, indexname,
       pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
       idx_scan as index_scans,
       idx_tup_read,
       idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
```

Index sizes grow faster than table sizes because B-trees don't reclaim space efficiently for append-heavy, time-ordered data. For keys that increase monotonically like timestamps, inserts concentrate on the rightmost leaf pages, resulting in repeated splits. Old leaf pages become sparse but remain allocated. You've improved read latency at the cost of write throughput, and this workload needs both.

### Phase 2: Table partitioning

Your largest table has crossed 800M to 1B rows, and dropping old data via DELETE causes table bloat and long-running transactions that block autovacuum. You implement time-based range partitioning (typically daily or weekly).

```sql
-- Partitioned table setup
CREATE TABLE device_metrics (
    ts          timestamptz NOT NULL,
    device_id   bigint NOT NULL,
    metric      text NOT NULL,
    value       double precision
) PARTITION BY RANGE (ts);

-- Daily partitions created by cron or pg_partman
CREATE TABLE device_metrics_20250601
  PARTITION OF device_metrics
  FOR VALUES FROM ('2025-06-01') TO ('2025-06-02');
```

Implementation requires automation: cron jobs or pg\_partman to create future partitions, monitoring to detect gaps where partition creation failed, and careful handling of queries that span partition boundaries. Backup and restore now operates on hundreds of individual tables, `pg_dump` time scales with partition count, and schema migrations touch every partition.

The wins are concrete. Queries with time-range filters trigger partition pruning, and EXPLAIN shows the planner excluding irrelevant partitions:

```sql
EXPLAIN SELECT avg(value) FROM device_metrics
WHERE ts > now() - interval '1 hour';

-- Scans 1-2 partitions instead of the entire table
-- "Partitions removed: 498 of 500"
```

Dropping old data becomes `DROP TABLE device_metrics_20240101` instead of a multi-hour DELETE that generates gigabytes of WAL and dead tuples.

What happens at 500+ partitions? The [PostgreSQL documentation on partitioning best practices](https://www.postgresql.org/docs/current/ddl-partitioning.html) is direct about the cost: "Planning times become longer and memory consumption becomes higher when more partitions remain after the planner performs partition pruning." `pg_partman` maintenance jobs occasionally fail silently, leaving gaps. Queries spanning long ranges (quarterly reports, year-over-year comparisons) hit hundreds of partitions and regress in performance. Each active partition still has its own autovacuum overhead. The write path is faster per-partition but aggregate write load is unchanged. And the operational complexity is real. New engineers need to understand the partitioning scheme, the automation scripts, the monitoring for gaps, the procedures for backfills, and the implications for schema changes.

### Phase 3: Autovacuum tuning

This is where it starts to feel wrong. You're tuning a cleanup process for data you never modify. `n_dead_tup` counts are climbing on active partitions, `last_autovacuum` timestamps show vacuum running constantly but falling behind during write peaks, and `pg_stat_activity` regularly shows autovacuum workers competing for I/O.

Even append-only workloads generate work for autovacuum. Aborted transactions leave dead tuples. Hint-bit setting (marking tuples as known-committed or known-aborted to avoid future `pg_xact` lookups) requires dirtying pages. And since PostgreSQL 13, autovacuum triggers based on insert count (not just dead tuples) specifically to freeze tuples and update the visibility map. At high insert rates, this means autovacuum fires continuously on tables that never see a single UPDATE or DELETE.

```sql
-- Per-table autovacuum settings on high-traffic partitions
ALTER TABLE device_metrics_20250601 SET (
    autovacuum_vacuum_scale_factor = 0.01,    -- default 0.2
    autovacuum_vacuum_cost_delay = 2,         -- default 2ms (20ms before PG 12)
    autovacuum_vacuum_cost_limit = 1000       -- default 200
);
```

```
# postgresql.conf adjustments
autovacuum_max_workers = 6            # default 3
autovacuum_naptime = 15s              # default 1min
maintenance_work_mem = 2GB            # default 64MB
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_cost_limit = 800
```

This helps stabilize bloat, and `pg_stat_user_tables.n_dead_tup` stays under control. But autovacuum workers now consume measurable CPU and I/O continuously, and monitoring shows autovacuum in `pg_stat_activity` at all hours. During write peaks, vacuum falls behind, bloat creeps back, and query performance becomes variable. You're tuning a process that exists to clean up overhead your workload doesn't fundamentally produce, but that the storage engine creates anyway.

### Phase 4: Vertical scaling

All of your optimizations are showing diminishing returns. The next logical step is to add more resources: upgrade from 16 vCPU/64GB to 32 vCPU/128GB with provisioned IOPS storage (e.g., io2 at 10,000+ IOPS on AWS).

More CPU gives autovacuum workers room to operate without starving query execution. More RAM increases `shared_buffers` and OS page cache hit rates, reducing physical disk reads. Faster storage reduces I/O wait time across the board. This gives you roughly six months of headroom.

Math doesn't lie: the infrastructure cost doubled or tripled, but data growth is still exponential. At the current trajectory, you'll need another upgrade in 12 months. The database cost line item is growing faster than the product revenue it supports.

### Phase 5: Read replicas

Dashboards and analytics queries compete with ingestion for CPU and I/O on the primary. You add 1–3 streaming replicas, configure pgbouncer or pgpool to route read traffic, and separate the connection pools. Immediately, write performance on the primary improves. Expensive analytical queries run against replicas without blocking ingestion.

The primary still carries the full write load. At sustained high insert rates generating tens of megabytes per second of WAL, replicas that fall behind accumulate WAL on the primary, consuming disk. The further behind a replica gets, the more WAL the primary must retain, and high write volume is exactly what causes replicas to fall behind in the first place. Real-time dashboards pointing at lagging replicas show stale data. You're now managing multiple Postgres instances with their own monitoring, autovacuum tuning, and connection pooling. The write bottleneck is still untouched.

### Taking stock

After all five phases, this is what the infrastructure looks like: partitioned tables across 500+ partitions with `pg_partman` automation and monitoring, aggressive per-table autovacuum settings under constant adjustment, instances upgraded 2–3x from original specs with provisioned IOPS, 2–3 streaming replicas with connection-level routing, detailed runbooks covering partition management, vacuum procedures, and failover scenarios.

Each optimization was the right response. Each bought time. Yet the trajectory is unchanged.

![](https://timescale.ghost.io/blog/content/images/2026/02/diagram-1-latency-across-optimization-phases.png)

Senior engineers are now spending 20–30% of their time on database operations. Quarterly planning includes a database scalability line item. New hire onboarding takes weeks before someone can safely operate the partitioning scheme. The team has become part product engineering, part DBA.

Is this inherent to the scale, or is it inherent to the architecture?

The answer matters because the two problems have different solutions. Optimization within the right architecture has a ceiling you can raise. Optimization against an architectural mismatch has a ceiling that doesn't move. Only the timeline changes. For this workload pattern, the ceiling is structural. The question was never if you'd hit it. It was always when.

## Why These Optimizations Hit a Ceiling

The optimization phases above aren't ineffective. Each one operates within architectural boundaries that weren't designed for this workload pattern, and those boundaries constrain how much any optimization can actually move the needle. Understanding the mechanics explains why returns diminish.

Postgres is a brilliant general-purpose relational database. Its design handles an enormous range of workloads well: e-commerce, content management, authentication, SaaS backends. "General-purpose" means optimized for the average case. High-frequency time-series ingestion with long retention is not the average case. Four core design decisions create this compounding overhead.

### MVCC (Multi-Version Concurrency Control)

MVCC lets readers and writers operate concurrently without lock contention. The [PostgreSQL documentation on concurrency control](https://www.postgresql.org/docs/current/mvcc-intro.html) describes the core guarantee: "reading never blocks writing and writing never blocks reading." When a row is updated, Postgres keeps the old tuple version visible to in-flight transactions, and autovacuum later marks dead tuples as reusable. For workloads with concurrent reads and updates on shared rows, this is an excellent tradeoff.

For append-only ingestion, every insert still pays the full MVCC cost. Each heap tuple carries a fixed-size header (23 bytes on most machines) containing `t_xmin`, `t_xmax`, `t_cid`, `t_ctid`, `t_infomask`, `t_infomask2`, and `t_hoff`. These fields track transaction visibility, even though the row will never be updated or deleted by a transaction. Extra cost with no extra value.

The write amplification is easily observable. A 1KB sensor reading becomes:

-   23-byte heap tuple header (plus alignment padding and a 4-byte `ItemIdData` pointer)
-   1,024 bytes of row data
-   5 index entries (assuming 5 indexes, ~40–80 bytes each in B-tree leaf pages)
-   ~1.2KB WAL record (heap insert + index insertions)

Total actual I/O: roughly 2.5–3.5KB per 1KB of logical data. At 100K inserts/sec of 1KB rows, you're writing 250–350MB/sec of actual I/O for 100MB/sec of application data. The exact ratio varies with row width, index count, and whether `full_page_writes` triggers after a checkpoint.

![](https://timescale.ghost.io/blog/content/images/2026/02/diagram-3-logical-data-vs-IO-breakdown.png)

Autovacuum still has work to do on append-only tables. Aborted transactions leave dead tuples, and hint-bit setting (marking tuples as known-committed or known-aborted to avoid future `pg_xact` lookups) requires dirtying pages. At high insert rates, even these minor sources of work keep autovacuum continuously active. `pg_stat_user_tables.n_dead_tup` may stay low, but `vacuum_count` and `autovacuum_count` keep climbing steadily.

### Row-based storage with B-tree indexes

Postgres stores data as a heap of 8KB pages, each containing variable-length tuples laid out row by row. Every tuple contains all columns. B-tree indexes map key values to ctid (page number + offset) pointers into the heap.

For time-series analytics, this creates read amplification:

```sql
SELECT avg(temperature)
FROM sensor_readings
WHERE ts > now() - interval '1 hour'
  AND device_id = 42;
```

This query needs two columns: `ts` and `temperature`. If the table has 30 columns, Postgres reads all 30 columns for every matching row from the heap pages. The I/O is 15x what a columnar layout would require, where only the referenced columns are read from disk.

Time-series data also compresses extremely well in columnar formats. Sequential timestamps delta-encode to near-zero storage (a regular interval collapses from 8 bytes per timestamp down to a single bit via delta-of-delta encoding), and repeated device IDs run-length-encode. Floating-point sensor values compress with XOR-based compression derived from Facebook's Gorilla algorithm ([Pelkonen et al., "Gorilla: A Fast, Scalable, In-Memory Time Series Database," VLDB, 2015](http://www.vldb.org/pvldb/vol8/p1816-teller.pdf)). Columnar storage routinely achieves 10–20x compression on time-series data. Row-based heap storage can't apply any of these techniques because values from different columns are interleaved on the same page.

On the write side, B-tree index maintenance creates significant overhead. Each insert traverses every index's B-tree from root to leaf, finds the correct leaf page, and inserts the new entry. If the leaf page is full, it splits, which can cascade up the tree. For time-ordered data, inserts concentrate on the right edge of timestamp indexes, creating contention on a small number of leaf pages. Non-timestamp indexes (device ID, metric type) scatter inserts across the tree, causing random I/O. With five indexes on a table, every row insert performs one heap page write, five B-tree traversals and leaf page insertions, plus WAL records for each. At 50K inserts/sec, that's 50K heap writes + 250K index insertions per second.

### Query planning overhead

The Postgres planner runs a full optimization pass on every query: it enumerates possible paths, estimates costs from `pg_statistic` entries, considers index usage, evaluates join orders, and selects an execution plan. For workloads with diverse, unpredictable query patterns involving complex joins, this is the right approach.

For time-series workloads, query shapes are highly repetitive. The same `WHERE ts > now() - interval '...'` filter runs thousands of times per second. The full planning cycle executes every time. At high query rates, planning overhead is measurable in `pg_stat_statements` as the gap between `total_plan_time` and `total_exec_time`.

Statistics maintenance creates its own cost. ANALYZE samples rows to populate `pg_statistic`, with the sample size scaled by `default_statistics_target` (default: 100, which yields roughly 30,000 sampled rows). On billion-row tables, even this sampling-based statistics collection is expensive and must run frequently to keep estimates accurate. Stale statistics provide poor cardinality estimates, leading the planner to choose sequential scans over index scans, or vice versa.

With hundreds of partitions, the planner must evaluate partition pruning for each partition's bounds against the query predicates. This is fast per-partition but scales linearly with partition count. At 500+ partitions, plan time for simple queries can exceed execution time.

### Write-Ahead Logging (WAL) volume

Every data modification generates a WAL record before it's applied to the heap or index pages. WAL writes are sequential and synchronous (fsync per commit, or per `wal_writer_delay` interval with asynchronous commit). At 100K inserts/sec, WAL generation is roughly:

-   Heap insert records: ~100–150 bytes each = 10–15MB/sec
-   Index insert records: 5 indexes × ~60–80 bytes each = 30–40MB/sec
-   Full-page writes (after checkpoint): intermittent bursts of 8KB per dirtied page

Total sustained WAL throughput: 50–100MB/sec under normal operation, spiking higher after checkpoints when `full_page_writes` triggers 8KB records for newly dirtied pages. [The PostgreSQL documentation](https://www.postgresql.org/docs/current/wal-reliability.html) describes why: "the first modification of a data page after each checkpoint results in logging the entire page content." At those rates, that's 3–6GB/min, 180–360GB/hour.

WAL I/O becomes a direct throughput bottleneck. `pg_stat_wal` shows `wal_write` and `wal_sync` times climbing. Replicas that can't apply WAL fast enough fall behind, and unprocessed WAL files accumulate on the primary's `pg_wal` directory, consuming disk. `max_wal_size` and checkpoint frequency become critical tuning parameters.

### The compounding effect

None of these four constraints operates in isolation. Each amplifies the others, and that's where the math gets ugly.

MVCC overhead creates per-tuple bloat, which accumulates faster than autovacuum can clean at high insert rates. Autovacuum competing for I/O degrades write throughput. Degraded write throughput causes queries on bloated tables to slow down, which increases pressure to add more indexes. More indexes produce more write amplification, more WAL, and more replication lag. Row storage forces read amplification on time-range queries, which creates pressure to add covering indexes. Those indexes add to the write overhead feeding back into the MVCC/autovacuum loop.

![](https://timescale.ghost.io/blog/content/images/2026/02/diagram-5-updated.png)

At 50K inserts/sec with five indexes on a table, the steady-state database workload is: 50K heap tuple writes/sec, 250K B-tree index insertions/sec, 50–100MB/sec sustained WAL generation, continuous autovacuum activity across active partitions, and full query planning on every incoming query.

This is why a 16-core/64GB instance struggles with what appears to be a straightforward append-only workload.

Partitioning reduces per-partition table size but doesn't change the per-row overhead. Adding RAM improves buffer cache hit rates but doesn't reduce write amplification. Autovacuum tuning manages bloat but can't eliminate the cost of producing it. Each optimization operates within these constraints. None removes the constraints themselves.

This is the Optimization Treadmill at the mechanical level. You're not fighting configuration. You're fighting the storage model, the concurrency architecture, and the write path. All of which are designed for a workload that looks nothing like yours.

## When to Choose a Different Path

Most teams recognize this pattern 12–18 months too late. By then, the tables are massive, the partitioning scheme is deeply embedded, and migration has become a multi-month project. The difference between acting at 10M rows and acting at 1B rows is roughly an order of magnitude in engineering cost.

### Postgres Workload Scoring Framework

Go back to [the six characteristics](#what-this-workload-looks-like). Be honest about how many describe your system right now, and then score yourself again against where you'll be in 12-18 months.

If four or five apply, you're in this pattern. The optimization phases above are already in your future, or you've started them.

If all six apply, you're past the point of easy exits. Architectural friction is the dominant factor in your performance trajectory, and the migration cost is climbing every quarter you wait.

If three or fewer apply, you likely have a different problem. Standard Postgres optimization should change the trajectory.

### Early warning signs

Before the pattern becomes a crisis, it shows up in how the team spends its time:

**Optimization dominates planning.** 10–20% of engineering time goes to database performance, and every quarterly roadmap includes a scalability line item.

**Costs grow faster than revenue.** Finance is asking why the database bill increased 40% while user growth was only 15%.

**Operational complexity accumulates.** 20+ pages of runbooks, partition management scripts, monitoring for autovacuum lag, replication delay, and index bloat. New engineers need weeks of onboarding before they can safely operate the database.

**Growth outpaces optimization.** You ship a 2x improvement and data growth erases it within two quarters.

**Autovacuum is a constant concern.** It's in the top five processes by CPU and I/O at all hours, and tuning it is a recurring conversation.

Two or three of these signs mean you should be paying attention. Four or more means you're already in the pattern.

### Migration complexity at different scales

**10M–50M rows.** A day or two to 1–2 weeks. Simple dump/restore, or logical replication. Low risk, fast validation, easy rollback. 1–2 engineers part-time (roughly 80 engineer-hours).

**100M–500M rows.** 2–6 weeks. Partition-by-partition migration. More dependencies to account for, more thorough testing required. 2–3 engineers, mostly full-time (roughly 400 engineer-hours).

**1B+ rows.** 2–6 months. Hundreds or thousands of partitions. Zero-downtime required, complex rollback planning. Application-level dual-write or change-data-capture pipelines are in play. 3–5 engineers full-time plus a validation period (roughly 2,000 engineer-hours).

Those hours are not spent on product features. And there's no point on this curve where migration gets easier by waiting.

### What "purpose-built Postgres variants" means

TimescaleDB is built on top of Postgres, not in place of it. The PostgreSQL wire protocol, SQL query language, extensions like PostGIS and pgvector, your application code, and your ecosystem tooling all stay the same. What changes is the storage engine and execution layer underneath.

**MVCC overhead addressed through columnar compression.** The problem: every row insert in vanilla Postgres generates per-tuple MVCC headers, index entries, and WAL records regardless of whether the data will ever be updated, driving 3–5x write amplification and continuous autovacuum load. TimescaleDB's columnar storage (the `Columnstore` layer) batches up to 1,000 row versions per column into compressed arrays before writing to disk. Each batch write replaces thousands of individual heap tuple insertions with a single compressed segment write. The per-tuple MVCC header overhead is amortized across the batch, and autovacuum pressure drops proportionally. Far less row-level churn to clean up. In practice, write amplification drops from the 3–5x range to near 1:1 for sustained append workloads. The [Tiger Data architecture whitepaper](https://www.tigerdata.com/docs/about/latest/whitepaper) covers the columnar layout and compression pipeline in detail.

**Row storage replaced by columnar layout for [time-series data](https://www.tigerdata.com/learn/the-best-time-series-databases-compared).** The problem: vanilla Postgres reads all columns of every matching row even when a query needs two, creating 15x+ read amplification on wide tables, with none of the compression techniques applicable to time-series data. Rather than reading all 30 columns of a row to get two values, queries against the columnar layer read only the referenced columns from compressed column arrays. The 15x read amplification drops to near 1:1. Time-series compression (delta-of-delta for timestamps, gorilla-style XOR for floats, run-length encoding for repeated values) routinely achieves 10–20x compression ratios vs. heap storage. A dataset that occupies 1TB in vanilla Postgres often fits in 50–100GB with columnar compression enabled.

**Query planning overhead reduced through chunk exclusion and continuous aggregates.** The problem: the Postgres planner runs a full optimization pass on every query, and with hundreds of partitions, partition pruning overhead can exceed execution time for simple queries. TimescaleDB's planner extension adds chunk exclusion that operates at a lower level than Postgres's partition pruning. Chunks are indexed by time range in a catalog table, and the planner excludes non-overlapping chunks before the standard planning phase. For query shapes that repeat thousands of times per second, this eliminates most of the per-partition pruning overhead. Continuous aggregates go further: pre-computed rollups stored as materialized views, updated incrementally as new data arrives, so dashboards querying hourly or daily aggregations hit a small summary table instead of scanning billions of raw rows.

**WAL volume reduced through batched ingestion.** The problem: at 100K inserts/sec, vanilla Postgres generates 50–100MB/sec of WAL, creating I/O bottlenecks and causing replicas to fall behind. Lagging replicas force the primary to retain more unprocessed WAL, which consumes disk and makes the lag worse. The root cause is per-row WAL records: one per heap insert, one per index insertion. Columnar storage's batch writes generate WAL at the segment level rather than the row level. At 100K inserts/sec, WAL volume drops from 50–100MB/sec to roughly 5–15MB/sec in typical deployments, which eliminates most replication lag issues. Replicas that previously fell behind during write peaks can keep up without tuning.

![](https://timescale.ghost.io/blog/content/images/2026/02/comparison-chart.png)

**Concrete numbers.** Benchmark results vary by workload, but the directional data is consistent: ingestion throughput 10–20x higher than vanilla Postgres at equivalent instance size, query performance on time-range aggregations 100x+ faster with columnar storage, storage footprint 10–20x smaller with compression enabled. RTABench, a benchmark for real-time analytics workloads, publishes results showing the performance gap between vanilla PostgreSQL and TimescaleDB across real-world query patterns. [See the benchmark results](https://rtabench.com/)

### Decision framework

Choose a specialized architecture if you score 4+ on the [Postgres Workload Scoring Framework](#when-to-choose-a-different-path) AND you're experiencing 2+ early warning signs AND you can project continued data growth.

Strong indicators to act now: you're under 100M rows, you're already building custom partitioning, your team spends 15%+ of engineering time on database optimization, and you can project 500M+ rows within 12 months.

You might not need this if writes are bursty rather than continuous, retention is 7–30 days, queries don't predominantly filter on time ranges, or growth is stable and slow.

## Optimization vs. Architecture

There are two different problems that both show up as "database performance is degrading."

**Problem 1: Optimization within the right architecture.** The workload fits the database's design. Better indexes, query rewrites, configuration tuning, and hardware upgrades directly improve the trajectory. Postgres expertise solves the problem. For most workloads, vanilla Postgres is the right choice.

**Problem 2: The Optimization Treadmill.** The workload hits design tradeoffs baked into the storage engine, concurrency model, and query planner. Optimization helps in the short term but doesn't change the trajectory. Each phase buys time. None buys a different outcome. You're working around the architecture rather than with it.

Knowing which problem you have determines the path forward.

If you followed the optimization phases in this piece, you weren't doing anything wrong. Those were correct responses to the symptoms. Any experienced Postgres team would have done the same. The pattern is common precisely because the progression makes sense at each step.

What changes with recognition is agency. At 10M–50M rows, you can choose a purpose-built architecture in days to weeks and redirect engineering time to product work. At 100M–500M rows, migration is harder but still reasonable, taking 2–6 weeks. At 1B+, it's a multi-month project, and every quarter of delay adds complexity.

The broader principle applies beyond this workload. Different databases have different architectural strengths, so the best choice depends on the workload. Postgres is brilliant for general-purpose relational work. Specialized variants built on top of Postgres excel at specialized patterns. Recognizing when architecture matters more than optimization is an engineering judgment call, not a criticism of the tool.

Architectural fit determines your ceiling. Optimization determines where you operate relative to that ceiling. When you're hitting the ceiling repeatedly, the productive question isn't "how do we optimize better?" It's "are we operating within the right architecture?" With this workload pattern, the ceiling was always there. You just needed enough data volume to find it. Score your workload. If you're at 8+ and under 100M rows, this is the cheapest architectural decision you'll make this year. [The whitepaper](https://www.tigerdata.com/docs/about/latest/whitepaper) covers the mechanics. The [Tiger Data free trial](https://console.cloud.timescale.com/signup) lets you validate on your own data.

[New: Learn how Plexigrid moved from 4 databases to 1 with Tiger Data.](https://timescale.ghost.io/blog/from-4-databases-to-1-how-plexigrid-replaced-influxdb-got-350x-faster-queries-tiger-data/)