TigerData logo
TigerData logo
  • Product

    Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    TimescaleDB Enterprise

    Self-managed TimescaleDB for on-prem, edge and private cloud

    Open source

    TimescaleDB

    Time-series, real-time analytics and events on Postgres

    Search

    Vector and keyword search on Postgres

  • Industry

    Crypto

    Energy Telemetry

    Oil & Gas Operations

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InStart a free trial
TigerData logo

Products

Time-series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Tutorials Changelog Success Stories Time-series Database

Company

Contact Us Careers About Newsroom Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2026 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Back to blog

Copy as HTML

Open in ChatGPT

Open in Claude

Open in v0

Matty Stratton

By Matty Stratton

7 min read

Mar 27, 2026

PostgreSQLPostgreSQL Performance

Table of contents

01 What benchmarks actually measure02 The specific ways sustained load differs from peak load03 The number you should actually be looking at04 Why this question is structurally hard to ask05 What the right benchmark looks like06 The benchmark question and the architecture question07 Ask the right question before you ship

Postgres Performance: Why Peak Throughput Benchmarks Miss the Real Problem

Postgres Performance: Why Peak Throughput Benchmarks Miss the Real Problem

Back to blog

PostgreSQL
Matty Stratton

By Matty Stratton

7 min read

Mar 27, 2026

Table of contents

01 What benchmarks actually measure02 The specific ways sustained load differs from peak load03 The number you should actually be looking at04 Why this question is structurally hard to ask05 What the right benchmark looks like06 The benchmark question and the architecture question07 Ask the right question before you ship

Copy as HTML

Open in ChatGPT

Open in Claude

Open in v0

You ran the benchmark. 80,000 inserts per second. The database handled it clean, latency stayed flat, no alarms. You shipped with confidence.

Three months later, p95 write latency is creeping. Six months later, autovacuum is in your top processes by CPU. Nine months later, you're rebuilding indexes on a table that's crossed 400 million rows.

The benchmark wasn't wrong. The question it answered just wasn't the right one.

Peak throughput tells you what the database can do in a sprint. Production asks what it can do running forever. Those are different questions with different answers, and most teams only ask the first one.

The number that actually matters is the sustained throughput ceiling: the write rate at which all of the database's maintenance processes (autovacuum, checkpointing, WAL archiving, replication) can keep up indefinitely. It's always lower than peak throughput. It drops over time as data volume grows. And almost nobody measures it.

What benchmarks actually measure

A typical load test runs for minutes. Sometimes an hour if you're thorough. It hits the database hard, measures throughput and latency, and stops. During that window, the buffer cache is warm from the test setup. Autovacuum hasn't had time to accumulate a backlog. WAL hasn't been generating for 72 hours straight. The indexes are fresh. The table fits mostly in memory.

These are ideal conditions. Not because anyone cheated. That's just what a bounded test looks like. The database performs brilliantly under bounded load because its maintenance subsystems haven't been outrun yet.

Production is unbounded. The data keeps arriving after the benchmark ends. Autovacuum runs against a table that grows every hour. The buffer cache works against a dataset that expands past RAM over weeks. The indexes that fit in memory at 50 million rows don't fit at 500 million. The checkpoint cycle that completed cleanly at low data volume starts competing with writes as WAL volume climbs.

The specific ways sustained load differs from peak load

There are four concrete mechanisms at work here. All four run simultaneously in production. None of them show up in a benchmark.

Your hot data stops being hot

At launch, your hot data fits in shared_buffers and the OS page cache. Read performance is largely a RAM question. As data volume grows past available RAM, cache hit rates fall. Queries that returned in milliseconds start hitting disk. The degradation is slow enough that it looks like a query regression, not a growth problem, and that's what makes it dangerous. You'll spend a sprint chasing query plans and index strategies before someone checks pg_statio_user_tables and realizes the hit rate has been sliding since month four. The latency change wasn't a code problem. It was a ratio problem.

Autovacuum falls behind and can't catch up

A benchmark run doesn't give autovacuum time to fall behind. Production does.

At high sustained insert rates, autovacuum fires continuously. During write peaks, it falls behind. The backlog accumulates. Bloat builds. By the time monitoring catches it, the table has weeks of accumulated dead tuples and hint-bit work queued up.

Here's the part that really gets you: clearing the backlog requires running autovacuum harder, which competes with writes, which slows ingestion. The fix and the problem share the same resource pool. You're asking the database to clean up faster while also writing faster, and there's only so much I/O to go around.

Indexes rot

Fresh B-tree indexes on a small table are compact and cache-friendly. The same indexes a year later on a table with a billion rows are fragmented, partially sparse from the hot-right-edge problem on timestamp columns, and too large to stay in cache.

Traversal costs go up. Page splits happen more often. The 10x read improvement you got from careful indexing in the first month erodes slowly, then faster. You'll REINDEX and get performance back for a while, but the table is still growing. The next degradation cycle is already in progress.

WAL never stops arriving

WAL volume scales directly with insert rate. At sustained high rates, WAL generation is constant. Replicas that keep up at launch start falling behind as write volume grows. The primary retains unprocessed WAL. Disk fills. And the replica needs to process a growing backlog while new WAL keeps arriving, which means there's no quiet period to catch up. If you've ever watched pg_stat_replication and seen replay_lag tick steadily upward with no sign of plateauing, you know exactly how this ends.

Each of these mechanisms is invisible in a benchmark. In production, they compound.

The number you should actually be looking at

So how do you actually find the sustained throughput ceiling?

You can estimate it. Look at autovacuum activity under current load: is it finishing cycles or perpetually falling behind? Check pg_stat_bgwriter for checkpoint pressure. Watch pg_wal directory size trends. Plot the ratio of index size to table size over time. These aren't exotic metrics. They're already in Postgres. Most teams aren't watching them together.

The leading indicators of a sustained throughput ceiling: autovacuum consistently showing in pg_stat_activity, checkpoint completion times trending up, replica lag growing during write peaks, n_dead_tup climbing faster than vacuum_count is cleaning.

None of these show up in a benchmark. All of them show up in production, usually together, usually around month six or nine.

image

Why this question is structurally hard to ask

Smart teams miss this. The reasons are structural.

Benchmarks have a natural stopping point. Load tests end. Sustained load doesn't have a natural evaluation moment until something breaks. There's no "sustained throughput benchmark" in most team playbooks because the concept doesn't have a clean boundary. When do you declare the test over?

The degradation timeline is also longer than most planning cycles. Indexing starts showing stress at 300 million rows. Partitioning gets complicated at 500+ partitions. WAL volume becomes a crisis when replica lag crosses a threshold that trips an alert. These events are six to eighteen months apart. The engineer who ran the initial benchmark often isn't the one debugging the production incident.

Then there's the procurement problem. Peak throughput is a good number for architecture decisions. "This database handles 80K inserts per second" is a clean, defensible statement. "This database handles 80K inserts per second now, but that number will effectively be lower in eight months as the buffer cache hit rate falls and autovacuum starts competing for I/O" is harder to put in a slide. (Both statements are true. Only one of them gets you budget approval.)

And most capacity planning frameworks are built around static estimates. How many users, how many requests, how much storage. Sustained throughput degradation is a dynamic problem. The ceiling moves as the system runs. That doesn't fit neatly into a capacity model built for stable workloads.

This adds up to something bigger than individual teams making mistakes. The entire way the industry evaluates databases is optimized for procurement, not production. Vendor benchmarks measure peak throughput because it's the largest number. Load testing frameworks default to bounded runs because unbounded runs don't have a natural end state. Capacity planning templates assume static ceilings because dynamic ceilings are harder to model. Every layer of the evaluation stack is designed to produce a number that looks good in a slide deck. None of it answers the question you'll actually need answered in month twelve.

So if the standard evaluation framework is structurally set up to miss this, what does a better one look like?

What the right benchmark looks like

Run the load test for longer. Hours, not minutes. Watch what happens to autovacuum, not just query latency.

Start the test with a table that already has data in it, sized to your 12-month projection. A benchmark on an empty table tells you about cold start performance. It tells you almost nothing about what the system looks like after a year of continuous ingestion.

Measure these things during the test:

  • pg_stat_bgwriter: checkpoint frequency and write volume
  • pg_stat_activity: autovacuum activity
  • Replica lag if you're running replicas
  • pg_stat_wal: WAL generation rate
  • Index size relative to table size

Repeat the test with 3x the data volume. If performance drops more than linearly, you've found where the architecture starts to strain. That's the number you want before you ship, not after.

The test that catches the Optimization Treadmill is a test that asks: what happens when this runs for a year? You can simulate that in a day if you load the data upfront and run the benchmark against a realistic data volume.

The benchmark question and the architecture question

If your system has the six workload characteristics (continuous ingestion, time-series access patterns, append-only data, long retention, operational query requirements, sustained growth), the sustained throughput ceiling is structural. Better benchmarking tells you earlier where the ceiling is, but it won't raise it.

Benchmarking tells you how fast the ceiling approaches. Architecture determines where it sits.

Teams that run good sustained-load benchmarks early find out at 30 million rows that they're on the Optimization Treadmill. Teams that only run peak throughput benchmarks find out at 800 million rows. The underlying architectural problem is identical in both cases. The migration cost is not.

Ask the right question before you ship

Peak throughput is a useful number. It tells you whether the hardware can keep up with the write rate at a point in time. Worth knowing.

It just doesn't tell you whether the maintenance processes can keep up with that write rate indefinitely, as data volume grows and the vacuum backlog and WAL volume and cache pressure all grow with it.

The question nobody asks before shipping is usually the one that generates the incident nine months later. Ask it now. Run the load test against a full-size dataset. Watch autovacuum, not just query latency. Track the ceiling as a moving target, not a static spec.

And if the benchmark reveals what the scoring framework already suggested, the cheapest architectural decision you'll make is the one you make before the table crosses 100 million rows.

Related posts

Yes, You Can Do Hybrid Search in Postgres (And You Probably Should)

Yes, You Can Do Hybrid Search in Postgres (And You Probably Should)

pg_textsearchCloud

Apr 20, 2026

Most search stacks run four systems to answer one question. You don't need any of them. Build production hybrid search in Postgres with pg_textsearch for BM25, pgvectorscale for vector similarity, and Reciprocal Rank Fusion to combine them. One query. One database.

Read more

The Best Time to Migrate Was at 10M Rows. The Second Best Time Is Now.

The Best Time to Migrate Was at 10M Rows. The Second Best Time Is Now.

PostgreSQLPostgreSQL Performance

Apr 08, 2026

Migration cost scales with data volume. The optimization tax you pay while waiting scales faster.

Read more

Stay updated with new posts and releases.

Receive the latest technical articles and release notes in your inbox.

Share

Start a free trial