TigerData logo
TigerData logo
  • Product

    Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    TimescaleDB Enterprise

    Self-managed TimescaleDB for on-prem, edge and private cloud

    Open source

    TimescaleDB

    Time-series, real-time analytics and events on Postgres

    Search

    Vector and keyword search on Postgres

  • Industry

    Crypto

    Energy Telemetry

    Oil & Gas Operations

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InStart a free trial
TigerData logo

Products

Time-series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Tutorials Changelog Success Stories Time-series Database

Company

Contact Us Careers About Newsroom Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2026 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Back to blog

Copy as HTML

Open in ChatGPT

Open in Claude

Open in v0

Matty Stratton

By Matty Stratton

6 min read

Mar 13, 2026

PostgreSQL

Table of contents

01 What "breathing room" actually means in Postgres02 The maintenance competition problem03 WAL as the throughput ceiling you can't tune past04 Why the standard toolkit doesn't solve this05 The workloads where this actually matters06 Recognizing the pattern early

When Continuous Ingestion Breaks Traditional Postgres

When Continuous Ingestion Breaks Traditional Postgres

Back to blog

PostgreSQL
Matty Stratton

By Matty Stratton

6 min read

Mar 13, 2026

Table of contents

01 What "breathing room" actually means in Postgres02 The maintenance competition problem03 WAL as the throughput ceiling you can't tune past04 Why the standard toolkit doesn't solve this05 The workloads where this actually matters06 Recognizing the pattern early

Copy as HTML

Open in ChatGPT

Open in Claude

Open in v0

Your system writes data constantly. Not in jobs. Not in batches. A stream that runs at 3am the same as it runs at 3pm. IoT sensors. Trade feeds. Metrics collectors. The data never stops.

For a while, Postgres handles it fine. Then you start noticing things. Autovacuum is always running. Write latency has a pattern you can't explain by traffic alone. Maintenance tasks that used to take minutes now take hours. And the really annoying part: nothing is misconfigured.

You check the usual suspects. Indexes are correct. Query plans look reasonable. Configs follow best practices. A colleague confirms the same.

The problem isn't a missing index or a bad query plan. The problem is that Postgres was designed with a quiet period baked into its assumptions. Your system eliminated that quiet period. Now you're paying for it.

What "breathing room" actually means in Postgres

Most database systems are designed around a workload shape that includes peaks and valleys. Peaks are when users are active. Valleys are when the database catches up.

Postgres maintenance is built around the valley.

Autovacuum runs more aggressively when the database is quiet. ANALYZE refreshes statistics without competing for I/O. Checkpoint cycles complete cleanly. WAL accumulation clears out. The buffer cache warms up on predictable patterns.

Batch ETL fits this model perfectly. A nightly job writes data for two hours. The database writes, then rests, then writes again. Maintenance runs in the gaps. Everything resets before the next cycle starts.

Continuous ingestion has no gaps. The window that used to be quiet at 2am is now the same as the window at 2pm. Every maintenance process that depends on quiet time now runs in direct competition with writes. All day. All night.

The maintenance competition problem

Three maintenance processes need quiet time and don't get it under continuous ingestion.

Autovacuum. Even on append-only tables, autovacuum fires continuously at high insert rates. Since PostgreSQL 13, inserts themselves trigger autovacuum to freeze tuples and update the visibility map. This isn't about dead tuples from updates or deletes. It's insert-driven vacuum, running because the data is arriving too fast for the system to catch up.

At 50K inserts/second, autovacuum never finishes a cycle before the next one starts. It competes for I/O with your writes. When it loses, bloat accumulates. When it wins, write latency spikes.

There's no configuration fix for this. You can tune autovacuum_vacuum_cost_delay and autovacuum_max_workers all day. What you're tuning is how autovacuum loses gracefully. Not how it stops competing.

Checkpoints. Postgres writes dirty pages to disk at checkpoint intervals. After a checkpoint completes, the first write to any previously-clean page triggers a full-page write to WAL (that's the full_page_writes mechanism, and it's on by default for good reason). At high insert rates, checkpoint cycles are constant. The full-page write burst that follows each one adds significant WAL volume on top of your baseline write load.

Batch systems checkpoint, rest, then return to normal. Continuous systems checkpoint and immediately start generating the next burst. There's no recovery window.

ANALYZE and statistics. Query planning accuracy depends on fresh statistics. On a billion-row table, ANALYZE is expensive. On a batch system, you schedule it after the load completes. On a continuous system, there is no "after." You run it during writes or you let statistics go stale. Stale statistics mean bad query plans. Bad query plans mean unexpected sequential scans at the worst possible time.

WAL as the throughput ceiling you can't tune past

This is the mechanical core of the problem.

Every insert generates WAL. Heap insert record, index insertion records for every index on the table, plus full-page writes after checkpoints. A single 1KB sensor reading with five indexes generates roughly 2.5-3.5KB of actual I/O once you account for the heap tuple, B-tree leaf page insertions, and WAL records. At 100K inserts/second, that puts sustained WAL throughput at 50-100MB/sec under normal conditions. After a checkpoint, it spikes higher because of full-page writes.

That's 3-6GB per minute. 180-360GB per hour. Just WAL.

WAL writes are sequential and synchronous by default. That's a hard ceiling on write throughput for a given storage configuration. You can raise the ceiling by buying faster storage. You can't eliminate it, because WAL is how Postgres guarantees durability. And you shouldn't want to eliminate it. Durability matters. But you should understand that your write throughput has a physical upper bound set by how fast your storage can absorb WAL, and continuous ingestion pushes against that bound constantly.

Here's where continuous ingestion and batch ETL diverge completely.

Batch ETL generates bursts of WAL followed by silence. The silence lets replicas catch up. A streaming replica can fall behind during a batch load and recover in the gap. Nobody notices because the gap is long enough.

Continuous ingestion generates WAL constantly. Replicas that fall slightly behind have no gap to recover in. They fall further behind. The primary retains unprocessed WAL in pg_wal, consuming disk. The further behind the replica gets, the more WAL it needs to process, and the more disk the primary holds. It's a feedback loop. The thing that causes the problem (WAL volume) is the same thing that prevents recovery (WAL volume).

Adding replicas makes it worse, not better. Each replica is another consumer that needs to keep up with the same WAL stream, and the primary holds WAL until the slowest one catches up.

The standard fix is more provisioned IOPS. It works for a while. Then data volume grows and you're having the same conversation again, just with bigger numbers on the invoice.

Why the standard toolkit doesn't solve this

Walk through each common response and you'll see exactly where it runs out.

More autovacuum workers. More workers means more I/O competition with writes, not less. You're distributing the problem across more processes. The aggregate I/O pressure is unchanged.

Aggressive autovacuum cost limits. You can configure vacuum to run faster and harder. It cleans up faster but hits writes harder. There's no setting that makes the competition disappear. You're choosing which process suffers.

More RAM. Bigger shared_buffers and page cache reduce physical reads. Write amplification is unchanged. WAL volume is unchanged. Autovacuum competition is unchanged. You bought better read performance for a write-bound problem.

Faster storage. Raises the WAL ceiling. Doesn't change the ratio of actual I/O to logical data. At 3-5x write amplification, faster storage lets you sustain a higher write rate before hitting the ceiling. But data volume grows, and the ceiling moves up proportionally.

Vertical scaling. Same as faster storage with more CPU. You've bought headroom measured in months. At the current data growth trajectory, that math doesn't improve over time.

Each of these is the right response to the symptom. None of them changes the underlying dynamic: continuous ingestion is in constant competition with the maintenance processes Postgres needs to stay healthy.

The workloads where this actually matters

Not every write-heavy system has this problem. Let's be precise.

The pattern shows up when three things are true at once: writes are continuous rather than bursty, data volume is growing on a sustained curve, and the database needs to stay queryable under latency requirements while ingestion is running.

Industrial IoT is the clearest example. A wind farm with 10,000 sensors reporting every five seconds generates roughly 2,000 inserts/second. That's modest by financial or observability standards, but it never pauses. The turbines don't stop overnight. Maintenance windows don't exist because the data source doesn't know what a maintenance window is.

Financial market data is the high-frequency version. Trade feeds run at hundreds of thousands of events per second during market hours. Pre-market and after-market data keeps coming. Systems that aggregate this data for risk and compliance queries need it available immediately, not at end of day.

Observability platforms are the distributed version. Metrics, traces, and logs from thousands of hosts. Each host generates data independently. The aggregate rate is enormous and constant.

What these have in common: the data source runs on its own schedule, completely independent of what the database needs. The wind turbine doesn't care that autovacuum is behind. The trading engine doesn't wait for a checkpoint to finish.

If your write pattern is bursty (user-driven traffic, nightly batch jobs, periodic syncs), you probably don't have this problem. The database gets its breathing room, maintenance catches up, and standard Postgres optimization works the way it's supposed to. The pattern described in this post shows up specifically when the gap disappears.

Recognizing the pattern early

The instinct when Postgres starts struggling under continuous ingestion is to tune harder. Add workers. Raise limits. Upgrade storage.

Those are correct responses for a database that has misconfiguration or a bad schema. Postgres is doing exactly what it was designed to do. The MVCC model, the WAL architecture, the maintenance scheduler: these are good design decisions for the workloads Postgres was built to handle. The system changed underneath it. That's not a criticism of the tool.

But continuous ingestion isn't a heavier version of batch ETL. It's a different workload class. The architectural assumptions underneath Postgres were built around a workload that breathes. Continuous ingestion doesn't breathe. And that distinction matters because it determines whether optimization will change your trajectory or just delay the same outcome.

Recognizing that early is worth a lot. At 50M rows, switching to a purpose-built architecture takes days. At 1B rows, it takes months. Every quarter you spend optimizing within the wrong architecture is a quarter where migration gets harder and the engineering team spends more time managing the database than building product.

If this sounds familiar, the full analysis covers the scoring framework and the mechanics behind why each optimization phase hits a ceiling. It's the same trajectory described here, zoomed out to show the complete path and where it leads.

Read the full analysis: Understanding Postgres Performance Limits for Analytics on Live Data →

Related posts

The Best Time to Migrate Was at 10M Rows. The Second Best Time Is Now.

The Best Time to Migrate Was at 10M Rows. The Second Best Time Is Now.

PostgreSQLPostgreSQL Performance

Apr 08, 2026

Migration cost scales with data volume. The optimization tax you pay while waiting scales faster.

Read more

Read Replicas Don't Solve Write Bottlenecks

Read Replicas Don't Solve Write Bottlenecks

PostgreSQLPostgreSQL Performance

Apr 07, 2026

Read replicas fix read contention. They don't fix write throughput. Here's the mechanical reason why, and what actually changes the trajectory.

Read more

Stay updated with new posts and releases.

Receive the latest technical articles and release notes in your inbox.

Share

Start a free trial