---
title: "Read Replicas Don't Solve Write Bottlenecks"
published: 2026-04-07T12:15:23.000-04:00
updated: 2026-04-07T12:15:23.000-04:00
excerpt: "Read replicas fix read contention. They don't fix write throughput. Here's the mechanical reason why, and what actually changes the trajectory."
tags: PostgreSQL, PostgreSQL Performance
authors: Matty Stratton
---

> **TimescaleDB is now Tiger Data.**

You added a read replica and something real happened. Dashboard queries stopped competing with ingestion. The primary's CPU dropped. Write latency came down. You watched the metrics and thought: that worked.

It did work. For a while.

Now the replica is lagging during write peaks. The primary is accumulating WAL because the replica can't keep up. Write latency is climbing again, despite the replica handling every read you can throw at it. You're managing two Postgres instances instead of one, and the problem you solved is back.

Read replicas are a real solution to a real problem. You did the right thing. The right thing ran out.

What you have now is a write bottleneck. Read replicas solve a read bottleneck. These sound similar. The mechanics are completely different.

## What read replicas actually solve

Let's be fair about this. The win is real.

Read replicas fix resource contention on the primary. When expensive analytical queries run on the same instance that handles ingestion, they compete directly for CPU, I/O, and buffer cache. A dashboard query scanning 200M rows is doing real work. It blocks autovacuum, slows the write path, and evicts hot data from shared buffers. Moving that query to a replica removes it from the primary entirely.

The immediate result: primary CPU drops, write latency improves, buffer cache stops getting thrashed by analytical scans. For a system where reads were the dominant load, this can be transformative. A SaaS application with thousands of concurrent users reading data that a small number of writers produce? Read replicas are the right call, and they solve the problem completely.

The mechanism is [streaming replication](https://www.tigerdata.com/blog/scalable-postgresql-high-availability-read-scalability-streaming-replication-fb95023e2af). WAL generated by the primary gets shipped to the replica and applied in order. The replica stays current. Reads on the replica are reads against real data.

That's the win. The primary has less to do because reads moved elsewhere. Full stop.

## What read replicas don't touch

Here's where it gets mechanical. Each of these is specific, and none of them changes when you add a replica.

**The write path is identical.** Every INSERT on the primary still goes through the [full Postgres write path](https://www.tigerdata.com/testing-postgres-ingest-insert-vs-batch-insert-vs-copy): heap tuple with [MVCC header](https://www.tigerdata.com/blog/mvcc-feature-youre-paying-for-but-not-using), B-tree index insertions for every index on the table, WAL record generation for heap and indexes, autovacuum triggered by insert volume for freezing. Nothing about streaming replication changes any of this. The replica receives the WAL and applies it. The WAL is generated at the same rate it always was, with the same per-row overhead.

Write amplification at 50K inserts/sec with five indexes is still 300K write operations per second on the primary. Adding a replica doesn't change that number. It adds the same write operations on the replica as well, applied from WAL.

**Autovacuum still runs on the primary.** The maintenance work that drives continuous autovacuum activity on high-insert tables (hint bit setting, freeze passes, dead tuple cleanup from aborted transactions) still happens on the primary. Routing reads to a replica doesn't reduce the insert rate that triggers autovacuum. It doesn't reduce dead tuple accumulation from aborted transactions. It doesn't change the XID freeze schedule.

Autovacuum workers still show up in `pg_stat_activity` at 3am. They still compete with writes during peaks. The tuning conversation still happens every quarter.

**WAL volume increases.** Here's the counterintuitive part. Adding replicas increases the total WAL-related workload, not decreases it. The primary now has to ship WAL to every replica in addition to writing it locally. At 50-100MB/sec sustained WAL generation, that's 50-100MB/sec of outbound data per replica. With two replicas, you're managing twice that outbound load.

This is usually invisible when replicas are keeping up. It becomes visible the moment a replica falls behind.

## The replica lag problem

This is where things get interesting. And by interesting, I mean expensive.

Streaming replication works by applying WAL on the replica fast enough to stay current. Under normal conditions this is fine. WAL apply is sequential and fast. Replicas keep up.

High sustained write volume changes this. At 50-100MB/sec of WAL generation, the replica is continuously applying writes. If the replica hits any slowdown (a large analytical query competing for I/O, a vacuum cycle, a momentary resource spike) it falls behind. A small lag at high WAL volume becomes a large lag fast.

Here's the self-reinforcing part, and it's the thing most people don't see coming.

A lagging replica can't be used for real-time dashboards. So reads start routing back to the primary. The primary is handling reads again. More resource contention on the primary. Write latency climbs. You're back where you started, except now you're also managing a lagging replica.

Meanwhile, the primary has to retain unprocessed WAL in `pg_wal` until the replica catches up. The further behind the replica falls, the more disk the primary uses holding WAL. On a system generating 100MB/sec of WAL, a replica that falls 30 minutes behind represents 180GB of retained WAL on the primary. That's not a theoretical edge case. That's a disk alert at 4am.

`max_wal_size` and checkpoint frequency become critical tuning parameters that they weren't before. Another surface to monitor. Another thing to tune. Another runbook entry.

## The operational load nobody budgets for

Adding replicas means running multiple Postgres instances. The cost of that gets absorbed into engineering time, and it's rarely accounted for when the decision is made.

Each replica needs its own monitoring: replication lag alerts, vacuum state, connection pooling, failover procedures. `pg_stat_replication` on the primary becomes a permanent fixture in the ops dashboard. Lag thresholds need calibration. Alerts need tuning to avoid false positives during expected spikes.

Connection routing adds its own complexity. pgbouncer or pgpool sitting in front of the cluster, routing reads to replicas and writes to the primary. The routing layer needs its own monitoring, and it has its own failure modes. A misconfigured connection router that sends writes to a replica produces cryptic application errors that take a while to diagnose. That incident happens once on every team that adds replicas. (Ask me how I know.)

Schema migrations now touch multiple instances. An `ALTER TABLE` on the primary replicates to replicas, but the timing matters. Long-running schema changes that lock tables on the primary lock them on replicas too, which means reads route back to the primary, which defeats the purpose of having replicas in the first place.

New engineers need to understand the topology. Which instance handles writes. Which handles reads. What happens when a replica lags. When to promote a replica. The replica architecture that seemed like a contained infrastructure change has tendrils into application code, deployment procedures, and incident response.

None of this is unusual or unreasonable. It's the actual cost of the solution.

## Isolation vs. solution

Step back from the mechanics for a second.

There are two different things that can happen when you add capacity to a struggling system. You can solve the problem, meaning the root cause goes away and performance improves permanently. Or you can isolate the problem, meaning you move it further from the things it was affecting. You buy time without changing trajectory.

Read replicas are isolation.

The write bottleneck on the primary doesn't go away. The autovacuum tax doesn't go away. The WAL volume doesn't decrease. What changes is that reads no longer compete with these things. The primary's problems are contained to the primary.

That's not nothing. Isolation buys real headroom, and it's the right call in the right situation. A system where reads actually were the bottleneck is solved, not just isolated. The problem class matters.

For [continuous high-frequency ingestion](https://www.tigerdata.com/blog/six-signs-postgres-tuning-wont-fix-performance-problems), the bottleneck isn't reads stealing write resources. It's write overhead at the storage level: MVCC headers on every tuple, B-tree index maintenance on every insert, WAL records per row per index, autovacuum competing for the same I/O budget. Routing reads elsewhere reduces resource contention on the primary but doesn't change the write overhead mechanics at all.

Six months after adding replicas, the primary is slower than it was when you added them. The data volume grew. The write overhead grew with it. The ceiling you isolated is now back in frame.

## What actually addresses the write bottleneck

The write bottleneck in this workload pattern is architectural. It's in the storage model: per-row MVCC overhead, per-insert WAL records, and the maintenance machinery built around row-level concurrency management. These are fixed costs baked into vanilla Postgres heap storage. They don't go away through replication topology changes.

What changes them is a different storage model.

[Columnar storage](https://www.tigerdata.com/blog/hypercore-a-hybrid-row-storage-engine-for-real-time-analytics) batches 1,000 rows into compressed segments before writing. Instead of 1,000 individual heap inserts, each generating its own WAL record plus WAL records for every index entry, you get one segment write and one WAL record. The per-row MVCC headers that added 23 bytes to every tuple get amortized across the batch. Autovacuum pressure drops because the storage engine isn't creating row-level churn that needs cleaning up.

Do the math on WAL alone. At 100K inserts/sec in vanilla Postgres with five indexes, you're generating a WAL record for each heap tuple and each index insertion: roughly 600K WAL entries per second. Columnar batching at 1,000 rows per segment reduces that to ~600 segment-level WAL entries per second. WAL volume goes from 50-100MB/sec to roughly 5-15MB/sec. Replica lag stops being a crisis because replicas can apply WAL at the rate it arrives. The primary stops retaining large WAL backlogs. The monitoring surface shrinks. The replication architecture that was struggling to keep up suddenly has room to breathe.

The replicas don't go away. They just stop being the bottleneck.

## Where this leaves you

Read replicas were the right call. The write performance improvement was real. The isolation of read traffic from ingestion was real. You did the correct thing given what you knew.

What it doesn't do is change the underlying write cost. That cost is in the storage model, and it scales with data volume. Adding replicas doesn't touch it. Tuning replication doesn't touch it. The ceiling that read replicas pushed into the background is still there, and it moves closer as ingestion continues.

The write bottleneck gets solved at the storage level or it doesn't get solved. Replication topology is orthogonal to that problem.

If continuous high-frequency ingestion describes what you're running and you're already managing replica lag, [the full picture of the optimization treadmill](https://www.tigerdata.com/blog/postgres-optimization-treadmill) covers what's left on the path and what the off-ramp looks like.