---
title: "Vertical Scaling: Buying Time You Can't Afford"
published: 2026-02-26T09:48:27.000-05:00
updated: 2026-03-24T13:05:00.000-04:00
excerpt: "Postgres vertical scaling works, until it doesn't. Learn why high-frequency ingestion workloads hit an architectural wall and what to do about it."
tags: PostgreSQL, PostgreSQL Performance
authors: Matty Stratton
---

> **TimescaleDB is now Tiger Data.**

Your Postgres database is struggling. Write latency is climbing, autovacuum is fighting for I/O, and the indexes you added three months ago aren't cutting it anymore. So you do the obvious thing.

You upgrade the instance. Metrics drop. Everyone exhales.

Six months later, you do it again.

Nobody puts this in a postmortem, because vertical scaling works. That's why teams keep reaching for it. But if you're running continuous high-frequency ingestion on Postgres, it's not a fix. It's a payment plan on a debt that keeps growing.

## The Cost Curve Doesn't Lie

You've probably already run the numbers. At 50K inserts per second, you're adding roughly 1.5 billion rows per year. Your data volume curve is exponential. Your infrastructure cost moves in steps, doubling each time you provision the next tier up.

Plot both lines on the same chart. Watch them diverge.

You upgrade from 16 vCPU/64GB to 32 vCPU/128GB with provisioned IOPS (io2 at 10,000+ on AWS, say). Cost roughly doubles. You get six months of breathing room. Then the data keeps growing, and the metrics start climbing again.

So you upgrade again. The cost doubles again. Twelve months out, you're projecting another upgrade. The database line item is growing faster than the product revenue it supports.

Oof.

## What You're Actually Buying

More CPU gives autovacuum room to run without starving query execution. More RAM improves `shared_buffers` and OS page cache hit rates. Faster storage reduces I/O wait across the board.

All real wins. None of them touch the per-row overhead.

Here's what's actually happening underneath. At 100K inserts per second, you're writing 250-350MB of actual I/O for 100MB of application data. Every row carries MVCC headers, index entries, and WAL records whether you asked for them or not. A 1KB sensor reading becomes roughly 2.5 to 3.5KB of actual I/O: 23-byte heap tuple header, five index entries at ~60 bytes each, plus a ~1.2KB WAL record stacking on top.

At 100K inserts/sec, that's 250-350MB/sec of real I/O to move 100MB/sec of data. A bigger instance tolerates that overhead more gracefully. It does not reduce it.

So the trajectory holds. Six months of headroom, metrics creep back, another upgrade, another budget conversation. Each step costs more than the last one and buys roughly the same amount of time.

## The Invisible Cost Nobody Tracks

Here's where it gets uncomfortable. The latency graphs are one thing. Engineers watch latency graphs. Finance watches the _invoice_.

At some point the database line item becomes visible enough that someone schedules a meeting. Now you're explaining autovacuum to a person who manages a spreadsheet for a living. (That meeting is not fun. The prep work for that meeting costs engineering time you don't have.)

But that's the visible cost. The invisible one is worse.

When teams hit this pattern, senior engineers typically spend 20-30% of their time on database operations. Not firefighting. Weekly. Monitoring autovacuum lag. Tuning per-partition settings. Watching replication delay. Reviewing runbooks before anyone touches the schema. Making sure the pg\_partman automation didn't silently fail again.

None of that shows up in the cloud bill. It doesn't trigger a finance meeting – it just quietly drains your best people every single week. New engineers need weeks of onboarding before they can safely operate the partitioning scheme. What should be a one-person schema change becomes a team event with a rollback plan.

You've built a database operations practice inside your product engineering team. That wasn't the plan.

## Why Vertical Scaling Feels Like It's Working

The thing that makes this pattern so persistent is that each optimization phase genuinely does help. Vertical scaling is no exception.

You add the bigger instance, and autovacuum workers stop competing with queries for CPU. Shared buffers expand, and buffer cache hit rates climb. Those io2 IOPS stop being the bottleneck. For a while, the system breathes.

But here's the thing: Postgres wasn't designed for continuous, high-frequency, append-only ingestion at scale. The design choices that make it excellent for general-purpose workloads, MVCC for concurrency, row-based heap storage, B-tree indexes, the WAL architecture – all generate overhead that multiplies when you're hammering it with hundreds of thousands of inserts per second that never pause.

Vertical scaling gives the existing architecture more room to operate. It doesn't change the architecture.

MVCC creates per-tuple overhead on data you'll never update. Row storage forces you to read all 30 columns when your query needs two. B-tree indexes mean every insert has to traverse and update every index, and at 50K inserts/sec with five indexes, that's 250K index insertions per second. WAL records every single one of those operations before touching a data page, so at 100K inserts/sec you're generating 50-100MB/sec of WAL just to do normal work.

None of those problems shrink when you add more vCPUs.

## How It Shows Up Before It's a Crisis

The real tell isn't in a p95 latency chart. It's the _pattern_.

You optimize. You get relief. The metrics climb back. You optimize again. The relief lasts a little less time than before.

Before it becomes a full crisis, it shows up in how the team is spending its time.

Optimization is on every quarterly roadmap, not as a one-time project, but as a line item, every quarter, competing with features for engineering time.

The database bill goes up 40% while user growth was 15%. Finance notices. Those numbers don't get ignored.

You ship a 2x performance improvement and data growth erases it within two quarters. The treadmill doesn't slow down – it **speeds up**.

And autovacuum just keeps coming up! It's in the top five processes by CPU and I/O at all hours and tuning it is somehow _always_ on someone's plate.

Two or three of these? Pay attention. Four? You're _already_ in the pattern.

## Optimization Problem vs. Architecture Problem

There are two different problems that both show up as "database performance is degrading."

The first is an optimization problem. The workload fits the database design. Better indexes, query rewrites, config tuning, vertical scaling. These directly improve the trajectory, and Postgres expertise solves it. For most workloads, vanilla Postgres is the right tool and this is the right path.

The second is an **architectural mismatch**. The workload is hitting design tradeoffs baked into the storage engine and the write path. Optimization helps short-term, but it doesn't change the trajectory. You're working _around_ the architecture instead of _with_ it.

Both of these look identical from the outside: degrading query latency, climbing infrastructure costs, teams spending more time on database operations than product work. The difference only becomes obvious when you notice each fix is lasting a little less time than the last one.

Vertical scaling is the right move for the first problem. For the second, it's just the most expensive item on the treadmill.

## When to Think About Architecture Instead

If your workload is continuous high-frequency ingestion, your data is append-only, queries predominantly filter on time ranges, and you're measuring retention in months or years, you're probably dealing with an architectural mismatch, not an optimization problem.

You also don't need to replace Postgres. TimescaleDB extends vanilla Postgres with columnar compression, hypertables with automatic chunking, and a query planner that understands [time-based access patterns](https://www.tigerdata.com/learn/the-best-time-series-databases-compared). You keep SQL, your extensions, your team's knowledge, and the entire Postgres ecosystem. What changes is the storage engine and write path underneath (the parts _actually_ generating the overhead).

Migration complexity scales with data volume. At 10M-50M rows, it's days to two weeks. At 100M-500M rows, two to six weeks. At 1B+, you're looking at months. Those hours don't go toward product features. And there's no point on that curve where waiting makes it cheaper.

If your team is spending 20%+ of engineering time on database operations and scalability is on every quarterly roadmap, you already know something is off. The upgrade cycles don't get cheaper. They just get further apart until they don't.

_This post is part of a series on Postgres performance limits for high-frequency data workloads. The full analysis, including a workload scoring framework and migration complexity breakdown at different scales, is in the anchor essay:_ [_Understanding Postgres Performance Limits for Analytics on Live Data_](https://www.tigerdata.com/blog/postgres-optimization-treadmill)_. Ready to test it on your own data?_ [_Start a free Tiger Data trial._](https://console.cloud.timescale.com/signup)