
Back to blog
4 min read
Apr 03, 2026
Table of contents
01 Why the Working Set Dictates Your Reality02 The Cost of Performance Unpredictability03 The Mechanics of the Memory Bottleneck04 Scaling Strategies: Moving the Ceiling05 Health Check: Calculating Your Cache Hit Ratio06 Next StepEvery database starts its life in a state of grace. Your tables are small, your indexes fit entirely in memory, and queries return in a few milliseconds. But as your application grows, you eventually hit a transition point that feels like falling off a cliff. One day, your p95 latency is 10ms, and the next it is 200ms without a single change to your code.
This article teaches you how to identify and survive the transition from memory-resident data to disk-bound reality. You will learn the mechanics of the “working set” and why your database performance degrades once your active data exceeds available RAM. We will break down the active data memory architecture and show you how to measure your system's health before the cliff arrives.
The most important concept in database performance is the working set. This is the portion of your total data that your application actually accesses regularly. In a typical time-series workload, this usually represents the most recent 24 to 48 hours of ingestion. If your total dataset is 5TB but your users only query the last day of metrics, your working set is a small fraction of that.
Your database keeps as much of this working set as possible in a memory area called the buffer cache. RAM is orders of magnitude faster than even the best NVMe drives. When a query finds its data in RAM, it is a cache hit. When the database has to go to the disk, it is a cache miss. Performance stays linear and predictable as long as your working set fits in RAM.
The performance cliff occurs when your working set exceeds your allocated memory. When this occurs, the database starts a process called “eviction.” To make room for new data, it must kick old data out to disk. If a query then asks for that evicted data, the database must pause the CPU and wait for the slow disk I/O to complete. This creates a binary performance reality: you are either running at the speed of light in memory or the speed of a crawl on the disk.
Why does this matter beyond simple speed? When your database becomes disk-bound, you introduce performance jitter. Jitter is the variance in latency that makes dashboards feel sluggish and causes automated alerts to flake. A query might take 5ms one time and 500ms the next because the specific data pages it needed were swapped out to the disk between requests.
This constant swapping also places immense physical strain on your hardware. High-frequency ingestion combined with large-scale queries creates a “churn” effect. Your storage controllers stay pegged at 100% utilization, just trying to keep up with data movement between layers. This leads to premature hardware wear and higher infrastructure costs because you are forced to pay for expensive provisioned IOPS to mask an architectural bottleneck.
As noted in our Postgres performance limits guide, this isn’t a configuration error. It is a mathematical certainty. If your ingestion rate is constant and your retention is long, your working set will eventually outgrow your RAM.
To manage this, you need to understand the cache hit ratio. This metric represents the percentage of time your database finds data in memory rather than having to fetch it from disk. For an operational database, you want this number as close to 100% as possible. Once this ratio drops below 95%, you are officially entering the danger zone.
A specific phenomenon called “cache poisoning” often accelerates this decline. Imagine you have a healthy database where the last hour of data is sitting comfortably in RAM. Then a developer runs a large analytical query that performs a full-table scan over six months of history.
Because Postgres uses a row-based heap, it must pull every old row into the buffer cache to check whether it matches the query filters. Time-based partitioning limits scans to relevant chunks. This “hot” data from the last hour gets evicted to make room for “cold” data from six months ago. Your ingestion performance tanks because the database is busy moving data back and forth. Your memory-resident advantage disappears in an instant.
When you hit this wall, you generally have three options.
The first is vertical scaling. You buy more RAM. This is the easiest fix but also the most expensive. It also has a hard physical limit. You cannot buy a server with infinite memory, and the price of high-RAM instances grows exponentially. You are essentially paying to delay the inevitable.
The second path is horizontal sharding. You split your data across multiple smaller machines. While this increases your total available RAM, it introduces massive operational complexity. You now have to manage connection pooling, distributed joins, and complex backup-and-restore procedures across a fleet of servers.
The third path is changing how the data is stored. As explained in the anchor article, row-based storage is inefficient for large-scale analytics. Using a columnar architecture, you can compress the 5TB dataset to 500GB. Suddenly, your entire working set fits back into the RAM you already have. You solve the bottleneck by reducing the data footprint rather than buying more hardware.
You should not wait for your users to complain about latency to check your memory health. You can query the Postgres internal statistics tables to see exactly how your buffer cache is performing right now.
Use the following SQL query to see your current hit ratio. This looks at the heap_blks_read (disk fetches) versus heap_blks_hit (memory fetches).
SELECT
relname AS table_name,
heap_blks_read AS from_disk,
heap_blks_hit AS from_memory,
(heap_blks_hit * 100) / NULLIF((heap_blks_hit + heap_blks_read), 0) AS hit_ratio_percentage
FROM
pg_statio_user_tables
WHERE
(heap_blks_hit + heap_blks_read) > 0
ORDER BY
hit_ratio_percentage ASC;
If you see your largest or most active tables showing a hit ratio below 95%, your database is struggling with its working set. You are likely seeing the start of the performance cliff mentioned at the beginning of this article.
Take five minutes today to run the health check query on your production environment. If your ratio looks healthy but performance still feels off, check if you're showing other signs that tuning won't fix the problem. If your cache hit ratio is below 95%, you need to investigate why your working set is exceeding your memory. You can start by identifying which queries are causing the most disk I/O and determining if you can optimize your indexes to keep that data in RAM.

Moving from Row Deletes to Instant Data Retention
Row-level DELETE generates massive WAL volume and autovacuum backlogs at scale. Learn how partition-based retention drops 90 days of data in milliseconds—no dead tuples, no cron jobs.
Read more
Receive the latest technical articles and release notes in your inbox.