
Back to blog
6 min read
Apr 17, 2026
Table of contents
01 The Mechanical Failure of Row Deletes02 Identify the Treadmill Symptoms03 Shift to Metadata Operations04 Implementation05 Compare the Approaches06 Automate Your Retention07 Take the Next StepEvery database reaches a volume where the standard DELETE command stops working. You might notice it first in your monitoring dashboards when p95 write latency creeps upward. Perhaps your autovacuum workers are running 24/7, yet the n_dead_tup count (a count of rows marked for deletion that autovacuum hasn't reclaimed yet) on your largest tables continues to climb. You are likely on the optimization treadmill, where every manual cleanup or configuration tweak buys you a few weeks of relief before performance degrades again.
This guide explains why row-level deletes fail in high-volume environments and how to replace them with a partition-based retention strategy. You will learn the mechanical reasons behind Write-Ahead Log (WAL) bottlenecks and how to identify when your cleanup processes are consuming too much CPU. We will walk through the process of reorganizing data into discrete time-based chunks and implementing a retention policy that drops data in milliseconds. By the end of this article, you will know how to replace resource-heavy row scans with simple metadata operations.
Postgres is a row-based engine that uses Multi-Version Concurrency Control (MVCC) to handle data. When you execute a DELETE command, the database does not immediately remove the data from disk. It marks the existing row as a dead tuple. This process requires the database to find each row, update the visibility headers, and generate a WAL record for every single change.
If you delete one million rows from a table with 500 million entries, you are forcing the database to perform one million write operations. This generates a massive WAL volume, which bottlenecks your overall throughput. In systems with continuous high-frequency ingestion, there is no quiet period or maintenance window to absorb this load. The cleanup process competes directly with your incoming data stream for I/O.
Traditional row-level deletes eventually hit an architectural ceiling. As tables grow, the cost of scanning the heap to find expired rows increases. Autovacuum must then visit those pages to reclaim the space, which dirties additional pages and triggers additional WAL writes. This cycle of read and write amplification is a design tradeoff of general-purpose engines. It works well for small SaaS backends but creates compounding overhead for long-retention telemetry.
You can see the impact of row-level deletes by inspecting your table statistics. Tables that rely on frequent DELETE commands often show high levels of bloat and falling autovacuum efficiency. Use the pg_stat_user_tables view to find where your maintenance tasks are falling behind.
SELECT
relname AS table_name,
n_dead_tup AS dead_rows,
last_autovacuum,
autovacuum_count
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000000
ORDER BY n_dead_tup DESC;
A high n_dead_tup count, paired with a very recent last_autovacuum timestamp, suggests that the vacuum worker is running continuously but cannot keep up with the rate of deletions. This is the moment where optimization within the current architecture starts to show diminishing returns. You are spending significant hardware resources to clean up the overhead your storage engine creates during deletions.
The solution to this bottleneck is to stop treating data as a single continuous heap. Instead, you should reorganize your data into discrete time-based storage units called partitions or chunks. This transforms a high-overhead row scan into a file-system operation.
When data is partitioned by time, a retention policy does not need to look at individual rows. It looks at the metadata for the partitions. If a partition contains only data that is older than your retention window, the database can unlink the entire file from the disk. This is an atomic operation that takes milliseconds, regardless of whether the partition contains 10 rows or 10 million rows.
This strategy removes the need for MVCC overhead during cleanup. No dead tuples are created because the rows are never marked for deletion. No WAL records are generated for individual row changes. The database updates its internal catalog to show that the storage unit no longer exists. This frees up CPU cycles for your primary ingestion and query workloads.
To use a partition-based retention policy, your table needs to be a hypertable. The setup takes two commands. Once the hypertable exists, the retention policy is one more.
If you are setting up a new metrics table, create it as you normally would, then convert it immediately:
CREATE TABLE device_metrics (
ts TIMESTAMPTZ NOT NULL,
device_id TEXT,
metric_name TEXT,
value DOUBLE PRECISION
);
SELECT create_hypertable('device_metrics', by_range('ts'));
TimescaleDB now manages the partitioning automatically. As data arrives, chunks are created to cover each time range. The default chunk interval is seven days, which works well for most time-series workloads. If your data volume is unusually high or low, you can tune the chunk interval:
-- Example: 1-day chunks for very high ingest rates
SELECT create_hypertable('device_metrics', by_range('ts', INTERVAL '1 day'));
If you already have data in a standard Postgres table, you can convert it in place. The migrate_data flag moves existing rows into the new chunk structure:
SELECT create_hypertable(
'device_metrics',
by_range('ts'),
migrate_data => true
);
For large tables, this migration will take time proportional to how much data you have. Run it during a low-traffic window. Your application does not need any schema changes — the hypertable behaves as a standard Postgres table for reads and writes.
Once your table is a hypertable, add the retention policy:
SELECT add_retention_policy('device_metrics', INTERVAL '90 days');
That is the full setup. The engine will now run a scheduled job to drop chunks older than 90 days. You can confirm the policy is registered:
SELECT job_id, proc_name, schedule_interval, config
FROM timescaledb_information.jobs
WHERE proc_name = 'policy_retention';
And verify which chunks currently exist on your table:
SELECT chunk_name, range_start, range_end
FROM timescaledb_information.chunks
WHERE hypertable_name = 'device_metrics'
ORDER BY range_start DESC
LIMIT 10;
The chunks outside your retention window will be dropped automatically on the next scheduled run. No cron jobs, no manual DROP TABLE calls, no gap risk if a job fails.
The difference in resource consumption between these methods is significant. Imagine you need to remove data older than 90 days from a high-traffic metrics table with hundreds of millions of rows.
DELETE FROM device_metrics
WHERE ts < NOW() - INTERVAL '90 days';
This forces the engine to scan the index for ts, visit every matching row in the heap, and mark each one as a dead tuple. On a large table, this operation takes minutes to hours. It generates gigabytes of WAL, puts pressure on your replicas, and creates a cleanup backlog that autovacuum must chase. Your p95 write latency will likely spike during this window, because the disk is absorbing both deletions and new inserts simultaneously.
With native Postgres table partitioning, you skip the row scan entirely. When a partition contains only expired data, you drop the entire file:
DROP TABLE device_metrics_y2026_m01;
This returns in milliseconds. It generates a single WAL record for the catalog update and immediately reclaims disk space. There is no row-level overhead, no dead tuple accumulation, no autovacuum backlog.
The tradeoff is operational: you are now responsible for creating partitions ahead of time and running a scheduled job to drop them. A typical setup uses a cron job or a tool like pg_partman to manage the partition lifecycle. If the job fails silently, old data accumulates. If partitions are not pre-created, ingestion breaks. This is better than DELETE, but it adds a maintenance surface you have to own.
Tiger Data's hypertable architecture applies the same drop-the-file strategy, but manages the partition lifecycle automatically. You set the policy once:
SELECT add_retention_policy('device_metrics', INTERVAL '90 days');
The engine handles chunk creation as data arrives and chunk removal on the schedule you defined. There is no cron job to monitor, no manual partition creation, no gap risk. The drop operation is the same fast metadata update as the native Postgres approach — it takes milliseconds regardless of how many rows the chunk contains. The difference is that the operational burden does not follow you.
The goal is to make retention invisible. By setting up a TTL policy, you define the rules once and let the database handle the execution.
Tiger Data uses columnar compression to reduce retention costs further. When data is compressed into arrays, the storage footprint is already 10-20x smaller than the standard Postgres heap. This means you can keep more data in the hot storage tier for longer periods. When the time eventually comes to drop the data, the metadata operation remains just as fast.
If you are currently managing retention through manual DELETE scripts, your first step is to quantify the cost of that maintenance. Look at your I/O wait times and WAL generation rates. If these metrics spike during your cleanup window, you have found a clear path to reclaiming your database performance.
Before changing anything, confirm the cost of your current approach. Run this query to find the tables where autovacuum is working hardest:
SELECT
relname AS table_name,
n_dead_tup AS dead_rows,
autovacuum_count,
last_autovacuum
FROM pg_stat_user_tables
ORDER BY autovacuum_count DESC
LIMIT 10;If the tables with the highest autovacuum_count are also your highest-write tables, you have confirmed the problem. Those are the candidates for a partition-based retention policy.
The fastest way to test the fix is to create a hypertable in Tiger Cloud and set a retention policy on your target table. You can have a policy running against real data in an afternoon. Start a free Tiger Cloud trial to test it on your own schema without touching production.
If you want to understand the full policy API before committing, the TimescaleDB data retention docs cover every option with working examples. For a broader look at data lifecycle management in Postgres, How to Manage Your Data With Data Retention Policies is a good companion read.

Why Giant IN Clauses Slow Down Your App
Giant `IN` clauses inflate PostgreSQL planning time and spike p99 latency. Learn how `ANY(ARRAY[])` cuts the hidden planning tax and keeps your app fast at scale.
Read more
Receive the latest technical articles and release notes in your inbox.