---
title: "Autovacuum: The Tax You're Always Paying"
published: 2026-05-05T15:04:21.000-04:00
updated: 2026-05-05T15:04:21.000-04:00
excerpt: "Your append-only table doesn't need autovacuum. It runs anyway. Here's what that actually costs your team."
tags: PostgreSQL, PostgreSQL Performance
authors: Matty Stratton
---

> **TimescaleDB is now Tiger Data.**

You open `pg_stat_activity` during a write peak and there it is. Autovacuum. Again.

You've tuned this three times. You wrote a runbook for it. At this point it has its own section in the quarterly database review.

The table it's working on hasn't seen a single UPDATE in six months. Every write is an INSERT. Old data gets dropped by partition, not deleted row by row. By every reasonable intuition about what autovacuum is for, this table should basically run itself.

But there it is.

The intuition isn't wrong. It's incomplete. Autovacuum exists to clean up after concurrent row modifications, and your workload doesn't do that. What your workload _does_ generate is a steady stream of other work that lands in the same process: dead tuples from aborted transactions, hint bits that need setting, transaction IDs that need freezing before the counter wraps. None of it is the problem autovacuum was designed to solve. All of it runs through the same mechanism.

This post isn't about how to tune autovacuum. It's about understanding what it's actually doing on your tables, why tuning helps at the margin but not at the root, and what it means that a process built for row modification cleanup is your most persistent background worker on a table that never modifies rows.

## What autovacuum is actually for

Postgres MVCC keeps old row versions alive as long as any active transaction might need to see them. When a row gets updated, the old version stays on the heap page, marked dead. When a row gets deleted, same thing. These dead tuples accumulate until something cleans them up. That something is autovacuum.

Without it, dead tuples pile up permanently. [Table bloat](https://www.tigerdata.com/learn/how-to-reduce-bloat-in-large-postgresql-tables) grows without bound. Heap pages that hold dead tuples can't be reused. Query performance degrades as scans trip over dead rows.

And then there's the harder problem: [XID wraparound](https://www.tigerdata.com/blog/how-to-fix-transaction-id-wraparound). Transaction IDs are 32-bit counters. About 2 billion transactions in, Postgres loses the ability to distinguish old from new. Rows from before the wraparound point become invisible. This isn't theoretical. It has happened in production. Autovacuum's freeze pass exists to prevent it by marking old tuples as frozen before the counter laps them.

For a standard OLTP workload with concurrent reads and updates on shared rows, this is essential infrastructure. Dead tuple accumulation is a direct consequence of normal operation. The cleanup cost is proportional to the update and delete rate. Makes sense.

The confusing part is what happens when your workload never updates or deletes anything.

## Three reasons autovacuum runs on append-only tables

Each of these is a different mechanism. None of them are the same problem.

**Aborted transactions leave dead tuples.** Not every INSERT commits. Connection drops mid-transaction. Application errors trigger rollbacks. Explicit transaction management has bugs. At [high insert rates](https://www.tigerdata.com/blog/13-tips-to-improve-postgresql-insert-performance), even a small abort rate produces a steady trickle of dead tuples. A 0.1% abort rate at 50,000 inserts per second is 50 dead tuples per second. Autovacuum has to find and mark them, even though those rows were never part of a committed write.

This is real dead tuple work. Just not from updates. You can see it directly in `n_dead_tup` in `pg_stat_user_tables`. Tuning autovacuum to run more aggressively cleans it up faster. There's no way to eliminate it without eliminating aborted transactions, which isn't realistic.

**Hint bits require page dirtying.** This one surprises most people who haven't dug deep into Postgres internals. When a row is first read after being written, Postgres doesn't just hand you the data. It verifies the writing transaction committed. It checks `pg_xact`. Once confirmed, it sets a hint bit in `t_infomask` to cache that result so future reads don't have to hit `pg_xact` again.

Setting that hint bit modifies the tuple header. A modified tuple header dirties the page. A dirty page needs writing back to disk.

So: your append-only table with immutable rows is generating I/O from reads. Not writes. Reads. The rows don't change. The headers do. This affects checkpoint pressure and the overall I/O budget autovacuum has to compete for.

**Insert volume alone triggers autovacuum for freezing.** Since PostgreSQL 13, `autovacuum_vacuum_insert_threshold` and `autovacuum_vacuum_insert_scale_factor` control a separate trigger: autovacuum fires based on insert count, not just dead tuple count. The reason is XID wraparound prevention. At high insert rates, unfrozen tuples accumulate fast. Postgres needs to freeze them before the counter laps. So autovacuum runs a freeze pass continuously on high-insert tables, regardless of whether any rows have been updated or deleted.

Go check `vacuum_count` and `autovacuum_count` in `pg_stat_user_tables` on your busiest append-only partition. They're climbing. `n_dead_tup` might be low. The freeze passes are happening anyway. [This BPFtrace walkthrough](https://www.tigerdata.com/blog/using-bpftrace-to-trace-postgresql-vacuum-operations) shows exactly which passes and when.

## What tuning actually does

Most autovacuum tuning falls into two categories: make it run more aggressively, or make it yield more to writes.

Running it more aggressively means lowering `autovacuum_vacuum_scale_factor`, increasing `autovacuum_max_workers`, reducing `autovacuum_naptime`. Dead tuples get cleaned before they affect query plans. Freeze passes complete before XID pressure builds. Real improvement.

Yielding more to writes means increasing `autovacuum_vacuum_cost_delay`, lowering `autovacuum_vacuum_cost_limit`. Write latency stabilizes. The tradeoff is that vacuum falls further behind and bloat accumulates more between cycles.

Here's what neither category does: reduce the amount of work autovacuum needs to do. Every configuration choice is about how the work gets distributed across time and how aggressively it competes with your actual workload. You're tuning the scheduler. Not the workload.

Per-table overrides are the right implementation of this: more aggressive settings on active partitions, letting older ones vacuum on a slower cycle. Good practice. It's still adjusting the tax rate, not the taxable activity.

## The operational cost people don't account for

Autovacuum tuning isn't free engineering time.

Someone has to write the per-table `ALTER` statements. Someone has to monitor whether the settings are actually working. At 500 partitions, "monitor autovacuum lag" is a real job that runs on a recurring schedule. New partitions inherit defaults unless automation creates them with the right settings. That automation needs maintaining.

The monitoring surface for autovacuum lag spans three separate system views and requires correlating timestamps across them. It’s not a dashboard that lights up. It’s a debugging session.

When autovacuum falls behind, the symptom usually isn't an autovacuum alert. It's query latency regression. Write performance degradation. The connection between autovacuum backlog and query performance is real but indirect. Connecting the two is senior engineer work. Hours, not a quick look at a graph. Sigh.

If you've tracked where your team actually spends time on database operations, a meaningful slice of it is this: watching autovacuum, tuning autovacuum, debugging incidents that turn out to be autovacuum-adjacent, and writing runbooks for autovacuum behavior on new partition types.

All of that cost is real. None of it is the kind of cost an append-only workload should be paying.

## Why append-only data shouldn't work this way

An append-only system has a simple storage contract. Data arrives. It gets written. It ages. Eventually it gets dropped, wholesale, by time window.

Nothing is ever modified in place. There is no concurrent row modification to manage. Reads never block writes and writes never block reads because there's no contention to prevent. The MVCC guarantee is irrelevant to the workload.

Postgres MVCC is the right model for workloads that need that concurrency guarantee. For workloads that don't, it's overhead. Autovacuum running continuously on your append-only table isn't Postgres failing. It's Postgres correctly maintaining the MVCC invariants on a workload that doesn't benefit from them.

The cost is real. The benefit isn't.

Tuning autovacuum makes the cost more manageable. It doesn't make the benefit appear.

## What changes when the storage model matches the workload

[Hypercore](https://www.tigerdata.com/blog/hypercore-a-hybrid-row-storage-engine-for-real-time-analytics) is built for exactly that contract. It abandons the per-tuple MVCC model for compressed column segments. Up to 1,000 row versions get batched into a single compressed segment before writing. Dead tuples in the traditional sense don't accumulate because the storage model doesn't create them.

When rows land in a Hypercore segment, transaction visibility is tracked at the segment level, not the tuple level. There is no per-tuple `t_xmin` to freeze. No hint bit to set on first read. The three mechanisms that drive autovacuum on your append-only heap table have nothing to work with here because the storage model does not create the conditions they require.

The result: autovacuum pressure drops proportionally. Postgres doesn't disappear. There's still housekeeping work. But the continuous autovacuum activity on high-insert partitions goes away. So does the I/O competition during write peaks, the tuning overhead, the monitoring surface for vacuum lag. The conditions that produced all of it no longer exist.

`vacuum_count` stops climbing on tables that nobody updates. `pg_stat_activity` stops showing vacuum workers at 3am. The runbook section on autovacuum tuning gets shorter.

Same SQL. Same wire protocol. Different storage contract underneath, matched to the workload that's actually running.

## Conclusion

Autovacuum isn't broken. It isn't misconfigured. It's working correctly given what it's been handed.

The problem is that "working correctly" on an append-only high-insert table means running constantly, competing with writes, requiring ongoing tuning, and consuming real engineering time. All to manage overhead that a purpose-built append system wouldn't generate in the first place.

That's the tax. The settings determine how you pay it. The workload determines that you owe it.

If autovacuum is in your top processes by CPU and I/O on a table that nobody updates, that's not a Postgres problem to fix. It's a signal about the relationship between your workload and your storage model. If your write pattern is append-only, the question worth asking is not how to tune autovacuum better. It is whether your storage model is matched to your workload at all.

[Understanding Postgres Performance Limits for Analytics on Live Data](https://www.tigerdata.com/blog/postgres-optimization-treadmill) goes deeper on where that mismatch shows up across the system, and what the path forward looks like at different data volumes.