---
title: "PostgreSQL Compression: Every Option, When To Use Each, and What To Expect"
description: " Every PostgreSQL compression option compared — TOAST, LZ4, column-level, and TimescaleDB columnar via Hypercore. When to use each and what storage reduction to expect."
section: "Postgres best practices"
---

> **TimescaleDB is now Tiger Data.**

PostgreSQL ships with TOAST, which compresses individual large values automatically - and most teams stop there. For workloads storing millions of rows of time-series, IoT, or analytical data, TOAST's 20-40% storage reduction on eligible values is rarely enough, because TOAST never fires on the rows where it's needed most: narrow rows with small scalar values.

This page maps every PostgreSQL compression option, when to use each, and what real-world storage reduction to expect. It covers the full range from built-in TOAST to filesystem-level compression to TimescaleDB's columnar engine via [<u>Hypercore</u>](https://www.tigerdata.com/blog/hypercore-a-hybrid-row-storage-engine-for-real-time-analytics). Tiger Data builds [<u>TimescaleDB</u>](https://www.tigerdata.com/timescaledb), so that option gets the most depth - it's the only approach that extends PostgreSQL's compression into the columnar tier. The others are covered because most teams will reach for them first, and they're the right choice for many workloads.

By the end of this page, you should be able to choose a compression strategy for your workload, write the SQL to enable it, and know what storage reduction to expect. For data compression concepts more broadly, see [<u>What is data compression and how does it work?</u>](https://www.tigerdata.com/learn/what-is-data-compression-and-how-does-it-work).

## The four PostgreSQL compression tiers

PostgreSQL compression operates at four distinct levels. Each solves a different problem for a different scope of data.

| **Tier** | **Scope** | **Best for** | **Expected reduction** | **Extension required?** |
| --- | --- | --- | --- | --- |
| TOAST (built-in) | Individual large column values (>2 KB) | Variable-length text, JSONB, arrays | 20-40% on eligible values | No |
| Column-level COMPRESSION setting (PG14+) | Per-column, still row-stored | Tuning pglz vs LZ4 per column type | Marginal vs default TOAST | No |
| Filesystem compression (ZFS/Btrfs) | All files on the storage volume | Ops-level compression, no SQL changes | 30-60% depending on data | No (OS/infra level) |
| TimescaleDB columnar via Hypercore | Chunks of time-partitioned data | Time-series, IoT, append-heavy workloads | Up to 98%; typical 90-97% | Yes (TimescaleDB extension) |

The sections below cover each tier in order, with SQL and expected ratios. Jump to the [<u>decision framework</u>](https://docs.google.com/document/d/16V6DsbA8FwRFTz3HDSwJFcsPDUEVgag1UNnh5T8fR5I/edit#decision-framework-which-compression-option-is-right-for-your-workload) if you already know which tier fits your workload.

## Tier 1: TOAST - PostgreSQL's built-in compression

TOAST stands for The Oversized Attribute Storage Technique. When a row's stored value exceeds roughly 2 KB, PostgreSQL automatically moves it to a separate TOAST table and optionally compresses it. This is transparent to queries - you read and write the table normally and decompression happens automatically.

### TOAST storage strategies

Each column has one of four storage strategies:

- **PLAIN** - No compression, no out-of-line storage. Data always stays in the main table. Used for fixed-width types like integers and timestamps.
- **EXTENDED** - Compress first, then store out-of-line if still too large. This is the default for `text`, `bytea`, and `jsonb`.
- **EXTERNAL** - Store out-of-line without compression. Useful when you need fast access to substrings without decompression overhead.
- **MAIN** - Compress if possible, prefer inline storage. A middle ground between PLAIN and EXTENDED.

### pglz vs LZ4 for TOAST

Two algorithms are available for TOAST compression:

- **pglz** - PostgreSQL's native algorithm, the default prior to PG14. Slower to compress but built-in with no configuration needed.
- **LZ4** - Available from PG14+. Faster compression and decompression than pglz, with slightly lower compression ratio in benchmarks (pglz achieves roughly 2.23x, LZ4 roughly 2.07x for typical TOAST workloads). For most workloads on PG14+, LZ4 is the better default.

To inspect and change TOAST settings:

`-- Check TOAST storage strategy for all columns in a table
SELECT attname, attstorage
FROM pg_attribute
WHERE attrelid = 'your_table'::regclass AND attnum > 0;

-- Change TOAST storage strategy
ALTER TABLE your_table ALTER COLUMN your_column SET STORAGE EXTENDED;

-- Set default TOAST compression to LZ4 (PG14+)
ALTER TABLE your_table SET (toast_compression = lz4);`

### The core TOAST limitation

TOAST compresses individual column values, not entire rows or the table as a whole. If you have a table with millions of narrow rows - a sensor reading schema like `(device_id INT, time TIMESTAMPTZ, value FLOAT)` - where no single value exceeds 2 KB, TOAST does nothing. Every value is too small to trigger TOAST, so all those rows sit uncompressed regardless of how many billions of them accumulate.

This is the fundamental limitation for time-series workloads. For a full breakdown of TOAST mechanics and its limits, see [<u>What Is TOAST and Why It Isn't Enough for Data Compression in Postgres</u>](https://www.tigerdata.com/blog/what-is-toast-and-why-it-isnt-enough-for-data-compression-in-postgres).

## Tier 2: Column-level COMPRESSION settings (PostgreSQL 14+)

PostgreSQL 14 introduced the ability to set a per-column compression algorithm - pglz or lz4 - at the DDL level. This controls which algorithm TOAST uses for that column when it does compress.

`-- Set LZ4 compression for a specific column at table creation
CREATE TABLE metrics (
    time        TIMESTAMPTZ NOT NULL,
    device_id   INT,
    payload     JSONB COMPRESSION lz4
);

-- Change compression algorithm on an existing column
ALTER TABLE metrics ALTER COLUMN payload SET COMPRESSION lz4;

-- Check compression setting per column
SELECT attname, compression
FROM pg_attribute
WHERE attrelid = 'metrics'::regclass AND attnum > 0;`

You can also set the database-wide default using the default_toast_compression GUC parameter:

`-- Set LZ4 as the default for all new toast-eligible columns
ALTER DATABASE mydb SET default_toast_compression = lz4;`

This setting controls the algorithm used within TOAST compression - it does not enable row-level or table-level compression. It's a tuning lever, not a compression strategy. For JSONB and large text columns, switching from pglz to LZ4 typically yields faster compression with a marginally lower ratio.

When to use this tier: teams running PG14+ on workloads with large variable-length columns (JSONB payloads, text blobs) who want to tune pglz vs LZ4 without adding extensions. Not useful for time-series tables with small float or integer values. For ratio and speed data, see our benchmark of [<u>pglz vs. LZ4 compression in PostgreSQL</u>](https://www.tigerdata.com/blog/optimizing-postgresql-performance-compression-pglz-vs-lz4).

## Tier 3: Filesystem compression

Filesystem compression is real, but it operates outside PostgreSQL entirely. ZFS and Btrfs support transparent compression at the storage layer. PostgreSQL data files are compressed on disk by the OS, with no SQL-level configuration needed. `ZFS compression=lz4` and `compression=zstd` are common configurations for Postgres data directories.

Expected reduction is typically 30-60% depending on data type and compressibility. This complements TOAST rather than replacing it - the two operate at different layers and stack.

This option applies to teams managing their own Postgres infrastructure on Linux with control over the storage layer. Managed cloud databases (Tiger Cloud, RDS, Cloud SQL) handle storage-level optimization at the infrastructure level - it is not user-configurable.

One practical note: Btrfs has a known regression with PostgreSQL where `fallocate()` calls cause Btrfs to skip compression on PostgreSQL data files under normal database I/O patterns; the compression that appears to work on bulk restores does not apply during regular operations. This became especially pronounced in PostgreSQL 17, which introduced `FileFallocate()`: any file touched by this call is permanently marked NOCOW by Btrfs, disabling compression regardless of mount options. 

The limitation: filesystem compression applies uniformly to all files, including indexes and WAL. It has no awareness of data structure or column types. It cannot achieve the column-aware ratios that TimescaleDB's Hypercore delivers for numeric time-series, because the filesystem sees bytes, not database columns.

## Tier 4: TimescaleDB columnar compression via Hypercore

This is where the first three tiers run out for time-series workloads, and why a fourth tier exists.

TOAST compresses column values, not rows. A sensor reading of (`timestamp, device_id, value`) has no individual value over 2 KB - TOAST never fires. Filesystem compression is blunt - it applies the same algorithm regardless of whether it's compressing a float column, a timestamp sequence, or a WAL segment. The problem requires a fundamentally different approach: storing data by column rather than by row, then applying type-aware algorithms to each column.

That's what TimescaleDB's Hypercore does. For background on why columnar storage enables better compression, see [<u>columnar databases vs. row-oriented databases</u>](https://www.tigerdata.com/learn/columnar-databases-vs-row-oriented-databases-which-to-choose).

### How Hypercore columnar compression works

TimescaleDB automatically partitions time-series data into fixed-time chunks via hypertables. When a chunk ages past a configurable threshold, Hypercore converts it from row-oriented storage to columnar format, then applies type-aware compression algorithms per column:

- **XOR-based compression (Gorilla-style)** - For floating-point values (sensor readings, metrics). XOR-encodes consecutive float values; when values change gradually between timestamps, the XOR result contains many leading zeros that can be stripped. This achieves very high compression for slowly-varying float series - the dominant data type in IoT and monitoring workloads.
- **Delta-of-delta encoding** - For timestamps and integer-like types. Stores the second derivative of the data: the difference of differences. For regular-interval time-series (one reading every 5 seconds), the delta-of-delta reduces to a series of zeroes. This compresses an 8-byte timestamp down to a single bit in the ideal case - 64x compression on the time column alone.
- **Simple-8b** - For low-cardinality integer sequences (device IDs, status codes). Packs multiple values into 64-bit integers.
- **Dictionary compression** - For all other types, and for columns with high repeated-value counts.

These algorithms are not available in vanilla PostgreSQL - they are part of the TimescaleDB extension's Hypercore storage engine. The [<u>compression methods in hypercore docs</u>](https://www.tigerdata.com/docs/learn/columnar-storage/compression-methods) cover each algorithm in depth.

### Enabling Hypercore compression

The current API uses `timescaledb.enable_columnstore` to convert chunks to the columnar format and `add_columnstore_policy()` to automate it:

`-- Step 1: Create a hypertable (time-based partitioning)
CREATE TABLE metrics (
    time        TIMESTAMPTZ NOT NULL,
    device_id   INT,
    value       FLOAT
) WITH (tsdb.hypertable);

-- Step 2: Enable the columnstore on an existing hypertable
ALTER TABLE metrics SET (
    timescaledb.enable_columnstore,
    timescaledb.segmentby = 'device_id',
    timescaledb.orderby = 'time DESC'
);

-- Step 3: Add an automated columnstore policy (convert chunks older than 7 days)
CALL add_columnstore_policy('metrics', after => INTERVAL '7 days');

-- Manually convert a specific chunk to the columnstore
SELECT convert_to_columnstore(c)
FROM show_chunks('metrics', older_than => INTERVAL '7 days') c;`

The `segmentby` column controls how rows are grouped within a chunk for compression. Grouping by `device_id` means all readings from a single device are stored together - improving both compression ratios and query performance for per-device queries.

### Expected compression ratios

| **Data type** | **TOAST (pglz)** | **TOAST (LZ4)** | **TimescaleDB columnar** |
| --- | --- | --- | --- |
| Float time-series (sensor data) | 0-5% (no compression if < 2 KB per value) | 0-5% | 90-97% storage reduction |
| Integer sequences | 0-5% | 0-5% | 85-95% storage reduction |
| JSONB payloads | 40-60% | 35-55% | 70-90% (varies by key cardinality) |
| Text / log data | 50-70% | 45-65% | 60-80% |

The real-world numbers match the theory. Ndustrial achieved [<u>97% storage reduction on industrial energy data</u>](https://www.tigerdata.com/learn/how-ndustrial-is-providing-fast-real-time-queries-and-safely-storing-client-data-with-97-compression) using TimescaleDB compression. Cloudflare uses TimescaleDB's columnar engine for analytics at scale - see [<u>How TimescaleDB Helped Cloudflare Scale Analytics and Reporting</u>](https://www.tigerdata.com/blog/how-timescaledb-helped-cloudflare-scale-analytics-and-reporting). Across Tiger Data's case studies, 90-97% reduction is typical for regular-interval float time-series. Current docs indicate chunks can be compressed by up to 98%.

For the engineering backstory on how this was built, see [<u>Building Columnar Compression in a Row-Oriented Database</u>](https://www.tigerdata.com/blog/building-columnar-compression-in-a-row-oriented-database).

### Direct Compress: write-path compression (tech preview)

Traditionally, compression ran as a background job that converted older chunks to the columnstore asynchronously. Direct Compress changes this: data is compressed in memory during insertion and written directly to the columnstore, eliminating the write-amplification of compressing already-written data.

`-- For COPY operations
SET timescaledb.enable_direct_compress_copy = on;`

The result is significantly reduced I/O footprint at ingest time. This removes the operational trade-off between ingestion speed and compression - previously, teams had to decide between writing data fast (uncompressed) and managing a background compression lag. 

Note that this feature is a tech preview and is not yet production-ready; it works best for batch ingestion of 1,000 or more rows per `segmentby` value, and does not work on tables with unique constraints, triggers, or continuous aggregates. For the announcement details, see [<u>Introducing Direct Compress</u>](https://www.tigerdata.com/blog/introducing-direct-compress-up-to-40x-faster-leaner-data-ingestion-for-developers-tech-preview).

### Querying compressed data: indexes and DML

Two objections come up consistently when teams evaluate columnar compression. Both have been resolved.

**"I can't query compressed chunks efficiently."** Hypercore supports B-tree and hash indexes on compressed data. Lookup queries on compressed data are up to 1,185x faster than unindexed scans. Compressed data is fully indexable. See [<u>PostgreSQL Indexes for Columnstore: 1,185x Faster Lookup Queries</u>](https://www.tigerdata.com/blog/postgresql-indexes-for-columnstore).

**"I can't UPDATE or DELETE compressed rows."** Compression tuple filtering resolved this. UPDATEs and DELETEs on compressed chunks are supported without decompressing the entire chunk. Only batches matching the query filter are decompressed, delivering up to 500x faster updates and deletes and up to 10x faster upserts. See [<u>Bridging the Gap: Introducing Compression Tuple Filtering</u>](https://www.tigerdata.com/blog/bridging-the-gap-between-compressed-and-uncompressed-data-in-postgres).

## Decision framework: which compression option is right for your workload?

### Choose TOAST (with LZ4) if:

- You are on PostgreSQL 14+ and have not yet set `default_toast_compression = lz4` - this is a free, zero-risk upgrade
- Your tables include large variable-length columns: `JSONB, text, bytea, xml`
- You want zero additional dependencies - TOAST is built in and always on
- Your workload is mixed OLTP (reads and writes with large values, not narrow time-series rows)

### Choose column-level COMPRESSION settings if:

- You are on PG14+ and want to tune the algorithm per column without changing storage strategy
- You have specific columns where LZ4 is the right trade-off (faster, slightly lower ratio) vs. pglz (slower, slightly higher ratio)
- You are not adding extensions and need to stay within the vanilla PostgreSQL feature set

### Choose filesystem compression if:

- You control the storage layer (self-managed Linux deployment with ZFS or Btrfs)
- You want database-agnostic compression that applies to all files, not just data that triggers TOAST
- Your primary goal is reducing infrastructure cost without any application or schema changes
- You understand the limitation: no data-type awareness, applies uniformly to all files including indexes and WAL

### Choose TimescaleDB columnar compression (Hypercore) if:

- Your data is time-series: sensor readings, metrics, events, logs - anything with a timestamp as the primary partitioning key
- You need 90%+ storage reduction and TOAST gives you 0-5% on your narrow rows
- You want to keep full SQL semantics and stay in PostgreSQL - not migrate to a specialized columnar database
- You need to run analytical queries on compressed data without decompressing to a separate system
- You are on Tiger Cloud or running the TimescaleDB extension on self-managed Postgres

### When columnar compression is not the right choice:

- Pure OLTP workloads with many point-row UPDATEs and DELETEs (though compression tuple filtering has substantially reduced this constraint)
- Tables where rows are not time-ordered and do not benefit from time-partitioned chunks
- Teams who cannot or will not add a PostgreSQL extension to their deployment

For implementation details, the [<u>TimescaleDB compression documentation</u>](https://www.tigerdata.com/docs/use-timescale/latest/compression/about-compression) covers setup, segmentby/orderby optimization, and policy configuration. 