---
title: "Compression Faceoff: Postgres TOAST vs Timescale Compression"
published: 2023-11-29T10:39:12.000-05:00
updated: 2025-12-09T13:53:07.000-05:00
excerpt: "PostgreSQL TOAST vs. Timescale compression: read the comparison and learn why TOAST is not the most effective database compression mechanism."
tags: PostgreSQL, Blog, #carlota, Columnstore, Benchmarks & Comparisons
authors: Muhammad Ali Iqbal
---

> **TimescaleDB is now Tiger Data.**

[Database compression is a critical technique in the world of data management](https://www.timescale.com/learn/what-is-data-compression-and-how-does-it-work)—and an infamous gap in the PostgreSQL toolkit. Even if TOAST (The Oversized-Attribute Storage Technique) plays a pivotal role in PostgreSQL when it comes to efficiently storing large data values, [it is not functional nor efficient enough to serve as a modern database compression mechanism](https://timescale.ghost.io/blog/what-is-toast-and-why-it-isnt-enough-for-data-compression-in-postgres/).

But we’ll let the numbers prove our point! In this blog post, we show you how [Timescale’s columnar compression](https://timescale.ghost.io/blog/building-columnar-compression-in-a-row-oriented-database/) can help you reduce your PostgreSQL database size (in a way that TOAST could never do!) by running a comparative analysis between both compression techniques.

You can then learn how the Timescale compression mechanism works and get advice on implementing it: Timescale is fully built on PostgreSQL, and you can [try it for free](https://console.cloud.timescale.com/signup) to experiment with compression in your PostgreSQL databases. 

## [PostgreSQL TOAST](https://www.tigerdata.com/blog/what-is-toast-and-why-it-isnt-enough-for-data-compression-in-postgres): A Quick Recap 

In essence, TOAST is PostgreSQL's built-in method for managing large data values that don't fit within its fixed page size (8 kB). It compresses these oversized values and, if they're still too large, moves them to a separate TOAST table, leaving pointers in the original table​​.

💡

We covered both TOAST and Timescale compression extensively in these blog posts—give them a read if you haven’t already:   
  
\- [What Is TOAST (and Why It Isn’t Enough for Data Compression in Postgres)](https://www.timescale.com/blog/what-is-toast-and-why-it-isnt-enough-for-data-compression-in-postgres/)  
\- [Building Columnar Compression for Large PostgreSQL Databases](https://www.timescale.com/blog/building-columnar-compression-in-a-row-oriented-database/)

  

However, TOAST is not a comprehensive solution for data compression in PostgreSQL. Its primary role is to manage large values within a Postgres page, not to optimize overall storage space. TOAST lacks user-friendly mechanisms for dictating compression policies, and accessing TOASTed data can add overhead, especially when stored out of line​​.

To give PostgreSQL a more modern compression mechanism, Timescale introduced columnar compression to PostgreSQL through the TimescaleDB extension. This columnar method can significantly reduce the size of your large PostgreSQL databases by up to 10x, with over 90 % compression rates, by shifting from PostgreSQL’s row-based storage to a columnar approach. This columnar compression not only reduces storage needs but also improves query performance, making PostgreSQL databases much more easily scalable. 

## TOAST vs. Timescale Compression: Comparing Compression Rates 

Before getting into what Timescale compression is all about (and how to configure it if you decide to test it), let’s jump straight into the purpose of this blog post—to see how Timescale compression compares to TOAST in terms of compression ratios. We want to know which compression technique is more effective at reducing PostgreSQL storage. 

To run this comparison, we used the table structures and other helper functions [found in this repo](https://github.com/stormatics/Timescale-PostgreSQL-Toast-Compression). To compare the effects of different TOAST strategies and Timescale compression, we looked at the size of the following tables: 

-   `hit_uncompressed`: A regular PostgreSQL heap table with TOAST strategy set to `external`, no compression applied.
-   `hit_compressed_pglz`: Regular table with TOAST set to `main`, using `PGLZ` for default\_toast\_compression.
-   `hit_compressed_lz4`: Similar to above, but using `LZ4` for default\_toast\_compression.
-   `hit_hyper_compressed`: A compressed [hypertable](https://www.tigerdata.com/blog/database-indexes-in-postgresql-and-timescale-cloud-your-questions-answered) (with Timescale compression) with 22 compressed chunks.

Initially, we loaded a 70 GB dataset with around 100 million rows. These are the sizes of each table and the compression ratio in each case:

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2023/11/Compression-Faceoff-Postgres-TOAST-vs-Timescale-Compression-table-1.png)

_**Note**: We are not including index sizes here as most of the indexes applied to the hypertable are disregarded or eliminated. Timescale creates internal indexes on **compress\_segmentby** and **compress\_orderby** columns when applying compression—more on that later._

In this case, it's evident that TOAST compression offers only marginal advantages in terms of reducing the table size. As we argued in this blog post, TOAST is effective in compressing larger data values but doesn’t have a mechanism for reducing table sizes independently of the size of the individual rows. On the contrary, Timescale compression yields a 72 % reduction in storage requirements for this particular table.  

To illustrate better the nature of TOAST, we ran another test, this time introducing a new column named payload with a JSONB data type, populating it with randomly generated data ranging in size from 1 byte to 3 kilobytes. This is how the results varied: 

![](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2023/11/Compression-Faceoff-Postgres-TOAST-vs-Timescale-Compression-table-2-1.png)

As you can see, TOAST results improve when the JSONB column is introduced. JSONB columns often contain more complex and larger data structures than plain text or numeric data—therefore, TOAST is more effective at compressing it. 

Still, Timescale compression performs much better. Timescale's columnar compression is particularly adept at handling complex data types like JSONB; this format often contains repetitive and patterned data, which especially fits Timescale’s columnar storage. Timescale also applies different algorithms optimized for various data types, including LZ4 for JSONB values. With further optimization, Timescale’s compression rate could even improve—[many Timescale users see compression rates over 90 %](https://timescale.ghost.io/blog/how-ndustrial-is-providing-fast-real-time-queries-and-safely-storing-client-data-with-97-compression/)! 

## Implementing Timescale Compression

Let’s walk you through the mechanics of implementing Timescale compression, briefly covering how it works architecturally and how you configure it as a user—so you can test it out more effectively afterward. If you’re already up to date with Timescale compression, feel free to skip this section. 

### Timescale compression architecture 

In Timescale, compression operates at the level of individual [chunks (partitions) in a hypertable](https://www.timescale.com/learn/is-postgres-partitioning-really-that-hard-introducing-hypertables). Each compressed chunk utilizes a hybrid row-columnar format, where 1,000 rows of uncompressed data are grouped together and transformed into column segments.  

The primary objective here is to consolidate this data into a single row, thereby eliminating the need for numerous rows to accommodate the same information. This design also allows the Timescale engine to apply various [compression algorithms](https://www.tigerdata.com/blog/time-series-compression-algorithms-explained) depending on the data type, such as Gorilla compression for floats and delta-of-delta with Simple-8b and run-length encoding for timestamps and integer-like types.

![Timescale's columnar compression consolidates many tuples into a single row, applying compression algorithms specific to each data type for maximum compression ratios ](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2023/11/Postgres-TOAST-vs-Timescale-Compression_compressed-chunks-1.png)

__Timescale's columnar compression consolidates many tuples into a single row, applying compression algorithms specific to each data type for maximum compression ratios__

Let's delve further into the mechanics of how this process operates: 

-   As we previously mentioned, Timescale's compression operates on individual chunks in a hypertable. The chunks, once identified by a user-defined [compression policy](https://docs.timescale.com/api/latest/compression/add_compression_policy/) based on their age, undergo a transformation process.
-   When this transformation process begins, it then consolidates multiple records within each chunk into a single row with an array-like structure. This leads to a substantial reduction in storage overhead, as we’ll see later. For instance, in the case of a chunk containing 10,000 records, this process can generate 10 rows, each housing thousands of arrayed values per column.
-   [Following the consolidation of records, TimescaleDB employs compression algorithms tailored to each column based on their data types.](https://timescale.ghost.io/blog/time-series-compression-algorithms-explained/)
-   [Utilizing properties such as `timescaledb.compress_orderby` and `timescaledb.compress_segmentby` can further optimize the compression process.](https://timescale.ghost.io/blog/time-series-compression-algorithms-explained/)
-   If `timescaledb.compress_orderby` property is used, items inside the array are ordered based on the specified columns before compression. This operation functions similarly to the SQL `ORDER BY` clause. By default, `compress_orderby` to the hypertable's time column in `DESC` order.
-   If `timescaledb.compress_segmentby` property is enabled, these columns are used for grouping the compressed data rows. This should be applied when the column is used in the query's `WHERE` clause. It's worth noting that the `compress_segmentby` column will remain in an uncompressed state (which we will see in the upcoming example). By default, `compress_segmentby` is set to null.
-   After Timescale compression, if the size of the compressed row exceeds the `TOAST_TUPLE_THRESHOLD`(2 kB), that row is relocated to a secondary storage location (the toast table), and a pointer is established within the chunk to reference this data.

### Enabling Timescale compression 

To enable compression in a particular hypertable, you would [define a compression policy](https://docs.timescale.com/api/latest/compression/add_compression_policy/#required-arguments) that automatically compresses chunks (partitions) once they reach a certain age. For example, the compression policy below automatically compresses all chunks older than seven (7) days: 

```SQL
ALTER TABLE <HyperTable> SET (
    timescaledb.compress,
    timescaledb.compress_orderby = 'col1 ASC/DESC, col2',
    timescaledb.compress_segmentby = 'col1, col2, ...'
);

-- Set Compression Policy
SELECT add_compression_policy('test', INTERVAL '7 days');

```

  
If you wish, you could also [manually compress individual chunks](https://docs.timescale.com/use-timescale/latest/compression/manual-compression/), for example, if you want to run some tests to experiment with compression:   

```SQL
SELECT compress_chunk('<chunk_name>');
```

### Understanding the role of `orderby` and `segmentby` in Timescale compression 

The process of compression changes how indexes are used on hypertables: since we’re switching to [columnar storage](https://www.tigerdata.com/blog/building-columnar-compression-in-a-row-oriented-database), most traditional indexes applied to the hypertable are no longer utilized after compression. To still allow for efficient querying and ingestion, Timescale leverages specialized indexes that are tailored to work with the compression setup via the `compress_orderby` and `compress_segmentby` parameters we mentioned earlier. 

`compress_orderby` is a parameter that dictates the ordering of data within each compressed chunk. It functions similarly to the SQL `ORDER BY` clause, helping to organize data in a way that optimizes compression efficiency and query performance.

`compress_segmentby` is used for grouping data within the compressed chunks. It's especially useful for queries that frequently access certain columns, as it ensures that these columns are stored together, enhancing the efficiency of data retrieval.

Let’s look at a practical example to understand how `compress_orderby` and `compress_segmentby` influence compression. Consider this example hypertable: 

| device_id | cpu | disk_io | timestamp |
| 1 | 0.10 | 2 | 2022-10-21 00:10:01 |
| 2 | 28 | 12 | 2022-10-21 00:10:01 |
| 2 | 25 | 12 | 2022-10-21 00:10:02 |
| 3 | 91 | 0.1 | 2022-10-21 00:10:02 |
| 1 | 0.11 | 3 | 2022-10-21 00:10:02 |
| 2 | 22 | 14 | 2022-10-21 00:10:03 |
| 3 | 98 | 0.1 | 2022-10-21 00:10:03 |

Now, **let’s compress the previous hypertable with `compress_segmentby = null` and `compress_orderby = timestamp desc`**:  

```SQL
ALTER TABLE example_hypertable SET (
    timescaledb.compress,
    timescaledb.compress_orderby = 'timestamp DESC'
);

SELECT add_compression_policy('example_hypertable', INTERVAL '7 days');
```

By applying `ORDER BY` according to `timestamp desc`, we’re sorting the records from the most recent to the oldest based on the timestamp column—something like this: 

| device_id | cpu | disk_io | timestamp |
| 3 | 98 | 0.1 | 2022-10-21 00:10:03 |
| 2 | 22 | 14 | 2022-10-21 00:10:03 |
| 2 | 25 | 12 | 2022-10-21 00:10:02 |
| 3 | 91 | 0.1 | 2022-10-21 00:10:02 |
| 1 | 0.11 | 3 | 2022-10-21 00:10:02 |
| 2 | 28 | 12 | 2022-10-21 00:10:01 |
| 1 | 0.10 | 2 | 2022-10-21 00:10:01 |

  
Then, the compression engine would merge the rows into an array format, giving the compressed chunk its columnar look: 

| device_id | cpu | disk_io | timestamp |
| [3,2,2,3,1,2,1] | [98,22,25,91,0.11,28,0.10] | [0.1,14,12,0.1,3,12,2] | [2022-10-21 00:10:03,2022-10-21 00:10:03,2022-10-21 00:10:02,2022-10-21 00:10:02,2022-10-21 00:10:02,2022-10-21 00:10:01,2022-10-21 00:10:01] |

When the `ORDER BY timestamp desc` is applied without setting the `compress_segmentby` parameter, the resulting arrays for each column contain values that appear random and are not grouped by any specific category or attribute—this won’t result in very efficient compression ratios. 

The `compress_segmentby` parameter can be utilized to enhance the efficiency of Timescale compression. Setting this parameter would allow the compression engine to group rows based on certain columns, leading to arrays with more similar values, which is more conducive to efficient compression.

For instance, segmenting by device\_id would group similar device records together, potentially offering better compression outcomes. Let’s see how this would look by **compressing the hypertable with `compress_segmentby = device_id` and `compress_orderby  = timestamp desc`:** 

```SQL
ALTER TABLE example_hypertable SET (
    timescaledb.compress,
    timescaledb.compress_orderby = 'timestamp DESC',
    timescaledb.compress_segmentby = 'device_id'
);

SELECT add_compression_policy('example_hypertable', INTERVAL '7 days');
```

This configuration will first order the data by timestamp in descending order and then segment it by `device_id`. This means that within each compressed chunk, records will be organized first by time and then grouped according to their`device_id`, leading to potentially more efficient compression as similar device records are clustered together.

Our compressed chunk would now look like this: 

| device_id | cpu | disk_io | timestamp |
| 3 | [98,91] | [0.1,0.1] | [2022-10-21 00:10:03,2022-10-21 00:10:02] |
| 2 | [22,25,28] | [14,12,12] | [2022-10-21 00:10:03, 2022-10-21 00:10:02, 2022-10-21 00:10:01 |
| 1 | [0.11,0.10] | [3,2] | [2022-10-21 00:10:02,2022-10-21 00:10:01] |

  
This reorganization results in each `device_id` having its own set of arrays for `cpu`, `disk_io`, and `timestamp`. Such clustering makes the data more homogeneous within each segment, which is beneficial for compression algorithms. Queries that filter based on `device_id` will be more efficient due to this logical grouping and the relative homogeneity of values within each segment leads to a better compression ratio—as similar values are more effectively compressed when stored adjacently.

## Best Practices for Timescale Compression 

To optimize storage reduction with Timescale compression, consider these best practices:

-   [Ensure each hypertable chunk has a significant number of records](https://www.timescale.com/learn/determining-optimal-postgres-partition-size), and adjust the `chunk_time_interval` settings accordingly.
-   Set up a compression policy that aligns with your query patterns. Uncompressed chunks are better for frequent updates and shallow-and-wide queries on recent data, while compressed chunks are more efficient for aggregates and analytical queries. 
-   Use compress\_segmentby columns in WHERE clauses for more efficient queries, allowing decompression to occur after filtering.
-   When selecting `compress_segmentby` fields, avoid primary key columns, as they represent unique values and are less amenable to compression.
-   Avoid too many `compress_segmentby` columns: this can reduce the number of items per compressed column, diminishing compression effectiveness. Aim for at least 1,000 rows per segment in each chunk.
-   Leveraging data similarity—adjacent rows with similar values or trends compress better. Use `compress_orderby` to organize data effectively.

## Conclusion 

If you’re struggling with the size of your PostgreSQL database, Timescale—with its effective columnar compression engine—is the right tool for you. 

We have the numbers to back it up: Timescale’s individual chunk (partition) compression in a hybrid row-columnar format and clever use of different compression algorithms consistently surpassed the compression powers of TOAST in our comparison, further cementing our theory: while effective in storing large data values in PostgreSQL, TOAST seems unable to respond to the compression demands of modern applications.

Now that you’ve seen the figures and learned more about how Timescale’s compression works and how to enable it, go test it out! You can [create a Timescale account for free](http://console.cloud.timescale.com/signup) and start experimenting with compressing your PostgreSQL databases today.