Speed Without Sacrifice: 2500x Faster Distinct Queries, 10x Faster Upserts, Bloom Filters and More in TimescaleDB 2.20

Posted by

Brandon Purcell

Philip

We're excited to announce TimescaleDB 2.20, a release that further solidifies TimescaleDB as the fastest PostgreSQL for time-series and analytical workloads. This version delivers remarkable performance enhancements, especially for demanding analytical queries and crucial real-world data management tasks like updates and upserts, without asking developers to sacrifice the flexibility, full SQL capabilities, or rich ecosystem of PostgreSQL.

Beyond significantly boosting query performance, TimescaleDB 2.20 streamlines the developer experience, particularly when managing data efficiently and working with continuous aggregates. Get ready for impressive speed-ups and powerful enhancements that help you move faster and scale smarter.

Below, we bring you all the details on this edition’s new features and developer experience enhancements. But if you want the TL;DR, here it is.

TimescaleDB 2.20 makes the fastest PostgreSQL for real-time analytics on time-series data even faster, delivering speed without the sacrifices found in specialized OLAP databases. Key highlights include:

Efficiently handle data updates and upserts (including backfills, now up to 10x faster) a critical real-world capability offering data management flexibility on highly constrained data rarely seen in append-only optimized OLAP or time-series databases.
Up to 6x faster point queries on high-cardinality columns (e.g., UUIDs, specific IDs) using new bloom filters—a key differentiator from OLAP systems that often struggle with such targeted lookups.
Up to 2500x faster DISTINCT operations with SkipScan, perfect for quickly getting a unique list or the latest reading from any device, event, or transaction.
8x more efficient Boolean column storage with vectorized processing resulting in 30-45% faster queries.
Enhanced developer flexibility with continuous aggregates now supporting window and mutable functions, plus customizable refresh orders.
The end of support for PostgreSQL 14.

With the TL;DR out of the way, we can deep dive into everything 2.20 has to offer!

Flexible Data Management: Efficient Updates & Upserts Now Up To 10x Faster!

Inserting or updating data into compressed columnstore chunks has been a pain point for some users, often requiring creative workarounds to manage throughput. We've heard you, and TimescaleDB 2.20 brings a massive improvement to this experience!

For heavily constrained UPSERT or ON CONFLICT, performance is now up to 10 times faster.

Consider a worst-case scenario backfill of 3,000 records where every record conflicts with an existing unique constraint:

INSERT INTO uk_ts_comp_table
	SELECT * FROM insert_sample on conflict (uk,ts) do nothing;

TimescaleDB 2.19.3: ~17 seconds
TimescaleDB 2.20.0: ~1.7 seconds (a 10x improvement!)

This is significantly closer to the performance of inserting into uncompressed rowstore chunks (which took ~300ms in this test). If no unique constraints are present in a chunk, the backfill performance between rowstore and columnstore chunks is now nearly equivalent.

We're not stopping here! We have more improvements lined up for backfilling, including leveraging bloom filters, to further enhance performance in upcoming releases. Our goal is to make backfilling into columnstore only marginally slower than into uncompressed tables, even for the most complex scenarios.

Columnstore Query Performance: Beyond OLAP Limitations

TimescaleDB's columnstore on PostgreSQL already provides powerful analytical capabilities. Version 2.20 pushes these boundaries, offering speeds that challenge dedicated OLAP systems while retaining the full power and flexibility of PostgreSQL.

Finding high-cardinality values faster with bloom filters (up to 3.5x faster)

Traditional OLAP databases excel at scanning large datasets but often falter with point lookups or finding rare values (like a specific UUID or transaction ID). This is where TimescaleDB 2.20 further differentiates itself by enhancing PostgreSQL's capabilities. We introduce bloom filters for columnstore batches, addressing a common pain point and delivering "speed without sacrifice"—you don't lose point query performance for analytical speed.

A bloom filter is a space-efficient, probabilistic data structure that quickly determines if an element might be in a set or definitely is not.

How bloom filters help bridge the gap:

For columns that aren't segmentby columns, we now build a bloom filter for each batch. When you query for a specific value (e.g., WHERE subject = 'unsubscribe'), TimescaleDB first checks the bloom filter:

If the filter says the value isn't in the batch, we skip decompressing and scanning that batch entirely.

If it might be there, we then decompress and check. This drastically reduces data processing for such lookups, making your PostgreSQL-based time-series database performant for mixed workloads.

Performance impact:

Queries like SELECT min(sent), max(sent), count() FROM hackers WHERE subject = 'unsubscribe' ORDER BY count() DESC LIMIT 10; can execute 3.5 times faster (e.g., from 12ms down to 2.7ms).
A blockchain customer benchmark searching for rare addresses saw query times drop from 245ms to 40ms (a 6x improvement!). SELECT * FROM token_transfers WHERE from_address = '0xa148e876bbd30e1937be268a2f4387b14eff5cb0' ORDER BY block_timestamp DESC LIMIT 10;

Storage and availability:

Each bloom filter adds a small amount of storage per batch (up to 1kB, but can be as little as 9 bytes depending on cardinality). This feature is enabled by default for all new chunks created in TimescaleDB 2.20. Existing chunks require recompression to gain bloom indexes.

Lightning-fast distinct operations with SkipScan (up to 2500x faster!)

Building on its initial availability for rowstore since version 2.2.1, a standout feature in TimescaleDB 2.20 is the major advancement and optimization of SkipScan for columnstore hypertables. This optimization dramatically speeds up queries that retrieve distinct values from segmentby columns—a common operation that can be slow in many databases.

What is SkipScan?

SkipScan is an intelligent query execution technique that significantly accelerates finding distinct values from an ordered index. Instead of scanning every entry, SkipScan efficiently jumps from one distinct item to the next, bypassing duplicates. SkipScan was first supported for regular PostgreSQL tables and rowstore hypertables in version 2.2.1. In TimescaleDB 2.20, we've massively enhanced this capability for columnstore, unlocking its full potential and performance.

How SkipScan accelerates Your columnstore queries

Columnstore in TimescaleDB organizes data into compressed batches, with an internal B-tree index on segmentby and orderby min/max metadata columns. SkipScan leverages this structure. If you need to find distinct devices or get the latest reading for each, SkipScan shines, delivering PostgreSQL performance at speeds that rival specialized systems.

Significant performance gains:

The benchmarks speak for themselves:

SELECT DISTINCT ON (device) device FROM ht_metrics_compressed;

2500x faster: From ~2.2 seconds down to a mere 0.85 milliseconds!

WITH temp as (SELECT DISTINCT ON (device) * FROM ht_metrics_compressed) SELECT avg(value) FROM temp;

2000x faster: From ~3.3 seconds down to 1.65 milliseconds.

SELECT count(DISTINCT device) FROM ht_metrics_compressed;

2000x faster: From ~2.42 seconds down to 1.16 milliseconds.

The technique applies to finding distinct values of segmentby columns in a columnstore.

Scenario 1: Show latest readings on each device

Imagine you want the most recent metric reading for every device:

select DISTINCT ON (device) device, time, value from metrics
ORDER BY device, time DESC;

With SkipScan:

It locates the first batch for the first device (e.g., device = 1).
It decompresses and returns the first tuple (latest reading due to ORDER BY time DESC) from this batch.
It then intelligently skips to the first batch for the next device (e.g., device = 2) and repeats the process.

This avoids scanning and decompressing unnecessary data. For a table with 10 distinct devices and millions of rows spread across thousands of batches, SkipScan might only need to perform a handful of targeted reads and decompressions, leading to execution times like 4.589 ms instead of 3912.069 ms (a ~850x improvement in this specific example).

Scenario 2: Show latest readings for devices updated in the last hour

What if you add a time filter?

SELECT DISTINCT ON (device) device, time, value
FROM metrics
WHERE time > (now() - INTERVAL '1 hour')
ORDER BY device, time DESC;

TimescaleDB's columnstore already uses min/max metadata per batch to prune batches that don't overlap with the time filter. SkipScan builds on this. Even if many batches are filtered out by the time condition, SkipScan still provides an advantage by efficiently finding the first relevant record for each distinct device within the remaining candidate batches.

In one test, this query ran in 2.537 ms with SkipScan, compared to 9.236 ms without it. While the relative improvement is smaller because the WHERE clause is already highly selective, SkipScan still ensures optimal performance.

Scenario 3: Show latest readings where measurement exceeds a threshold

Consider a filter on a non-indexed, non-segmentby column:

SELECT DISTINCT ON (device) *
FROM metrics
WHERE value > 50
ORDER BY device, time DESC;

Here, batch-level filtering on value isn't possible. Without SkipScan, the system would have to decompress and scan many rows for each device to find those with value > 50 before picking the latest.

With SkipScan:

For device = 1, it scans batches in time DESC order.
It decompresses rows within these batches until it finds the first row satisfying value > 50.
It then immediately skips to device = 2 and repeats.

This "early exit" per device offers huge savings. One such query saw execution time drop from 2145.536 ms to 45.392 ms (a ~47x improvement) thanks to SkipScan.

SkipScan, now highly optimized for columnstore in addition to its existing rowstore capabilities, is a game-changer for DISTINCT queries on your primary segmentby column, especially across hybrid tables where it can efficiently process both data formats, making your analytical dashboards and reports snappier than ever.

Bool columnstore: Enhanced storage efficiency and vectorized speed

We're continuing to optimize storage and processing for all data types. In TimescaleDB 2.20, we've finalized improvements for boolean columns in our columnstore:

Ultra-Efficient Storage (1 Bit per Boolean): Previously, a boolean value might occupy one byte in storage. [cite: 69] Now, we store booleans using just one bit within our custom compressed format. This is an 8x reduction in storage for boolean data in the best case and is now the default behavior. This improvement was introduced in version 2.19.0 (previously disabled by default) and is now enabled by default for all new data. To apply this efficient storage to existing chunks, they will need to be recompressed.
Vectorized Processing: Alongside storage improvements, we've introduced vectorized processing for boolean filter clauses, leveraging modern CPU capabilities like SIMD (Single Instruction, Multiple Data). This allows for parallel processing of boolean conditions, leading to faster query execution. You can read more about our journey with SIMD vectorization in this blog post.

Performance gains:

These improvements show up in benchmarks:

ClickBench Query 6 (URL, PageViews):

SELECT URL, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = false AND IsRefresh = false AND URL <> '' GROUP BY URL ORDER BY PageViews DESC LIMIT 10;

Improved from ~13ms to ~10ms.

ClickBench Query 7 (Title, PageViews):

SELECT Title, COUNT(*) AS PageViews FROM hits WHERE CounterID = 62 AND EventDate >= '2013-07-01' AND EventDate <= '2013-07-31' AND DontCountHits = false AND IsRefresh = false AND Title <> '' GROUP BY Title ORDER BY PageViews DESC LIMIT 10;

Improved from ~8.5ms to ~5.1ms.

tsbench (uk_price_paid dataset):

SELECT town, sum(price) FROM uk_price_paid WHERE is_new = true AND duration > 1 GROUP BY 1;

Improved from ~41ms to ~26ms.

These enhancements mean more efficient storage and faster queries when dealing with boolean flags and conditions in your time-series data.

A Smoother Start: Simplified Hypertable Creation in TimescaleDB

Getting started with TimescaleDB just got dramatically easier. In earlier versions, creating a hypertable with columnstore and compression often required four separate SQL statements and prior knowledge of how to configure each option. This created a steep learning curve for new users and extra friction for experienced developers looking to spin up new time-series tables quickly.

In TimescaleDB 2.20, we’re collapsing those steps into a single, streamlined CREATE TABLE statement that gets you up and running faster:

CREATE TABLE metric (
	time timestampz not null,
	device text,
	value float
)
WITH (
	timescaledb.hypertable,
	timescaledb.partition_column='time',
	timescaledb.chunk_interval='1 day'
);

This new declarative syntax offers a smarter default experience:

It creates the metrics table.
Automatically promotes it to a hypertable (timescaledb.hypertable).
Applies the partitioning and chunking settings inline.
Enables the columnstore engine by default, with automatic configuration of segmentby and orderby settings based on statistical heuristics when compression first runs.

This means you can get started analyzing time-series data with compression and columnar performance without writing multiple DDL statements or tweaking config knobs upfront.

Note: While compression is now automatically enabled, a compression policy is still not configured by default. That step remains manual (for now)—but we’re working to automate this too in an upcoming release.

The previous APIs for creating hypertables and enabling compression still work and will continue to be supported. This new approach is fully backward-compatible and designed to improve the developer experience, not disrupt existing workflows.

Continuous Aggregates—Enhanced Developer Experience

Continuous aggregates are a cornerstone of managing and querying large time-series datasets efficiently. TimescaleDB 2.20 introduces more flexibility and power to how you define and manage them.

Window functions and mutable functions in continuous aggregates

We're excited to announce that you can now use window functions and mutable functions within your continuous aggregate query definitions! The support is experimental, as there is a risk of data inconsistency if the user is not careful. For example, in backfill scenarios, buckets could be missed.

Window Functions: Support for window functions is experimental and must be enabled via the GUC: SET timescaledb.enable_cagg_window_functions = on;. This allows for more complex calculations like moving averages, rankings, and period-over-period comparisons directly within your cagg definition.

Mutable Functions: Mutable functions (functions whose results can change even with the same inputs, e.g., random() or now()) are now permitted. The system will emit a warning if you use them in a materialized query.

The order of materialization is now changeable. With the policy batched refreshes introduced in the 2.19.0 release, we made the hard-coded decision to always update the most recent data first. Now it’s possible to change the order depending on the use case to refresh the oldest data first. In the refresh policy, the option refresh_newest_first allows you to change the sorting, if set to TRUE the newest data will get refreshed first, and FALSE starts with the oldest. The default value is TRUE.

Why is this important? These functions were previously blocked due to the potential risk of data inconsistency, especially during backfills or re-materializations where historical "states" might not be perfectly reproducible. While these risks still exist and require careful consideration by the user, enabling these functions significantly improves developer experience and allows for a wider range of analytical rollups.

Use with caution: Be mindful of the implications, especially with window

functions that depend on specific data ordering or mutable functions that could lead to non-deterministic aggregate values over time if not handled carefully.

Customizable materialization order for continuous aggregates

Previously, when a continuous aggregate refresh policy ran, it would always prioritize materializing the newest data first. This makes sense for many real-time monitoring use cases.

However, some scenarios benefit from refreshing the oldest data first (e.g., backfilling historical data or ensuring completeness of older periods before newer ones). In TimescaleDB 2.20, you now have control over this behavior.

The add_continuous_aggregate_policy function (and related policy modification functions) now includes an option: refresh_newest_first.

refresh_newest_first = TRUE (default): Refreshes start from the newest data and work backward.

refresh_newest_first = FALSE: Refreshes start from the oldest data and work forward.

This added flexibility allows you to tailor the materialization strategy to your specific application needs.

End of Support for PostgreSQL 14

As announced with the release of TimescaleDB 2.19.0, version 2.20 drops support for PostgreSQL 14.

Supported PostgreSQL Versions: TimescaleDB 2.20 is compatible with PostgreSQL 15, 16, and the upcoming PostgreSQL 17. We recommend users aim for the latest PostgreSQL versions to benefit from their own performance and security enhancements.
Upgrade Path for Self-Hosted Users: If you are self-hosting TimescaleDB on PostgreSQL 14, you will be blocked from upgrading to TimescaleDB 2.20.0 or later versions. You must first upgrade your PostgreSQL instance to version 15 or newer.
Timescale Cloud: For our Cloud customers, the Timescale team handles TimescaleDB extension upgrades. If your service is currently running on PostgreSQL 13 or PostgreSQL 14, you will need to upgrade your PostgreSQL version to 15 or newer through the Timescale Cloud console. Once your PostgreSQL upgrade is complete, your TimescaleDB instance will be upgraded to version 2.20 (or the latest available version). Services already on PostgreSQL 15+ will be upgraded to TimescaleDB 2.20 by our team.

Please plan your PostgreSQL upgrades accordingly to take advantage of the latest TimescaleDB features.

Upgrade to TimescaleDB 2.20 Today!

TimescaleDB 2.20 marks a significant leap forward, reinforcing its position as the fastest PostgreSQL for time-series and demanding workloads. We've delivered substantial speed improvements for columnstore queries and data management without sacrificing the robust capabilities, full SQL, and developer experience you expect from PostgreSQL. Features like SkipScan, bloom filters for point queries, and dramatically faster, more flexible data updates on historic data allow you to tackle demanding workloads with greater efficiency and flexibility than typical OLAP solutions might offer.

For Timescale Cloud users on PostgreSQL 15 and above, these enhancements are already live and available, while self-hosted users can upgrade now to explore these benefits firsthand. Check out our release notes for a full list of improvements and bug fixes. As always, we welcome your feedback in our Community Forum or on GitHub. Check out

Happy querying!

Date published

Jun 02, 2025

Posted by

Brandon Purcell

Philip

Get Started Free with Tiger CLI

Date published

Jun 02, 2025

Posted by

Brandon Purcell

Philip

Get Started Free with Tiger CLI