Tiger Data Blog

Introducing Direct Compress: Up to 40x Faster, Leaner Data Ingestion for Developers (Tech Preview)

Sven Klemm — Tue, 09 Sep 2025 13:00:52 GMT

Time-series and analytical data continues to grow at an unprecedented pace, and with it comes the challenge of efficiently storing and querying massive datasets. Traditionally, compressing this data required background jobs, and additional tuning. This slowed down ingestion, added operational headache, and delayed storage savings.

That’s why today, we're excited to announce Direct Compress, a new feature coming to TimescaleDB that compresses data during ingestion in memory, eliminating the need for traditional compression policies and improving insert performance by up to 40x.

Note: Direct Compress is currently available as a tech preview in TimescaleDB 2.21 for COPY operations, with full support for INSERT operations coming in a later version.

The Evolution of TimescaleDB’s Columnstore

TimescaleDB has long been recognized for its industry-leading compression capabilities. With hypercore, TimescaleDB's hybrid row-columnar storage engine, users can achieve compression ratios of over 90% while maintaining fast query performance. Traditionally, the system would:

Insert data in uncompressed row format
Write individual WAL records for each tuple
Later compress chunks through background policies

Now, Direct Compress fundamentally changes this approach by compressing data during the ingestion process itself.

What is Direct Compress?

Direct Compress is a feature that allows TimescaleDB to compress data in memory as it's being ingested. Instead of writing WAL records for individual tuples, the system writes compressed batches directly to disk. This approach addresses several key challenges that developers and database administrators face when working with high-volume time-series data:

Excessive I/O overhead: Traditional ingestion requires writing each tuple individually to the WAL, creating significant I/O bottlenecks
Dependency on compression policies: Previously, you had to wait for background compression jobs to optimize storage
Insert performance limitations: Large-scale data ingestion was constrained by the overhead of individual tuple processing

Benchmark Results (37x Improvement)

To test the per-tuple overhead, a narrow table with only one integer column was used. Direct compression provided considerable performance improvements, with the single integer table achieving 148.8 million tuples per second using 10k batch compression—a 37x improvement over uncompressed insertion. For a table with a timestamp column and 2 integer columns we achieved an insert rate of 66 million tuples per second with compression.

The schema used does have a big impact on achievable insert rate, with more complex datatypes like jsonb or wider rows having lower ingest rates. Parsing integer columns was found to have the least overhead compared to other datatypes, and for these benchmarks more than half of the cpu time was spent parsing input even when using binary input format. Performance scaled linearly across all thread counts until reaching the storage I/O bottleneck. During these tests we used a Tiger Cloud instance with 64 cores and EBS storage—with more optimized storage higher numbers are probably achievable. For the uncompressed tests no indexes were present on the hypertables. The 1k and 10k batch size refers to the batch size used internally during compression, not the batch size used by the client sending the data.

Table with timestamp and 2 int columns

Key Benefits

Reduced I/O operations

By compressing data in memory before writing to disk, Direct Compress eliminates the need to write individual WAL records for each tuple. Instead, only compressed batches are written, dramatically reducing I/O overhead.

Eliminated policy dependencies

With Direct Compress, your INSERT operations already produce compressed chunks. This means compress_chunk() functions and compression policies become less critical to your workflow, simplifying your database maintenance.

Immediate storage efficiency

Unlike traditional compression that happens after ingestion, Direct Compress provides storage benefits immediately, reducing your storage footprint from the moment data arrives.

How Direct Compress Works

Direct Compress operates by intercepting data during the ingestion process and compressing it in memory before writing to disk. The process involves:

Batch Collection: Data is collected in configurable batches during COPY or INSERT operations.
In-Memory Compression: Each batch is compressed using TimescaleDB's proven compression algorithms.
Optimized Writing: Compressed batches are written directly to disk with minimal WAL overhead.

This approach differs from traditional compression methods because it eliminates the two-step process of "ingest then compress," instead performing both operations simultaneously. Importantly, Direct Compress requires batched operations on the client side to achieve these performance benefits. With direct compression, data ingestion becomes limited by CPU processing rather than IO speed.

Roadmap

COPY support (TimescaleDB 2.21 - Tech Preview)
INSERT support (coming soon)
Continuous aggregate support (coming soon)

Getting Started with Direct Compress

Prerequisites

Before using Direct Compress, ensure you have:

TimescaleDB version 2.21 or later (currently in tech preview)
A hypertable with compression enabled (see example)
Batched client operations to make use of the feature

Important requirements and limitations

Direct Compress requires batching on the client side to function effectively. It cannot be used:

If the hypertable schema has unique constraints
If the hypertable has triggers
Continuous aggregates on the target hypertable

Configuration options

Direct Compress is controlled through several GUCs (Grand Unified Configuration parameters):

timescaledb.enable_direct_compress_copy (default: off)

Enables the core Direct Compress feature for COPY operations. When enabled, chunks will be marked as unordered, so presorting is not required.

timescaledb.enable_direct_compress_copy_sort_batches (default: on)

Enables per-batch sorting before writing compressed data, which can improve query performance.

timescaledb.enable_direct_compress_copy_client_sorted (default: off)

⚠️ DANGER: When enabled, chunks will not be marked as unordered. Only use this if your data is globally sorted, as queries requiring ordering will produce incorrect results with unsorted data. In the context of this feature we can distinguish between local and global sorting. Local sorting means within the current batch data is sorted. Global sorting means there is no batch that will overlap with the current batch.

Basic Usage Example

-- Create a hypertable with compression
CREATE TABLE sensor_data(
    time timestamptz, 
    device text, 
    value float
) WITH (
    tsdb.hypertable,
    tsdb.partition_column='time'
);

-- Enable Direct Compress
SET timescaledb.enable_direct_compress_copy = on;

-- Use binary format for maximum performance
COPY sensor_data FROM '/tmp/sensor_data.binary' WITH (format binary);

Best Practices and Recommendations

1. Use binary format

Binary format achieves the highest insert rates. While CSV and text formats are supported, binary format provides optimal performance.

2. Consider order by configuration

The default orderby configuration is time DESC for query optimization. However, for maximum Direct Compress benefits, consider changing this to time to optimize for insert performance:

ALTER TABLE sensor_data SET (timescaledb.orderby = 'time');

This represents a trade-off between insert performance and query performance—choose based on your primary use case.

3. Presort data before ingestion

While TimescaleDB can do sorting as part of Direct Compress, it will take away CPU resources from other tasks.

4. Leverage multiple threads

The benchmark results show significant benefits from parallel ingestion. Consider using multiple threads for large data imports.

Migration and Compatibility

Upgrading existing tables

Direct Compress works with any existing hypertable that has the columnstore enabled, provided the limitations (no unique constraints, triggers, or continuous aggregates) are met.

Backward compatibility

Direct Compress is fully compatible with existing TimescaleDB compression features. You can use both traditional columnstore policies and Direct Compress simultaneously, though Direct Compress reduces the need for background compression jobs.

Looking Forward

Direct Compress represents a significant milestone in TimescaleDB's ongoing evolution toward real-time analytics at scale. This feature is part of our broader commitment to eliminating the traditional trade-offs between ingestion speed and storage efficiency.

Future enhancements to Direct Compress will include:

Support for INSERT
Additional optimizations for unsorted data when using direct compress
Compatibility with continuous aggregates
Enhanced client-side tooling for optimal batching

Try Direct Compress Today

Direct Compress brings considerable performance improvements to TimescaleDB users by eliminating the traditional ingestion bottleneck. With up to 40x faster ingestion rates and immediate storage benefits, this feature is a game-changer for high-volume time-series applications.

Whether you're managing IoT sensor data, financial market feeds, or application monitoring metrics, Direct Compress can help you achieve unprecedented ingestion performance while reducing storage costs from day one.

We encourage you to try the tech preview of Direct Compress in your development environment and share your experiences with the community. Your feedback will help us refine this feature as we move toward full release. As always, our team is available to help you optimize your TimescaleDB deployment for your specific use case.

Ready to get started? Check out our documentation or contact our team for personalized assistance with Direct Compress implementation.

Have questions about Direct Compress or want to share your results? Join the conversation in our community forum or reach out to us on GitHub.

About the Author

Sven is the tech lead for TimescaleDB, but his journey with databases started a long time ago. For over 25 years, he has been a huge fan of PostgreSQL, and it's that deep-seated passion that led him to where he is today. His work on planner optimizations and diving into the columnstore to squeeze out every bit of performance is a direct extension of his goal: to make the Postgres ecosystem even more powerful and efficient for everyone who uses it.

That long history with Postgres also informs his work on the security front. One of the projects he is most passionate about is pgspot, where he gets to help build a more secure future for the database. After all these years, he has seen firsthand how a strong, trustworthy foundation is essential. To him, a great database isn't just about speed; it's about protecting the data with unwavering reliability. This blend of performance and security is what truly excites him every day.

When he's not in the weeds of database code, you can find him thinking about the bigger picture—how to make the community and product stronger, safer, and more user-friendly. He loves the challenge of taking a complex problem and finding a simple, elegant solution. His journey with Postgres has taught him that the best technology is built on a foundation of trust and a commitment to continuous improvement.

Bridging the Gap Between Compressed and Uncompressed Data in Postgres: Introducing Compression Tuple Filtering

Sven Klemm — Wed, 18 Sep 2024 13:00:59 GMT

When we introduced columnar compression for Postgres in 2019, our goal was to help developers scale Postgres and efficiently manage growing datasets, such as IoT sensors, financial ticks, product metrics, and even vector data. Compression quickly became a game-changer, saving users significant storage costs and boosting query performance—all while keeping their data in Postgres. With many seeing over 95 % compression rates, the impact was immediate.

But we didn’t stop there. Recognizing that many real-time analytics workloads demand flexibility for updating and backfilling data, we slowly but surely enhanced our compression engine to support INSERT, UPDATE, and DELETE (DML) operations directly on compressed data. This allowed users to work with compressed data almost as easily as they do with uncompressed data.

However, it also created a problem. While we had originally intended mutating compressed chunks to be a rare event, people were now pushing its limits with frequent inserts, updates, and deletes. Seeing our customers go all in on this feature confirmed that we were on the right track, but we had to double down on performance.

Today, we’re proud to announce significant improvements as of TimescaleDB 2.16.0, delivering up to 500x faster updates and deletes and 10x faster upserts on compressed data. These optimizations make compressed data behave even more like uncompressed data—without sacrificing performance or flexibility.

Let’s dive into how we achieved these performance gains and what they mean for you. To check this week’s previous launches and keep track of upcoming ones, head to this blog post or our launch page.

How We Allowed DML Operations on Compressed Data

To understand our latest improvements, it helps to revisit how we initially threw away the rule book and enabled DML operations on compressed data.

Working with compressed data is tricky. Imagine trying to update a zipped file. You’d need to unzip the file, make your changes, and then zip it back up. Similarly, updating or deleting data in a compressed database often involves decompressing and reprocessing large chunks (potentially full tables) of data, which can slow things down significantly.

Our solution was to segment and group records into batches of 1,000, so instead of working with millions of records at once, we operate on smaller, more manageable groups. We also used techniques like segment indexes and sparse indexes on batch metadata to identify only the relevant batches of compressed data that contain the values to be updated or deleted.

However, a challenge remained: if no metadata is stored for a batch, we have to assume there could be a row inside that needs to be updated. This assumption requires decompressing the batch and materializing it into an uncompressed row format, which takes up precious time and disk space.

Optimizing Batch Processing in TimescaleDB 2.16.0

With the release of TimescaleDB 2.16.0, we focused on reducing the number of batches that need to be decompressed and materialized. This improvement, known as compression tuple filtering, allows us to filter out unnecessary data at various processing stages, dramatically speeding up DML operations.

Real-world performance gains

Here’s what that looks like in practice:

Up to 500x faster updates and deletes: by avoiding the need to decompress and materialize irrelevant batches, DML operations like UPDATE and DELETE can now be completed significantly faster.
Up to 10x faster upserts: similarly, upserts (a combination of INSERT and UPDATE) are optimized to avoid unnecessary decompression (and that’s not the only boost we made for upserts).

These gains translate to major real-world performance improvements, particularly for users dealing with large datasets that require high-frequency updates of data that has already been compressed.

How Compression Tuple Filtering Works

To achieve these optimizations, we filter data at multiple stages during DML operations. Previously, filtering was only possible using constraints on segment_by columns or columns with metadata, such as orderby or columns with a chunk skipping index.

With TimescaleDB 2.16.0, we’ve taken a microscope to our decompression pipeline and added an additional layer of inline filtering. When running DML operations, if a column doesn’t have metadata, we now apply constraints incrementally during decompression. If a batch is fully filtered out by the constraint, it’s skipped entirely and never materialized to disk or passed to Postgres to continue evaluation. This saves significant resources and time by reducing the amount of work to do further down the query pipeline.

INSERT optimizations

Let’s break it down by DML operation, starting with INSERT.

Regular INSERT operations: When inserting data into compressed chunks, no decompression is required unless you have a UNIQUE constraint on the hypertable. This makes inserts almost as fast as with uncompressed data.
Inserts with UNIQUE constraints: When a UNIQUE constraint is in place, there are three possible scenarios:
1. No ON CONFLICT clause: We check for constraint violations during decompression. If a violation is found, it’s flagged before any data is materialized to disk.
2. ON CONFLICT DO NOTHING: Similar to the first case, violations are flagged during decompression, and the INSERT is skipped without materializing data.
3. ON CONFLICT DO UPDATE (UPSERT): In cases of conflict, we decompress only the batch containing the conflicting tuple. If no conflict is detected, there’s no need to materialize anything.

UPDATE/DELETE Optimizations

For UPDATE and DELETE operations, we first check if any tuples in a batch match the query constraints. If none do, the batch is skipped, and no decompression or materialization is needed. This skip leads to dramatically faster update and delete operations.

Real-world comparison: Before and after

Let’s look at a concrete example to illustrate the performance difference with tuple filtering.

Assume you have a hypertable with 1,000 batches, each containing 1,000 tuples, for a total of 1,000,000 rows. Now, you want to delete all rows where value < 10. In this case, these rows are all contained within a single batch (maybe values lower than 10 are very rare and happen only for a short period of time).

DELETE FROM metrics WHERE value < 10;

Before TimescaleDB 2.16.0, we would have to decompress all 1,000 batches and materialize 1,000,000 tuples to disk—even if only a small portion matched the query constraints.

In TimescaleDB 2.16.0, this changes with compressed tuple filtering. We now only need to decompress the relevant batches, avoiding unnecessary materialization.

In our example query, this reduces the total query time from 829517.741 ms to 1487.494 ms, 557x faster! As you can see, tuple filtering allows us to drastically reduce the work needed to execute this query, resulting in a huge speed-up. This accelerating power is also variable: the more tuples you discard, the faster your query will become!

Why Do DML Operations on Compressed Data Matter

When working with massive datasets, every millisecond counts and resource efficiency becomes crucial. The improvements introduced in TimescaleDB 2.16.0 directly address these needs by minimizing the amount of data that must be decompressed, written to disk, and then evaluated by Postgres. This not only reduces disk I/O and CPU usage but also significantly lowers execution time. The result? More headroom to handle larger datasets, scale your systems seamlessly and improve overall application performance.

For developers managing frequent updates, inserts, or deletes in TimescaleDB hypertables, these optimizations mean that less thought has to be given to the current format of the data. Regardless of whether the data is currently stored on disk as rows or compressed columns, DML operations will work as expected.

Final Words

With the release of TimescaleDB 2.16.0, our columnar compression engine takes another big leap forward. Users can now benefit from up to 500x faster updates and deletes and 10x faster upserts—all while continuing to enjoy the storage savings and performance gains of compression.

Looking ahead, we’re committed to further enhancing our compression engine, delivering even more flexibility and performance gains. Stay tuned—there’s more to come.

Want to see these improvements in action? Sign up for Timescale today and experience the power of our compression engine for your real-time analytics workloads.

How We Fixed Long-Running PostgreSQL now( ) Queries (and Made Them Lightning Fast)

Sven Klemm — Wed, 22 Jun 2022 13:00:24 GMT

It was just another regular Wednesday in our home offices when we received a question in the forum about a query with the Postgres now() function. A TimescaleDB user with dozens of tables of IoT data reported a slow degradation in query performance and a creeping server CPU usage. After struggling with the issue, they turned to our community for help.

via GIPHY

That same question came up in our forum, Community Slack, and support more often than we’d like. We could relate to this particular pain point because we also struggled with it in partitioned vanilla PostgreSQL. After a closer look at the user’s query, we found the usual suspect: the issue of high planning time in the presence of many chunks—in Timescale slang, chunks are data partitions within a table—and in a query using a rather common function: now().

Usually, the problem with these queries is that the chunk exclusion happens late. Chunk exclusion is what happens when some data partitions are not even considered during the query to speed up the process. The logic is simple: the fewer data a query has to go through, the faster it is.

However, the problem is that now(), similarly to other stable functions in PostgreSQL, is not considered during plan-time chunk exclusion, those precious moments in which your machine is trying to find the quickest way to execute your query while excluding some of your data partitions to further speed up the process. So, your chunks are only excluded later, at execution time, which results in higher plan time—and yes, you guessed it—slower performance.

Until now, every time this issue popped up, we knew what to do. We had written a wrapper function, marked as immutable, that would call the now() function and whose only purpose was to add the immutable marking so that PostgreSQL would consider it earlier during plan-time chunk exclusion, thus improving query performance.

Well, not anymore.

Today, we’re announcing the optimization of the now() function with the release of TimescaleDB 2.7, which solves this problem by natively performing as our previous workaround.

In this blog post, we’ll look at the basics of the now() function, explain how it works in vanilla PostgreSQL and our previous TimescaleDB version, and wrap everything up with a description of our optimization, which evaluates now()expressions during plan-time chunk exclusion, significantly reducing planning time. Finally, we include a performance comparison that will blow you away (all we can say for now is “more than 400 times faster”).

via GIPHY

If you are already a TimescaleDB user, check out our docs for instructions on how to upgrade. If you are using Timescale, upgrades are automatic, so all you need to do is sit back and enjoy this very fast ride! (New to Timescale? You can start a free 30-day trial, no credit card required.)

now( ) in Vanilla PostgreSQL

Queries with now() expressions are common in time-series data to retrieve readings of the last five minutes, three hours, three days, or other time intervals. In sum, now() is a function that returns the current time or, more accurately, the start time of the current transaction. These queries usually only need data from the most recent partition in a hypertable, also called chunk.

A query to retrieve readings from the last five minutes could look like this:

SELECT * FROM hypertable WHERE time > now() - interval ‘5 minutes’;

To understand our users' slowdown, it’s vital to know that constraints in PostgreSQL can be constified at different stages in the planning process. The problem with now() is that it can only be constified during execution because the planning and execution times may differ.

Since now() is a stable function, it’s not considered for plan-time constraint exclusion; therefore, all chunks will have to be part of the planning process. For hypertables with many chunks, this query's total execution time is often dominated by planning time, resulting in poor query performance.

If we dig a little deeper with the EXPLAIN output, we can see that all chunks of the hypertable are part of the plan, painfully increasing it.

 Append  (cost=0.00..1118.94 rows=1097 width=20)
   ->  Seq Scan on _hyper_3_38356_chunk  (cost=0.00..1.01 rows=1 width=20)
         Filter: ("time" > now())
   ->  Seq Scan on _hyper_3_38357_chunk  (cost=0.00..1.01 rows=1 width=20)
         Filter: ("time" > now())
   ->  Seq Scan on _hyper_3_38358_chunk  (cost=0.00..1.01 rows=1 width=20)
         Filter: ("time" > now())
   ->  Seq Scan on _hyper_3_38359_chunk  (cost=0.00..1.01 rows=1 width=20)
         Filter: ("time" > now())
   ->  Seq Scan on _hyper_3_38360_chunk  (cost=0.00..1.01 rows=1 width=20)
         Filter: ("time" > now())
   ->  Seq Scan on _hyper_3_38361_chunk  (cost=0.00..1.01 rows=1 width=20)
         Filter: ("time" > now())

We had to do something to improve this, and so we did.

now( ) in TimescaleDB

As proud builders on top of PostgreSQL, we wanted to come up with a solution. So in previous versions of TimescaleDB, we did not use the now() expression for plan-time constraint exclusion.

In turn, we implemented constraint exclusion at execution time in a bid to improve query performance. If you want to learn more about how we did this, check out this blog post, which offers a detailed behind-the-scenes explanation of what happens when you execute a query in PostgreSQL.

While the resulting plan does look much slimmer than the original, all the chunks were still considered during planning and removed only during execution. So, even though the resulting plan looks very different (look at those 1,096 excluded chunks), the effort is very similar to the vanilla PostgreSQL plan.

Custom Scan (ChunkAppend) on metrics1k  (cost=0.00..1113.45 rows=1097 width=20)
   Chunks excluded during startup: 1096
   ->  Seq Scan on _hyper_3_39453_chunk  (cost=0.00..1.01 rows=1 width=20)
         Filter: ("time" > now())

Close, but not good enough.

now( ) We're Talking

With our latest release, TimescaleDB 2.7, we approached things differently, adding an optimization that would allow the evaluation of now() expressions during plan-time chunk exclusion.

Looking at the root of the problem, the reason why now() would not be correct is due to prepared statements. If you execute now() but only use that value in a transaction half an hour later, the value does not reflect the current time—now()—anymore.

However, it will still hold true for certain expressions even as time goes by. For example, time >= now() will be true at this moment, in 5 minutes and 10 hours. So, when optimizing this, we looked for expressions that held as time passed and used those during plan-time exclusion.

The initial implementation of this feature works for intervals of hours, minutes, and seconds (e.g., now() - ‘1 hour’).

As you can see from the EXPLAIN output, chunks are no longer excluded during execution. The exclusion happens earlier, during planning, speeding up the query. Success!

 Custom Scan (ChunkAppend) on metrics1k  (cost=0.00..1.02 rows=1 width=20)
   Chunks excluded during startup: 0
   ->  Seq Scan on _hyper_3_39453_chunk  (cost=0.00..1.02 rows=1 width=20)
         Filter: (("time" > '2022-05-24 12:41:31.266968+02'::timestamp with time zone) AND ("time" > now()))

In the next TimescaleDB version, 2.8, we are removing the initial limitations of the now() optimization, making it also available in intervals of months and years. This means that you will be able to make the most of this improvement in a wider range of situations, as any time > now() - Intervalexpression will be usable during plan-time chunk exclusion.

 Custom Scan (ChunkAppend) on metrics1k  (cost=0.00..1.02 rows=1 width=20)
   Chunks excluded during startup: 0
   ->  Seq Scan on _hyper_3_39453_chunk  (cost=0.00..1.02 rows=1 width=20)
         Filter: ("time" > now())

This code is already committed in our GitHub repo, and will be available shortly.

How Does It Work?

But how did we make this current version happen? The optimization works by rewriting the constraint. For example:

time > now() - INTERVAL ‘5 min’

turns into

(("time" > (now() - '00:05:00'::interval)) AND ("time" > '2022-06-10 09:58:04.224996+02'::timestamp with time zone))

This means that the constified part of the constraint will be used during plan-time chunk exclusion. And, assuming that time only moves forward, the result will still be correct even in the presence of prepared statements, as the original constraint is ANDed with the constified value.

Rewriting the constraint makes the constified value available to plan-time constraint exclusion, leading to massive reductions in planning time, especially in the presence of many chunks.

So we know that this translates into faster queries. But how fast?

Performance Comparison—now( ) That Is Fast!

As shown in our table, the optimization’s performance improvement scales with the total number of chunks in the hypertables. The more data partitions you’re dealing with, the more you’ll notice the speed improvement—up to 401x faster in TimescaleDB 2.7 for a total of 20,000 chunks when compared to the previous version.

now()that is fast. 🔥

The table lists the total execution time of the query (at the beginning of the post) on hypertables with a different number of chunks

now( ) Go Try It

There are few things more satisfying for a developer than solving a problem for your users, especially a recurring one. Achieving such performance optimization is just the icing on the cake.

If you want to experience the lightning-fast performance of PostgreSQL now()queries for yourself, TimescaleDB 2.7 is available for Timescale and self-managed TimescaleDB.

If you are a Timescale user, you will be automatically upgraded to TimescaleDB 2.7. No action is required from your side. You can also create a free Timescale account to get a free 30-day trial (no credit card required).
If you are using TimescaleDB in your own instances, check out our docs for instructions on how to upgrade.

Once you’re using TimescaleDB, connect with us! You can find us in our Community Slack and the Timescale Community Forum. We’ll be more than happy to answer any question on query performance improvements, TimescaleDB, PostgreSQL, or other time-series issues.

Improving DISTINCT Query Performance Up to 8,000x on PostgreSQL

Sven Klemm — Thu, 06 May 2021 11:01:13 GMT

PostgreSQL is an amazing database, but it can struggle with certain types of queries, especially as tables approach tens and hundreds of millions of rows (or more). DISTINCT queries are an example of this.

Waiting for our DISTINCT queries to return

Why are DISTINCT queries slow on PostgreSQL when they seem to ask an "easy" question? It turns out that PostgreSQL currently lacks the ability to efficiently pull a list of unique values from an ordered index.

🔖

Learning PostgreSQL? Read the basics on DISTINCT.

Even when you have an index that matches the exact order and columns for these "last-point" queries, PostgreSQL is still forced to scan the entire index to find all unique values. As a table grows (and they grow quickly with time-series data), this operation keeps getting slower.

Other databases, such as MySQL, Oracle, and DB2, implement a feature called "Loose index scan," "Index Skip Scan," or “Skip Scan,” to speed up the performance of queries like this.

When a database has a feature like "Skip Scan," it can incrementally jump from one ordered value to the next without reading all of the rows in between. Without support for this feature, the database engine has to scan the entire ordered index and then deduplicate it at the end—which is a much slower process.

Since 2018, there have been plans to support something similar in PostgreSQL. (Note: We couldn’t use this implementation directly due to some limitations of what is possible within the Postgres extension framework.)

Unfortunately, this patch wasn't included in the CommitFest for PostgreSQL 14, so it won't be included until PostgreSQL 15 at the earliest (i.e., no sooner than Fall 2022, at least 1.5 years from now).

We don’t want our users to have to wait that long.

What is Timescale's SkipScan?

Today, via TimescaleDB 2.2.1, we are releasing TimescaleDB SkipScan, a custom query planner node that makes ordered DISTINCT queries blazing fast in PostgreSQL 🔥.

As you'll see in the benchmarks below, some queries performed more than 8,000x better than before—and many of the SQL queries your applications and analytics tools use could also see dramatic improvements with this new feature.

This feature works in both Timescale hypertables and normal PostgreSQL tables.

This means that with Timescale, not only will your time-series DISTINCT queries be faster, but any other related queries you may have on normal PostgreSQL tables will also be faster.

This is because Timescale is not just a time-series database. It’s a relational database, specifically, a relational database for time series. Developers who use Timescale benefit from a purpose-built time-series database plus a classic relational (Postgres) database, all in one, with full SQL support.

And to be clear, we love PostgreSQL. We employ engineers who contribute to PostgreSQL. We contribute to the ecosystem around PostgreSQL. PostgreSQL is the world’s fastest-growing database, and we are excited to support it alongside thousands of other users and contributors.

We constantly seek to advance the state of the art with databases, and features like SkipScan are only our latest contribution to the industry. SkipScan makes Timescale and PostgreSQL better, more competitive databases overall, especially compared to MySQL, Oracle, DB2, and others.

How to check (and optimize) your query performance in PostgreSQL

If you're new to PostgreSQL and are wondering how to check your query performance in the first place (and optimize it!), we're going to leave two helpful resources here:

This beginner's guide to EXPLAIN ANALYZE by Michael Christofides in one of our Timescale Community Days. And here's a blog post on Explaining EXPLAIN in case you're more of a reader.

And our blog post on using pg_stat_statements to optimize queries.

Optimizing DISTINCT query performance: What about RECURSIVE CTEs?

However, if you're an experienced PostgreSQL user, you might point out that it is already possible to get reasonably fast DISTINCTqueries via RECURSIVE CTEs.

From the PostgreSQL Wiki, using a RECURSIVE CTE can get you good results, but writing these kinds of queries can often feel cumbersome and unintuitive, especially for developers new to PostgreSQL:

WITH RECURSIVE cte AS (
   (SELECT tags_id FROM cpu ORDER BY tags_id, time DESC LIMIT 1)
   UNION ALL
   SELECT (
      SELECT tags_id FROM cpu
      WHERE tags_id > t.tags_id 
      ORDER BY tags_id, time DESC LIMIT 1
   )
   FROM cte t
   WHERE t.tags_id IS NOT NULL
)
SELECT * FROM cte LIMIT 50;

But even if writing a RECURSIVE CTE like this in day-to-day querying felt natural to you, there's a bigger problem. Most application developers, ORMs, and charting tools like Grafana or Tableau will still use the simpler, straight-forward form:

SELECT DISTINCT ON (tags_id) * FROM cpu
WHERE tags_id >=1 
ORDER BY tags_id, time DESC
LIMIT 50;

In PostgreSQL, without a ", such as MySQL, Oracle, and DB2, implement a feature called "Loose index scan," "Index Skip Scan," or “Skip Scan" node, this query will perform the much slower Index Only Scan, causing your applications and graphing tools to feel clunky and slow.

Surely there's a better way, right?

SkipScan Is the Way

SkipScan is an optimization for queries in the form of SELECT DISTINCT ON (column). Conceptually, a SkipScan is a regular IndexScan that “skips” across an index looking for the next value that is greater than the current value:

SkipScan: An index scan that “skips” across an index looking for the next greater value

With SkipScan in Timescale/PostgreSQL, query planning and execution can now utilize a new node (displayed as (SkipScan) in the EXPLAIN output) to quickly return distinct items from a properly ordered index.

Rather than scanning the entire index with an Index Only Scan, SkipScan incrementally searches for each successive item in the ordered index. As it locates one item, the (SkipScan) node quickly restarts the search for the next item. This is a much more efficient way of finding distinct items in an ordered index. (See GitHub for more details.)

Benchmarking TimescaleDB SkipScan vs. a Normal PostgreSQL Index Scan

In every example query, Timescale with SkipScan improved query response times by at least 26x.

✨

If you don't want to go through the entire benchmark, here's a short and sweet piece on SkipScan's performance under load.

But the real surprise is how much of a difference it makes at lower cardinalities with lots of data—it is almost 8,500x faster to retrieve all columns for the most recent reading of each device. That's fast!

In our tests, SkipScan is also consistently faster—by 80x or more—in our 4,000 device benchmarks. (This level of cardinality is typical for many users of Timescale.)

Before we share the full results, here is how our benchmark was set up.

Benchmark setup

To perform our benchmarks, we installed Timescale on a DigitalOcean Droplet using the following specifications. PostgreSQL and Timescale were installed from packages, and we applied the recommended tuning from timescaledb-tune.

8 Intel vCPUs
16 GB of RAM
320 GB NVMe SSD
Ubuntu 20.04 LTS
Postgres 12.6
TimescaleDB 2.2 (The first release with SkipScan. TimescaleDB 2.2.1 primarily adds distributed hypertable support and some bug fixes.)

To demonstrate the performance impact of SkipScan on varying degrees of cardinality, we benchmarked three separate datasets of varying sizes. To generate our datasets, we used the 'cpu-only' use case in the Time Series Benchmark Suite (TSBS), which creates 10 metrics every 10 seconds for each device (identified by the tag_id in our benchmark queries).

Dataset 1	Dataset 2	Dataset 3
100 devices	4000 devices	10,000 devices
4 months of data	4 days of data	36 hours of data
~103,000,000 rows	~103,000,000 rows	~144,000,000 rows

Additional data preparation

Not all device data is up-to-date in real life because devices go offline and internet connections get interrupted. Therefore, to simulate a more realistic scenario (i.e., that some devices had stopped reporting for a period of time), we deleted rows for random devices over each of the following periods.

Dataset 1	Dataset 2	Dataset 3
5 random devices over:	100 random devices over:	250 random devices over:
30 minutes	1 hour	10 minutes
36 hours	12 hours	1 hour
7 days	36 hours	12 hours
1 month	3 days	24 hours

To delete the data, we utilized the tablesample function of Postgres. This SELECT feature allows you to return a random sample of rows from a table based on a percentage of the total rows. In the example below, we randomly sample 10% of the rows ( bernoulli(10) ) and then take the first 10 ( limit 10 ).

DELETE FROM cpu
WHERE tags_id IN 
  (SELECT id FROM tags tablesample bernoulli(10) LIMIT 10)
  AND time >= now() - INTERVAL '30 minutes';

From there, we ran each benchmarking query multiple times to accommodate for caching, with and without SkipScan enabled.

As mentioned earlier, the following two indexes were present on the hypertable for all queries.

"cpu_tags_id_time_idx" btree (tags_id, "time" DESC)
"cpu_time_idx" btree ("time" DESC)

Benchmark results

Here are the results:

TimescaleDB with SkipScan improved the query response by at least 26x, up to 8500x in some cases.

About the Queries Benchmarked

For this test, we benchmarked five types of common queries:

Scenario #1: What was the last reported time of each device in a paged list?

SELECT DISTINCT ON (tags_id) tags_id, time FROM cpu
ORDER BY tags_id, time DESC
LIMIT 10 OFFSET 50;

Scenario #2: What was the time and most recently reported set of values for each device in a paged list?

SELECT DISTINCT ON (tags_id) * FROM cpu
ORDER BY tags_id, time DESC
LIMIT 10 OFFSET 50;

Scenario #3: What is the most recent point for all reporting devices in the last 5 minutes?

SELECT DISTINCT ON (tags_id) * FROM cpu 
WHERE time >= now() - INTERVAL '5 minutes' 
ORDER BY tags_id, time DESC;

Scenario #4: Which devices reported at some time today but not within the last hour?

WITH older AS (
  SELECT DISTINCT ON (tags_id) tags_id FROM cpu 
  WHERE time > now() - INTERVAL '24 hours'
)                                          
SELECT * FROM older o 
WHERE NOT EXISTS (
  SELECT 1 FROM cpu 
  WHERE cpu.tags_id = o.tags_id 
  AND time > now() - INTERVAL '1 hour'
);

Scenario #5: Which devices reported yesterday but not in the last 24 hours?

WITH older AS (
  SELECT DISTINCT ON (tags_id) tags_id FROM cpu 
  WHERE time > now() - INTERVAL '48 hours'
  AND time < now() - INTERVAL '24 hours'
)                                          
SELECT * FROM older o 
WHERE NOT EXISTS (
  SELECT 1 FROM cpu 
  WHERE cpu.tags_id = o.tags_id 
  AND time > now() - INTERVAL '24 hour'
);

How Will Your Application Improve?

But SkipScan isn’t a theoretical improvement reserved for benchmarking blog posts 😉—it has real-world implications, and many applications we use rely on getting this data as fast as possible.

Think about the applications you use (or develop) every day. Do they retrieve paged lists of unique items from database tables to fill dropdown options (or grids of data)?

At a few thousand items, the query latency might not be very noticeable. But, as your data grows and you have millions of rows of data and tens of thousands of distinct items, that dropdown menu might take seconds—or minutes—to populate.

SkipScan can reduce that to tens of milliseconds!

Baby Yoda

Even better, SkipScan also provides a fast, efficient way of answering the question that so many people with time-series data ask every day:

"What was the last time and value recorded for each of my [devices / users / services / crypto and stock investments / etc]?"

As long as there is an index on "device_id" and "time" descending, SkipScan will retrieve the data using a query like this much more efficiently.

SELECT DISTINCT ON (device_id) * FROM cpu 
ORDER BY device_id, time DESC;

With SkipScan, your application and dashboards that rely on these types of queries will now load a whole lot faster 🚀 (see below).

TimescaleDB 2.2 with SkipScan enabled runs in less than 400 ms

TimescaleDB 2.2 without SkipScan enabled runs in 23 seconds

How to Use SkipScan on Timescale

How do you get started? Upgrade to TimescaleDB 2.2.1 and set up your schema and indexing as described below. You should start to see immediate speed improvements in many of your DISTINCT queries.

To ensure that a (SkipScan) node can be chosen for your query plan:

First, the query must use the DISTINCT keyword on a single column. The benchmarking queries above will give you some examples to draw from.

Second, there must be an index that contains the DISTINCT column first, and any other ORDER BY columns. Specifically:

The index needs to be a BTREE index.
The index needs to match the ORDER BY in your query.
The DISTINCT column must either be the first column of the index, or any leading column(s) must be used as constraints in your query.

In practice, this means that if we use the questions from the beginning of this blog post ("retrieve a list of unique IDs in order" and "retrieve the last reading of each ID"), we would need at least one index like this (but if you're using a TimescaleDB hypertable, this likely already exists):

 "cpu_tags_id_time_idx" btree (tags_id, "time" DESC)

With that index in place, you should start to see immediate benefit if your queries look similar to the benchmarking examples below. When SkipScan is chosen for your query, the EXPLAIN ANALYZE output will show one or more Custom Scan (SkipScan) nodes similar to this:

->  Unique
  ->  Merge Append
    Sort Key: _hyper_8_79_chunk.tags_id, _hyper_8_79_chunk."time" DESC
     ->  Custom Scan (SkipScan) on _hyper_8_79_chunk
      ->  Index Only Scan using _hyper_8_79_chunk_cpu_tags_id_time_idx on _hyper_8_79_chunk
          Index Cond: (tags_id > NULL::integer)
     ->  Custom Scan (SkipScan) on _hyper_8_80_chunk
      ->  Index Only Scan using _hyper_8_80_chunk_cpu_tags_id_time_idx on _hyper_8_80_chunk
         Index Cond: (tags_id > NULL::integer)
...

Learn More and Get Started

If you’re new to Timescale, create a free account to get started with a fully managed TimescaleDB instance (100 % free for 30 days).

If you are an existing user:

Timescale: TimescaleDB 2.2.1 is now the default for all new services on Timescale, and any of your existing services will be automatically upgraded during your next maintenance window.
Self-managed TimescaleDB: Here are the upgrade instructions.

Join our Slack Community to share your results, ask questions, get advice, and connect with other developers (I, as well as our co-founders, engineers, and passionate community members, are active on all channels).

You can also visit our GitHub to learn more (and, as always, ⭐️ are appreciated!)

Achieving the Best of Both Worlds: Ensuring Up-To-Date Results With Real-Time Aggregation

Sven Klemm — Thu, 07 May 2020 15:11:33 GMT

Real-time aggregates (released with TimescaleDB 1.7) build on continuous aggregates' ability to increase query speed and optimize storage. Learn what's new, details about how they work, and how to get started.

One constant across all time-series use cases is data: metrics, logs, events, sensor readings; IT and application performance monitoring, SaaS applications, IoT, martech, fintech, and more. Lots (and lots) of data. What’s more, it typically arrives continuously.

This need to handle large volumes of constantly generated data motivated some of our earliest TimescaleDB architectural decisions, such as its use of automated time-based partitioning and local-only indexing to achieve high insert rates. And last year, we added type-specific columnar compression to significantly shrink the overhead involved in storing all of this data (often by 90% or higher – see our technical description and benchmarking results).

And another key capability in TimescaleDB, which is the focus of this post, has been continuous aggregates, which we first introduced in TimescaleDB 1.3. Continuous aggregates allow one to specify a SQL query that continually processes raw data into a so-called materialized table.

Continuous aggregates are somewhat similar to materialized views in databases, but unlike a materialized view (as in PostgreSQL), continuous aggregates do not need to be refreshed manually; the view will be refreshed automatically in the background as new data is added, or old data is modified. Additionally, TimescaleDB does not need to re-calculate all of the data on every refresh. Only new and/or invalidated data will be calculated. And since this re-aggregation is automatic – it executes as a background job at regular intervals – this process doesn’t add any maintenance burden to your database.

This is where most database or streaming systems that offer continuous aggregates or continuous queries give up. We knew we could do better.

Enter Real-Time Aggregation, introduced in TimescaleDB 1.7 (see our release blog).

Quick Background on Continuous Aggregates

The benefit of continuous aggregations are two fold:

Query performance. By executing queries against pre-calculated results, rather than the underlying raw data, continuous aggregates can significantly improve query performance.
Storage savings with downsampling. Continuous aggregates are often combined with data retention policies for better storage management. Raw data can be continually aggregated into a materialized table, and dropped after it reaches a certain age. So the database may only store some fixed period of raw data (say, one week), yet store aggregate data for much longer.

Consider the following example, collecting system metrics around CPU usage and storing it in a CPU metrics hypertable, where each row includes a timestamp, hostname, and 3 metrics around CPU usage (usage_user, usage_system, usage_iowait).

We collect these statistics every second per server.

            time              | hostname |     usage_user     |    usage_system     |    usage_iowait
-------------------------------+----------+--------------------+---------------------+---------------------
2020-05-06 02:32:34.627143+00 | host0    | 0.5378765249290502 |  0.2958572490961302 | 0.10685818344495246
2020-05-06 02:32:34.627143+00 | host1    | 0.3175958910709298 |  0.7874926624954846 | 0.16615243032654803
2020-05-06 02:32:34.627143+00 | host2    | 0.4788377981501064 | 0.18277343256546175 |  0.7183967491020162

So a query that wants to compute the per-hourly histogram of usage consumption over the course of 7 days for 10 servers will process 10 servers * 60 seconds * 60 minutes * 24 hours * 7 days= 6,048,000 rows of data.

On the other hand, if we pre-compute a histogram per hour, then the same query on the continuous aggregate table will only need to process 10 servers * 24 hours * 7 days = 1680 rows of data.

But pre-computed results in the continuous aggregate view will lag behind the latest data, as the materialization only runs at scheduled intervals. So, both to more cheaply handle out-of-order data and to avoid excessive load, there is typically some refresh lag between the raw data and when it’s materialized. In fact, this refresh lag is configurable in TimescaleDB, such that the continuous aggregation engine will not materialize data that’s newer than the refresh lag.

(Slightly more specifically, if we compute aggregations across some time bucket, such as hourly, then each hourly interval has a start time and end time. TimescaleDB will only materialize data when its corresponding aggregation interval’s end time is older than the refresh lag. So, if we are doing hourly rollups with 30 minute refresh lag, then we’d only perform the materialized aggregation from, say, 2:00am - 3:00am after 2:30pm.)

So, on one hand, using a continuous aggregate view has cut down the amount of data we process at query time by 3600x (i.e., from more than 6 million rows to fewer than 2000). But, in this view, we’re often missing the last hour or so of data.

While you could just make the refresh lag smaller and smaller to workaround this problem, it comes at the cost of higher and higher load; unless these aggregates are recomputed on every new insert (expensive!), they’re fundamentally always stale.

Introducing Real-Time Aggregation

With real-time aggregation, when you query a continuous aggregate view, rather than just getting the pre-computed aggregate from the materialized table, the query will transparently combine this pre-computed aggregate with raw data from the hypertable that’s yet to be materialized. And, by combining raw and materialized data in this way, you get accurate and up-to-date results, while still enjoying the speedups that come from pre-computing a large portion of the result.

Let’s return to the example above. Recall that when we created hourly rollups, we set the refresh lag to 30 minutes, so our continuous aggregate view will lag behind by 30-90 minutes.

But, when querying a view that supports real-time aggregation, the single query as before for hourly data across the past week will process and combine the results from two tables:

Materialized table: 10 servers * (22 hours + 24 hours * 6 days) = 1660 rows
Raw data: 10 servers * 60 seconds * 90 minutes = 54,000 rows

So now, with these “back of the envelope” calculations, we’ve processed a total of 55,660 rows, still well below the 6 million from before. Moreover, the last 90 minutes of data are more likely to already be memory resident for even better performance, given the database page caching already happening for recent data.

Real-time aggregates allow you to query your pre-calculated data and newer, not yet materialized "raw" data

The above illustration shows this in practice. The database internally maintains a completion threshold as metadata, which records the point-in-time to which all previous records from the raw table have been materialized. This completion threshold lags behind the refresh lag we discussed earlier, and gets updated by the database engine whenever a background task updates the materialized view.

(In fact, it’s a bit more complicated given TimescaleDB’s ability to handle late data that gets written after some time region has already been materialized, i.e., behind the completion threshold. But we’re going to ignore how TimescaleDB tracks invalidation regions in this post.)

So now when processing our query covering the interval , the database engine will conceptually take a UNION ALL between results from the materialized table starting at now() - interval '7 days' up to the completion threshold, with results from the raw table from the completion threshold up to now().

But rather than just describe this behavior, let’s walk through a concrete example and compare our query times without continuous aggregates, with vanilla continuous aggregates, and with real-time aggregation enabled.

These capabilities were developed by Timescale engineers: Sven Klemm, Matvey Arye, Gayathri Ayyapan, David Kohn, and Josh Lockerman.

Testing Real-Time Aggregation

In the following, I’ve created a TimescaleDB 1.7 instance via Managed Service for TimescaleDB (specially, an “basic-100-compute-optimized” instance with PostgreSQL 12, 4 vCPU, and 100GB SSD storage), and then created the following hypertable:

$ psql postgres://tsdbadmin@tsdb-bb8e760-internal-90d0.a.timescaledb.io:26479/defaultdb?sslmode=require

=> CREATE TABLE cpu (
      time TIMESTAMPTZ,
      hostname TEXT,
      usage_user FLOAT,
      usage_system FLOAT,
      usage_iowait FLOAT
   );

=> SELECT create_hypertable ('cpu', 'time', 
      chunk_time_interval => interval '1d');

I’m now going to load the hypertable with 14 days of synthetic data (which is created with the following INSERT statement):

=> INSERT INTO cpu (
   SELECT time, hostname, random(), random(), random()
      FROM generate_series(NOW() - interval '14d', NOW(), '1s') AS time
      CROSS JOIN LATERAL (
         SELECT 'host' || host_id::text AS hostname 
            FROM generate_series(0,9) AS host_id
      ) h
   );

Okay, so that inserted 12,096,010 rows of synthetic data into our hypertable of the following format, stretching from 2:32am UTC on April 22 to 2:32am UTC on May 6:

=> SELECT * FROM cpu ORDER BY time DESC LIMIT 3;

             time              | hostname |     usage_user     |    usage_system     |    usage_iowait     
-------------------------------+----------+--------------------+---------------------+---------------------
 2020-05-06 02:32:34.627143+00 | host0    | 0.5378765249290502 |  0.2958572490961302 | 0.10685818344495246
 2020-05-06 02:32:34.627143+00 | host1    | 0.3175958910709298 |  0.7874926624954846 | 0.16615243032654803
 2020-05-06 02:32:34.627143+00 | host2    | 0.4788377981501064 | 0.18277343256546175 |  0.7183967491020162


=> SELECT min(time) AS start, max(time) AS end FROM cpu;

-[ RECORD 1 ]------------------------
start | 2020-04-22 02:32:34.627143+00
end   | 2020-05-06 02:32:34.627143+00

Let’s now create a continuous aggregate view on this table with hourly histograms:

=> CREATE VIEW cpu_1h 
   WITH (timescaledb.continuous, 
         timescaledb.refresh_lag = '30m',
         timescaledb.refresh_interval = '30m')
   AS
      SELECT 
         time_bucket('1 hour', time) AS hour,
         hostname, 
         histogram(usage_user, 0.0, 1.0, 5) AS hist_usage_user,
         histogram(usage_system, 0.0, 1.0, 5) AS hist_usage_system,
         histogram(usage_iowait, 0.0, 1.0, 5) AS hist_usage_iowait
      FROM cpu
      GROUP BY hour, hostname;

By default, queries to this view use these real-time aggregation features. If you want to disable real-time aggregation, set materialized_only = true when creating the view or by later ALTERing the view. (See API docs here.)

Now, the job scheduling framework will start to asynchronously process this view, which we can see in our informational view. (You can also manually force the materialization to occur if needed.)

=> SELECT * FROM timescaledb_information.continuous_aggregate_stats;

- [ RECORD 1 ]
view_name              | cpu_1h
completed_threshold    | 2020-05-06 02:00:00+00
invalidation_threshold | 2020-05-06 02:00:00+00
job_id                 | 1000
last_run_started_at    | 2020-05-06 02:34:08.300524+00
last_successful_finish | 2020-05-06 02:34:09.04923+00
last_run_status        | Success
job_status             | Scheduled
last_run_duration      | 00:00:00.748706
next_scheduled_run     | 2020-05-06 03:04:09.04923+00
total_runs             | 17
total_successes        | 17
total_failures         | 0
total_crashes          | 0

From this data, we see that the materialized view includes data up to 2:00am on May 6, while from above we’ve learned that the raw data goes up to 2:32am.

Let’s try our query directly on the raw table, and use an EXPLAIN ANALYZE to both show the database plan, as well as actually execute the query and collect timing information. (Note that in many use cases, one would offset queries from now() - . But to ensure that we use identical datasets in our subsequent analysis, we explicitly select the interval offset from the dataset’s last timestamp.)

=> EXPLAIN (ANALYZE, COSTS OFF)
   SELECT 
      time_bucket('1 hour', time) AS hour,
      hostname, 
      histogram(usage_user, 0.0, 1.0, 5) AS hist_usage_user,
      histogram(usage_system, 0.0, 1.0, 5) AS hist_usage_system,
      histogram(usage_iowait, 0.0, 1.0, 5) AS hist_usage_iowait
   FROM cpu
   WHERE time > '2020-05-06 02:32:34.627143+00'::timestamptz - interval '7 days'
   GROUP BY hour, hostname
   ORDER BY hour DESC;

QUERY PLAN             
----------------------------------------------------------------
 Finalize GroupAggregate (actual time=1859.306..1862.331 rows=1690 loops=1)
   Group Key: (time_bucket('01:00:00'::interval, cpu."time")), cpu.hostname
   ->  Gather Merge (actual time=1841.735..1849.604 rows=1881 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort (actual time=1194.162..1194.222 rows=627 loops=3)
               Sort Key: (time_bucket('01:00:00'::interval, cpu."time")) DESC, cpu.hostname
               Sort Method: quicksort  Memory: 25kB
               Worker 0:  Sort Method: quicksort  Memory: 274kB
               Worker 1:  Sort Method: quicksort  Memory: 274kB
               ->  Partial HashAggregate (actual time=1193.198..1193.594 rows=627 loops=3)
                     Group Key: time_bucket('01:00:00'::interval, cpu."time"), cpu.hostname
                     ->  Parallel Custom Scan (ChunkAppend) on cpu (actual time=9.840..716.952 rows=2016000 loops=3)
                           Chunks excluded during startup: 7
                           ->  Parallel Seq Scan on _hyper_1_14_chunk (actual time=14.751..199.098 rows=864000 loops=1)
                                 Filter: ("time" > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))
                           ->  Parallel Seq Scan on _hyper_1_13_chunk (actual time=14.749..201.100 rows=864000 loops=1)
                                 Filter: ("time" > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))
                           ->  Parallel Seq Scan on _hyper_1_12_chunk (actual time=0.025..182.591 rows=864000 loops=1)
                                 Filter: ("time" > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))
                           ->  Parallel Seq Scan on _hyper_1_11_chunk (actual time=0.031..182.812 rows=864000 loops=1)
                                 Filter: ("time" > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))
                           ->  Parallel Seq Scan on _hyper_1_10_chunk (actual time=0.035..183.918 rows=864000 loops=1)
                                 Filter: ("time" > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))
                           ->  Parallel Seq Scan on _hyper_1_9_chunk (actual time=0.019..184.416 rows=864000 loops=1)
                                 Filter: ("time" > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))
                           ->  Parallel Seq Scan on _hyper_1_8_chunk (actual time=0.823..91.605 rows=386225 loops=2)
                                 Filter: ("time" > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))
                                 Rows Removed by Filter: 45775
                           ->  Parallel Seq Scan on _hyper_1_15_chunk (actual time=0.022..20.277 rows=91550 loops=1)
                                 Filter: ("time" > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))

 Planning Time: 1.917 ms
 Execution Time: 1921.753 ms

Note that TimescaleDB’s constraint exclusion excluded 7 of the chunks from being queried given the WHERE predicate (as the query was for the last 7 days of the 14 day dataset), then processed the query on the remaining 8 chunks (performing a scan over 6,048,000 rows) using two parallel workers. The query in total took just over 1.9 seconds.

Now let’s try the query on our materialized table, first turning off real-time aggregation just for this experiment:

=> ALTER VIEW cpu_1h set (timescaledb.materialized_only = true);

First, let’s look at the table definition, which defines a SELECT on the materialized view with the specified GROUP BYs. But we also see that each of the histograms calls “finalize_agg.” TimescaleDB doesn’t precisely pre-compute and store the exact answer that’s specified in the query, but rather a partial aggregate that is then “finalized” at query time, which will allow for greater parallelization and rebucketing at query time (in a future release).

 \d+ cpu_1h;

                                          View "public.cpu_1h"
      Column       |           Type           | Collation | Nullable | Default | Storage  | Description 
-------------------+--------------------------+-----------+----------+---------+----------+-------------
 hour              | timestamp with time zone |           |          |         | plain    | 
 hostname          | text                     |           |          |         | extended | 
 hist_usage_user   | integer[]                |           |          |         | extended | 
 hist_usage_system | integer[]                |           |          |         | extended | 
 hist_usage_iowait | integer[]                |           |          |         | extended | 

View definition:
 SELECT _materialized_hypertable_2.hour,
    _materialized_hypertable_2.hostname,
    _timescaledb_internal.finalize_agg('histogram(double precision,double precision,double precision,integer)'::text, NULL::name, NULL::name, '{{pg_catalog,float8},{pg_catalog,float8},{pg_catalog,float8},{pg_catalog,int4}}'::name[], _materialized_hypertable_2.agg_3_3, NULL::integer[]) AS hist_usage_user,
    _timescaledb_internal.finalize_agg(...) AS hist_usage_system,
    _timescaledb_internal.finalize_agg(...) AS hist_usage_iowait
   FROM _timescaledb_internal._materialized_hypertable_2
  GROUP BY _materialized_hypertable_2.hour, _materialized_hypertable_2.hostname;

Now let’s run the query with vanilla continuous aggregates enabled:

=> EXPLAIN (ANALYZE, COSTS OFF)
   SELECT * FROM cpu_1h
   WHERE hour > '2020-05-06 02:32:34.627143+00'::timestamptz - interval '7 days'
   ORDER BY hour DESC;

QUERY PLAN
----------------------------------------------------------------
 Sort (actual time=3.218..3.312 rows=1670 loops=1)
   Sort Key: _materialized_hypertable_2.hour DESC
   Sort Method: quicksort  Memory: 492kB
   ->  HashAggregate (actual time=1.943..2.891 rows=1670 loops=1)
         Group Key: _materialized_hypertable_2.hour, _materialized_hypertable_2.hostname
         ->  Custom Scan (ChunkAppend) on _materialized_hypertable_2 (actual time=0.064..0.688 rows=1670 loops=1)
               Chunks excluded during startup: 1
               ->  Seq Scan on _hyper_2_17_chunk (actual time=0.063..0.590 rows=1670 loops=1)
                     Filter: (hour > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))
                     Rows Removed by Filter: 270

 Planning Time: 0.645 ms
 Execution Time: 3.461 ms

Just 4 milliseconds, after a scan of 1,670 rows in the materialized hypertable. And let’s look at the most recent 3 rows returned for a specific host:

=> SELECT hour, hostname, hist_usage_user
    FROM cpu_1h
    WHERE hour > '2020-05-06 02:32:34.627143+00'::timestamptz - interval '7 days'         
       AND hostname = 'host0'
    ORDER BY hour DESC LIMIT 3;

          hour          | hostname |      hist_usage_user      
------------------------+----------+---------------------------
 2020-05-06 01:00:00+00 | host0    | {0,781,676,712,719,712,0}
 2020-05-06 00:00:00+00 | host0    | {0,736,714,776,689,685,0}
 2020-05-05 23:00:00+00 | host0    | {0,714,759,715,692,720,0}

Note that the last record is from the 1:00am - 2:00am hour.

Now let’s re-enable real-time aggregation and try the same query, first showing how the real-time aggregation is defined as a UNION ALL between the materialized and raw data.

=> ALTER VIEW cpu_1h set (timescaledb.materialized_only = false);

=> \d+ cpu_1h;

                                          View "public.cpu_1h"
      Column       |           Type           | Collation | Nullable | Default | Storage  | Description 
-------------------+--------------------------+-----------+----------+---------+----------+-------------
 hour              | timestamp with time zone |           |          |         | plain    | 
 hostname          | text                     |           |          |         | extended | 
 hist_usage_user   | integer[]                |           |          |         | extended | 
 hist_usage_system | integer[]                |           |          |         | extended | 
 hist_usage_iowait | integer[]                |           |          |         | extended | 

View definition:
 SELECT _materialized_hypertable_2.hour,
    _materialized_hypertable_2.hostname,
    _timescaledb_internal.finalize_agg(...) AS hist_usage_user,
    _timescaledb_internal.finalize_agg(...) AS hist_usage_system,
    _timescaledb_internal.finalize_agg(...) AS hist_usage_iowait
   FROM _timescaledb_internal._materialized_hypertable_2
  WHERE _materialized_hypertable_2.hour < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(1)), '-infinity'::timestamp with time zone)
  GROUP BY _materialized_hypertable_2.hour, _materialized_hypertable_2.hostname
UNION ALL
 SELECT time_bucket('01:00:00'::interval, cpu."time") AS hour,
    cpu.hostname,
    histogram(cpu.usage_user, 0.0::double precision, 1.0::double precision, 5) AS hist_usage_user,
    histogram(cpu.usage_system, 0.0::double precision, 1.0::double precision, 5) AS hist_usage_system,
    histogram(cpu.usage_iowait, 0.0::double precision, 1.0::double precision, 5) AS hist_usage_iowait
   FROM cpu
  WHERE cpu."time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(1)), '-infinity'::timestamp with time zone)
  GROUP BY (time_bucket('01:00:00'::interval, cpu."time")), cpu.hostname;


=> EXPLAIN (ANALYZE, COSTS OFF)
   SELECT * FROM cpu_1h
   WHERE hour > '2020-05-06 02:32:34.627143+00'::timestamptz - interval '7 days'
   ORDER BY hour DESC;

QUERY PLAN               
----------------------------------------------------------------
 Sort (actual time=20.871..21.055 rows=1680 loops=1)
   Sort Key: _materialized_hypertable_2.hour DESC
   Sort Method: quicksort  Memory: 495kB
   ->  Append (actual time=1.842..20.536 rows=1680 loops=1)
         ->  HashAggregate (actual time=1.841..2.789 rows=1670 loops=1)
               Group Key: _materialized_hypertable_2.hour, _materialized_hypertable_2.hostname
               ->  Custom Scan (ChunkAppend) on _materialized_hypertable_2 (actual time=0.105..0.580 rows=1670 loops=1)
                     Chunks excluded during startup: 1
                     ->  Index Scan using _hyper_2_17_chunk__materialized_hypertable_2_hour_idx on _hyper_2_17_chunk (actual time=0.104..0.475 rows=1670 loops=1)
                           Index Cond: ((hour < COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(1)), '-infinity'::timestamp with time zone)) AND (hour > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval)))
         ->  HashAggregate (actual time=17.641..17.655 rows=10 loops=1)
               Group Key: time_bucket('01:00:00'::interval, cpu."time"), cpu.hostname
               ->  Custom Scan (ChunkAppend) on cpu (actual time=0.165..12.297 rows=19550 loops=1)
                     Chunks excluded during startup: 14
                     ->  Index Scan using _hyper_1_15_chunk_cpu_time_idx on _hyper_1_15_chunk (actual time=0.163..9.723 rows=19550 loops=1)
                           Index Cond: ("time" >= COALESCE(_timescaledb_internal.to_timestamp(_timescaledb_internal.cagg_watermark(1)), '-infinity'::timestamp with time zone))
                           Filter: (time_bucket('01:00:00'::interval, "time") > ('2020-05-06 02:32:34.627143+00'::timestamp with time zone - '7 days'::interval))

 Planning Time: 3.532 ms
 Execution Time: 22.905 ms

Still very fast at just over 26 milliseconds (scanning 1,670 materialized rows and 19,550 raw rows), and now the results:

=> SELECT hour, hostname, hist_usage_user
   FROM cpu_1h
WHERE hour > '2020-05-06 02:32:34.627143+00'::timestamptz - interval '7 days'
      AND hostname = 'host0'
   ORDER BY hour DESC LIMIT 3;

          hour          | hostname |      hist_usage_user      
------------------------+----------+---------------------------
 2020-05-06 02:00:00+00 | host0    | {0,384,388,385,400,398,0}
 2020-05-06 01:00:00+00 | host0    | {0,781,676,712,719,712,0}
 2020-05-06 00:00:00+00 | host0    | {0,736,714,776,689,685,0}

Unlike when we were processing the materialized table without the real-time aggregation, we have up-to-date data with data from the 2:00 - 3:00am hour. This is because the materialized table didn’t have data from the last hour, while the real-time aggregation was able to compute that result from the raw data at query time. You can also notice that there is less data in the final row (namely, each histogram bucket has about half the counts as the prior rows), as this final row was the aggregation of 32 minutes of raw data, not a full hour.

You can also observe these two stages of real-time aggregation in the above query plan: the materialized hypertable is processed in the first section via Custom Scan (ChunkAppend) on _materialized_hypertable_2, while the underlying raw hypertable is processed in the second section via Custom Scan (ChunkAppend) on cpu, and each processes only before or after the offset specified by the completion threshold (shown with _timescaledb_internal.cagg_watermark(1) in the plan).

So, in summary: a complete, up-to-date aggregate over the data, both at a fraction of the latency of querying the raw data, and avoiding the excessive overhead of schemes that update materalizations through per-row or per-statement triggers.

Query Type	Latency	Freshness
Raw Data	1924 ms	Up-to-date
Continuous Aggregates	4 ms	Lags up to 90 minutes
Real-Time Aggregation	26 ms	Up-to-date

Continuous aggregates and real-time aggregation for the win!

Conclusions

What motivated us to build TimescaleDB is the firm belief that time-series use cases need a best-in-class, flexible time-series database, with advanced capabilities specifically designed for time-series workloads. We developed real-time aggregation for time-series use cases such as devops monitoring, real-time analytics, and IoT, where fast queries over high-volume workloads and accurate, real-time results really matter.

Real-time aggregation joins a number of advanced capabilities in TimescaleDB around data lifecycle management and time-series analytics, including automated data retention, data reordering, native compression, downsampling, and traditional continuous aggregates.

And, there’s still much more to come. Keep an eye out for our much-anticipated TimescaleDB 2.0 release, which introduces horizontal scaling to TimescaleDB for terabyte to petabyte workloads.

Want to check out real-time aggregation?

Ready to dig in? Check out our docs.
Brand new to TimescaleDB? Get started here.

If you have any questions along the way, we’re always available via our community Slack (we’re @mike and @sven , come say hi 👋).

And, if you are interested in keeping up-to-date with future TimescaleDB releases, sign up for our Release Notes. It’s low-traffic, we promise.

Until next time, keep it real!

OrderedAppend: An Optimization for Range Partitioning

Sven Klemm — Wed, 31 Jul 2019 17:00:07 GMT

In our previous post on implementing constraint exclusion, we discussed how TimescaleDB leverages PostgreSQL’s foundation and expands on its capabilities to improve performance. Continuing with the same theme, in this post, we will discuss how we’ve added support for ordered appends, which optimize a wide range of queries, particularly those that are ordered by time.

We’ve seen performance improvements up to 100x for certain queries after applying this feature, so we encourage you to keep reading!

Optimizing Appends for Large Queries

PostgreSQL represents how plans should be executed using “nodes.” Various nodes may appear in an EXPLAIN output, but we want to focus specifically on Append nodes, which essentially combine the results from multiple sources into a single result.

PostgreSQL has two standard Appends that are commonly used that you can find in an EXPLAIN output:

Append: appends results of child nodes to return a unioned result
MergeAppend: merges the output of child nodes by sort key; all child nodes must be sorted by that same sort key; accesses every chunk when used in TimescaleDB

When MergeAppend nodes are used with TimescaleDB, we must access every chunk to determine whether it has keys that we need to merge. However, this is obviously less efficient since it requires us to touch every chunk.

To address this issue, with the release of TimescaleDB 1.2, we introduced OrderedAppend as an optimization for range partitioning. This feature optimizes a large range of queries, particularly those that are ordered by time and contain a LIMIT clause.

This optimization takes advantage of the fact that we know the range of time held in each chunk and can stop accessing chunks once we’ve found enough rows to satisfy the LIMIT clause. As mentioned above, this optimization can improve performance by up to 100x, depending on the query.

With the release of TimescaleDB 1.4, we wanted to extend the cases in which OrderedAppend can be used. This meant making OrderedAppend space-partition aware and removing the LIMIT clause restriction. With these additions, more users can benefit from the performance benefits achieved through leveraging OrderedAppend.

Developing Query Plans With the Optimization

As an optimization for range partitioning, OrderedAppend eliminates sort steps because it is aware of the way data is partitioned.

Since each chunk has a known time range it covers to get sorted output, no global sort step is needed. Only local sort steps have to be completed and then appended in the correct order. If index scans are utilized, which return the output sorted, sorting can be completely avoided.

For a query ordering by the time dimension with a LIMIT clause, you would normally get something like this:

dev=# EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
dev-# SELECT * FROM metrics ORDER BY time LIMIT 1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit (actual rows=1 loops=1)
   Buffers: shared hit=16
   ->  Merge Append (actual rows=1 loops=1)
         Sort Key: metrics."time"
         Buffers: shared hit=16
         ->  Index Scan using metrics_time_idx on metrics (actual rows=0 loops=1)
               Buffers: shared hit=1
         ->  Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_2_chunk_metrics_time_idx on _hyper_1_2_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_3_chunk_metrics_time_idx on _hyper_1_3_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_4_chunk_metrics_time_idx on _hyper_1_4_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_5_chunk_metrics_time_idx on _hyper_1_5_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3

You can see three pages are read from every chunk and an additional page from the parent table which contains no actual rows.

With this optimization enabled, you would get a plan looking like this:

dev=# EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
dev-# SELECT * FROM metrics ORDER BY time LIMIT 1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit (actual rows=1 loops=1)
   Buffers: shared hit=3
   ->  Custom Scan (ChunkAppend) on metrics (actual rows=1 loops=1)
         Order: metrics."time"
         Buffers: shared hit=3
         ->  Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_2_chunk_metrics_time_idx on _hyper_1_2_chunk (never executed)
         ->  Index Scan using _hyper_1_3_chunk_metrics_time_idx on _hyper_1_3_chunk (never executed)
         ->  Index Scan using _hyper_1_4_chunk_metrics_time_idx on _hyper_1_4_chunk (never executed)
         ->  Index Scan using _hyper_1_5_chunk_metrics_time_idx on _hyper_1_5_chunk (never executed)

After the first chunk, the remaining chunks never get executed, and to complete the query, only three pages have to be read. TimescaleDB removes parent tables from plans like this because we know the parent table does not contain any data.

MergeAppend vs. ChunkAppend

The main difference between these two examples is the type of Append node we used. In the first case, a MergeAppend node is used. In the second case, we used a ChunkAppend node (also introduced in 1.4), which is a TimescaleDB custom node that works similarly to the PostgreSQL Append node but contains additional optimizations.

The MergeAppend node implements the global sort and requires locally sorted input which has to be sorted by the same sort key. To produce one tuple, the MergeAppend node has to read one tuple from every chunk to decide which one to return to.

For the very simple example query above, you will see 16 pages read (with MergeAppend) vs. three pages (with ChunkAppend), which is a 5x improvement over the unoptimized case (if we ignore the single page from the parent table) and represents the number of chunks present in that hypertable. So for a hypertable with 100 chunks, there would be 100 times fewer pages to be read to produce the result for the query.

As you can see, you gain the most benefit from OrderedAppend with a LIMIT clause, as older chunks don’t have to be touched if the required results can be satisfied from more recent chunks. This type of query is very common in time-series workloads (e.g., if you want to get the last reading from a sensor). However, even for queries without a LIMIT clause, this feature is beneficial because it eliminates the sorting of data.

Next Steps

If you are interested in using OrderedAppend, make sure you have the latest version of TimescaleDB installed (installation guide). You can also create a free Timescale account (30-day trial, no credit card required) and never worry about upgrading again (we'll do it for you).

If you are brand new to TimescaleDB, get started here. Have questions? Join our Slack channel!

Mind the Gap: Using SQL Functions for Time-Series Analysis

Sven Klemm — Thu, 24 Jan 2019 20:01:11 GMT

SQL functions are reusable routines written in SQL or supported procedural languages that perform operations on input values and return a result, often used to encapsulate logic and simplify complex queries.

With the release of TimescaleDB 1.2 came three new SQL functions for time-series analysis: time_bucket_gapfill, interpolate, and locf. Used together, these SQL functions will enable you to write more efficient and readable SQL queries for time-series analysis.

The efficiency gains were so evident that we have since developed a complete set of hyperfunctions for faster time-series analysis with fewer lines of code. You can find them in the Timescale Toolkit.

In this post, we'll discuss why you'd want to use time buckets, the related gapfilling techniques, and how they’re implemented under the hood. Ultimately, it's the story of how we extended SQL and the PostgreSQL query planner to create a set of highly optimized functions for time-series analysis.

SQL Functions for Time-Series Analysis: Introduction to Time Bucketing

Many common techniques for time-series analysis assume that our temporal observations are aggregated to fixed time intervals. Dashboards and most visualizations of time series rely on this technique to make sense of our raw data, turning the noise into a smoother trend line that is more easily interpretable and analytically tractable.

When writing queries for this type of reporting, you need an efficient way to aggregate raw observations (often noisy and irregular) to fixed time intervals. Examples of such queries might be average temperature per hour or the average CPU utilization per five seconds.

The solution is time bucketing. The time_bucket function has been a core feature of TimescaleDB since the first public beta release. With time bucketing, we can get a clear picture of the important data trends using a concise, declarative SQL query.

SELECT
  time_bucket('1 minute', time) as one_minute_bucket,
  avg(value) as avg_value
FROM observations
GROUP BY one_minute_bucket
ORDER BY one_minute_bucket;

Challenges With Time Bucketing for Time Series

The reality of time-series data engineering is not always so easy.

Consider measurements recorded at irregular sampling intervals, either intentionally, as with measurements recorded in response to external events (e.g., motion sensor). Or perhaps inadvertently due to network problems, out-of-sync clocks, or equipment taken offline for maintenance.

Time bucket: none

We should also consider analyzing multiple measurements recorded at mismatched sampling intervals. For instance, you might collect some of your data every second and some every minute, but still need to analyze both metrics at 15-second intervals.

The time_bucket function will only aggregate your data to a given time bucket if there is data in it. In both the cases of mismatched or irregular sampling, a time bucket interval might come back with missing data (i.e., gaps).

Time bucket: 20 minutes

If your analysis requires data aggregated to contiguous time intervals, the time bucketing with gapfilling solves this problem.

SQL Functions: Time Bucketing With Gapfilling

TimescaleDB community users have access to a set of SQL functions:

time_bucket_gapfill for creating contiguous, ordered time buckets
interpolate to perform linear interpolation between the previous and next value
locf or last observation carried forward to fill in gaps with the previous known value

Gapfilling

The new time_bucket_gapfill function is similar to time_bucket except that it guarantees a contiguous, ordered set of time buckets.

The function requires that you provide a start and finish argument to specify the time range for which you need contiguous buckets. The result set will contain additional rows in place of any gaps, ensuring that the returned rows are in chronological order and contiguous.

Let’s look at the SQL:

SELECT
    time_bucket_gapfill(
        '1 hour', time,
        start => '2019-01-21 9:00', 
        finish => '2019-01-21 17:00') AS hour,
    avg(value) AS avg_val
FROM temperature
GROUP BY hour;

          hour          |         avg_val
------------------------+-------------------------
 2019-01-21 09:00:00+00 |     26.5867799823790905
 2019-01-21 10:00:00+00 |    23.25141648529633607
 2019-01-21 11:00:00+00 |     21.9964633100885991
 2019-01-21 12:00:00+00 |    23.08512263446292656
 2019-01-21 13:00:00+00 |
 2019-01-21 14:00:00+00 |     27.9968220672055895
 2019-01-21 15:00:00+00 |     26.4914455532679670
 2019-01-21 16:00:00+00 |   24.07531628738616732

Note that one of the hours is missing data entirely, and the average value is represented as NULL. Gapfilling gives us a contiguous set of time buckets but no data for those rows. That's where the locf and interpolate functions come into play.

LOCF or last observation carried forward

The “last observation carried forward” technique can be used to impute missing values by assuming the previous known value.

SELECT
    time_bucket_gapfill(
        '1 hour', time,
        start => '2019-01-21 9:00', 
        finish => '2019-01-21 17:00') AS hour,
  -- instead of avg(val)
  locf(avg(val))
FROM temperature
GROUP BY hour
ORDER BY hour

Shown here:

LOCF at 20 minutes

Linear interpolation

Linear interpolation imputes missing values by assuming a line between the previous known value and the next known value.

SELECT
    time_bucket_gapfill(
        '1 hour', time,
        start => '2019-01-21 9:00', 
        finish => '2019-01-21 17:00') AS hour,
  -- instead of avg(val)
  interpolate(avg(val))
FROM temperature
GROUP BY hour
ORDER BY hour

Shown here:

Interpolate at 20 minutes

These techniques are not exclusive; you can combine them as needed in a single time bucketed query:

locf(avg(temperature)), interpolate(max(humidity)), avg(other_val)

Best Practices for Time-Series Analysis With SQL Functions

Whether you choose to use the LOCF, interpolation, or gapfilling SQL functions with nulls depends on your assumptions about the time-series data and your analytical approach.

Use locf if you assume your measurement changes only when you've received new data.
Use interpolation if you assume your continuous measurement would have a smooth, roughly linear trend if sampled at a higher rate.
Use standard aggregate functions (without locf or interpolation) if your data is not continuous on the time axis. Where there is no data, the result is assumed NULL.
If you want to assume scalar values (typically zero) in place of NULLs, you can use PostgreSQL’s coalesce function: COALESCE(avg(val), 0)

If you choose to explicitly ORDER your results, keep in mind that the gapfilling will sort by time in ascending order. Any other explicit ordering may introduce additional sorting steps in the query plan.

Extending SQL for Time-Series Analysis

The new time_bucket_gapfill SQL query is significantly more readable, less error-prone, more flexible regarding grouping, and faster to execute.

How does TimescaleDB achieve this? Under the hood, these are not ordinary functions but specially optimized hooks into the database query planner itself.

The time_bucket_gapfill function inserts a custom scan node and sort node (if needed) into the query plan. This creates ordered, contiguous time buckets even if some of the buckets are missing observations. The locf and interpolate functions are not executed directly but serve as markers so that the gapfilling node can track the previous and next known values.

Query plan visualization resulting from time_bucket_gapfill; courtesy of https://tatiyants.com/pev

The result: a semantically cleaner language for expressing time-series analysis, easier to debug, more performant, and saving the application developer from having to implement any of these tricks on the application side. This is another example of how Timescale is extending PostgresSQL for high-performance, general purpose time-series data management.

Supercharge Your Time-Series Analysis

Time buckets with gapfilling and the related imputation function are available as community features under the TSL license. (For more information on the license, read this blog post.)

If you’re interested in learning more about gapfilling, check out our docs. If you are new to TimescaleDB and ready to get started, follow the installation instructions.

We encourage active TimescaleDB users to join our Slack community and post any questions you may have there. Finally, if you are looking for a modern cloud-native PostgreSQL platform, check out Timescale Cloud.

Interested in learning more? Follow us on Twitter or sign up below to receive more posts like this!