OrderedAppend: An Optimization for Range Partitioning

Posted by

Sven Klemm

In our previous post on implementing constraint exclusion, we discussed how TimescaleDB leverages PostgreSQL’s foundation and expands on its capabilities to improve performance. Continuing with the same theme, in this post, we will discuss how we’ve added support for ordered appends, which optimize a wide range of queries, particularly those that are ordered by time.

We’ve seen performance improvements up to 100x for certain queries after applying this feature, so we encourage you to keep reading!

Optimizing Appends for Large Queries

PostgreSQL represents how plans should be executed using “nodes.” Various nodes may appear in an EXPLAIN output, but we want to focus specifically on Append nodes, which essentially combine the results from multiple sources into a single result.

PostgreSQL has two standard Appends that are commonly used that you can find in an EXPLAIN output:

Append: appends results of child nodes to return a unioned result
MergeAppend: merges the output of child nodes by sort key; all child nodes must be sorted by that same sort key; accesses every chunk when used in TimescaleDB

When MergeAppend nodes are used with TimescaleDB, we must access every chunk to determine whether it has keys that we need to merge. However, this is obviously less efficient since it requires us to touch every chunk.

To address this issue, with the release of TimescaleDB 1.2, we introduced OrderedAppend as an optimization for range partitioning. This feature optimizes a large range of queries, particularly those that are ordered by time and contain a LIMIT clause.

This optimization takes advantage of the fact that we know the range of time held in each chunk and can stop accessing chunks once we’ve found enough rows to satisfy the LIMIT clause. As mentioned above, this optimization can improve performance by up to 100x, depending on the query.

With the release of TimescaleDB 1.4, we wanted to extend the cases in which OrderedAppend can be used. This meant making OrderedAppend space-partition aware and removing the LIMIT clause restriction. With these additions, more users can benefit from the performance benefits achieved through leveraging OrderedAppend.

Developing Query Plans With the Optimization

As an optimization for range partitioning, OrderedAppend eliminates sort steps because it is aware of the way data is partitioned.

Since each chunk has a known time range it covers to get sorted output, no global sort step is needed. Only local sort steps have to be completed and then appended in the correct order. If index scans are utilized, which return the output sorted, sorting can be completely avoided.

For a query ordering by the time dimension with a LIMIT clause, you would normally get something like this:

dev=# EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
dev-# SELECT * FROM metrics ORDER BY time LIMIT 1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit (actual rows=1 loops=1)
   Buffers: shared hit=16
   ->  Merge Append (actual rows=1 loops=1)
         Sort Key: metrics."time"
         Buffers: shared hit=16
         ->  Index Scan using metrics_time_idx on metrics (actual rows=0 loops=1)
               Buffers: shared hit=1
         ->  Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_2_chunk_metrics_time_idx on _hyper_1_2_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_3_chunk_metrics_time_idx on _hyper_1_3_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_4_chunk_metrics_time_idx on _hyper_1_4_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_5_chunk_metrics_time_idx on _hyper_1_5_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3

You can see three pages are read from every chunk and an additional page from the parent table which contains no actual rows.

With this optimization enabled, you would get a plan looking like this:

dev=# EXPLAIN (ANALYZE,COSTS OFF,BUFFERS,TIMING OFF,SUMMARY OFF)
dev-# SELECT * FROM metrics ORDER BY time LIMIT 1;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Limit (actual rows=1 loops=1)
   Buffers: shared hit=3
   ->  Custom Scan (ChunkAppend) on metrics (actual rows=1 loops=1)
         Order: metrics."time"
         Buffers: shared hit=3
         ->  Index Scan using _hyper_1_1_chunk_metrics_time_idx on _hyper_1_1_chunk (actual rows=1 loops=1)
               Buffers: shared hit=3
         ->  Index Scan using _hyper_1_2_chunk_metrics_time_idx on _hyper_1_2_chunk (never executed)
         ->  Index Scan using _hyper_1_3_chunk_metrics_time_idx on _hyper_1_3_chunk (never executed)
         ->  Index Scan using _hyper_1_4_chunk_metrics_time_idx on _hyper_1_4_chunk (never executed)
         ->  Index Scan using _hyper_1_5_chunk_metrics_time_idx on _hyper_1_5_chunk (never executed)

After the first chunk, the remaining chunks never get executed, and to complete the query, only three pages have to be read. TimescaleDB removes parent tables from plans like this because we know the parent table does not contain any data.

MergeAppend vs. ChunkAppend

The main difference between these two examples is the type of Append node we used. In the first case, a MergeAppend node is used. In the second case, we used a ChunkAppend node (also introduced in 1.4), which is a TimescaleDB custom node that works similarly to the PostgreSQL Append node but contains additional optimizations.

The MergeAppend node implements the global sort and requires locally sorted input which has to be sorted by the same sort key. To produce one tuple, the MergeAppend node has to read one tuple from every chunk to decide which one to return to.

For the very simple example query above, you will see 16 pages read (with MergeAppend) vs. three pages (with ChunkAppend), which is a 5x improvement over the unoptimized case (if we ignore the single page from the parent table) and represents the number of chunks present in that hypertable. So for a hypertable with 100 chunks, there would be 100 times fewer pages to be read to produce the result for the query.

As you can see, you gain the most benefit from OrderedAppend with a LIMIT clause, as older chunks don’t have to be touched if the required results can be satisfied from more recent chunks. This type of query is very common in time-series workloads (e.g., if you want to get the last reading from a sensor). However, even for queries without a LIMIT clause, this feature is beneficial because it eliminates the sorting of data.

Next Steps

If you are interested in using OrderedAppend, make sure you have the latest version of TimescaleDB installed (installation guide). You can also create a free Timescale account (30-day trial, no credit card required) and never worry about upgrading again (we'll do it for you).

If you are brand new to TimescaleDB, get started here. Have questions? Join our Slack channel!

Date updated

Apr 08, 2025

Posted by

Sven Klemm

Date updated

Apr 08, 2025

Posted by

Sven Klemm