Category: All posts
Sep 19, 2025
Everyone eventually asks the same “easy” questions: How many distinct things do I have right now? How many active devices? Which trading pairs saw activity today? What’s the latest reading per sensor? On PostgreSQL at scale, those questions turn from harmless SQL into multi‑second waits because DISTINCT
usually means walking every qualifying row and deduplicating afterward. Even with an ordered index, the engine traverses the full range and pushes keys through a unique step—O(N)
work when N
is tens or hundreds of millions.
We built SkipScan to flip that script. Instead of visiting every row, SkipScan uses the B‑tree’s order to jump from one distinct value to the next: find device = 1
, output it, then restart the index at device > 1
, and so on. The cost becomes O(K × log N)
, where K
is the number of distinct values. When K ≪ N
, you get millisecond answers instead of coffee‑breaks.
SkipScan first landed in TimescaleDB 2.2.0 for rowstores. With 2.20.0, we extended it to columnstore hypertables and to distinct aggregates like COUNT(DISTINCT …)
—leveraging PostgreSQL 16’s ability to feed presorted inputs into ORDER BY/DISTINCT aggregates. With 2.22.0, we support multi-column SkipScan when all column values are guaranteed to be not-null. What follows is the problem it solves, the design that makes columnstore skipping safe and fast, and the exact queries and plans you can reproduce.
Rowstores are bad enough for DISTINCT
at large N
. Columnstores raise the stakes: data are packed in compressed batches (≤1,000 rows) and reading a single tuple can imply decompressing a whole batch. If we naïvely walked every batch to deduplicate, compression would help storage but hurt latency.
To make DISTINCT
fast on columnstore, the executor has to:
That’s the essence of SkipScan on columnstore.
Columnstore chunks maintain an automatic B‑tree on (segmentby, orderby)
(e.g., (device, time DESC)
), plus per‑batch metadata columns like _ts_meta_min_1
and _ts_meta_max_1
for time
. SkipScan hooks into that structure:
device
using the (segmentby, orderby)
index; for “latest per device,” this is typically the batch with the newest time
for that device.DISTINCT ON (device) ... ORDER BY device, time DESC
).device > current_device
to hop to the next device’s first batch._ts_meta_max_1 > now() - '1 hour'
) so entire stretches of batches are skipped without decompression.There’s one important constraint: SkipScan on columnstore applies when the DISTINCT
key is the leading segmentby
column (or the leading columns for multi-key SkipScan). That’s what allows safe jumps.
Note: by leading column in the index we mean the 1st index column used in a query. For an index on (a,b)
, b
is the leading index column in a query like “select distinct a, b from t where a=1”
Under the covers you’ll see a Custom Scan (SkipScan)
node above an index scan and (when needed) a DecompressChunk
. After emitting a distinct value, the executor rewrites its start condition from col >= v
to col > v
and re‑seeks—classic B‑tree hopscotch.
Table
CREATE TABLE metrics (
time timestamptz NOT NULL,
device integer,
value double precision
);
Rowstore index
CREATE INDEX ON metrics(device, time DESC);
Scale: ~23,328,010 rows; device
has 10 distinct values.
Columnstore layout
ALTER TABLE metrics SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'device',
timescaledb.compress_orderby = 'time DESC'
);
SELECT compress_chunk(ch) FROM show_chunks('metrics') ch;
On one representative compressed chunk we see 5,370 batches totaling 5,364,693 compressed rows and an index on (device, _ts_meta_min_1 DESC, _ts_meta_max_1 DESC)
.
Complexity notes: the O(·)
terms are theoretical guides. PostgreSQL uses B‑trees (multi‑way), so constants, caching, and pruning matter. We use them to show scale, not to predict exact milliseconds.
Counting devices (rowstore)
SELECT COUNT(DISTINCT device) FROM metrics;
With SkipScan: 3.430 ms.
Without (vanilla path): 6756.322 ms (~6.8 s).Same 23M‑row table; SkipScan performs roughly 2000× faster by doing ~K
log‑seeks instead of scanning all N
entries.
Latest reading per device (columnstore)
SELECT DISTINCT ON (device) device, time, value
FROM metrics
ORDER BY device, time DESC;
With SkipScan: 4.589 ms by touching the first batch per device.
Without: 3912.069 ms—the engine decompresses and inspects ~5.3 M tuples to deduplicate.
Latest reading per device in the last hour (columnstore + metadata pruning)
SELECT DISTINCT ON (device) device, time, value
FROM metrics
WHERE time > now() - INTERVAL '1 hour'
ORDER BY device, time DESC;
With SkipScan: 2.537 ms.
Without: 9.236 ms.The gap narrows because _ts_meta_max_1
lets the planner discard 5,330/5,370 batches up front; SkipScan still wins by visiting only the first qualifying batch per device.
Thresholded latest reading (columnstore; tuple‑level predicate)
SELECT DISTINCT ON (device) *
FROM metrics
WHERE value > 50
ORDER BY device, time DESC;
With SkipScan: 45.392 ms after scanning ~3,002 decompressed tuples on the first chunk.
Without: 2145.536 ms after touching ~6,303,586 tuples across chunks.
When a predicate can’t be answered by batch metadata, SkipScan still stops early per device once a match is found.
In IoT fleets, “show me the latest per device” powers every status board. On rowstores it’s a recurring spike; on columnstore without SkipScan it can become a wall. With SkipScan, the query returns in a few milliseconds by decompression‑on‑demand.
At exchanges and brokerages, COUNT(DISTINCT symbol)
and “latest quote per pair” show up in every dashboard and alert. With 200M+ rows and hundreds of thousands of pairs, SkipScan changes the economics of distinct—turning once‑expensive questions into cheap, always‑on queries.
In crypto analytics, analysts slice by wallet, contract, or chain. Setting the segmentby
to the entity you dedupe most often (e.g., address
) lets SkipScan jump entity‑to‑entity, keeping answer times stable even under heavy ingest.
Multi-column SkipScan support was added in 2.22.0, with a caveat: it can only be chosen for queries which do not produce NULL distinct values.
That means if a query produces multiple distinct columns and doesn’t allow NULLs to be output as distinct values, then this query can be eligible for multi-column SkipScan.
Examples:
CREATE INDEX ON metrics(region, device, metric_type);
-- All distinct columns have filters which don't allow NULLs: can use SkipScan
SELECT DISTINCT ON (region, device, metric_type) *
FROM metrics
WHERE region IN ('UK','EU','JP') AND device >1 AND metric_type IS NOT NULL
ORDER BY region, device, metric_type, time DESC;
-- Distinct columns are declared NOT NULL: can use SkipScan
CREATE TABLE metrics(region TEXT NOT NULL, device INT NOT NULL, ...);
SELECT DISTINCT ON (region, device) *
FROM metrics
ORDER BY region, device, time DESC;
Multicolumn SkipScan skips over the current key values in a similar way to a single-column SkipScan: it freezes the values of keys preceding the current key, allows any values for keys following the current key, and searches for the first current key value which is greater than the current value.
When no more values are found for the current key, SkipScan moves to the previous key and repeats the process, or is done when there are no more previous keys.
For the above example, suppose we are currently scanning the values (AUS
, 2, >56) for the tuple (region, device, metric_type). We’ve found the next tuple (AUS
, 2, 59) and we search for (AUS
, 2, >59) now.
If no more tuples are found, we switch the search to (AUS
,>2, “any value”), find the next tuple (AUS
,5,3) and search for (AUS
,5,>3), etc.
If no more tuples for AUS
are found we search for (>AUS
, “any value”, “any value”), suppose we find (BR
,4, 15) next, and restart the search for (BR
,4,>15), etc.
The reason why we don’t use SkipScan if NULL values are possible for distinct columns is because of complexities in determining “any value” in the above examples.
Because PostgreSQL sorts NULLs separately from other values we have to check for NULLs and NOT NULLs separately for each key to exhaust “any value” possibilities.
Checking it for 2 keys means doing 4 checks for (NULL, NULL), (NULL, NOT NULL), (NOT NULL, NULL), (NOT NULL, NOT NULL). Checking it for M keys means 2^M complexity.
It is a lot of complexity to address for an edge-case as users usually do not care about NULL devices or regions etc.
To take advantage of multi-column SkipScan it’s enough to add IS NOT NULL check or declare distinct columns NOT NULL if no NULL values are anticipated in those columns. If there are strict conditions on distinct columns like “a>1” or “b IN (1,2,3)” then NULLs are already not allowed for those columns and SkipScan can be applied to those columns.
Design your layout so the column you deduplicate is first:
device, time DESC
).timescaledb.compress_segmentby
to that column (or columns for multicolumn SkipScan) and compress_orderby
to match your query’s sort (e.g., time DESC
). Compress your historical chunks.Verify with EXPLAIN
—look for Custom Scan (SkipScan)
above the index/decompress nodes. If the planner is shy, you can nudge it:
SET enable_seqscan = false; -- prefer indexes
SET timescaledb.skip_scan_run_cost_multiplier = 0; -- bias for SkipScan (2.20+)
To check whether SkipScan was used in a query, and if it was used, on which keys and whether those keys are NOT NULL, the following flag can be set:
SET timescaledb.debug_skip_scan_info TO true; -- (2.22+)
-- NULLs are excluded via ">1"
SELECT DISTINCT ON (device) device, time, value
FROM metrics
WHERE device > 1
ORDER BY device, time DESC;
-- SkipScan key info is output
INFO: SkipScan used on metrics_device_idx(device NOT NULL)
-- NULLs are not excluded
SELECT DISTINCT ON (device) device, time, value
FROM metrics
ORDER BY device, time DESC;
-- SkipScan key info is output
INFO: SkipScan used on metrics_device_idx(device NULLS LAST)
When it doesn’t apply: multiple NULL-allowing distinct keys at once, orders that don’t start with the distinct column, or a columnstore where your distinct key isn’t the leading segmentby
.
-- Ordered index for rowstore
CREATE INDEX ON metrics(device, time DESC);
-- Ordered index for columnstore based on segmentby + orderby settings
ALTER TABLE metrics SET (timescaledb.compress, timescaledb.compress_orderby='time DESC', timescaledb.compress_segmentby='device');
SELECT compress_chunk(ch) FROM show_chunks('metrics') ch;
These queries can use SkipScan:
SELECT DISTINCT ON (device) * FROM metrics;
SELECT DISTINCT ON (device) * FROM metrics ORDER BY device, time DESC;
SELECT DISTINCT ON (device, value) * FROM metrics WHERE value = 10;
SELECT count(DISTINCT device), max(DISTINCT device) FROM metrics;
These queries cannot use SkipScan either because distinct columns not in the index or non-distinct aggregates present or because the index doesn’t match the query order:
SELECT DISTINCT ON (time) * FROM metrics;
SELECT DISTINCT ON (device) * FROM metrics ORDER BY device, time;
SELECT DISTINCT ON (device, value) * FROM metrics;
SELECT count(DISTINCT device), max(device) FROM metrics;
SELECT count(DISTINCT device), count(DISTINCT value) FROM metrics;
Millisecond‑fast deduplication on datasets measured in billions of rows; far fewer tuples decompressed per query; predictable dashboard SLOs under bursty ingest; and no application changes—just the right index or segment layout. Rowstore users get wins back to 2.2.0; columnstore and distinct‑aggregate users get the full experience on 2.20.0+ (with PostgreSQL 16+), and multicolumn SkipScan support for not-null distinct values added in 2.22.0.
Skip less relevant rows. Skip entire batches. Skip the scan.
Sign up for a free trial today to see the benefit for your own queries and workloads.