---
title: "Time Series Downsampling in SQL: The Complete Guide "
description: "Time series downsampling in SQL: which method to use, how to automate it, and how to build a tiered data-resolution ladder in PostgreSQL."
section: "Tiger Data basics"
---

> **TimescaleDB is now Tiger Data.**

Most teams that downsample time-series data do it in Python. A batch job summarizes rows into a DataFrame, writes aggregates back to a table, and runs on a schedule. This works until it doesn't: the job falls behind, raw data bloats storage, and dashboards show stale curves with missing peaks. Moving downsampling into the database fixes all three problems. It runs continuously, scales with your data, and preserves curve shape through the entire data lifecycle.

This guide covers when to downsample, which method to use, and how to build the full tiered resolution ladder in SQL. Tiger Data builds a database purpose-built for time-series data, so the examples here use our toolset. Alternatives are noted where relevant.

## What Is Time-Series Downsampling?

Time-series downsampling is the process of reducing the temporal resolution of time-series data by replacing multiple raw data points within a time window with a single representative value.

A sensor emitting readings every second produces 86,400 data points per day. Over a year, querying that raw data for dashboard rendering or long-term trend analysis becomes slow and expensive. Downsampling solves this by trading resolution for performance and cost.

Engineers downsample for three reasons:

1. **Storage cost reduction:** fewer rows stored means lower storage bills, especially for high-frequency IoT and infrastructure data
2. **Query performance:** pre-aggregated data returns results orders of magnitude faster than scanning raw rows
3. **Visualization readability:** a chart renderer displays 500 points identically to 50,000; the human eye cannot distinguish them at dashboard scale

Downsampling is not the same as [<u>data compression</u>](https://www.tigerdata.com/learn/what-is-data-compression-and-how-does-it-work). Compression reduces the storage footprint of raw data without altering its resolution. Downsampling reduces resolution itself. Both are complementary strategies for long-term time-series data management.

## When Do You Need to Downsample?

Three signals indicate it is time to downsample:

1. **Query latency is growing** as your hypertable ages past 30-90 days of raw data
2. **Storage costs are scaling linearly** with data volume, and your long-term analysis can tolerate lower granularity
3. **Grafana (or equivalent) renders identically** whether it receives 10,000 or 1,000 data points for a given time range

One situation where downsampling is not appropriate: forensic replay of raw events. Security event logs, compliance audit trails, and medical device records must retain full raw resolution. Downsampling here creates gaps that can invalidate investigations or violate regulatory requirements.

| **Downsample** | **Keep raw** |
| --- | --- |
| IoT sensor telemetry (temperature, humidity, vibration) | Security event logs |
| Infrastructure metrics (CPU, memory, network) | Financial transaction records |
| Energy and utility consumption readings | Medical device event records |
| Dashboards aggregating more than 1 week of data | Data with regulatory retention requirements |

## The Four Downsampling Methods in Tiger Data

Tiger Data provides four approaches to downsampling, each suited to a different context. This section covers all four; the decision framework appears at the end of the article.

1. `time_bucket()` + aggregate: general-purpose, standard SQL
2. `lttb()` hyperfunction: visualization-optimized, shape-preserving
3. `asap_smooth()` hyperfunction: smoothing-optimized, noise-reducing
4. Continuous aggregates: database-level automation of any of the above

### `time_bucket()` Averaging

`time_bucket()` divides a time-series into equal-width intervals. Aggregate functions (AVG, MAX, MIN, SUM) collapse each interval to a single row.

Given a sensor readings table:

`CREATE TABLE sensor_readings (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   INT,
  temperature FLOAT
) WITH (timescaledb.hypertable);`

A 1-hour average downsampled query looks like this:

`SELECT
  time_bucket('1 hour', time) AS hour,
  sensor_id,
  AVG(temperature) AS avg_temp
FROM sensor_readings
GROUP BY 1, 2
ORDER BY 1, 2;`

**Strengths:** Simple, requires no extension beyond TimescaleDB, works with any column type, and is compatible with all aggregate functions including SUM, COUNT, MIN, and MAX.

**Limitation:** Averaging destroys the shape of the time-series. A sensor that spikes to 95°C for 10 seconds will show only a fraction of a degree above baseline in a 1-hour average; the spike is effectively invisible. For anomaly detection or alerting use cases, this matters.

**Best for:** Storage reduction and long-term trend queries where the general trajectory matters more than the exact curve shape.

Learn more about `time_bucket` in the [<u>Tiger Data docs</u>](https://www.tigerdata.com/docs/learn/data-lifecycle/time-buckets/about-time-buckets).

### `lttb()`: Largest Triangle Three Buckets

LTTB selects a fixed number of representative points from a larger dataset such that the resulting line chart is visually indistinguishable from the original.

The [<u>algorithm</u>](https://github.com/pingec/downsample-lttb) was developed by Sveinn Steinarsson and validated by the [<u>tsdownsample benchmark study</u>](https://www.sciencedirect.com/science/article/pii/S2352711025000123) published in ScienceDirect in January 2025, which benchmarks leading downsampling algorithms for visualization. Tiger Data's `lttb()` hyperfunction is the SQL-native implementation.

`SELECT
  time,
  value
FROM unnest(
  (SELECT lttb(time, temperature, 200)
   FROM sensor_readings
   WHERE time > NOW() - INTERVAL '7 days')
);`

This returns 200 representative (time, value) pairs drawn from the original dataset. The output must be unnested before use in a `SELECT` list.

**Strengths:** Preserves curve shape including peaks, valleys, and inflection points. Output is visually lossless for dashboard rendering.

**Limitation:** `lttb()` selects original data points rather than computing new aggregate values. It cannot be used for SUM or running totals. It is primarily a read-time function for visualization, not a write-time storage reduction tool.

**Best for:** Grafana panels and chart rendering where visual fidelity is required, and when you want to downsample for display without altering stored data.

See the full [`<u>lttb()</u>`<u> API reference</u>](https://www.tigerdata.com/docs/reference/toolkit/downsampling/lttb) for complete syntax.

### `asap_smooth()`: Automated Smoothing and Predictability

`asap_smooth()` returns a smoothed version of the time-series that removes high-frequency noise while preserving the overall trend shape. It automatically selects the degree of smoothing.

`SELECT
  time,
  value
FROM unnest(
  (SELECT asap_smooth(time, temperature, 100)
   FROM sensor_readings
   WHERE time > NOW() - INTERVAL '30 days')
);`

**Contrast with LTTB:** `lttb()` selects original data points from the dataset. `asap_smooth()` computes new smoothed values that were not present in the raw data. Use `lttb()` when you want to show actual data with fewer points. Use `asap_smooth()` when you want to present the trend signal without noise.

**Best for:** Dashboards where high-frequency noise obscures the trend, and executive-level summaries of sensor telemetry where a clean signal matters more than individual data points.

**Limitation:** ASAP output is computed values rather than original data points. The smoothed series is not suitable for anomaly detection or alerting thresholds, since it removes the very spikes those systems need to detect.

See the [`<u>asap_smooth()</u>`<u> API reference</u>](https://www.tigerdata.com/docs/reference/toolkit/downsampling/asap_smooth) for full options.

### Continuous Aggregates: Database-Level Automation

`time_bucket()`, `lttb()`, and `asap_smooth()` all operate at query time. Continuous aggregates operate at write time: they precompute aggregations incrementally as new data arrives, so queries hit precomputed results rather than raw rows.

A continuous aggregate is a materialized view defined with a `time_bucket()` group, refreshed automatically on a schedule (or in real time as data lands). This makes continuous aggregates the production-grade implementation of `time_bucket()` downsampling, not a separate concept but an automated wrapper around it.

Continuous aggregates can also be stacked hierarchically. An hourly CAGG built on top of a 10-minute CAGG is the mechanism behind the tiered resolution ladder described in the next section.

For the full guide on creating and managing continuous aggregates, see [<u>Continuous Aggregates in Tiger Data</u>](https://www.tigerdata.com/docs/learn/continuous-aggregates). For refresh policy syntax, see the [<u>continuous aggregates documentation</u>](https://www.tigerdata.com/docs/build/continuous-aggregates/refresh-policies).

## The Tiered Resolution Ladder: A Complete SQL Pattern

This is the production pattern most practitioners are searching for: raw data at full resolution for 1 day, 10-minute averages for 1 week, hourly averages for 1 month, daily averages for 1 year or longer. All of it is maintained automatically by the database. It is the most-requested pattern in community forums, and no existing guide implements it end-to-end in SQL.

### Step 1: Create the Base Hypertable

`CREATE TABLE sensor_readings (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   INT,
  temperature FLOAT
) WITH (timescaledb.hypertable);`

### Step 2: Create the 10-Minute Continuous Aggregate

`CREATE MATERIALIZED VIEW readings_10min
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('10 minutes', time) AS bucket,
  sensor_id,
  AVG(temperature) AS avg_temp
FROM sensor_readings
GROUP BY 1, 2;

SELECT add_continuous_aggregate_policy('readings_10min',
  start_offset => INTERVAL '1 hour',
  end_offset   => INTERVAL '10 minutes',
  schedule_interval => INTERVAL '10 minutes'
);`

### Step 3: Create the Hourly CAGG on Top of the 10-Minute CAGG

This is the pattern that surprises practitioners: you can aggregate an aggregate. The hourly CAGG reads from `readings_10min`, not from the raw table.

`CREATE MATERIALIZED VIEW readings_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', bucket) AS bucket,
  sensor_id,
  AVG(avg_temp) AS avg_temp
FROM readings_10min
GROUP BY 1, 2;

SELECT add_continuous_aggregate_policy('readings_hourly',
  start_offset => INTERVAL '1 day',
  end_offset   => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);`

### Step 4: Add Retention Policies

Retention policies drop the raw data and lower-resolution CAGGs once they are no longer needed.

`-- Drop raw data older than 1 day
SELECT add_retention_policy('sensor_readings', INTERVAL '1 day');

-- Drop 10-minute rollups older than 1 week
SELECT add_retention_policy('readings_10min', INTERVAL '1 week');

-- Drop hourly rollups older than 1 month
SELECT add_retention_policy('readings_hourly', INTERVAL '1 month');`

**Order of operations matters.** Apply the retention policy on raw data only after confirming the CAGG is fully refreshed. Dropping raw data before the CAGG has processed it creates permanent data loss. Verify CAGG population before enabling raw data retention.

For more on [<u>data retention policies</u>](https://www.tigerdata.com/learn/what-is-data-retention-policy), including how to check policy status and handle edge cases, see the dedicated guide.

The result: this pattern gives you queryable daily averages going back years, while keeping the last day of raw data available for forensic queries. You do not have to choose between raw access and historical coverage.

## Five Common Downsampling Myths

### Myth 1: "Downsampling permanently destroys your data."

Downsampling with continuous aggregates is non-destructive. The CAGG is a separate materialized view. Raw data stays exactly where it is until a retention policy explicitly drops it. You control whether and when raw data is deleted. Adding a CAGG alone changes nothing about the raw table.

### Myth 2: "Just use averaging. It's good enough."

Averaging destroys the shape of the time-series. A sensor that spikes to 95°C for 10 seconds will show only a fraction of a degree above baseline in a 10-minute average; the spike is effectively invisible. Peaks, valleys, and inflection points disappear. For visualization, `lttb()` selects actual data points that preserve curve shape. Use averaging for storage reduction when the trend matters; use `lttb()` when visual accuracy matters.

### Myth 3: "Downsampling is only for Grafana dashboards."

Visualization is one use case. Storage cost reduction and long-term analytics are equally valid and often more impactful. A hypertable growing at 1M rows per day generates roughly 365M rows per year at raw resolution. Daily averages per sensor reduce that to 365 rows per day per sensor, a reduction of several orders of magnitude for high-cardinality device fleets.

### Myth 4: "Downsampling belongs in Python, not the database."

Downsampling in Python requires extracting data from the database, transforming it in a DataFrame, and writing results back. That adds latency, network overhead, and ETL complexity. Doing it in the database, where the data already lives, eliminates the round-trip, runs continuously without a scheduler to maintain, and scales with the database rather than with application servers.

### Myth 5: "You must choose between raw data access and long-term historical coverage."

The tiered resolution ladder gives you both. Raw data is available for the most recent window (for example, 1 day). Lower-resolution aggregates cover longer history. Queries route to the appropriate layer automatically. You get forensic access to recent data and fast long-range analytics over historical data without managing two separate storage systems.

## Decision Framework: Which Method Should You Use?

For [<u>hyperfunctions for time-series analysis</u>](https://www.tigerdata.com/learn/time-series-data-analysis-hyperfunctions) beyond downsampling, see the overview of all available Tiger Data hyperfunctions.

**Choose **`**time_bucket()**`** averaging if:**

- You need storage reduction and trend accuracy matters more than shape preservation
- You are running a one-off query or exploratory analysis rather than building production automation
- Your aggregate function is SUM, COUNT, or MIN/MAX rather than AVG

**Choose **`**lttb()**`** if:**

- You are rendering a Grafana panel or chart and need visual fidelity with fewer data points
- You want to downsample for display without altering stored data
- Visualization latency is the problem (too many points to render smoothly)

**Choose **`**asap_smooth()**`** if:**

- High-frequency noise is obscuring the underlying trend in charts
- Your audience is non-technical and needs a clean signal without manual smoothing configuration

**Choose continuous aggregates if:**

- You need automated, production-grade downsampling that runs without manual queries
- You are building the tiered resolution ladder
- Query performance on historical data is degrading as the hypertable grows
- You want to combine downsampling with a retention policy for full lifecycle management

