TigerData logo
TigerData logo
  • Product

    Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Open source

    TimescaleDB

    Time-series, real-time analytics and events on Postgres

    Search

    Vector and keyword search on Postgres

  • Industry

    Crypto

    Energy Telemetry

    Oil & Gas Operations

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InStart a free trial
TigerData logo

Products

Time-series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Tutorials Changelog Success Stories Time-series Database

Company

Contact Us Careers About Newsroom Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2026 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Back to blog

Copy as HTML

Open in ChatGPT

Open in Claude

Open in v0

Sven Klemm

By Sven Klemm

Matthew Perry

By Matthew Perry

6 min read

Jan 24, 2019

Product & EngineeringPostgreSQL#CTA-ebook

Table of contents

01 SQL Functions for Time-Series Analysis: Introduction to Time Bucketing02 Challenges With Time Bucketing for Time Series03 SQL Functions: Time Bucketing With Gapfilling04 Best Practices for Time-Series Analysis With SQL Functions05 Extending SQL for Time-Series Analysis06 Supercharge Your Time-Series Analysis

Mind the Gap: Using SQL Functions for Time-Series Analysis

Mind the Gap: Using SQL Functions for Time-Series Analysis

Back to blog

Product & Engineering
Sven Klemm

By Sven Klemm

6 min read

Jan 24, 2019

Table of contents

01 SQL Functions for Time-Series Analysis: Introduction to Time Bucketing02 Challenges With Time Bucketing for Time Series03 SQL Functions: Time Bucketing With Gapfilling04 Best Practices for Time-Series Analysis With SQL Functions05 Extending SQL for Time-Series Analysis06 Supercharge Your Time-Series Analysis

Copy as HTML

Open in ChatGPT

Open in Claude

Open in v0

SQL functions are reusable routines written in SQL or supported procedural languages that perform operations on input values and return a result, often used to encapsulate logic and simplify complex queries.

With the release of TimescaleDB 1.2 came three new SQL functions for time-series analysis: time_bucket_gapfill, interpolate, and locf. Used together, these SQL functions will enable you to write more efficient and readable SQL queries for time-series analysis.

The efficiency gains were so evident that we have since developed a complete set of hyperfunctions for faster time-series analysis with fewer lines of code. You can find them in the Timescale Toolkit.

In this post, we'll discuss why you'd want to use time buckets, the related gapfilling techniques, and how they’re implemented under the hood. Ultimately, it's the story of how we extended SQL and the PostgreSQL query planner to create a set of highly optimized functions for time-series analysis.

SQL Functions for Time-Series Analysis: Introduction to Time Bucketing

Many common techniques for time-series analysis assume that our temporal observations are aggregated to fixed time intervals. Dashboards and most visualizations of time series rely on this technique to make sense of our raw data, turning the noise into a smoother trend line that is more easily interpretable and analytically tractable.

image

When writing queries for this type of reporting, you need an efficient way to aggregate raw observations (often noisy and irregular) to fixed time intervals. Examples of such queries might be average temperature per hour or the average CPU utilization per five seconds.

The solution is time bucketing. The time_bucket function has been a core feature of TimescaleDB since the first public beta release. With time bucketing, we can get a clear picture of the important data trends using a concise, declarative SQL query.

SELECT
  time_bucket('1 minute', time) as one_minute_bucket,
  avg(value) as avg_value
FROM observations
GROUP BY one_minute_bucket
ORDER BY one_minute_bucket;

Challenges With Time Bucketing for Time Series

The reality of time-series data engineering is not always so easy.

Consider measurements recorded at irregular sampling intervals, either intentionally, as with measurements recorded in response to external events (e.g., motion sensor). Or perhaps inadvertently due to network problems, out-of-sync clocks, or equipment taken offline for maintenance.

image
Time bucket: none

We should also consider analyzing multiple measurements recorded at mismatched sampling intervals. For instance, you might collect some of your data every second and some every minute, but still need to analyze both metrics at 15-second intervals.

The time_bucket function will only aggregate your data to a given time bucket if there is data in it. In both the cases of mismatched or irregular sampling, a time bucket interval might come back with missing data (i.e., gaps).

image
Time bucket: 20 minutes

If your analysis requires data aggregated to contiguous time intervals, the time bucketing with gapfilling solves this problem.

SQL Functions: Time Bucketing With Gapfilling

TimescaleDB community users have access to a set of SQL functions:

  • time_bucket_gapfill for creating contiguous, ordered time buckets
  • interpolate to perform linear interpolation between the previous and next value
  • locf or last observation carried forward to fill in gaps with the previous known value

Gapfilling

The new time_bucket_gapfill function is similar to time_bucket except that it guarantees a contiguous, ordered set of time buckets.

The function requires that you provide a start and finish argument to specify the time range for which you need contiguous buckets. The result set will contain additional rows in place of any gaps, ensuring that the returned rows are in chronological order and contiguous.

Let’s look at the SQL:

SELECT
    time_bucket_gapfill(
        '1 hour', time,
        start => '2019-01-21 9:00', 
        finish => '2019-01-21 17:00') AS hour,
    avg(value) AS avg_val
FROM temperature
GROUP BY hour;

          hour          |         avg_val
------------------------+-------------------------
 2019-01-21 09:00:00+00 |     26.5867799823790905
 2019-01-21 10:00:00+00 |    23.25141648529633607
 2019-01-21 11:00:00+00 |     21.9964633100885991
 2019-01-21 12:00:00+00 |    23.08512263446292656
 2019-01-21 13:00:00+00 |
 2019-01-21 14:00:00+00 |     27.9968220672055895
 2019-01-21 15:00:00+00 |     26.4914455532679670
 2019-01-21 16:00:00+00 |   24.07531628738616732

Note that one of the hours is missing data entirely, and the average value is represented as NULL. Gapfilling gives us a contiguous set of time buckets but no data for those rows. That's where the locf and interpolate functions come into play.

LOCF or last observation carried forward

The “last observation carried forward” technique can be used to impute missing values by assuming the previous known value.

SELECT
    time_bucket_gapfill(
        '1 hour', time,
        start => '2019-01-21 9:00', 
        finish => '2019-01-21 17:00') AS hour,
  -- instead of avg(val)
  locf(avg(val))
FROM temperature
GROUP BY hour
ORDER BY hour

Shown here:

image
LOCF at 20 minutes

Linear interpolation

Linear interpolation imputes missing values by assuming a line between the previous known value and the next known value.

SELECT
    time_bucket_gapfill(
        '1 hour', time,
        start => '2019-01-21 9:00', 
        finish => '2019-01-21 17:00') AS hour,
  -- instead of avg(val)
  interpolate(avg(val))
FROM temperature
GROUP BY hour
ORDER BY hour

Shown here:

image
Interpolate at 20 minutes

These techniques are not exclusive; you can combine them as needed in a single time bucketed query:

locf(avg(temperature)), interpolate(max(humidity)), avg(other_val)

Best Practices for Time-Series Analysis With SQL Functions

Whether you choose to use the LOCF, interpolation, or gapfilling SQL functions with nulls depends on your assumptions about the time-series data and your analytical approach.

  • Use locf if you assume your measurement changes only when you've received new data.
  • Use interpolation if you assume your continuous measurement would have a smooth, roughly linear trend if sampled at a higher rate.
  • Use standard aggregate functions (without locf or interpolation) if your data is not continuous on the time axis. Where there is no data, the result is assumed NULL.
  • If you want to assume scalar values (typically zero) in place of NULLs, you can use PostgreSQL’s coalesce function: COALESCE(avg(val), 0)

If you choose to explicitly ORDER your results, keep in mind that the gapfilling will sort by time in ascending order. Any other explicit ordering may introduce additional sorting steps in the query plan.

Extending SQL for Time-Series Analysis

The new time_bucket_gapfill SQL query is significantly more readable, less error-prone, more flexible regarding grouping, and faster to execute.

How does TimescaleDB achieve this? Under the hood, these are not ordinary functions but specially optimized hooks into the database query planner itself.

The time_bucket_gapfill function inserts a custom scan node and sort node (if needed) into the query plan. This creates ordered, contiguous time buckets even if some of the buckets are missing observations. The locf and interpolate functions are not executed directly but serve as markers so that the gapfilling node can track the previous and next known values.

image
Query plan visualization resulting from time_bucket_gapfill; courtesy of https://tatiyants.com/pev

The result: a semantically cleaner language for expressing time-series analysis, easier to debug, more performant, and saving the application developer from having to implement any of these tricks on the application side. This is another example of how Timescale is extending PostgresSQL for high-performance, general purpose time-series data management.

Supercharge Your Time-Series Analysis

Time buckets with gapfilling and the related imputation function are available as community features under the TSL license. (For more information on the license, read this blog post.)

If you’re interested in learning more about gapfilling, check out our docs. If you are new to TimescaleDB and ready to get started, follow the installation instructions.

We encourage active TimescaleDB users to join our Slack community and post any questions you may have there. Finally, if you are looking for a modern cloud-native PostgreSQL platform, check out Timescale Cloud.


Interested in learning more? Follow us on Twitter or sign up below to receive more posts like this!

Related posts

SkipScan in TimescaleDB: Why DISTINCT Was Slow, How We Built It, and How You Can Use It

SkipScan in TimescaleDB: Why DISTINCT Was Slow, How We Built It, and How You Can Use It

Announcements & ReleasesTimescaleDB

Sep 19, 2025

Learn how TimescaleDB's SkipScan transforms DISTINCT queries from multi-second waits to milliseconds by jumping between values instead of scanning every row.

Read more

Introducing Phased Rollouts for Safer TimescaleDB Upgrades

Introducing Phased Rollouts for Safer TimescaleDB Upgrades

Announcements & ReleasesProduct & Engineering

Sep 11, 2025

Tiger Data introduces Phased Rollouts, a safer upgrade process for all TimescaleDB services on Tiger Cloud providing a 3-week validation period between dev and production deployments.

Read more

Stay updated with new posts and releases.

Receive the latest technical articles and release notes in your inbox.

Share

Start a free trial