Gapfilling overview
Aggregate data by time interval while filling in gaps of missing data
time_bucket_gapfill works similarly to time_bucket, but adds
gapfilling capabilities. The other functions in this group must be used in the
same query as time_bucket_gapfill. They control how missing values are treated.
Samples
Section titled “Samples”Use time_bucket_gapfill without a gapfilling algorithm
Section titled “Use time_bucket_gapfill without a gapfilling algorithm”Get the daily average metric value. Use time_bucket_gapfill without specifying
a gapfilling algorithm. This leaves the missing values as NULL:
SELECT time_bucket_gapfill('1 day', time) AS day, avg(value) as value FROM metrics WHERE time > '2021-12-31 00:00:00+00'::timestamptz AND time < '2022-01-10 00:00:00-00'::timestamptz GROUP BY day ORDER BY day desc;day | value-----------------------+--------------------2022-01-09 00:00:00+00 |2022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 |2022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 |2022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 |(10 rows)Use time_bucket_gapfill and carry last value forward
Section titled “Use time_bucket_gapfill and carry last value forward”Get the daily average metric value. Use locf to carry the last value forward
if a value is missing. Note that avg is nested inside locf, and not the
other way around.
SELECT time_bucket_gapfill('1 day', time) AS day, locf(avg(value)) as value FROM metrics WHERE time > '2021-12-31 00:00:00+00'::timestamptz AND time < '2022-01-10 00:00:00-00'::timestamptz GROUP BY day ORDER BY day desc;day | value-----------------------+--------------------2022-01-09 00:00:00+00 | 48.612931559931082022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 | 58.2575206347852662022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 | 47.844200014159752022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 |(10 rows)Use time_bucket_gapfill and carry last value forward with prev expression
Section titled “Use time_bucket_gapfill and carry last value forward with prev expression”Get the daily average metric value. Use the optional prev argument to locf to fill gaps at the beginning of the queried time range. Note that the prev expression returns just a value to fill the gap with. This is sufficient since the value is just carried forward and not further processed.
SELECT time_bucket_gapfill('1 day', time) AS day, device_id, locf( avg(value), ( SELECT value FROM metrics m2 WHERE time < '2021-12-31 00:00:00+00'::timestamptz AND m.device_id=m2.device_id ORDER BY time DESC LIMIT 1 ) ) as value FROM metrics m WHERE time > '2021-12-31 00:00:00+00'::timestamptz AND time < '2022-01-10 00:00:00-00'::timestamptz GROUP BY day, device_id ORDER BY day desc;day | value-----------------------+--------------------2022-01-09 00:00:00+00 | 48.612931559931082022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 | 58.2575206347852662022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 | 47.844200014159752022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 | 47.84420001415975(10 rows)Use time_bucket_gapfill and use linear interpolation
Section titled “Use time_bucket_gapfill and use linear interpolation”Get the daily average metric value. Use interpolate to linearly interpolate
the value if it is missing. Note that avg is nested inside interpolate.
SELECT time_bucket_gapfill('1 day', time) AS day, interpolate(avg(value)) as value FROM metrics WHERE time > '2021-12-31 00:00:00+00'::timestamptz AND time < '2022-01-10 00:00:00-00'::timestamptz GROUP BY day ORDER BY day desc;day | value-----------------------+--------------------2022-01-09 00:00:00+00 |2022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 | 56.322894080385882022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 | 45.1895935461800142022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 |(10 rows)Use time_bucket_gapfill and use linear interpolation with prev and next expression
Section titled “Use time_bucket_gapfill and use linear interpolation with prev and next expression”Get the daily average metric value. Use the optional prev and next arguments to interpolate to extrapolate the missing values starting and ending the queried time range. Note that the prev and next expressions each return a tuple with time and value. The time is necessary to compute the missing values correctly.
SELECT time_bucket_gapfill('1 day', time) AS day, device_id, interpolate( avg(value), ( SELECT (time, value) FROM metrics m2 WHERE time < '2021-12-31 00:00:00+00'::timestamptz AND m.device_id=m2.device_id ORDER BY time DESC LIMIT 1 ), ( SELECT (time, value) FROM metrics m2 WHERE time > '2021-12-10 00:00:00-00'::timestamptz AND m.device_id=m2.device_id ORDER BY time ASC LIMIT 1 ) ) as value FROM metrics m WHERE time > '2021-12-31 00:00:00+00'::timestamptz AND time < '2022-01-10 00:00:00-00'::timestamptz GROUP BY day, device_id ORDER BY day desc;day | value-----------------------+--------------------2022-01-09 00:00:00+00 | 48.612931559931082022-01-08 00:00:00+00 | 48.612931559931082022-01-07 00:00:00+00 | 54.3882675259864852022-01-06 00:00:00+00 | 56.322894080385882022-01-05 00:00:00+00 | 58.2575206347852662022-01-04 00:00:00+00 | 46.091724242617652022-01-03 00:00:00+00 | 42.534987078200272022-01-02 00:00:00+00 | 45.1895935461800142022-01-01 00:00:00+00 | 47.844200014159752021-12-31 00:00:00+00 | 47.84420001415975(10 rows)Use time_bucket_gapfill with a timezone argument
Section titled “Use time_bucket_gapfill with a timezone argument”Get the daily average metric value, using Europe/Berlin as the timezone. Note
that daily time buckets now start at 23:00 UTC, which is equivalent to
midnight in Berlin for the selected dates:
SELECT time_bucket_gapfill('1 day', time, 'Europe/Berlin') AS day, interpolate(avg(value)) as value FROM metrics WHERE time > '2021-12-31 00:00:00+00'::timestamptz AND time < '2022-01-10 00:00:00-00'::timestamptz GROUP BY day ORDER BY day desc;day | value-----------------------+--------------------2022-01-09 23:00:00+00 |2022-01-08 23:00:00+00 | 48.650791279137032022-01-07 23:00:00+00 | 47.318477770991542022-01-06 23:00:00+00 | 55.988457403438592022-01-05 23:00:00+00 | 55.616674017771082022-01-04 23:00:00+00 | 58.741155745220122022-01-03 23:00:00+00 | 45.779936359882732022-01-02 23:00:00+00 | 41.786899234532022022-01-01 23:00:00+00 | 24.3243134777439742021-12-31 23:00:00+00 | 48.866803776612612021-12-30 23:00:00+00 |(11 rows)Available functions
Section titled “Available functions”Bucket function
Section titled “Bucket function”time_bucket_gapfill(): bucket rows by time interval while filling gaps in data
Interpolators
Section titled “Interpolators”locf(): fill in missing values by carrying the last observed value forwardinterpolate(): fill in missing values by linear interpolation