---
title: "Function Pipelines: Building Functional Programming Into PostgreSQL Using Custom Operators"
published: 2021-10-19T09:33:44.000-04:00
updated: 2025-12-10T10:51:27.000-05:00
excerpt: "Function pipelines allow you to analyze data by composing multiple functions in SQL and express complex logic in PostgreSQL in a cleaner way."
tags: Announcements & Releases, PostgreSQL, Engineering
authors: David Kohn
---

> **TimescaleDB is now Tiger Data.**

**Today, we are announcing _function pipelines_, a new capability that introduces functional programming concepts inside PostgreSQL (and SQL) using custom operators.**

_Function pipelines_ radically improve the developer ergonomics of analyzing data in PostgreSQL and SQL, by applying principles from [functional programming](https://en.wikipedia.org/wiki/Functional_programming) and popular tools like [Python’s Pandas](https://pandas.pydata.org/docs/index.html) and [PromQL](https://prometheus.io/docs/prometheus/latest/querying/basics/).

At Timescale, our mission is to serve developers worldwide and enable them to build exceptional data-driven products that measure everything that matters: software applications, industrial equipment, financial markets, blockchain activity, user actions, consumer behavior, machine learning models, climate change, and more.

[We believe SQL is the best language for data analysis](https://timescale.ghost.io/blog/blog/why-sql-beating-nosql-what-this-means-for-future-of-data-time-series-database-348b777b847a/). We’ve championed its benefits for several years, even when many were abandoning it for custom domain-specific languages. And we were right—SQL has resurged and become the universal language for data analysis, and now many NoSQL databases are adding SQL interfaces to keep up.

But SQL is not perfect and, at times, can get quite unwieldy. For example,

```SQL
SELECT device_id, 
	sum(abs_delta) as volatility
FROM (
	SELECT device_id, 
		abs(val - lag(val) OVER (PARTITION BY device_id ORDER BY ts))
        	as abs_delta 
	FROM measurements
	WHERE ts >= now() - '1 day'::interval) calc_delta
GROUP BY device_id; 
```

_Pop quiz: What does this query do?_

Even if you are a SQL expert, queries like this can be quite difficult to read—and even harder to express. Complex data analysis in SQL can be hard.

## The Advantages of Function Pipelines

_Function pipelines_ let you express that same query like this:

```SQL
SELECT device_id, 
	timevector(ts, val) -> sort() -> delta() -> abs() -> sum() 
    		as volatility
FROM measurements
WHERE ts >= now() - '1 day'::interval
GROUP BY device_id;
```

Now it is much clearer what this query is doing:

-   It gets the last day’s data from the measurements table, grouped by `device_id`.
-   It sorts the data by the time column.
-   It calculates the delta (or change) between values.
-   It takes the absolute value of the delta.
-   And then takes the sum of the results of the previous steps.

**Function pipelines improve your own coding productivity, while also making your SQL code easier for others to comprehend and maintain.**

Inspired by functional programming languages, function pipelines enable you to analyze data by composing multiple functions, leading to a simpler, cleaner way of expressing complex logic in PostgreSQL.

The best part is that we built function pipelines in a way that is fully PostgreSQL compliant—we did not change any SQL syntax—meaning that any tool that speaks PostgreSQL will be able to support data analysis using function pipelines.

## How We Built Function Pipelines

How did we build this? By taking advantage of PostgreSQL's incredible extensibility, particularly custom types, operators, and [custom types](https://www.postgresql.org/docs/current/sql-createtype.html), [custom operators](https://www.postgresql.org/docs/current/sql-createoperator.html), and [functions](https://www.postgresql.org/docs/current/sql-createfunction.html).

In our previous example, you can see the key elements of function pipelines:

-   **Custom data types**: in this case, the `timevector`, which is a set of `(time, value)` pairs.
-   **Custom operator**: `->`, used to _compose_ and _apply_ function pipeline elements to the data that comes in.
-   And finally, **custom functions,** called _pipeline elements._ Pipeline elements can transform and analyze `timevector`s (or other data types) in a function pipeline. For this initial release, we’ve built _60 custom functions!_ **(**[**Full list here**](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/)**)**.

We’ll go into more detail on function pipelines in the rest of this post, but if you just want to get started as soon as possible, **the easiest way to try function pipelines is through a fully managed Timescale Cloud service**. [Try it for free](https://console.cloud.timescale.com/signup) (no credit card required) for 30 days.

Function pipelines are pre-loaded on each new database service on Timescale Cloud, available immediately—so after you’ve created a new service, you’re all set to use them!

If you prefer to manage your own database instances, [you can install the `timescaledb_toolkit`](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/install-toolkit/) into your existing PostgreSQL installation, completely for free.

We’ve been working on this capability for a long time, but in line with our belief of “[move fast but don’t break things](https://timescale.ghost.io/blog/blog/move-fast-but-dont-break-things-introducing-the-experimental-schema-with-new-experimental-features-in-timescaledb-2-4/),” we’re initially releasing function pipelines as an [experimental feature](https://docs.timescale.com/api/latest/api-tag-overview/#experimental-timescaledb-toolkit)—and we would absolutely love to **get your feedback**. You can [open an issue](https://github.com/timescale/timescaledb-toolkit/issues) or join a [discussion thread](https://github.com/timescale/timescaledb-toolkit/discussions) in GitHub (and, if you like what you see, GitHub ⭐ is always welcome and appreciated, too!).

_We’d also like to take this opportunity to give a huge shoutout to_ [_`pgx`, the Rust-based framework for building PostgreSQL extensions_](https://github.com/zombodb/pgx)_—it handles a lot of the heavy lifting for this project. We have over 600 custom types, operators, and functions in the `timescaledb_toolkit` extension at this point; managing this without `pgx` (and the ease of use that comes from working with Rust) would be a real bear of a job._

## Function Pipelines: Why Are They Useful?

It’s October. In the northern hemisphere (where most of Team Timescale, including your author, lives), it is starting to get cold.

Now imagine a restaurant in New York City whose owners care about their customers and their customers’ comfort. And you are working on an IoT product designed to help small businesses like these owners minimize their heating bill while maximizing their customers happiness. So you install two thermometers, one at the front measuring the temperature right by the door, and another at the back of the restaurant.

Now, as many of you may know (if you’ve ever had to sit by the door of a restaurant in the fall or winter), when someone enters, the temperature drops—and once the door is closed, the temperature warms back up. The temperature at the back of the restaurant will vary much less than at the front, right by the door. Both of them will drop slowly down to a lower set point during non-business hours and warm back up sometime before business hours based on the setpoints on our thermostat. So overall we’ll end up with a graph that looks something like this:

![A graph with time on the x axis and temperature on the y axis showing two curves. First a curve labeled back which starts low steadily rises stays relatively constant in the section labeled operating hours and then drops slowly back down. Second a curve labeled front which starts following the other, starts low, rises then it starts getting jumpy, drastically varying while the other stays constant during operating hours, then it falls back down. ](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2021/10/graph--1-.jpg)

A graph of the temperature at the front (near the door) and back. The back is much steadier, while the front is more volatile. Graph is for illustrative purposes only, data is fabricated. No restaurants or restaurant patrons were harmed in the making of this post.

As we can see, the temperature by the front door varies much more than at the back of the restaurant. Another way to say this is the temperature by the front door is more _volatile_. Now, the owners of this restaurant want to measure this because frequent temperature changes means uncomfortable customers.

In order to measure volatility, we could first subtract each point from the point before to calculate a delta. If we add this up directly, large positive and negative deltas will cancel out. But, we only care about the magnitude of the delta, not its sign—so what we really should do is take the absolute value of the delta, and then take the total sum of the previous steps.

We now have a metric that might help us measure customer comfort, and also the efficacy of different weatherproofing methods (for example, adding one of those little vestibules that acts as a windbreak).

To track this, we collect measurements from our thermometers and store them in a table:

```SQL
CREATE TABLE measurements(
	device_id BIGINT,
	ts TIMESTAMPTZ,
	val DOUBLE PRECISION
);
```

The `device_id` identifies the thermostat, `ts` the time of reading and `val` the temperature.

Using the data in our measurements table, let’s look at how we calculate volatility using function pipelines.

_Note: because all of the function pipeline features are still experimental, they exist in the `toolkit_experimental` schema. Before running any of the SQL code in this post you will need to set your `search_path` to include the experimental schema as we do in the example below, we won’t repeat this throughout the post so as not to distract._

```SQL
set search_path to toolkit_experimental, public; --still experimental, so do this to make it easier to read

SELECT device_id, 
	timevector(ts, val) -> sort() -> delta() -> abs() -> sum() 
    	as volatility
FROM measurements
WHERE ts >= now()-'1 day'::interval
GROUP BY device_id;
```

And now we have the same query that we used as our example in the introduction.

In this query, the function pipeline `timevector(ts, val) -> sort() -> delta() -> abs() -> sum()` succinctly expresses the following operations:

1.  Create `timevector`s (more detail on this later) out of the `ts` and `val` columns.
2.  Sort each `timevector` by the time column.
3.  Calculate the delta (or change) between each pair in the `timevector` by subtracting the previous `val` from the current.
4.  Take the absolute value of the delta.
5.  Take the sum of the results from the previous steps.

The `FROM`, `WHERE` and `GROUP BY` clauses do the rest of the work telling us:

1.  We’re getting data _FROM_ the `measurements` table
2.  _WHERE_ the ts, or timestamp column, contains values over the last day
3.  Showing one pipeline output per `device_id` (the GROUP BY column)

As we noted before, if you were to do this same calculation using SQL and PostgreSQL functionality, your query would look like this:

```SQL
SELECT device_id, 
sum(abs_delta) as volatility
FROM (
	SELECT 
		abs(val - lag(val) OVER (PARTITION BY device_id ORDER BY ts) ) 
        	as abs_delta 
	FROM measurements
	WHERE ts >= now() - '1 day'::interval) calc_delta
GROUP BY device_id; 
```

This does the same five steps as the above but is much harder to understand. We have to use a [window function](https://www.postgresql.org/docs/current/functions-window.html) and aggregate the results—but also, because aggregates are performed before window functions, we need to actually execute the window function in a subquery.

As we can see, function pipelines make it significantly easier to comprehend the overall analysis of our data. There’s no need to completely understand what’s going on in these functions just yet, but for now, it’s enough to understand that we’ve essentially implemented a small functional programming language inside of PostgreSQL. You can still use all of the normal, expressive SQL you’ve come to know and love. Function pipelines just add new tools to your SQL toolbox that make it easier to work with time-series data.

Some avid SQL users might find the syntax a bit foreign at first, but for many people who work in other programming languages, especially using tools like [Python’s Pandas Package](https://pandas.pydata.org/docs/index.html), this type of [successive operation on data sets](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pipe.html) will feel natural.

Again, this is still fully PostgreSQL-compliant: we introduce no changes to the parser or anything that should break compatibility with PostgreSQL drivers.

## Building Function Pipelines Without Forking PostgreSQL

We built function pipelines—without modifying the [parser](https://www.postgresql.org/docs/10/parser-stage.html) or anything that would require a fork of PostgreSQL—by taking advantage of three of the many ways that PostgreSQL enables extensibility: [custom types](https://www.postgresql.org/docs/current/sql-createtype.html), [custom functions](https://www.postgresql.org/docs/current/sql-createfunction.html), and [custom operators](https://www.postgresql.org/docs/current/sql-createoperator.html).

-   **Custom data types**, starting with the `timevector`, which is a set of `(time, value)` pairs
-   **A custom operator**: `->`, which is used to _compose_ and _apply_ function pipeline elements to the data that comes in.
-   **Custom functions**, called _pipeline elements_, which can transform and analyze `timevector`s (or other data types) in a function pipeline (with 60 functions in this initial release)

We believe that new idioms like these are exactly what PostgreSQL was _meant to enable_. That’s why it has supported custom types, functions and operators from its earliest days. (And is one of the many reasons why we love PostgreSQL.)

## A Custom Data Type: The `timevector`

A `timevector` is a collection of `(time, value)` pairs. As of now, the times must be `TIMESTAMPTZ`s and the values must be `DOUBLE PRECISION` numbers. (But this may change in the future as we continue to develop this data type. If you have ideas/input, please [file feature requests on GitHub](https://github.com/timescale/timescaledb-toolkit/issues) explaining what you’d like!)

You can think of the `timevector` as something like this:

![ A box with `timevector` on the outside, inside there is a table with one column labeled time containing multiple timestamps and another column labeled value containing floating point numbers.](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2021/10/table1--1-.jpg)

A depiction of a `timevector`.

One of the first questions you might ask is: how does a `timevector` relate to time-series data? (If you want to know more about time-series data, [we have a great blog post on that](https://timescale.ghost.io/blog/time-series-data/),).

Let’s consider our example from above, where we were talking about a restaurant that was measuring temperatures, and we had a `measurements` table like so:

```SQL
CREATE TABLE measurements(
	device_id BIGINT,
	ts TIMESTAMPTZ,
	val DOUBLE PRECISION
);
```

In this example, we can think of a single time-series dataset as all historical and future time and temperature measurements from a device.

Given this definition, we can think of a `timevector` as a **finite subset of a time-series dataset**. The larger time-series dataset may extend back into the past, and it may extend into the future, but the `timevector` is bounded.

![A box labeled time-series with a table in it. The table has one column labeled time and another labeled value like the diagram above. There is a timevector box inside the table with timestamps and values. Each column also has an arrow at the top pointing to the word past and another arrow at the bottom pointing to the word future conveying that the time-series extends into the past and future while the timevector contains a subset of the values. ](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2021/10/table2--1-.jpg)

A `timevector` is a finite subset of a time-series and contains all the `(time, value)` pairs in some region of the time-series.

In order to construct a `timevector` from the data gathered from a thermometer, we use a custom aggregate and pass in the columns we want to become our `(time, value)` pairs. We can use the `WHERE` clause to define the extent of the `timevector` (i.e., the limits of this subset), and the `GROUP BY` clause to provide identifying information about the [time series](https://www.tigerdata.com/blog/time-series-introduction) that are represented.

Building on our example, this is how we construct a `timevector` for each thermometer in our dataset:

```SQL
SELECT device_id, 
	timevector(ts, val)
FROM measurements
WHERE ts >= now() - '1 day'::interval
GROUP BY device_id;
```

But a `timevector` doesn't provide much value by itself. So now, let’s also consider some complex calculations that we can apply to the `timevector`, starting with a custom operator used to apply these functions

## A Custom Operator: `->`

  
In function pipelines, the `->` operator is used to apply and compose multiple functions, in an easy to write and read format.

Fundamentally, `->` means: “apply the operation on the right to the inputs on the left”, or, more simply “do the next thing”.

We created a general-purpose operator for this because we think that too many operators meaning different things can get very confusing and difficult to read.

One thing that you’ll notice about the pipeline elements is that the arguments are in an unusual place in a statement like:

```SQL
SELECT device_id, 
 	timevector(ts, val) -> sort() -> delta() -> abs() -> sum() 
    	as volatility
FROM measurements
WHERE ts >= now() - '1 day'::interval
GROUP BY device_id;
```

It _appears_ (from the semantics) that the `timevector(ts, val)` is an argument to `sort()`, the resulting `timevector` is an argument to `delta()` and so on.  

The thing is that `sort()` (and the others) are regular function calls; they can’t see anything outside of their parentheses and don’t know about anything to their left in the statement; so we need a way to get the `timevector` into the `sort()` (and the rest of the pipeline).

We solved this by taking advantage of one of the fundamental computing insights that functional programming languages use: _code and data are really the same thing_.

Each of our functions returns a special type that describes the function and its arguments. We call these types _pipeline elements_ (more later)_._

The `->` operator then performs one of two different types of actions depending on the types on its right and left sides. It can either:

1.  _Apply_ a pipeline element to the left-hand argument—perform the function described by the pipeline element directly on the incoming data type.
2.  _Compose pipeline elements into a combined element that can be applied at some point in the future (this is an optimization that allows us to apply multiple elements in a “nested” manner so that we don’t perform multiple unnecessary passes)._

The operator determines the action to perform based on its left and right arguments.

Let’s look at our `timevector` from before: `timevector(ts, val) -> sort() -> delta() -> abs() -> sum()`. If you remember from before, I noted that this function pipeline performs the following steps:

1.  Create `timevector`s out of the `ts` and `val` columns.
2.  Sort it by the time column.
3.  Calculate the delta (or change) between each pair in the `timevector` by subtracting the previous `val` from the current.
4.  Take the absolute value of the delta.
5.  Take the sum of the results from the previous steps.

And logically, at each step, we can think of the `timevector` being materialized and passed to the next step in the pipeline.

However, while this will produce a correct result, it’s not the most efficient way to compute this. Instead, it would be more efficient to compute as much as possible in a single pass over the data.

In order to do this, we allow not only the _apply_ operation, but also the _compose_ operation. Once we’ve composed a pipeline into a logically equivalent higher order pipeline with all of the elements we can choose the most efficient way to execute it internally. (Importantly, even if we have to perform each step sequentially, we don’t need to _materialize_ it and pass it between each step in the pipeline so it has significantly less overhead even without other optimization).

## Custom Functions: Pipeline Elements

Now let’s discuss the third, and final, key piece that makes up function pipelines: custom functions, or as we call them, _pipeline elements_.

We have implemented over 60 individual pipeline elements, which fall into 4 categories (with a few subcategories):

### `timevector` transforms

These elements take in a `timevector` and produce a `timevector`. They are the easiest to compose, as they produce the same type.

Example pipeline:

```SQL
SELECT device_id, 
	timevector(ts, val) 
    	-> sort() 
        -> delta() 
        -> map($$ ($value^3 + $value^2 + $value * 2) $$) 
        -> lttb(100) 
FROM measurements
```

Organized by sub-category:

#### Unary mathematical

Simple mathematical functions applied to the value in each point in a `timevector`. ([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/#unary-mathematical-functions))

| Element | Description |
| --- | --- |
| abs() | Computes the absolute value of each value |
| cbrt() | Computes the cube root of each value |
| ceil() | Computes the first integer greater than or equal to each value |
| floor() | Computes the first integer less than or equal to each value |
| ln() | Computes the natural logarithm of each value |
| log10() | Computes the base 10 logarithm of each value |
| round() | Computes the closest integer to each value |
| sign() | Computes +/-1 for each positive/negative value |
| sqrt() | Computes the square root for each value |
| trunc() | Computes only the integer portion of each value |

#### Binary mathematical

Simple mathematical functions with a scalar input applied to the value in each point in a `timevector`. ([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/#binary-mathematical-functions))

| Element | Description |
| --- | --- |
| add(N) | Computes each value plus N |
| div(N) | Computes each value divided by N |
| logn(N) | Computes the logarithm base N of each value |
| mod(N) | Computes the remainder when each number is divided by N |
| mul(N) | Computes each value multiplied by N |
| power(N) | Computes each value taken to the N power |
| sub(N) | Computes each value less N |

#### Compound transforms

Transforms involving multiple points inside of a `timevector`. ([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/#compound-transforms))

| Element | Description |
| --- | --- |
| delta() | Subtracts each value from the previous` |
| fill_to(interval, fill_method) | Fills gaps larger than interval with points at interval from the previous using fill_method |
| lttb(resolution) | Downsamples a timevector using the largest triangle three buckets algorithm at `resolution, requires sorted input. |
| sort() | Sorts the timevector by the time column ascending |

#### Lambda elements

These elements use lambda expressions, which allows the user to write small functions to be evaluated over each point in a `timevector`.  
Lambda expressions can return a `DOUBLE PRECISION` value like `$$ $value^2 + $value + 3 $$`. They can return a `BOOL` like `$$ $time > ‘2020-01-01’t $$` . They can also return a `(time, value)` pair like `$$ ($time + ‘1 day’i, sin($value) * 4)$$`. ([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/#lambda-elements))

You can apply them using the elements below:

| Element | Description |
| --- | --- |
| filter(lambda (bool) ) | Removes points from the timevector where the lambda expression evaluates to false |
| map(lambda (value) ) | Applies the lambda expression to all the values in the timevector |
| map(lambda (time, value) ) | Applies the lambda expression to all the times and values in the timevector |

### `timevector` finalizers

These elements end the `timevector` portion of a pipeline, they can either help with output or produce an aggregate over the entire `timevector`. They are an optimization barrier to composition as they (usually) produce types other than `timevector`.

Example pipelines:

```SQL
SELECT device_id, 
	timevector(ts, val) -> sort() -> delta() -> unnest()
FROM measurements
```

```SQL
SELECT device_id, 
	timevector(ts, val) -> sort() -> delta() -> time_weight()
FROM measurements
```

Finalizer pipeline elements organized by sub-category:

#### `timevector` output

These elements help with output, and can produce a set of `(time, value)` pairs or a Note: this is an area where we’d love further feedback, are there particular data formats that would be especially useful for, say graphing that we can add? [File an issue in our GitHub](https://github.com/timescale/timescaledb-toolkit/issues)!

| Element | Description |
| --- | --- |
| unnest( ) | Produces a set of (time, value) pairs. You can wrap and expand as a composite type to produce separate columns (pipe -> unnest()).* |
| materialize() | Materializes a timevector to pass to an application or other operation directly, blocks any optimizations that would materialize it lazily. |

#### `timevector` aggregates

Aggregate all the points in a `timevector` to produce a single value as a result. ([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/#aggregate-output-elements))

| Element | Description |
| --- | --- |
| average() | Computes the average of the values in the timevector |
| counter_agg() | Computes the counter_agg aggregate over the times and values in the timevector |
| stats_agg() | Computes a range of statistical aggregates and returns a 1DStatsAgg over the values in the timevector |
| sum() | Computes the sum of the values in the timevector |
| num_vals() | Counts the points in the timevector |

## Aggregate Accessors and Mutators

These function pipeline elements act like the accessors that I described in our previous [post on aggregates](https://timescale.ghost.io/blog/blog/how-postgresql-aggregation-works-and-how-it-inspired-our-hyperfunctions-design-2/). You can use them to get a value from the aggregate part of a function pipeline like so:

```SQL
SELECT device_id, 
	timevector(ts, val) -> sort() -> delta() -> stats_agg() -> variance() 
FROM measurements
```

But these don’t _just_ work on `timevector`s - they also work on a normally produced aggregate as well.

When used instead of normal function accessors and mutators they can make the syntax more clear by getting rid of nested functions like:

```SQL
SELECT approx_percentile(0.5, percentile_agg(val)) 
FROM measurements
```

Instead, we can use the arrow accessor to convey the same thing:

```SQL
SELECT percentile_agg(val) -> approx_percentile(0.5) 
FROM measurements
```

By aggregate family:

#### Counter aggregates

[Counter aggregates](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/counter-aggregation/counter-aggs/#counter-aggregates) deal with resetting counters, (and were stabilized in our 1.3 release this week!). Counters are a common type of metric in the application performance monitoring and metrics world. All values have resets accounted for. These elements must have a `CounterSummary` to their left when used in a pipeline, from a `counter_agg()` aggregate or pipeline element. ([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/#counter-aggregates))

| Element | Description |
| --- | --- |
| counter_zero_time() | The time at which the counter value is predicted to have been zero based on the least squares fit of the points input to the CounterSummary(x intercept) |
| corr() | The correlation coefficient of the least squares fit line of the adjusted counter value. |
| delta() | Computes the last - first value of the counter |
| extapolated_delta(method) | Computes the delta extrapolated using the provided method to bounds of range. Bounds must have been provided in the aggregate or a with_bounds call |
| idelta_left() / idelta_right() | Computes the instantaneous difference between the second and first points (left) or last and next-to-last points (right) |
| intercept() | The y-intercept of the least squares fit line of the adjusted counter value. |
| irate_left() / irate_right() | Computes the instantaneous rate of change between the second and first points (left) or last and next-to-last points (right) |
| num_changes() | Number of times the counter changed values. |
| num_elements() | Number of items - any with the exact same time will have been counted only once. |
| num_changes() | Number of times the counter reset. |
| slope() | The slope of the least squares fit line of the adjusted counter value. |
| with_bounds(range) | Applies bounds using the range (a TSTZRANGE) to the CounterSummary if they weren’t provided in the aggregation step |

#### Percentile approximation

These aggregate accessors deal with [percentile approximation](https://timescale.ghost.io/blog/blog/how-percentile-approximation-works-and-why-its-more-useful-than-averages/). For now we’ve only implemented them for `percentile_agg` and `uddsketch` based aggregates. We have not yet implemented them for `tdigest`. ([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/#percentile-approximation))

| Element | Description |
| --- | --- |
| approx_percentile(p) | The approximate value at percentile p |
| approx_percentile_rank(v) | The approximate percentile a value v would fall in |
| error() | The maximum relative error guaranteed by the approximation |
| mean() | The exact average of the input values. |
| num_vals() | The number of input values |

#### Statistical aggregates

These aggregate accessors add support for common statistical aggregates (and were stabilized in our 1.3 release this week!). These allow you to compute and `rollup()` common statistical aggregates like `average`, `stddev` and more advanced ones like `skewness` as well as 2-dimensional aggregates like `slope` and `covariance`. Because there are both 1D and 2D versions of these, the accessors can have multiple forms, for instance, `average()` calculates the average on a 1D aggregate while `average_y()` & `average_x()` do so on each dimension of a 2D aggregate. ([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/function-pipelines/#statistical-aggregates))

| Element | Description |
| --- | --- |
| average() / average_y() / average_x() | The average of the values. |
| corr() | The correlation coefficient of the least squares fit line. |
| covariance(method) | The covariance of the values using either population or sample method. |
| determination_coeff() | The determination coefficient (aka R squared) of the values. |
| kurtosis(method) / kurtosis_y(method) / kurtosis_x(method) | The kurtosis (4th moment) of the values using either population or sample method. |
| intercept() | The intercept of the least squares fit line. |
| num_vals() | The number of (non-null) values seen. |
| sum() / sum_x() / sum_y() | The sum of the values seen. |
| skewness(method) / skewness_y(method) / skewness_x(method) | The skewness (3rd moment) of the values using either population or sample method. |
| slope() | The slope of the least squares fit line. |
| stddev(method) / stddev_y(method) / stddev_x(method) | The standard deviation of the values using either population or sample method. |
| variance(method) / variance_y(method) / variance_x(method) | The variance of the values using either population or sample method. |
| x_intercept() | The x intercept of the least squares fit line. |

#### Time-weighted averages

([More info](https://timescale.ghost.io/blog/blog/what-time-weighted-averages-are-and-why-you-should-care/))  
The `average()` accessor may be called on the output of a `time_weight()` like so:

```SQL
SELECT time_weight('Linear', ts, val) -> average()  FROM measurements;
```

#### Approximate count distinct (Hyperloglog)

This is an approximation for distinct counts that was stabilized in our 1.3 release!([Docs link](https://docs.timescale.com/timescaledb/latest/how-to-guides/hyperfunctions/approx-count-distincts/)) The `distinct_count()` accessor may be called on the output of a `hyperloglog()` like so:

```SQL
SELECT hyperloglog(device_id) -> distinct_count() FROM measurements;
```

## Next Steps

We hope this post helped you understand how function pipelines leverage PostgreSQL extensibility to offer functional programming concepts in a way that is fully PostgreSQL compliant. And how function pipelines can improve the ergonomics of your code, making it easier to write, read, and maintain.

[**You can try function pipelines today**](https://console.cloud.timescale.com/signup) with a fully managed Timescale Cloud service (no credit card required, free for 30 days). Function pipelines are available now on every new database service on Timescale Cloud, so after you’ve created a new service, you’re all set to use them!

If you prefer to manage your own database instances, you can [download and install the timescaledb\_toolkit extension](https://github.com/timescale/timescaledb-toolkit) on GitHub for free, after which you’ll be able to use function pipelines.