---
title: Frequency aggregation overview | Tiger Data Docs
description: Functions for finding the most common values using the SpaceSaving algorithm
---

Get the most common elements of a set and their relative frequency. The estimation uses the [SpaceSaving](https://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf) algorithm.

This group of functions contains two aggregate functions, which let you set the cutoff for keeping track of a value in different ways. [`freq_agg`](/docs/reference/toolkit/frequency-analysis/freq_agg/freq_agg/index.md) allows you to specify a minimum frequency, and [`mcv_agg`](/docs/reference/toolkit/frequency-analysis/freq_agg/mcv_agg/index.md) allows you to specify the target number of values to keep.

To estimate the absolute number of times a value appears, use [`count_min_sketch`](/docs/reference/toolkit/frequency-analysis/count_min_sketch/index.md).

## Two-step aggregation

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

1. More efficient because multiple accessors can reuse the same aggregate
2. Easier to reason about performance, because aggregation is separate from final computation
3. Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
4. Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the [blog post on two-step aggregates](https://www.timescale.com/blog/how-postgresql-aggregation-works-and-how-it-inspired-our-hyperfunctions-design).

## Samples

### Get the 5 most common values from a table

This test uses a table of randomly generated data. The values used are the integer square roots of a random number in the range 0 to 400.

```
CREATE TABLE value_test(value INTEGER);
INSERT INTO value_test SELECT floor(sqrt(random() * 400)) FROM generate_series(1,100000);
```

This returns the 5 most common values seen in the table:

```
SELECT topn(
    toolkit_experimental.freq_agg(0.05, value),
    5)
FROM value_test;
```

The output for this query:

```
 topn
------
   19
   18
   17
   16
   15
```

### Generate a table with frequencies of the most commonly seen values

Return values that represent more than 5% of the input:

```
SELECT value, min_freq, max_freq
FROM into_values(
    (SELECT toolkit_experimental.freq_agg(0.05, value) FROM value_test));
```

The output for this query looks like this, with some variation due to randomness:

```
 value | min_freq | max_freq
-------+----------+----------
    19 |  0.09815 |  0.09815
    18 |  0.09169 |  0.09169
    17 |  0.08804 |  0.08804
    16 |  0.08248 |  0.08248
    15 |  0.07703 |  0.07703
    14 |  0.07157 |  0.07157
    13 |  0.06746 |  0.06746
    12 |  0.06378 |  0.06378
    11 |  0.05565 |  0.05595
    10 |  0.05286 |  0.05289
```

## Advanced use

You can adjust the following advanced options to suit your data distribution:

### Estimated skew

`mcv_agg` assumes that the data is skewed. In other words, some values are more frequent than others. The degree of skew is defined by the `s` parameter of a [zeta distribution](https://en.wikipedia.org/wiki/Zeta_distribution).

The default value of `1.1` works on data with this distribution or a more extreme one:

| N   | Minimum percentage of all values represented by the top N (approximate) |
| --- | ----------------------------------------------------------------------- |
| 5   | 20%                                                                     |
| 10  | 25%                                                                     |
| 20  | 30%                                                                     |
| 50  | 36%                                                                     |
| 100 | 40%                                                                     |

## Available functions

### Aggregates

- [`freq_agg()`](/docs/reference/toolkit/frequency-analysis/freq_agg/freq_agg/index.md): aggregate data into a space-saving aggregate with a minimum frequency cutoff
- [`mcv_agg()`](/docs/reference/toolkit/frequency-analysis/freq_agg/mcv_agg/index.md): aggregate data into a space-saving aggregate with a target number of values

### Accessors

- [`into_values()`](/docs/reference/toolkit/frequency-analysis/freq_agg/into_values/index.md): return the values and their estimated frequencies from a frequency aggregate
- [`max_frequency()`](/docs/reference/toolkit/frequency-analysis/freq_agg/max_frequency/index.md): get the maximum frequency of a value from a frequency aggregate
- [`min_frequency()`](/docs/reference/toolkit/frequency-analysis/freq_agg/min_frequency/index.md): get the minimum frequency of a value from a frequency aggregate
- [`topn()`](/docs/reference/toolkit/frequency-analysis/freq_agg/topn/index.md): get the N most common values from a frequency aggregate

### Rollup

- [`rollup()`](/docs/reference/toolkit/frequency-analysis/freq_agg/rollup/index.md): combine multiple frequency aggregates
