---
title: Approximate count distinct overview | Tiger Data Docs
description: Estimate the number of distinct values in a dataset, also known as cardinality estimation
---

For large datasets and datasets with high cardinality (many distinct values), this can be much more efficient in both CPU and memory than an exact count using `count(DISTINCT)`.

The estimation uses the [`hyperloglog++`](https://en.wikipedia.org/wiki/HyperLogLog) algorithm. If you aren’t sure what parameters to set for the `hyperloglog`, try using the [`approx_count_distinct`](/docs/reference/toolkit/approximate-count-distinct/approx_count_distinct/index.md) aggregate, which sets some reasonable default values.

This function group uses the [two-step aggregation](#two-step-aggregation) pattern. In addition to the usual aggregate function, [`hyperloglog`](/docs/reference/toolkit/approximate-count-distinct/hyperloglog/index.md), it also includes the alternate aggregate function [`approx_count_distinct`](/docs/reference/toolkit/approximate-count-distinct/approx_count_distinct/index.md). Both produce a hyperloglog aggregate, which can then be used with the accessor and rollup functions in this group.

## Two-step aggregation

This group of functions uses the two-step aggregation pattern.

Rather than calculating the final result in one step, you first create an intermediate aggregate by using the aggregate function.

Then, use any of the accessors on the intermediate aggregate to calculate a final result. You can also roll up multiple intermediate aggregates with the rollup functions.

The two-step aggregation pattern has several advantages:

1. More efficient because multiple accessors can reuse the same aggregate
2. Easier to reason about performance, because aggregation is separate from final computation
3. Easier to understand when calculations can be rolled up into larger intervals, especially in window functions and continuous aggregates
4. Perform retrospective analysis even when underlying data is dropped, because the intermediate aggregate stores extra information not available in the final result

To learn more, see the [blog post on two-step aggregates](https://www.timescale.com/blog/how-postgresql-aggregation-works-and-how-it-inspired-our-hyperfunctions-design).

## Samples

### Roll up two hyperloglogs

The first hyperloglog buckets the integers from 1 to 100,000, and the second hyperloglog buckets the integers from 50,000 to 150,000. Accounting for overlap, the exact number of distinct values in the combined set is 150,000.

Calling `distinct_count` on the rolled-up hyperloglog yields a final value of 150,552, so the approximation is off by only 0.368%:

```
SELECT distinct_count(rollup(logs))
FROM (
    (SELECT hyperloglog(4096, v::text) logs FROM generate_series(1, 100000) v)
    UNION ALL
    (SELECT hyperloglog(4096, v::text) FROM generate_series(50000, 150000) v)
) hll;
```

Output:

```
 distinct_count
----------------
         150552
```

## Approximate relative errors

These are the approximate errors for each bucket size:

| precision | registers (bucket size) | error  | column size (in bytes) |
| --------- | ----------------------- | ------ | ---------------------- |
| 4         | 16                      | 0.2600 | 12                     |
| 5         | 32                      | 0.1838 | 24                     |
| 6         | 64                      | 0.1300 | 48                     |
| 7         | 128                     | 0.0919 | 96                     |
| 8         | 256                     | 0.0650 | 192                    |
| 9         | 512                     | 0.0460 | 384                    |
| 10        | 1024                    | 0.0325 | 768                    |
| 11        | 2048                    | 0.0230 | 1536                   |
| 12        | 4096                    | 0.0163 | 3072                   |
| 13        | 8192                    | 0.0115 | 6144                   |
| 14        | 16384                   | 0.0081 | 12288                  |
| 15        | 32768                   | 0.0057 | 24576                  |
| 16        | 65536                   | 0.0041 | 49152                  |
| 17        | 131072                  | 0.0029 | 98304                  |
| 18        | 262144                  | 0.0020 | 196608                 |

## Available functions

### Aggregate

- [`hyperloglog()`](/docs/reference/toolkit/approximate-count-distinct/hyperloglog/index.md): aggregate data into a hyperloglog for approximate counting

### Alternate aggregate

- [`approx_count_distinct()`](/docs/reference/toolkit/approximate-count-distinct/approx_count_distinct/index.md): aggregate data into a hyperloglog without specifying the number of buckets

### Accessors

- [`distinct_count()`](/docs/reference/toolkit/approximate-count-distinct/distinct_count/index.md): estimate the number of distinct values from a hyperloglog
- [`stderror()`](/docs/reference/toolkit/approximate-count-distinct/stderror/index.md): estimate the relative standard error of a hyperloglog

### Rollup

- [`rollup()`](/docs/reference/toolkit/approximate-count-distinct/rollup/index.md): combine multiple hyperloglogs
