---
title: "Scale PostgreSQL via Partitioning: A Dev’s Intro to  Hypertables"
published: 2024-11-18T09:16:27.000-05:00
updated: 2024-11-22T05:01:24.000-05:00
excerpt: "PostgreSQL partitioning can be faster and more efficient. Get started quickly with hypertables in this step-by-step developer guide."
tags: PostgreSQL, Blog, Hypertables, PostgreSQL Performance
authors: Sarah Conway
---

> **TimescaleDB is now Tiger Data.**

As a PostgreSQL enthusiast, it’s been very interesting to join Timescale and learn about [TimescaleDB](https://github.com/timescale/timescaledb), a PostgreSQL extension with numerous solutions for handling large volumes of data. While [Timescale has excellent documentation](https://docs.timescale.com/getting-started/latest/), having a quick and easy getting started guide is always nice. 

Scaling PostgreSQL via partitioning is a challenge for many PostgreSQL users, but TimescaleDB has a core feature to help solve it. Here’s a straightforward breakdown of _hypertables_, a great feature designed to significantly improve query performance on massive volumes of data. A benchmark vs. vanilla PostgreSQL is also included.

(Looking for more details? Check out [Timescale’s official docs on hypertables](https://docs.timescale.com/use-timescale/latest/hypertables/).)

## Testing Hypertables vs. Regular PostgreSQL Partitioning

Generally speaking, hypertables are designed to make working with time-series data easy by automatically partitioning regular PostgreSQL tables into smaller data partitions or chunks. There are a ton of different examples time-series data, including IoT applications, weather data, financial data analysis, and system monitoring.

Simply put: [time-series data is _any data that has been collected at regular or irregular intervals_](https://timescale.ghost.io/blog/time-series-introduction/) that is now associated with timestamps.

In this blog post, we’ll set up a basic IoT sensor dataset and compare the differences in query run time between regular PostgreSQL tables and hypertables.

For real-time dashboards and alerts, this kind of data would need to be collected frequently; efficient querying is key for optimal performance and timely rendering of the results. This is a perfect example use case for hypertables. Why? They enable you to do the following:

1\. Handle high-frequency inserts efficiently

2\. Optimize queries for time-based data, and

3\. Ensure fast query performance, especially at scale

So, how do you actually use them?

### Let’s create a hypertable

First, let's create a hypertable to store our sensor data:

```SQL
CREATE TABLE iot_sensor_data (
    sensor_id       INTEGER NOT NULL,
    ts              TIMESTAMPTZ NOT NULL,
    temperature     DOUBLE PRECISION,
    humidity        DOUBLE PRECISION,
    battery_level   DOUBLE PRECISION,
    PRIMARY KEY (sensor_id, ts)
);
```

Now, let’s convert that to a hypertable.

```SQL
SELECT create_hypertable('iot_sensor_data', ‘ts’);
```

This creates a hypertable optimized for time-series data that is partitioned on the ts column. The TimescaleDB extension automatically manages this for you.

### Inserting data

Here's how you might insert data into your hypertable:

```SQL
INSERT INTO iot_sensor_data (sensor_id, ts, temperature, humidity, battery_level)
VALUES 
    (1, '2023-01-01 00:00:00+00', 22.5, 60.2, 85.0),
    (1, '2023-01-01 00:01:00+00', 23.1, 59.8, 84.7),
    (1, '2023-01-01 00:02:00+00', 21.9, 61.0, 84.3),
    (1, '2023-01-01 00:03:00+00', 22.8, 60.5, 84.1),
    (1, '2023-01-01 00:04:00+00', 22.0, 61.3, 83.9);
```

In practice, you'd have a script or application inserting data at regular intervals (such as every minute or as data arrives).

### Hypertables automatically partition by time (and by space)

One of my favorite features of hypertables is that they automatically partition time-series data for you without a lot of extra work on your part. This automatic partitioning means you can work with hypertables as if they were _just_ normal PostgreSQL tables. Behind the scenes, they do the heavy lifting of partitioning your PostgreSQL tables and speeding up your queries and data ingestion rate while keeping up with your data as it scales. 

As part of this functionality, when you create a hypertable, [indexes are automatically created](https://www.timescale.com/learn/explaining-postgresql-explain) along with it. By default, you’ll notice a descending index on time and an index on the space parameter and time (if there’s a space partition created). Let’s check it out.

List the associated indexes:

```SQL
SELECT 
    indexname AS index_name,
    indexdef AS index_definition
FROM 
    pg_indexes
WHERE 
    tablename = 'iot_sensor_data';

```

And you should see the following results:

```SQL
[
  {
    "index_name": "iot_sensor_data_pkey",
    "index_definition": "CREATE UNIQUE INDEX iot_sensor_data_pkey ON public.iot_sensor_data USING btree (sensor_id, ts)"
  },
  {
    "index_name": "iot_sensor_data_ts_idx",
    "index_definition": "CREATE INDEX iot_sensor_data_ts_idx ON public.iot_sensor_data USING btree (ts DESC)"
  }
]
```

It is possible, of course, to disable the default behavior of creating indexes automatically; just set `create_default_indexes`, to false when creating the hypertable, like so:

```SQL
SELECT create_hypertable('iot_sensor_data', 'ts', create_default_indexes => false);
```

Additionally, take note that if you’d like to [configure unique indexes](https://docs.timescale.com/use-timescale/latest/hypertables/hypertables-and-unique-indexes/), you must reference all the partitioning columns for the table. This is a requirement in order to ensure unique constraints apply across partitions based on the time column.

For example, to create a unique index on both `sensor_id` and `ts`, you would run:

```SQL
CREATE UNIQUE INDEX ON iot_sensor_data (sensor_id, ts);
```

## Performance Benefits: Hypertables vs. Regular PostgreSQL Tables

Here's a simple benchmark you can run to compare query performance:

Generate a time series for each minute for a month, and associate each timestamp with 100 devices. This will insert approximately 4,320,100 rows (100 sensors \* 60 minutes/hour \* 24 hours/day \* 31 days).

```SQL
INSERT INTO iot_sensor_data (sensor_id, ts, temperature, humidity, battery_level)
SELECT 
    sensor_id,
    ts,
    20 + random() * 10,          -- Temperature between 20 and 30°C
    50 + random() * 10,          -- Humidity between 50% and 60%
    20 + random() * 20           -- Battery level between 20% and 100%
FROM generate_series(1, 100) AS sensor_id,  -- Generate 100 distinct sensor IDs
     generate_series(
        '2024-09-01 00:00:00+00'::timestamptz,
        '2024-10-01 00:00:00+00', 
        '1 minute'::interval
     ) AS ts;
```

Now let’s create a regular table with the same structure as our hypertable:

```SQL
CREATE TABLE regular_iot_sensor_data (
    sensor_id       INTEGER NOT NULL,
    ts              TIMESTAMPTZ NOT NULL,
    temperature     DOUBLE PRECISION,
    humidity        DOUBLE PRECISION,
    battery_level   DOUBLE PRECISION,
    PRIMARY KEY (sensor_id, ts)
);
```

Let’s reproduce the same additional index that exists on our hypertable on this regular table:

```SQL
CREATE INDEX regular_iot_sensor_data_ts_idx 
ON public.regular_iot_sensor_data USING btree (ts DESC);
```

Copy over the records we just inserted into the hypertable into `regular_iot_sensor_data`:

```SQL
INSERT INTO regular_iot_sensor_data (sensor_id, ts, temperature, humidity, battery_level)
SELECT * FROM iot_sensor_data;
```

Then, check the results of `EXPLAIN ANALYZE` against each table, starting with the hypertable: 

```SQL
EXPLAIN ANALYZE
SELECT AVG(temperature)
FROM iot_sensor_data
WHERE ts BETWEEN '2024-09-01' AND '2024-09-05';
```

Which returns:

```SQL
Finalize Aggregate  (cost=10112.93..10112.94 rows=1 width=8) (actual time=50.737..58.663 rows=1 loops=1)
  ->  Gather  (cost=1112.49..10112.92 rows=4 width=32) (actual time=50.650..58.650 rows=4 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Parallel Append  (cost=112.49..9112.52 rows=2 width=32) (actual time=31.895..45.882 rows=1 loops=3)
              ->  Partial Aggregate  (cost=112.49..112.50 rows=1 width=32) (actual time=1.865..1.866 rows=1 loops=1)
                    ->  Parallel Bitmap Heap Scan on _hyper_52_268_chunk  (cost=2.55..112.34 rows=59 width=8) (actual time=0.444..1.848 rows=100 loops=1)
                          Recheck Cond: ((ts >= '2024-09-01 00:00:00+00'::timestamp with time zone) AND (ts <= '2024-09-05 00:00:00+00'::timestamp with time zone))
                          ->  Bitmap Index Scan on _hyper_52_268_chunk_iot_sensor_data_ts_idx  (cost=0.00..2.52 rows=100 width=0) (actual time=0.413..0.413 rows=100 loops=1)
                                Index Cond: ((ts >= '2024-09-01 00:00:00+00'::timestamp with time zone) AND (ts <= '2024-09-05 00:00:00+00'::timestamp with time zone))
              ->  Partial Aggregate  (cost=9000.00..9000.01 rows=1 width=32) (actual time=45.256..45.256 rows=1 loops=3)
                    ->  Parallel Seq Scan on _hyper_52_267_chunk  (cost=0.00..8400.00 rows=240000 width=8) (actual time=0.018..32.989 rows=192000 loops=3)
                          Filter: ((ts >= '2024-09-01 00:00:00+00'::timestamp with time zone) AND (ts <= '2024-09-05 00:00:00+00'::timestamp with time zone))
Planning Time: 0.409 ms
Execution Time: 58.710 ms
```

Then do the same against the regular table:

```SQL
EXPLAIN ANALYZE
SELECT AVG(temperature)
FROM regular_iot_sensor_data
WHERE ts BETWEEN '2024-09-01' AND '2024-09-05';
```

That returns:

```SQL
Finalize Aggregate  (cost=17131.34..17131.35 rows=1 width=8) (actual time=135.917..145.755 rows=1 loops=1)
  ->  Gather  (cost=17131.13..17131.34 rows=2 width=32) (actual time=135.724..145.744 rows=3 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Partial Aggregate  (cost=16131.13..16131.14 rows=1 width=32) (actual time=130.502..130.503 rows=1 loops=3)
              ->  Parallel Index Scan using regular_iot_sensor_data_ts_idx on regular_iot_sensor_data  (cost=0.43..15581.22 rows=219961 width=8) (actual time=0.039..107.362 rows=192033 loops=3)
                    Index Cond: ((ts >= '2024-09-01 00:00:00+00'::timestamp with time zone) AND (ts <= '2024-09-05 00:00:00+00'::timestamp with time zone))
Planning Time: 0.348 ms
Execution Time: 145.799 ms

```

🔖

[Learn how to use PostgreSQL EXPLAIN](https://www.timescale.com/learn/explaining-postgresql-explain).

  

As your datasets grow, you’ll notice even more benefits to query performance. Already, we can see quite a difference—about a 148.3% increase in execution time for a regular table, from 58.710 ms (left) to 145.799 ms (right):

![The hypertable execution time on the right (58.710 ms) and the regular table execution time on the left (145.799 ms).](https://storage.ghost.io/c/6b/cb/6bcb39cf-9421-4bd1-9c9d-fa7b6755ba0e/content/images/2024/11/Scale-PostgreSQL-via-Partitioning_Hypertable-benchmark.png)

A bar graph with the hypertable execution time on the right (58.710 ms) and the regular table execution time on the left (145.799 ms).

For fun, I just ran a simple `SELECT * FROM` query against both the `iot_sensor_data` and `regular_iot_sensor_data` tables to compare the difference in query execution times (no limit).   

The `iot_sensor_data` table returned results in 1762.9 s vs. 1940.6 s for the regular table. That’s about 10 % slower for the regular table, just retrieving results—without even performing any calculations.  

## What’s Next?

Hypertables are great for storing and querying large volumes of time-stamped data, whether you’re just looking for a performance boost or hoping to enable real-time analytics and monitoring at scale.

We covered a number of useful features that are unlocked when using hypertables in TimescaleDB. There’s even more that can be researched to fully take advantage of all the benefits that hypertables provide. So, here’s a complete list:

-   [Automatic data partitioning](https://docs.timescale.com/use-timescale/latest/hypertables/about-hypertables/#hypertable-partitioning) (by time or optionally by space) 
-   Optimization for [high insert rates](https://docs.timescale.com/use-timescale/latest/ingest-data/) and [management](https://www.timescale.com/learn/postgres-data-management-best-practices) of large data volumes
-   [Continuous aggregates](https://docs.timescale.com/use-timescale/latest/continuous-aggregates/) (incrementally updated materialized views)
-   [Data retention policies](https://docs.timescale.com/use-timescale/latest/data-retention/) (to automatically drop unneeded data and manage storage effectively)
-   Time-based [compression](https://docs.timescale.com/use-timescale/latest/compression/) for improved storage footprint and query performance
-   [Automated data tiering](https://docs.timescale.com/use-timescale/latest/data-tiering/) for cost management
-   Advanced time-series [hyperfunctions](https://docs.timescale.com/use-timescale/latest/hyperfunctions/) specifically designed for hypertables
-   Automatic [indexing](https://docs.timescale.com/use-timescale/latest/hypertables/about-hypertables/#hypertable-indexes)
-   [Real-time continuous aggregation](https://docs.timescale.com/use-timescale/latest/continuous-aggregates/real-time-aggregates/)

Look out for more articles like this to see other real-world examples of TimescaleDB features like continuous aggregates and compression.

Interested in trying out hypertables yourself? If you’re self-hosting, [install the open source TimescaleDB extension](https://docs.timescale.com/self-hosted/latest/install/?ref=timescale.com)—else, you can sign up for the fully managed Timescale Cloud solution [free for 30 days](https://console.cloud.timescale.com/signup?ref=timescale.com) (with no credit card required).