TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Guide to PostgreSQL PerformanceHow to Reduce Bloat in Large PostgreSQL TablesDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)Top PostgreSQL Drivers for PythonWhen to Consider Postgres PartitioningGuide to PostgreSQL Database OperationsUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Performance

Guide to PostgreSQL Performance

Schema design

PostgreSQL Performance Tuning: Designing and Implementing Your Database Schema

Performance tuning

PostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesHow to Reduce Bloat in Large PostgreSQL TablesPostgreSQL Performance Tuning: How to Size Your Database

Partitioning

Determining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)When to Consider Postgres PartitioningPg_partman vs. Hypertables for Postgres Partitioning

Database design and modeling

An Intro to Data Modeling on PostgreSQLDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables Guide to PostgreSQL Database Design

Database replication

A PostgreSQL Database Replication Guide

Data analysis

A Guide to Data Analysis on PostgreSQLHow to Compute Standard Deviation With PostgreSQL

Data transformation

How to Use PostgreSQL for Data Transformation

Data aggregation

How PostgreSQL Data Aggregation Works

Scaling postgres

A Guide to Scaling PostgreSQLBuilding a Scalable Database

Database security

Guide to PostgreSQL SecurityWhat Is Audit Logging and How to Enable It in PostgreSQL

Data management

Understanding PostgreSQL TablespacesGuide to Postgres Data ManagementHandling Large Objects in Postgres

Database operations

Guide to PostgreSQL Database Operations

JSON

How to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Index JSONB Columns in PostgreSQLSQL/JSON Data Model and JSON in SQL: A PostgreSQL Perspective

Query optimization

Explaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?Optimizing Array Queries With GIN Indexes in PostgreSQLRecursive Query in SQL: What It Is, and How to Write One

Database tools and libraries

How to Use Psycopg2: The PostgreSQL Adapter for PythonTop PostgreSQL Drivers for Python

Database indexes

How to Monitor and Optimize PostgreSQL Index Performance

Database backups and restore

A Guide to pg_restore (and pg_restore Example)

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Mar 6, 2024

Postgres Partition

When to Consider Postgres Partitioning

Colored open boxes representing partitions in a table. Learn in this post when to consider Postgres partitioning.

Written by James Blackwood-Sewell and Carlota Soto

PostgreSQL is one of the most popular relational databases on the planet. Its rich feature set and open-source nature make it a popular choice for businesses of all sizes. But it can run into trouble as data grows, especially with single large tables.

One popular strategy to tackle these issues is table partitioning. In this article, we'll explore when and why you should consider Postgres partitioning (especially your large tables), sharing some best practices and common pitfalls if you decide to partition your data.

What Is Table Partitioning?

Table partitioning is a database design technique that divides a large table into smaller, more manageable pieces known as partitions. Each partition is a sub-table that holds a subset of the data, with each row existing in exactly one partition. Partitions can have their own indexes, but any unique indexes (including primary keys) must include the column used to partition the table.

This technique can offer significant advantages when managing large tables. By breaking down a large table into smaller partitions, you can benefit from improved query performance, optimized index sizes, and more efficient data maintenance operations, keeping your large-scale databases more agile and responsive overall.

But partitioning is not a one-size-fits-all solution: sometimes, it won't benefit you. It can even degrade your performance, as we’ll see later on.

PostgreSQL supports three partitioning strategies:

  1. Range partitioning: this strategy is ideal for time-series data or incrementing sequences (maybe a BIGINT primary key), where you partition data based on a range of values (e.g., by day or number of keys).

  2. List partitioning: you would use list partitioning when you want to partition data based on specific values in a column (e.g., country or department).

  3. Hash partitioning: hash partitioning is suitable when there is no clear partitioning key, as it distributes data evenly across partitions based on a hash function.

One crucial characteristic of partitioning in PostgreSQL is that it does not support “global indexes”: there is no way to create an index across partitions. Instead, when querying across partitions, PostgreSQL will evaluate the WHERE clause, and if there are constraints on the partition column, it will use those to exclude partitions that don’t hold data relevant to the query.

It’s probably clear to you at this point that if you were to implement partitioning in a production setup, you'd need automation in place to create and maintain partitions, especially if the partitioned table expects to receive data continuously. There are multiple ways to do this, the most common being the following:

  • You can write SQL procedures/functions to check if the necessary partitions for upcoming days/weeks/months exist, using cron (on Unix-like systems) or Task Scheduler (on Windows) to run the script regularly.

  • You can use a PostgreSQL extension dedicated to this, the most popular being pg_partman (which will probably need to be paired with a scheduler).

  • You can use Timescale, whose hypertables have built-in automatic partitioning (reducing the creation and maintenance of partitions to a single command).

Read a comparison between two of these methodologies—pg_partman and hypertables.

When Should You Consider Partitioning?

So, yes, partitioning can be very powerful, but it is certainly not for every use case. The decision to partition a PostgreSQL table is not strictly based on the table's absolute size. Evaluating your database's characteristics and requirements before implementing partitioning is essential.

Generally speaking, you should start thinking about partitioning in one (or more) of the following situations:

  • You have large tables. As we mentioned, the size of your tables is not the only factor determining whether you may see benefits from partitioning; this said, if you have large tables (tens of millions to billions of rows), you would probably benefit from partitioning.

  • Your ingestion rate is very high. Even if the current table size isn't massive, a high data ingestion rate can indicate that the table will grow significantly soon. It might be beneficial to preemptively manage this growth with a partitioning strategy before it starts affecting your performance and maintenance operations.

  • You’re beginning to notice query performance degradation. Partitioning may also be beneficial if your queries are slowing down, especially those that will only touch a subset of your data. You can benefit from this even when your tables are smaller. For example, partitioning can significantly enhance query performance when your daily queries include searches based on a specific range or criteria. Let's say you're dealing with time-series data: partitioning by date can help you quickly retrieve records within a particular time frame without scanning the entire table.

  • You’re dealing with maintenance overhead. As a table grows, maintenance operations like VACUUM, ANALYZE, and indexing can take longer and might start impacting your operational efficiency. Partitioning can simplify these operations because you can focus on maintaining smaller partitions independently, reducing the impact on your database's overall performance.

  • You’re managing data retention policies. If your dataset has built-in obsolescence that periodically purges older data, partitioning can make these operations much more efficient. Dropping an old partition is much faster and less resource-intensive than deleting rows.

  • You want to use less memory. If you operate with limited memory, you might benefit from partitioning, as smaller indexes and data chunks fit better in memory and improve cache hit rates. In most cases, this will also improve performance.

When Not to Use Partitioning

There are situations where introducing partitioning may be counterproductive, and you may want to look at other optimizations instead, such as tuning your indexes or queries:

  • Your tables are small, and you’re not ingesting at high rates.

    If your tables are not large and are unlikely to grow significantly, partitioning would add unnecessary complexity without delivering noticeable benefits.

  • You’re just starting with PostgreSQL. Implementing and maintaining partitioned tables introduces a certain level of operational and architectural complexity: managing such tables is more challenging, so make sure you feel ready to go there—or use a hosted service like Timescale to manage this for you.

  • You have uniform data access patterns. If your queries usually access data uniformly across the table rather than focusing on specific subsets (like recent data, ranges of data, specific categories, and so on), then your queries could actually perform worse.

  • Frequent full scans. If most of your queries don’t use your partition key in the WHERE clause, you’ll scan every partition every time. This scan will be slow and will get slower the more partitions you have.

How to Partition a PostgreSQL Table

List partitioning

List partitioning is useful when you want to partition data based on a finite set of values, such as categories or regions. Each partition will contain data that matches a particular list of values.

Example:

Suppose we have a table called orders and we want to partition it based on different regions: 'North', 'South', 'East', and 'West'.

-- Create the parent table CREATE TABLE orders (     order_id SERIAL PRIMARY KEY,     order_date DATE,     region TEXT,     amount NUMERIC ) PARTITION BY LIST (region);

-- Create partitions for each region CREATE TABLE orders_north PARTITION OF orders FOR VALUES IN ('North'); CREATE TABLE orders_south PARTITION OF orders FOR VALUES IN ('South'); CREATE TABLE orders_east PARTITION OF orders FOR VALUES IN ('East'); CREATE TABLE orders_west PARTITION OF orders FOR VALUES IN ('West');

In this example, any INSERT statement will automatically route data to the correct partition based on the region value.

Range partitioning

Range partitioning is useful when you want to partition data based on a continuous range of values, such as dates, numerical values, or other ordered data.

Example:

Suppose we want to partition the orders table by the year of the order_date.

-- Create the parent table CREATE TABLE orders (     order_id SERIAL PRIMARY KEY,     order_date DATE,     region TEXT,     amount NUMERIC ) PARTITION BY RANGE (order_date);

-- Create partitions for each year CREATE TABLE orders_2023 PARTITION OF orders FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'); CREATE TABLE orders_2022 PARTITION OF orders FOR VALUES FROM ('2022-01-01') TO ('2023-01-01'); CREATE TABLE orders_2021 PARTITION OF orders FOR VALUES FROM ('2021-01-01') TO ('2022-01-01');

Here, records are routed based on their order_date. For example, a record with an order_date of 2023-05-15 will be placed in the orders_2023 partition.

Hash partitioning

Hash partitioning is useful when you want to distribute rows across partitions in a way that balances the load. Data is divided using a hash function that evenly distributes data across the specified number of partitions.

Example:

Let’s partition the orders table by a hash of the order_id, spreading the rows across four partitions.

-- Create the parent table CREATE TABLE orders (     order_id SERIAL PRIMARY KEY,     order_date DATE,     region TEXT,     amount NUMERIC ) PARTITION BY HASH (order_id);

-- Create 4 partitions CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE orders_p2 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 1); CREATE TABLE orders_p3 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 2); CREATE TABLE orders_p4 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 3);

In this case, each row will be placed in one of the four partitions based on the result of a hash function applied to order_id. PostgreSQL will compute the hash and then distribute the rows across the partitions.

How to create a partitioned hypertable 

Now let’s see how you can create a partitioned PostgreSQL table in Timescale using hypertables. Hypertables automatically partition your data—you can learn more about them in the Timescale documentation.

1. Log into your Timescale account or install the TimescaleDB extension

Log into your Timescale Cloud account. If you don’t have an account, you can create one for free and try it for 30 days. Alternatively, you can install TimescaleDB on your machine and then enable it.

-- Enable the TimescaleDB extension CREATE EXTENSION IF NOT EXISTS timescaledb;

2. Create the main table

Next, create a similar case from the previous, with an orders table that will store the time-series data.

-- Create the sensor_data table CREATE TABLE orders (     customer_id BIGINT NOT NULL,     order_time TIMESTAMPTZ NOT NULL,     region TEXT,     amount NUMERIC,     PRIMARY KEY (order_time, customer_id) -- Composite key for uniqueness );

This is a standard table that will later be converted into a hypertable. The order_time  column is required for time-series data, and we have replaced the primary key with a combined order_time + customer_id column to uniquely identify the orders from which customer.

3. Convert the table to a hypertable

Convert the orders table into a hypertable using the create_hypertable() function. Here, we will partition the data by time into chunks of one (1) day.

-- Convert the table to a hypertable partitioned by time (daily chunks) SELECT create_hypertable('orders', by_range('order_time', INTERVAL '1 day')); Similar to hashing partition previously presented, you can use space partitioning on the region to distribute the data across multiple partitions using add_dimension.

SELECT add_dimension('orders', by_hash('region', 4));

Explanation:

  • The order_time column is used for time-based partitioning.

  • chunk_time_interval => INTERVAL '1 day' specifies that the time-based partitioning should create chunks of data for each day.

  • The region column is used for space partitioning, with 4 (four) partitions.

4. Insert data into the hypertable

You can now insert data into the hypertable, and TimescaleDB will automatically manage the partitioning and chunking of the data.

-- Insert some sample data INSERT INTO orders (order_time, customer_id, region, amount) VALUES     ('2024-08-01 00:00:00', 1, 'Region A', 600),     ('2024-08-01 00:00:00', 2, 'Region A', 580),     ('2024-08-01 01:00:00', 1, 'Region B', 610),     ('2024-08-01 01:00:00', 2, 'Region C', 590),     ('2024-08-02 00:00:00', 1, 'Region B', 620),     ('2024-08-02 00:00:00', 2, 'Region C, 570);

Each row will be routed to the appropriate chunk based on the time column. The data will be further distributed across the space partitions based on the sensor_id.

5. Query the hypertable

You can query the sensor_data hypertable just like a normal PostgreSQL table. TimescaleDB will automatically optimize the query by accessing only the relevant chunks.

-- Query data for a specific time range SELECT * FROM orders WHERE order_time BETWEEN '2024-08-01 00:00:00' AND '2024-08-01 23:59:59';

6. Check chunks created

You can view the chunks that TimescaleDB has created using the show_chunks() function.

-- Show the chunks created by TimescaleDB SELECT show_chunks('orders'); This will show the time ranges and partitions that TimescaleDB has automatically created to store your data.

For more advanced tips on how to use hypertables in TimescaleDB, check out this video.

Best Practices for Data Partitioning

But if you decide to do it, here's how to do it properly:

  • Choose the right partition size. What’s your ideal partition size? It depends, but you should aim for a balance. While PostgreSQL can handle a high number of partitions, having too many will increase planning time and could negatively affect query performance. At the same time, if your partitions are too large, you won’t be able to use ranges to exclude data, undermining the effectiveness of partition pruning.

  • Keep your partition size consistent. Aim to keep partitions relatively uniform in size, ensuring that maintenance tasks and query performances are consistent across partitions.

  • Choose the right partitioning key. Opt for a key that aligns with your query patterns. For instance, if most of your queries filter by date, a timestamp or date column would be an ideal partitioning key.

  • Create partitions in advance. Be sure to create partitions for upcoming periods (e.g., future days or months) ahead of time so there's no interruption in data ingestion. While you can use a default partition to catch orphaned records, in practice, this introduces a maintenance burden and does not perform well.

  • Take advantage of data retention policies to maintain old partitions. For example, if you're partitioning by time and data has a limited useful life, schedule regular tasks to drop or archive old partitions.

  • Optimize your queries. If you’re especially interested in optimizing query performance, analyze and understand the query execution plan to validate that it is only scanning necessary partitions.

  • Properly place partitions across different storage mediums. If you're using tablespaces to place partitions on different storage engines (e.g., EBS or S3), ensure that frequently accessed partitions are on faster storage and older or less accessed partitions can be on slower, cheaper storage. Timescale makes this implementation easy via data tiering.

Avoid These Pitfalls

Similarly, make sure to avoid these common mistakes:

  • Over-partitioning. Creating many small partitions is tempting, but this won’t work well—you’ll get into query planning and management challenges.

  • Inefficient indexing. Avoid creating unnecessary indexes on your partitions. Only index the columns that are frequently filtered or joined on.

  • Unoptimized query pattern. Queries spanning multiple partitions or not using the partition key in the WHERE clause might suffer in performance. Be sure to optimize the majority of your queries for the partitioning scheme.

  • Running out of partitions. If you insert data with no partition to live in, it will either be rejected or stored in a DEFAULT partition. Ensure you either pre-create partitions at a quiet time (as this will lock your table) or use an extension that creates new partitions on the fly.

  • Monitor disk usage (partitions need extra space). If you're creating many partitions, especially on different tablespaces or disks, monitor disk usage to avoid out-of-space issues.

Conclusion

Partitioning PostgreSQL tables can be a great ally in your production database, especially if your tables are getting big, your ingestion is growing, or your query performance is degrading. Still, partitioning is not a one-size-fits-all solution: consider if it’s the right fit for your use case before deciding to implement it.

If you’ve decided to partition your large PostgreSQL tables by time, check out Timescale, which completely automates the creation and maintenance of partitions: you will interact with your tables as if they were regular PostgreSQL tables, while Timescale handles partitioning under the hood. You will get the extra performance—without the extra work.

Want to keep reading? Learn more about determining the optimal Postgres partition size.

On this page