TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
AWS Time-Series Database: Understanding Your OptionsStationary Time-Series AnalysisThe Best Time-Series Databases ComparedTime-Series Analysis and Forecasting With Python Alternatives to TimescaleWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhy Consider Using PostgreSQL for Time-Series Data?Time-Series Analysis in RWhat Is Temporal Data?What Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleUnderstanding Database Workloads: Variable, Bursty, and Uniform PatternsHow to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonUnderstanding Autoregressive Time-Series ModelingCreating a Fast Time-Series Graph With Postgres Materialized Views
Understanding PostgreSQLOptimizing Your Database: A Deep Dive into PostgreSQL Data TypesUnderstanding FROM in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ How to Install PostgreSQL on MacOSUnderstanding FILTER in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)PostgreSQL Join Type TheoryA Guide to PostgreSQL ViewsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQL User-Defined FunctionsUnderstanding PostgreSQL's COALESCE FunctionUnderstanding SQL Aggregate FunctionsUsing PostgreSQL UPDATE With JOINHow to Install PostgreSQL on Linux5 Common Connection Errors in PostgreSQL and How to Solve ThemUnderstanding HAVING in PostgreSQL (With Examples)How to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsPostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding PostgreSQL Date and Time FunctionsWhat Is Data Compression and How Does It Work?What Is Data Transformation, and Why Is It Important?Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Left Join? And a Right Join?Understanding PostgreSQL SELECTSelf-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding WHERE in PostgreSQL (With Examples)What Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?Data Partitioning: What It Is and Why It MattersStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Guide to PostgreSQL PerformanceHow to Reduce Bloat in Large PostgreSQL TablesDesigning Your Database Schema: Wide vs. Narrow Postgres TablesBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables Best Practices for (Time-)Series Metadata Tables A Guide to Data Analysis on PostgreSQLA Guide to Scaling PostgreSQLGuide to PostgreSQL SecurityHandling Large Objects in PostgresHow to Query JSON Metadata in PostgreSQLHow to Query JSONB in PostgreSQLHow to Use PostgreSQL for Data TransformationOptimizing Array Queries With GIN Indexes in PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaPostgreSQL Performance Tuning: Key ParametersPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeNavigating Growing PostgreSQL Tables With Partitioning (and More)Top PostgreSQL Drivers for PythonWhen to Consider Postgres PartitioningGuide to PostgreSQL Database OperationsUnderstanding PostgreSQL TablespacesWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to Postgres Data ManagementHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveA Guide to pg_restore (and pg_restore Example)PostgreSQL Performance Tuning: How to Size Your DatabaseAn Intro to Data Modeling on PostgreSQLExplaining PostgreSQL EXPLAINWhat Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideHow to Compute Standard Deviation With PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Store Video in PostgreSQL Using BYTEABest Practices for PostgreSQL Database OperationsHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQLBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceBest Practices for Postgres SecurityBest Practices for PostgreSQL Data AnalysisTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYHow to Use PostgreSQL for Data Normalization
PostgreSQL Extensions: amcheckPostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Intro to uuid-ossp
Columnar Databases vs. Row-Oriented Databases: Which to Choose?Data Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)How to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabasePostgreSQL as a Real-Time Analytics DatabaseWhat Is the Best Database for Real-Time AnalyticsHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQL
When Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANNA Brief History of AI: How Did We Get Here, and What's Next?A Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNA Guide to Cosine SimilarityStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorRAG Is More Than Just Vector SearchPostgreSQL Hybrid Search Using Pgvector and CohereImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchWhat Is Vector Search? Vector Search vs Semantic SearchText-to-SQL: A Developer’s Zero-to-Hero GuideNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and Pgvector
Understanding IoT (Internet of Things)A Beginner’s Guide to IIoT and Industry 4.0Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseWhy You Should Use PostgreSQL for Industrial IoT DataHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)What InfluxDB Got WrongTimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
5 Ways to Monitor Your PostgreSQL DatabaseHow to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideData Visualization in PostgreSQL With Apache SupersetMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find ThemTimescale Tips: Testing Your Chunk Size
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Performance

Best Practices for Postgres PerformanceTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPY

Database design and modeling

How to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Use PostgreSQL for Data Normalization

Database operations

Best Practices for PostgreSQL Database Operations

Data analysis

Best Practices for PostgreSQL Data Analysis

Data aggregation

Best Practices for PostgreSQL Aggregation

Database replication

Best Practices for Postgres Database Replication

Query optimization

How to Use a Common Table Expression (CTE) in SQL

Scaling postgres

Best Practices for Scaling PostgreSQL

Data management

How to Manage Your Data With Data Retention PoliciesHow to Store Video in PostgreSQL Using BYTEABest Practices for Postgres Data Management

Database security

Best Practices for Postgres Security

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Feb 16, 2024

Data Analysis

Best Practices for PostgreSQL Aggregation

Colorful cubes merging as in data aggregation

PostgreSQL aggregation is essential for deriving meaningful insight from data—it transforms rows and rows of raw data into useful information that’s crucial for decision-making. It takes in multiple rows of data and outputs a single row, allowing users to analyze and summarize data efficiently.

PostgreSQL supports numerous built-in aggregate functions; common aggregate functions include SUM(), MAX(), MIN(), and AVG(), used for basic statistical operations, while more complex ones include stddev and variance, which are useful for more complex statistical analysis. 

Learn how to compute standard variation in Postgres.

In this article, we’ll discuss some best practices to help you leverage PostgreSQL aggregate functions and get the most out of them for enhanced data analysis. 

How Do PostgreSQL Aggregates Work?

While aggregates are a subset of functions, they’re fundamentally different from standard functions in the way they work. Aggregates take in a group of related rows to output a single result, while standard functions provide one result per row. To put it simply, functions work on rows, while aggregates work on columns. 

Let’s consider an example to understand this better. Suppose we have a sales table with three columns: product_id, quantity_sold, and price_per_unit, and we want to calculate the total revenue for each product and the total quantity sold for each. 

Let’s first create a table:

CREATE TABLE SALES (   product_id INTEGER,   quantity_sold INTEGER,   price_per_unit INTEGER );

Now, let’s add a few values to see the difference between functions and aggregates:

INSERT INTO SALES VALUES (0001, 10.00, 15.25); INSERT INTO SALES VALUES (0002, 12.00, 22.50); INSERT INTO SALES VALUES (0002, 10.00, 20.00); INSERT INTO SALES VALUES (0001, 8.00, 15.00);

We can now create a function to calculate the revenue by taking in the price per unit and the quantity sold:

CREATE FUNCTION calculate_revenue(quantity_sold INT, price_per_unit NUMERIC) RETURNS NUMERIC AS $$ BEGIN   RETURN quantity_sold * price_per_unit; END; $$ LANGUAGE plpgsql;

Now, let’s use this function to calculate the revenue for each:

SELECT product_id, quantity_sold, price_per_unit, calculate_revenue(quantity_sold, price_per_unit) AS total_revenue FROM SALES;

Here’s the result:

image

 

In this example, our calculate_revenue function returns the total revenue for each row. 

Now, to find the total quantity sold for each, we can just use the built-in SUM() aggregate function:

SELECT product_id, SUM(quantity_sold) AS total_quantity_sold FROM sales GROUP BY product_id;

As you can see, it works on the quantity_sold column and calculates the total quantity sold for each product:

image

In other words, aggregates combine inputs from multiple rows into a single result (grouped by the product_id in this case). Under the hood, these PostgreSQL aggregates work row by row, which raises an important question: how do aggregates know the values stored in the previous rows? This is where state transition functions come in. 

State transition functions

The aggregate function stores the state of the rows it has already seen, and as new rows are added, the internal state is updated. In our example, the internal state is just the sum of all the products sold so far. 

The function that processes all the incoming rows and updates the internal state is called a state transition function. It takes in two arguments, the current state and the value of the incoming row, and outputs a new state. As the aggregate function scans over different rows, the state transition function updates the internal state, allowing PostgreSQL to move through a column quickly.

However, that’s not all there’s to it. Aggregate functions like SUM(), MAX(), and MIN() have a pretty straightforward state with just one value, but that’s not always the case. Some aggregates can have a composite state. 

For instance, in the case of the AVG() aggregate, you need to store both the count and the sum as the internal state. But then, there’s another step we need to take to get the result, which is to divide the total sum by the total count. This calculation is performed by another function called the final function; it takes the state and does the calculations necessary to get the final result. 

So, the state transition function is called every time there’s a new row, but the final function is only called once after the state transition function has processed the group of rows. And while state transition functions aren’t computationally more expensive than final functions, the former is still the most expensive part once you factor in the number of rows that go into the aggregate.

When you have a large volume of time-series data continuously being ingested, you want something that can help improve performance. The good news is that PostgreSQL already has mechanisms for optimizing aggregates.    

Parallelization and combine functions

Since the state transition function runs on each row, we can parallelize it to improve performance. We can do so by initializing multiple instances of the state transition function and providing each a subset of the rows as the input. 

Once these parallel aggregates run, we’ll end up with multiple partial states (one per parallel aggregate). However, since we need to aggregate the entire set of rows, we need an intermediate function that combines all the partial aggregates before running the final function. This is where we need another function, called the combine function, which we can run iteratively over all the partial states to get the combined state. Then, finally, we can run the final function to get the final result.

If this is unclear, consider the AVG() function again. By parallelizing the state transition functions, we can calculate the total sum and count of a subset of rows. Then, we can use the combine function to add up all the sums and counts of all subsets before running the final function.

Best Practices for Aggregate Design

Optimizing the design of these aggregates is essential if you want to get the most value from your data analytics. Here are some practices that can allow for effective aggregate design:

Two-step aggregation

One way to optimize data aggregation is to use a two-step aggregation process that basically emulates the way PostgreSQL implements the state transition and final functions for aggregates. The approach involves internal calls that return the internal state, exactly like the transition function we discussed above, and accessor calls that take in the internal state and return the result, exactly like the final function mentioned earlier. 

This is particularly useful for time-series data. By exposing the aggregates’ internal architecture using accessors and aggregates, we can better understand how to structure our calls. 

Caching results

Materialized views in PostgreSQL are a powerful way of optimizing performance by querying results. They pre-compute queries run frequently and store the results in the database. So, every time the query is run, the database doesn’t need to execute it; instead, the results are already accessible, which means you’ll get the response to your query quickly. This helps reduce repetitive computations, allowing for more efficient analytics. 

However, you’ll need to refresh materialized views every time the data is updated, which can be quite resource-intensive.    

Pre-aggregation

Pre-aggregations refer to materialized query results that persist as tables. You can create a separate roll-up table to store the aggregated data and use triggers to manage updates to aggregates, allowing access to the table instead of repeatedly calling the aggregate function. This can save many re-computations and improve overall performance, especially when the same aggregate is computed multiple times.  

Continuous Aggregates With Timescale

While materialized views are a powerful way of speeding up commonly run queries and avoiding expensive re-computations, there’s still a big problem—you need to manually refresh them to keep the view up to date, especially since it quickly becomes stale as new data comes in. 

And it’s not a one-time thing; you’ll have to refresh the views incrementally if you want to continue avoiding recomputation. To add to that, materialized views don’t have a built-in refresh mechanism that runs automatically. This means the materialized views won’t have the data that was updated or added after the last refresh, which can be particularly problematic if you have real-time data. 

We built continuous aggregates to overcome these limitations of materialized views and make real-time analytics possible. You can think of them as materialized views for real-time aggregates that are refreshed automatically via a refresh policy. Every time a refresh runs, it only uses the data changed since the last refresh (and not the entire dataset), making the process more efficient. Once you get up-to-date results, you can use these materialized views for use cases like live dashboards and real-time analytics.

Time_buckets() with continuous aggregates

Continuous aggregates involve the use of the time_bucket() function, which allows you to group data over different time intervals. It’s quite similar to PostgreSQL’s date_bin() function but is more flexible in terms of the start time and bucket size.  

Creating a refresh policy is pretty straightforward; you just need to define the refresh interval so that your continuous aggregates are periodically and automatically updated. This whole process is much more efficient than a materialized view, minimizes computation, and enables real-time analysis. 

Real-time analytics in PostgreSQL don't have to be hard—read how.

Start Aggregating Your Data

Understanding the best practices to get the most out of PostgreSQL aggregation is crucial for improving data analytics and deriving more meaningful information from your data. We’ve talked about how PostgreSQL aggregates work and the best practices for their design.

And while built-in aggregates are great for minimizing computations, they come with a big limitation—they’re not always up-to-date, which can be a big problem, particularly if you have time-series data. If you want to improve your time-series aggregate performance, try Timescale today. You can experiment with continuous aggregates and see how they can improve your analytical capabilities. 

On this page