TigerData logo
TigerData logo
  • Product

    Tiger Cloud

    Robust elastic cloud platform for startups and enterprises

    Agentic Postgres

    Postgres for Agents

    TimescaleDB

    Postgres for time-series, real-time analytics and events

  • Docs
  • Pricing

    Pricing

    Enterprise Tier

  • Developer Hub

    Changelog

    Benchmarks

    Blog

    Community

    Customer Stories

    Events

    Support

    Integrations

    Launch Hub

  • Company

    Contact us

    About

    Timescale

    Partners

    Security

    Careers

Log InTry for free
Home
Alternatives to TimescaleTime-Series Analysis in RAWS Time-Series Database: Understanding Your OptionsWhat Is a Time Series and How Is It Used?Is Your Data Time Series? Data Types Supported by PostgreSQL and TimescaleWhy Consider Using PostgreSQL for Time-Series Data?How to Work With Time Series in Python?Tools for Working With Time-Series Analysis in PythonGuide to Time-Series Analysis in PythonTime-Series Analysis and Forecasting With Python Understanding Database Workloads: Variable, Bursty, and Uniform PatternsThe Best Time-Series Databases ComparedUnderstanding Autoregressive Time-Series ModelingStationary Time-Series AnalysisCreating a Fast Time-Series Graph With Postgres Materialized ViewsWhat Are Open-Source Time-Series Databases—Understanding Your OptionsWhat Is Temporal Data?
Optimizing Your Database: A Deep Dive into PostgreSQL Data TypesHow to Install PostgreSQL on LinuxHow to Install PostgreSQL on MacOS5 Common Connection Errors in PostgreSQL and How to Solve ThemHow to Fix No Partition of Relation Found for Row in Postgres DatabasesHow to Fix Transaction ID Wraparound ExhaustionUnderstanding PostgreSQL Date and Time FunctionsData Partitioning: What It Is and Why It MattersWhat Is Data Compression and How Does It Work?Self-Hosted or Cloud Database? A Countryside Reflection on Infrastructure ChoicesUnderstanding ACID Compliance Understanding percentile_cont() and percentile_disc() in PostgreSQLUsing PostgreSQL UPDATE With JOINUnderstanding PostgreSQL Conditional FunctionsUnderstanding PostgreSQL Array FunctionsWhat Characters Are Allowed in PostgreSQL Strings?Understanding PostgreSQL's COALESCE FunctionWhat Is Data Transformation, and Why Is It Important?Understanding PostgreSQL User-Defined FunctionsStructured vs. Semi-Structured vs. Unstructured Data in PostgreSQLUnderstanding SQL Aggregate FunctionsUnderstanding Foreign Keys in PostgreSQLUnderstanding PostgreSQLUnderstanding FROM in PostgreSQL (With Examples)Understanding FILTER in PostgreSQL (With Examples)How to Address ‘Error: Could Not Resize Shared Memory Segment’ Understanding HAVING in PostgreSQL (With Examples)Understanding GROUP BY in PostgreSQL (With Examples)Understanding LIMIT in PostgreSQL (With Examples)Understanding PostgreSQL FunctionsUnderstanding ORDER BY in PostgreSQL (With Examples)Understanding WINDOW in PostgreSQL (With Examples)Understanding PostgreSQL WITHIN GROUPPostgreSQL Mathematical Functions: Enhancing Coding EfficiencyUnderstanding DISTINCT in PostgreSQL (With Examples)Using PostgreSQL String Functions for Improved Data AnalysisData Processing With PostgreSQL Window FunctionsUnderstanding WHERE in PostgreSQL (With Examples)PostgreSQL Joins : A SummaryUnderstanding OFFSET in PostgreSQL (With Examples)Understanding the Postgres string_agg FunctionWhat Is a PostgreSQL Full Outer Join?What Is a PostgreSQL Cross Join?What Is a PostgreSQL Inner Join?What Is a PostgreSQL Left Join? And a Right Join?PostgreSQL Join Type TheoryUnderstanding PostgreSQL SELECTA Guide to PostgreSQL ViewsStrategies for Improving Postgres JOIN PerformanceUnderstanding the Postgres extract() FunctionUnderstanding the rank() and dense_rank() Functions in PostgreSQL
Top PostgreSQL Drivers for PythonPostgreSQL Performance Tuning: Optimizing Database IndexesDetermining the Optimal Postgres Partition SizeBest Practices for (Time-)Series Metadata Tables Guide to Postgres Data ManagementHow to Query JSONB in PostgreSQLHow to Index JSONB Columns in PostgreSQLHow to Monitor and Optimize PostgreSQL Index PerformanceOptimizing Array Queries With GIN Indexes in PostgreSQLSQL/JSON Data Model and JSON in SQL: A PostgreSQL PerspectiveHow to Query JSON Metadata in PostgreSQLA Guide to pg_restore (and pg_restore Example)Handling Large Objects in PostgresPostgreSQL Performance Tuning: Designing and Implementing Your Database SchemaGuide to PostgreSQL PerformancePostgreSQL Performance Tuning: Key ParametersHow to Reduce Bloat in Large PostgreSQL TablesGuide to PostgreSQL Database OperationsPostgreSQL Performance Tuning: How to Size Your DatabaseExplaining PostgreSQL EXPLAINA Guide to Data Analysis on PostgreSQLHow PostgreSQL Data Aggregation WorksBuilding a Scalable DatabaseA Guide to Scaling PostgreSQLPg_partman vs. Hypertables for Postgres PartitioningHow to Use PostgreSQL for Data TransformationWhen to Consider Postgres PartitioningDesigning Your Database Schema: Wide vs. Narrow Postgres TablesRecursive Query in SQL: What It Is, and How to Write OneGuide to PostgreSQL Database DesignWhat Is Audit Logging and How to Enable It in PostgreSQLGuide to PostgreSQL SecurityNavigating Growing PostgreSQL Tables With Partitioning (and More)An Intro to Data Modeling on PostgreSQLBest Practices for Time-Series Data Modeling: Single or Multiple Partitioned Table(s) a.k.a. Hypertables What Is a PostgreSQL Temporary View?A PostgreSQL Database Replication GuideUnderstanding PostgreSQL TablespacesHow to Compute Standard Deviation With PostgreSQLHow to Use Psycopg2: The PostgreSQL Adapter for Python
Best Practices for Scaling PostgreSQLBest Practices for PostgreSQL Database OperationsHow to Store Video in PostgreSQL Using BYTEAHow to Handle High-Cardinality Data in PostgreSQLHow to Use PostgreSQL for Data NormalizationTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPYBest Practices for Postgres SecurityBest Practices for Postgres Data ManagementBest Practices for Postgres PerformanceHow to Design Your PostgreSQL Database: Two Schema ExamplesHow to Manage Your Data With Data Retention PoliciesBest Practices for PostgreSQL Data AnalysisBest Practices for PostgreSQL AggregationBest Practices for Postgres Database ReplicationHow to Use a Common Table Expression (CTE) in SQL
PostgreSQL Extensions: Unlocking Multidimensional Points With Cube PostgreSQL Extensions: hstorePostgreSQL Extensions: ltreePostgreSQL Extensions: pg_prewarmPostgreSQL Extensions: pgRoutingPostgreSQL Extensions: Using PostGIS and Timescale for Advanced Geospatial InsightsPostgreSQL Extensions: Turning PostgreSQL Into a Vector Database With pgvectorPostgreSQL Extensions: amcheckPostgreSQL Extensions: Secure Your Time-Series Data With pgcryptoPostgreSQL Extensions: pg_stat_statementsPostgreSQL Extensions: Database Testing With pgTAPPostgreSQL Extensions: Install pg_trgm for Data MatchingPostgreSQL Extensions: PL/pgSQLPostgreSQL Extensions: Intro to uuid-ossp
PostgreSQL as a Real-Time Analytics DatabaseHow to Build an IoT Pipeline for Real-Time Analytics in PostgreSQLHow to Choose a Real-Time Analytics DatabaseUnderstanding OLTPOLAP Workloads on PostgreSQL: A GuideHow to Choose an OLAP DatabaseData Analytics vs. Real-Time Analytics: How to Pick Your Database (and Why It Should Be PostgreSQL)What Is the Best Database for Real-Time AnalyticsColumnar Databases vs. Row-Oriented Databases: Which to Choose?
A Brief History of AI: How Did We Get Here, and What's Next?Text-to-SQL: A Developer’s Zero-to-Hero GuideA Beginner’s Guide to Vector EmbeddingsPostgreSQL as a Vector Database: A Pgvector TutorialUsing Pgvector With PythonHow to Choose a Vector DatabaseVector Databases Are the Wrong AbstractionUnderstanding DiskANNStreaming DiskANN: How We Made PostgreSQL as Fast as Pinecone for Vector DataA Guide to Cosine SimilarityImplementing Cosine Similarity in PythonVector Database Basics: HNSWVector Database Options for AWSVector Store vs. Vector Database: Understanding the ConnectionPgvector vs. Pinecone: Vector Database Performance and Cost ComparisonHow to Build LLM Applications With Pgvector Vector Store in LangChainHow to Implement RAG With Amazon Bedrock and LangChainRetrieval-Augmented Generation With Claude Sonnet 3.5 and PgvectorPostgreSQL Hybrid Search Using Pgvector and CohereWhat Is Vector Search? Vector Search vs Semantic SearchNearest Neighbor Indexes: What Are IVFFlat Indexes in Pgvector and How Do They WorkRAG Is More Than Just Vector SearchImplementing Filtered Semantic Search Using Pgvector and JavaScriptRefining Vector Search Queries With Time Filters in Pgvector: A TutorialUnderstanding Semantic SearchBuilding an AI Image Gallery With OpenAI CLIP, Claude Sonnet 3.5, and PgvectorWhen Should You Use Full-Text Search vs. Vector Search?HNSW vs. DiskANN
Understanding IoT (Internet of Things)Storing IoT Data: 8 Reasons Why You Should Use PostgreSQLHow to Choose an IoT DatabaseHow to Simulate a Basic IoT Sensor Dataset on PostgreSQLFrom Ingest to Insights in Milliseconds: Everactive's Tech Transformation With TimescaleHow Ndustrial Is Providing Fast Real-Time Queries and Safely Storing Client Data With 97 % CompressionA Beginner’s Guide to IIoT and Industry 4.0Why You Should Use PostgreSQL for Industrial IoT DataHow Hopthru Powers Real-Time Transit Analytics From a 1 TB Table Migrating a Low-Code IoT Platform Storing 20M Records/DayMoving Past Legacy Systems: Data Historian vs. Time-Series DatabaseHow United Manufacturing Hub Is Introducing Open Source to ManufacturingBuilding IoT Pipelines for Faster Analytics With IoT CoreVisualizing IoT Data at Scale With Hopara and TimescaleDB
What Is ClickHouse and How Does It Compare to PostgreSQL and TimescaleDB for Time Series?Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series DataWhat We Learned From Benchmarking Amazon Aurora PostgreSQL ServerlessTimescaleDB vs. Amazon Timestream: 6,000x Higher Inserts, 5-175x Faster Queries, 150-220x CheaperHow to Store Time-Series Data in MongoDB and Why That’s a Bad IdeaPostgreSQL + TimescaleDB: 1,000x Faster Queries, 90 % Data Compression, and Much MoreEye or the Tiger: Benchmarking Cassandra vs. TimescaleDB for Time-Series Data
Alternatives to RDSWhy Is RDS so Expensive? Understanding RDS Pricing and CostsEstimating RDS CostsHow to Migrate From AWS RDS for PostgreSQL to TimescaleAmazon Aurora vs. RDS: Understanding the Difference
What InfluxDB Got Wrong5 InfluxDB Alternatives for Your Time-Series Data8 Reasons to Choose Timescale as Your InfluxDB Alternative InfluxQL, Flux, and SQL: Which Query Language Is Best? (With Cheatsheet)TimescaleDB vs. InfluxDB: Purpose Built Differently for Time-Series Data
How to Migrate Your Data to Timescale (3 Ways)Postgres TOAST vs. Timescale CompressionBuilding Python Apps With PostgreSQL: A Developer's GuideMore Time-Series Data Analysis, Fewer Lines of Code: Meet HyperfunctionsTimescale Tips: Testing Your Chunk SizeIs Postgres Partitioning Really That Hard? An Introduction To HypertablesPostgreSQL Materialized Views and Where to Find Them5 Ways to Monitor Your PostgreSQL DatabaseData Visualization in PostgreSQL With Apache Superset
Postgres cheat sheet
HomeTime series basicsPostgres basicsPostgres guidesPostgres best practicesPostgres extensionsPostgres for real-time analytics
Sections

Database operations

Best Practices for PostgreSQL Database Operations

Data analysis

Best Practices for PostgreSQL Data Analysis

Data aggregation

Best Practices for PostgreSQL Aggregation

Database replication

Best Practices for Postgres Database Replication

Query optimization

How to Use a Common Table Expression (CTE) in SQL

Scaling postgres

Best Practices for Scaling PostgreSQL

Data management

How to Manage Your Data With Data Retention PoliciesHow to Store Video in PostgreSQL Using BYTEABest Practices for Postgres Data Management

Database design and modeling

How to Design Your PostgreSQL Database: Two Schema ExamplesHow to Handle High-Cardinality Data in PostgreSQLHow to Use PostgreSQL for Data Normalization

Performance

Best Practices for Postgres PerformanceTesting Postgres Ingest: INSERT vs. Batch INSERT vs. COPY

Database security

Best Practices for Postgres Security

Products

Time Series and Analytics AI and Vector Enterprise Plan Cloud Status Support Security Cloud Terms of Service

Learn

Documentation Blog Forum Tutorials Changelog Success Stories Time Series Database

Company

Contact Us Careers About Brand Community Code Of Conduct Events

Subscribe to the Tiger Data Newsletter

By submitting, you acknowledge Tiger Data's Privacy Policy

2025 (c) Timescale, Inc., d/b/a Tiger Data. All rights reserved.

Privacy preferences
LegalPrivacySitemap

Published at Feb 16, 2024

Data Analysis

Best Practices for PostgreSQL Aggregation

Colorful cubes merging as in data aggregation

PostgreSQL aggregation is essential for deriving meaningful insight from data—it transforms rows and rows of raw data into useful information that’s crucial for decision-making. It takes in multiple rows of data and outputs a single row, allowing users to analyze and summarize data efficiently.

PostgreSQL supports numerous built-in aggregate functions; common aggregate functions include SUM(), MAX(), MIN(), and AVG(), used for basic statistical operations, while more complex ones include stddev and variance, which are useful for more complex statistical analysis. 

Learn how to compute standard variation in Postgres.

In this article, we’ll discuss some best practices to help you leverage PostgreSQL aggregate functions and get the most out of them for enhanced data analysis. 

How Do PostgreSQL Aggregates Work?

While aggregates are a subset of functions, they’re fundamentally different from standard functions in the way they work. Aggregates take in a group of related rows to output a single result, while standard functions provide one result per row. To put it simply, functions work on rows, while aggregates work on columns. 

Let’s consider an example to understand this better. Suppose we have a sales table with three columns: product_id, quantity_sold, and price_per_unit, and we want to calculate the total revenue for each product and the total quantity sold for each. 

Let’s first create a table:

CREATE TABLE SALES (   product_id INTEGER,   quantity_sold INTEGER,   price_per_unit INTEGER );

Now, let’s add a few values to see the difference between functions and aggregates:

INSERT INTO SALES VALUES (0001, 10.00, 15.25); INSERT INTO SALES VALUES (0002, 12.00, 22.50); INSERT INTO SALES VALUES (0002, 10.00, 20.00); INSERT INTO SALES VALUES (0001, 8.00, 15.00);

We can now create a function to calculate the revenue by taking in the price per unit and the quantity sold:

CREATE FUNCTION calculate_revenue(quantity_sold INT, price_per_unit NUMERIC) RETURNS NUMERIC AS $$ BEGIN   RETURN quantity_sold * price_per_unit; END; $$ LANGUAGE plpgsql;

Now, let’s use this function to calculate the revenue for each:

SELECT product_id, quantity_sold, price_per_unit, calculate_revenue(quantity_sold, price_per_unit) AS total_revenue FROM SALES;

Here’s the result:

image

 

In this example, our calculate_revenue function returns the total revenue for each row. 

Now, to find the total quantity sold for each, we can just use the built-in SUM() aggregate function:

SELECT product_id, SUM(quantity_sold) AS total_quantity_sold FROM sales GROUP BY product_id;

As you can see, it works on the quantity_sold column and calculates the total quantity sold for each product:

image

In other words, aggregates combine inputs from multiple rows into a single result (grouped by the product_id in this case). Under the hood, these PostgreSQL aggregates work row by row, which raises an important question: how do aggregates know the values stored in the previous rows? This is where state transition functions come in. 

State transition functions

The aggregate function stores the state of the rows it has already seen, and as new rows are added, the internal state is updated. In our example, the internal state is just the sum of all the products sold so far. 

The function that processes all the incoming rows and updates the internal state is called a state transition function. It takes in two arguments, the current state and the value of the incoming row, and outputs a new state. As the aggregate function scans over different rows, the state transition function updates the internal state, allowing PostgreSQL to move through a column quickly.

However, that’s not all there’s to it. Aggregate functions like SUM(), MAX(), and MIN() have a pretty straightforward state with just one value, but that’s not always the case. Some aggregates can have a composite state. 

For instance, in the case of the AVG() aggregate, you need to store both the count and the sum as the internal state. But then, there’s another step we need to take to get the result, which is to divide the total sum by the total count. This calculation is performed by another function called the final function; it takes the state and does the calculations necessary to get the final result. 

So, the state transition function is called every time there’s a new row, but the final function is only called once after the state transition function has processed the group of rows. And while state transition functions aren’t computationally more expensive than final functions, the former is still the most expensive part once you factor in the number of rows that go into the aggregate.

When you have a large volume of time-series data continuously being ingested, you want something that can help improve performance. The good news is that PostgreSQL already has mechanisms for optimizing aggregates.    

Parallelization and combine functions

Since the state transition function runs on each row, we can parallelize it to improve performance. We can do so by initializing multiple instances of the state transition function and providing each a subset of the rows as the input. 

Once these parallel aggregates run, we’ll end up with multiple partial states (one per parallel aggregate). However, since we need to aggregate the entire set of rows, we need an intermediate function that combines all the partial aggregates before running the final function. This is where we need another function, called the combine function, which we can run iteratively over all the partial states to get the combined state. Then, finally, we can run the final function to get the final result.

If this is unclear, consider the AVG() function again. By parallelizing the state transition functions, we can calculate the total sum and count of a subset of rows. Then, we can use the combine function to add up all the sums and counts of all subsets before running the final function.

Best Practices for Aggregate Design

Optimizing the design of these aggregates is essential if you want to get the most value from your data analytics. Here are some practices that can allow for effective aggregate design:

Two-step aggregation

One way to optimize data aggregation is to use a two-step aggregation process that basically emulates the way PostgreSQL implements the state transition and final functions for aggregates. The approach involves internal calls that return the internal state, exactly like the transition function we discussed above, and accessor calls that take in the internal state and return the result, exactly like the final function mentioned earlier. 

This is particularly useful for time-series data. By exposing the aggregates’ internal architecture using accessors and aggregates, we can better understand how to structure our calls. 

Caching results

Materialized views in PostgreSQL are a powerful way of optimizing performance by querying results. They pre-compute queries run frequently and store the results in the database. So, every time the query is run, the database doesn’t need to execute it; instead, the results are already accessible, which means you’ll get the response to your query quickly. This helps reduce repetitive computations, allowing for more efficient analytics. 

However, you’ll need to refresh materialized views every time the data is updated, which can be quite resource-intensive.    

Pre-aggregation

Pre-aggregations refer to materialized query results that persist as tables. You can create a separate roll-up table to store the aggregated data and use triggers to manage updates to aggregates, allowing access to the table instead of repeatedly calling the aggregate function. This can save many re-computations and improve overall performance, especially when the same aggregate is computed multiple times.  

Continuous Aggregates With Timescale

While materialized views are a powerful way of speeding up commonly run queries and avoiding expensive re-computations, there’s still a big problem—you need to manually refresh them to keep the view up to date, especially since it quickly becomes stale as new data comes in. 

And it’s not a one-time thing; you’ll have to refresh the views incrementally if you want to continue avoiding recomputation. To add to that, materialized views don’t have a built-in refresh mechanism that runs automatically. This means the materialized views won’t have the data that was updated or added after the last refresh, which can be particularly problematic if you have real-time data. 

We built continuous aggregates to overcome these limitations of materialized views and make real-time analytics possible. You can think of them as materialized views for real-time aggregates that are refreshed automatically via a refresh policy. Every time a refresh runs, it only uses the data changed since the last refresh (and not the entire dataset), making the process more efficient. Once you get up-to-date results, you can use these materialized views for use cases like live dashboards and real-time analytics.

Time_buckets() with continuous aggregates

Continuous aggregates involve the use of the time_bucket() function, which allows you to group data over different time intervals. It’s quite similar to PostgreSQL’s date_bin() function but is more flexible in terms of the start time and bucket size.  

Creating a refresh policy is pretty straightforward; you just need to define the refresh interval so that your continuous aggregates are periodically and automatically updated. This whole process is much more efficient than a materialized view, minimizes computation, and enables real-time analysis. 

Real-time analytics in PostgreSQL don't have to be hard—read how.

Start Aggregating Your Data

Understanding the best practices to get the most out of PostgreSQL aggregation is crucial for improving data analytics and deriving more meaningful information from your data. We’ve talked about how PostgreSQL aggregates work and the best practices for their design.

And while built-in aggregates are great for minimizing computations, they come with a big limitation—they’re not always up-to-date, which can be a big problem, particularly if you have time-series data. If you want to improve your time-series aggregate performance, try Timescale today. You can experiment with continuous aggregates and see how they can improve your analytical capabilities. 

On this page