Tiger Data Blog

Benchmarking PostgreSQL Batch Ingest

James Blackwood-Sewell — Tue, 26 Nov 2024 14:00:51 GMT

In a previous article in this series, I explored the magic of INSERT...UNNEST for improving PostgreSQL batch INSERT performance. While it’s a fantastic technique, I know it’s not the fastest option available (although it is very flexible). Originally, I hadn't intended to loop back and benchmark all the batch ingest methods, but I saw a lot of confusion out there, so I'm back, and this time I'm looking at COPY too. As usual for this series, it’s not going to be a long post, but it is going to be an informative one.

I flipped my approach for this post, comparing not just the PostgreSQL database performance in isolation but the practical performance from an application. To do this, I built a custom benchmarking tool in Rust to measure the end-to-end performance of each method. In this article, I’ll walk you through the batch ingest options you’ve got, and how they stack up (spoiler alert, the spread is over 19x!).

The Introduction: Batch Ingest in PostgreSQL

I’m defining batch ingest as writing a dataset to PostgreSQL in batches or chunks. You’d usually do this because the data is being collected in (near) real time (think a flow of IoT data from sensors) before being persisted into PostgreSQL (hopefully with TimescaleDB, although that's out of scope for this post).

Writing a single record at a time is incredibly inefficient, so writing batches makes sense (the size probably depends on how long you can delay writing). Just to be clear this isn't about loading a very large dataset in one go, I’d call that bulk ingest not batch ingest (and you'd usually do that from a file).

Broadly speaking, there are two methods for ingesting multiple values at once in PostgreSQL: INSERT and COPY. Each of these methods has a few variants, so let's look at the differences.

INSERT: VALUES and UNNEST

The most common method to ingest data in PostgreSQL is the standard INSERT statement using the VALUES clause. Everyone recognizes it, and every language and ORM (object-relational mapper) can make use of it. While you can insert a single row, we are interested in batch ingest here, passing multiple values using the following syntax (this example is a batch of three for a table with seven columns).

INSERT INTO sensors
VALUES 
    ($1,  $2,  $3,  $4,  $5,  $6,  $7),
    ($8,  $9,  $10, $11, $12, $13, $14),
    ($15, $16, $17, $18, $19, $20, $21);

The other method is INSERT...UNNEST. Here, instead of passing a value per attribute (so batch size * columns total values), we pass an array of values per column.

INSERT INTO sensors
SELECT * FROM unnest(
    $1::int[],
    $2::timestamp[],
    $3::float8[],
    $4::float8[],
    $5::float8[]
    $6::float8[]
    $7::float8[]
);

If you’re after a discussion on the difference between the two, then check out Boosting Postgres INSERT Performance by 2x With UNNEST.

Each of these queries can be actually sent to the database in a few ways:

You could construct the query string manually with a literal value in place of the $ placeholders. I haven’t benchmarked this because it’s bad practice and can open you up to SQL injection attacks (never forget about Little Bobby Tables).
You could use your framework to send a parameterized query (which looks like the ones above with $ placeholders) which sends the query body and the values to Postgres as separate items. This protects against SQL injection and speeds up query parsing.
You could use a prepared statement (which would also be parameterized) to let the database know about your query ahead of time, then just send the values each time you want to run it. This provides the benefits of parameterization, and also speeds up your queries by reducing the planning time.

Most frameworks implement prepared statements using the binary protocol directly, but you can use the PREPARE and EXECUTE SQL commands to do the same thing from SQL.

Keep in mind that PostgreSQL has a limit of 32,767 parameterized variables in a query. So if you had seven columns, then your maximum batch size for INSERT…VALUES would be 4,681. When you’re using INSERT…UNNEST, you’re only sending one parameter per column. Because PostgreSQL can support at most 1,600 columns, you'll never hit the limit.

💡

When using PREPARED statements for batch ingest, ensure that plan_cache_mode is not set to force_custom_plan. This setting is designed for queries that benefit from being re-planned for each execution, which isn’t the case for batch inserts.

By default, plan_cache_mode is set to auto, meaning PostgreSQL will use custom plans for the first five executions before switching to a generic plan. To optimize performance, you could consider changing your session to force_generic_plan, ensuring the query is planned just once and reused for all subsequent executions.

COPY: Text and Binary

COPY is a PostgreSQL-specific extension to the SQL standard for bulk ingestion (strictly speaking we are talking about COPY FROM here because you can also COPY TO which moves data from a table to a file).

COPY shortcuts the process of writing multiple records in a number of ways, with two of the most critical ones being:

WAL MULTI-INSERT: PostgreSQL optimizes COPYwrite-ahead operations by writing MULTI_INSERT records to the WAL (write-ahead log) instead of logging each row individually. This results in less data being written to the WAL files, which means less I/O (input/output) on your database.
COPY ring buffer: To avoid polluting shared buffers, COPY uses a dedicated ring buffer for its I/O operations. This minimizes the impact on the buffer cache used for regular queries, preserving performance for other database operations. So, less about raw speed and more about not being a noisy neighbor.

COPY can read data from multiple sources, local files, local commands or standard input. For batch ingestion, standard input makes the most sense as the data can be sent directly from the client without an intermediate step. I was actually surprised by the amount of people who reached out on Reddit following my last post saying they couldn’t use COPY because they would need to write out their stream of data as a file, that’s 100 percent what the STDIN setting is for!

COPY can use two different protocols, text and binary.

Text supports formats like CSV and involves sending text strings over the wire to be parsed by the server before ingestion. You can actually just dump raw CSV records into COPY.
Binary supports writing data in the native PostgreSQL format from the client, removing the need for parsing on the server side. It’s much faster but also much less flexible, with limited support in many languages. To do this you need to type your data in your client, so you know the format to write it in.

The two variants of COPY we'll be testing are the text version using:

COPY sensors FROM STDIN;

And the binary version using:

COPY sensors FROM STDIN WITH (FORMAT BINARY);

COPY isn't a normal SQL statement, so it can’t exist within a larger query. If also can’t perform an upsert (like INSERT … ON CONFLICT), although from PostgreSQL 17, a text COPY can now simulate INSERT … ON CONFLICT DO NOTHING by ignoring errors with ON_ERROR IGNORE.

The Setup

I created my own Rust CLI tool to run this benchmark. That might seem like overkill, but I did it for the following reasons:

I needed something that supported COPY FROM STDIN and COPY WITH (FORMAT BINARY) directly, ruling out Grafana K6 and the PostgreSQL native pgbench.
I needed something that would let me run parameterized and prepared queries using the binary protocol directly, not using PREPARE and EXECUTE, because this is how most frameworks operate.
I wanted to measure the timing from an application's viewpoint, including data wrangling, network round-trip latency, and database operations.
I start measuring time after I've read the CSV file from disk and loaded it into a Rust data structure. This is to avoid measuring the I/O limits of the benchmark client. Batch ingest normally takes place in a stream without data being read from files.
I love Rust (if you’re after more Rust x PostgreSQL content, check out my talk at PGConf.EU 2024)!

The tool tests the insertion of data from a CSV file into the database (the default file is one million records) with multiple batch sizes and ingest methods into a table with five metric columns (the actual schema isn’t important, I just love the fact a lot of power and renewables companies use Timescale ♺):

CREATE TABLE IF NOT EXISTS power_generation (
    generator_id INTEGER, 
    timestamp TIMESTAMP WITH TIME ZONE,
    power_output_kw DOUBLE PRECISION, 
    voltage DOUBLE PRECISION,
    current DOUBLE PRECISION,
    frequency DOUBLE PRECISION,
    temperature DOUBLE PRECISION
 );

It supports a combination of the following methods (or you can use the –all shortcut) for insertion over multiple batch sizes per run:

Batch insert (parameterized)
Prepared batch insert
UNNEST insert (parameterized)
Prepared UNNEST insert
COPY
Binary COPYcomma-separated

The tool supports a few options, the most important being the comma-separated list of batch sizes.

Usage: pgingester [OPTIONS] [METHODS]...

Arguments:
  [METHODS]...  [possible values: insert-values, prepared-insert-values, insert-unnest, prepared-insert-unnest, copy, binary-copy]

Options:
  -b, --batch-sizes               [default: 1000]
  -t, --transactions                           
  -c, --csv-output                             
  -a, --all                                    
  -c, --connection-string   [env: CONNECTION_STRING=]
  -f, --input-file                 [default: ingest.csv]
  -h, --help                                   Print help
  -V, --version                                Print version

I tested with a connection to a Timescale 8 CPU/32 GB memory server (although it was only using a single connection, so this is overkill).

The Results

Running the CLI tool with the following arguments will output a bit list of ingest performance, including the relative speed for each tested method.

pgingester --all --batch-sizes 1000,5000,10000,100000,1000000

I ran all queries with multiple batch sizes with the default CSV input (one million lines). The Insert VALUES and Prepared Insert VALUES queries will only run for the 1,000 batch size as above there are too many parameters to bind (the warnings to standard error have been removed below).

We can make a number of interesting conclusions from this data:

With a larger batch size (anything other than 1,000) binary, COPY is substantially faster (at least 3.6x) than anything else (19x faster than a naive parameterized INSERT...VALUES). This is because it doesn’t have to do any data parsing on the server side. The more data you load in a single batch, the more pronounced the difference will become.
Text COPY also performs well, but surprisingly it’s surpassed in speed by prepared statements for batches of 10,000 or less.
Both COPY variants perform poorly with batches of 1,000. Interestingly, I've seen a lot of batch ingest tools actually use this.
When you’re using INSERT for batch ingest, prepared statements always outperform parameterized ones. If you want maximum speed, the same number of parameters regardless of the batch size, and to avoid the maximum number of parameters being hit on larger batches then use INSERT…UNEST.
INSERT...UNNEST at a batch size of 100,000 does a lot better against any of the text COPY variants than I thought it would: there is actually only 3 ms in it 👀!

💡

I ran this with a larger dataset of 100 million rows as well. Performance is slightly worse, probably because PostgreSQL is checkpointing in the background. However, the general numbers and relative speeds remain very similar.

So, Which Should You Use?

If you're looking to optimize batch ingestion in PostgreSQL, the right method depends on your specific use case, batch size, and application requirements. Here’s how the options stack up:

Small Batch Sizes (<= 10,000 rows): Prepared INSERT...UNNEST can be surprisingly competitive. Down at a batch size of 1,000, COPY is actually much slower.
Large Batch Sizes (> 10,000 rows): For maximum throughput with larger batches, binary COPY is unbeatable. Its ability to bypass server-side parsing and its use of a dedicated ring buffer make it the top choice for high-velocity data pipelines. If you need speed, you can have larger batches, and your application can support the binary protocol, this should be your default.
Ease of Implementation: If you prioritize ease of implementation or need compatibility across a wide range of tools, text COPY is a great middle-ground. It doesn't require complex client-side libraries and is supported in nearly every language that interacts with PostgreSQL. You can also just throw your CSV data at it.
Considerations Beyond Speed:
- Upserts: If you need conflict handling (INSERT...ON CONFLICT), COPY isn't an option, and you'll need to stick with INSERT (unless you just want to ignore errors and you're happy with text COPY, in which case PostgreSQL 17 has your back with ON_ERROR).
- Framework support: Ensure your preferred framework supports your chosen method; COPY usually requires a different API to be used and binary COPY may require an extension library or not be supported.
- Batch size limits: Watch for the 32,767-parameter limit when using parameterized INSERT...VALUES.
- Memory and disk write overheads: COPY is designed to have the least impact on your system, writing less data to disk and not polluting shared_buffers. This is actually a big consideration! In fact, both the COPY methods write 62 MB of WAL for the one million row test, while INSERT writes 109 MB. This ~1.7x rule seems to hold across any ingest size.

Final Thoughts for Developers

When it comes to PostgreSQL batch ingestion, there is no one-size-fits-all solution. Each method offers trade-offs between performance, complexity, and flexibility:

For maximum raw speed over larger batches, binary COPY is your best bet.
For flexibility and ease of use over larger batches, text COPY balances speed with broad support.
For smaller batches or compatibility-focused workflows, prepared INSERT...UNNEST statement can hold its own, offering competitive speeds with maximum flexibility (but remember, if you have a heavy ingest pipeline, you risk disrupting shared_buffers, and you will be writing more to WAL).

Remember, the “best” method isn’t just about ingest speed; it’s about what fits your workflow and scales with your application. Happy ingesting!

Boosting Postgres INSERT Performance by 2x With UNNEST

James Blackwood-Sewell — Fri, 15 Nov 2024 17:00:33 GMT

If you Google Postgres INSERT performance for long enough, you’ll find some hushed mentions of using an arcane UNNEST function (if you squint, it looks like a columnar insert) over a series of arrays to increase performance. Any performance gains sound good to me, but what's actually going on here?

I’ve been aware of this technique for a long time (in fact, several object-relational mappers use it under the hood), but I’ve never fully understood what's happening, and any analysis I’ve seen has always left me wondering if the gains were as much about data wrangling in the programming language used as Postgres speed. This week I decided to change that and do some testing myself.

💡

This used to be called "Boosting Postgres INSERT Performance by 50% with UNNEST". The performance went from 2.19s to 1.03s, which 52.97% less time, but also 113% faster.

I changed the wording in this article to 2x because I think that's always clearer (thanks /u/a3kov and /u/lobster_johnson)

The Introduction: INSERTs in Postgres

At Tiger Data, I work with time-series data, so I gave my analysis a time-series slant. I want to simulate inserting a stream of records into my database with the INSERT statement (yes, I know COPY is a thing, see the callout below), and in doing so, I want to minimize the load I create as much as possible (saving my precious CPU cycles for my real-time analytics queries).

💡

If you’re aiming to load data into your database as quickly and efficiently as possible, check out the PostgreSQL COPY command—it’s almost always faster than using regular INSERT. We benchmarked Postgres data ingestion methods in an earlier post.

However, even though COPY is faster, many developers still prefer INSERT for its flexibility. INSERT supports useful features like upserts (INSERT ... ON CONFLICT), returning the inserted rows, and has better integration with language libraries. Plus, it can be part of a larger SQL query, giving you more control over the data insertion process.

Let’s take a closer look at the INSERT queries I tested using a batch size of 1,000, 5,000, and 10,000 records.

In one corner, we have the multi-record INSERT variant we all know and love, using a VALUES clause followed by a tuple per row in the batch. These queries look long but also pretty easy to understand.

INSERT INTO sensors (sensorid, ts, value)
VALUES 
  ($1, $2, $3), 
  ($4, $5, $6), 
   ..., 
  ($2998, $2999, $3000);

In the other corner, we have our UNNEST variant, using a SELECT query that takes one array per column and uses the UNNEST function to convert them into rows at execution time.

INSERT INTO sensors (ts, sensorid, value) 
  SELECT * 
  FROM unnest(
    $1::timestamptz[], 
    $2::text[], 
    $3::float8[]
)

The Postgres documentation describes UNNEST as a function that “expands multiple arrays (possibly of different data types) into a set of rows.” This actually makes sense, it’s basically flattening a series of arrays into a row set, much like the one in INSERT .. VALUES query.

One key difference is that where the first variant has batch_size * num_columns values in the query, the UNNEST variant only has num_columns arrays (each of which contains batch_size records when it’s flattened). This will be important later, so take note!

The Setup

I ran the benchmark on a single TimescaleDB 4 CPU/16 GB memory instance (the spec isn't really important for this benchmark) with a very simple schema (the same table I used on the SkipScan performance post).

CREATE TABLE sensors (
    sensorid TEXT,
    ts TIMESTAMPTZ,
    value FLOAT8
);

I was hoping to use Grafana k6 for all my performance articles, but in this case, it didn’t make sense. I don’t want to measure the time that application code takes to get my data into the format an INSERT .. VALUES or INSERT .. UNNEST statement needs (especially in TypeScript), I just want the time the database spends processing the statements and loading my data.

I fell back to using good old pgbench for these tests with a static file for each INSERT variant and batch combination. As usual, you can find the files in the timescale/performance GitHub repo.

I ran each of the following queries to insert one million records using a single thread:

INSERT .. VALUES with a batch size of 1,000
INSERT .. VALUES with a batch size of 5,000
INSERT .. VALUES with a batch size of 1,0000
INSERT .. UNNEST with a batch size of 1,000
INSERT .. UNNEST with a batch size of 5,000
INSERT .. UNNEST with a batch size of 10,000

I used the pg_stat_statments (if you don’t know about this amazing extension, then do yourself a favor and look it up!) statistics in the database to extract the total _planning_time and total_exec_time for each run.

The Results: INSERT VALUES vs. INSERT UNNEST

The results were very clear: at the database layer, INSERT .. UNNEST is 2.13x faster than INSERT .. VALUES at at batch size of 1000! This ratio held steady regardless of batch size (and even with multiple parallel jobs).

The primary savings come at query planning time. With the INSERT .. VALUES approach, Postgres must parse and plan each value individually (remember how many there were?). In contrast, INSERT .. UNNEST processes one array per column, which reduces the planning workload by not working with individual elements at plan time.
Execution time is similar between both methods. The actual query execution was time slightly slower for UNNEST, which reflects the extra work that the UNNEST function needs to do. This was more than made up for by the planning gain.

As you might expect adding columns makes things even better for UNNEST, with 10 float columns (rather than one) we get a massive 5.02x faster So if you've got a wide schema, you're in for even more performance gains (but I wanted to leave this article at what most people could reasonably expect).

If you’d like to see the graphs for the 5,000 and 10,000 batch sizes, then check out the PopSQL dashboard.

A reasonable response to this might be, "What if we prepared the INSERT .. VALUES query, would that reduce planning time and make it the winner?". Some quick tests (unfortunately, pg_stat_statements can't track statistics for EXECTUTE queries on prepared statements) show that this is not the case; UNNEST is still king.

Should I use UNNEST?

There’s no question that in terms of database performance, INSERT .. UNNEST beats INSERT .. VALUES for batch inserts. By minimizing planning overhead, UNNEST unlocks an almost magical speed boost, making it a fantastic option for scenarios where ingestion speed is critical. One thing to keep in mind is that the overhead of your language and network latency often contribute just as much to the total time you see in your application, but still, your database will be working less, which is always a good thing.

As with any optimization, there’s a trade-off. The key consideration isn’t always just speed; it’s also usability. The INSERT .. VALUES syntax is intuitive and widely understood, making it easier to adopt and maintain, especially in teams or projects where SQL expertise varies. Pivoting to use UNNEST introduces complexity. You’ll need to wrangle your data into arrays, and if you’re using an ORM, you might discover it doesn’t support this pattern at all. If you're writing raw SQL, UNNEST might be less familiar to future developers inheriting your codebase.

And while UNNEST is fast, let’s not forget about COPY, which remains the undisputed gold standard for ingestion. If you don’t need features like upserts (ON CONFLICT clauses), COPY will get your data in faster, and with less overhead.

Final Thoughts for Developers

Think of INSERT .. UNNEST as a magic performance hack sitting squarely between traditional INSERT .. VALUES and COPY. It delivers significant speed improvements for batch ingestion while retaining the flexibility and composability of SQL INSERT statements.

At Tiger Data, we love exploring the edges of what Postgres can do and techniques like INSERT .. UNNEST remind us why. It’s elegant, fast, and underutilized, but hopefully no longer misunderstood. If you’re aiming to push your database to its limits, we highly recommend adding this pattern to your SQL toolkit. It’s another example of how understanding Postgres deeply can help you get the most out of your system. And if you want to optimize your PostgreSQL database for time series, events, real-time analytics, or vector data, take TimescaleDB out for a spin.

PostgreSQL DISTINCT: TimescaleDB’s SkipScan Under Load

James Blackwood-Sewell — Thu, 07 Nov 2024 14:00:24 GMT

The Introduction: DISTINCT Queries in PostgreSQL

Let’s say you’re working with sensor data in PostgreSQL, with each reading containing a sensor ID, timestamp, and value. You want to power an application dashboard that needs to know the last known state of each sensor in your fleet. Your query might look like this:

SELECT DISTINCT ON (sensorid) *
FROM sensors
ORDER BY sensorid, ts DESC;

The DISTINCT ON clause ensures only one record per sensor is selected, and because the query is ordered by descending timestamp, you’ll get the latest reading for each sensor (although you could also use a WHERE clause to get the latest value at another point in time). Simple enough, right?

In practice, this query pattern can be inefficient, even with proper indexing. In this post, I’ll explain why and walk through a benchmark demonstrating that TimescaleDB’s SkipScan can optimize this query by an astonishing 10,548x at p50 and 9,603x at p95.

💡

This post is about optimizing DISTINCT queries to get the last values associated with an ID quickly, if you want to estimate the cardinality of your dataset (count the unique IDs) then check out the timescaledb-toolkit, which gives you hyperloglog

SkipScan Details

SkipScan is one of those TimescaleDB features that flies under the radar but provides impressive performance improvements—especially given it works with both Timescale’s hypertables and standard PostgreSQL tables (although not currently on compressed hypertables).

As tables and indexes grow, DISTINCT queries slow down in PostgreSQL because it doesn’t natively pull unique values directly from ordered indexes. Even if you have a perfect index in place, PostgreSQL will still scan the full index, filtering out duplicates only after the fact. This approach leads to a significant slowdown as tables grow larger.

SkipScan enhances the efficiency of SELECT DISTINCT ON .. ORDER BY queries by allowing PostgreSQL to directly jump to each new unique value within an ordered index, skipping over intermediate rows. This approach eliminates the need to scan the entire index and then deduplicate, as SkipScan directly retrieves the next distinct value, significantly accelerating query performance. If you're after a deep dive, check out the docs.

We’ve run benchmarks on SkipScan before, but this time, I wanted to see how it interacts in a more realistic environment with ingest and query running at the same time.

The Setup

I set up two Timescale Cloud instances with identical configurations (4 CPUs and 16 GB of memory). On one instance, I disabled SkipScan (SET timescaledb.skip_scan=off), allowing it to default to standard PostgreSQL behavior. The other instance had SkipScan enabled to compare performance.

I created an empty test table using the following SQL (and without any TimescaleDB-specific features):

CREATE TABLE sensors (
  sensorid TEXT, 
  ts TIMESTAMPTZ,
  value FLOAT8);
  
CREATE UNIQUE INDEX ON sensors (sensorid, ts DESC);

Using Grafana K6 (with the xk6-sql extension), I ran the following test for twenty minutes:

Data ingest: Ingest ran at a target rate of 200K rows per second, using INSERT to ingest data from 1000 sensors, in batches of 1000, with up to 10 concurrent workers (watch this space for a deep dive into the performance of different PostgreSQL INSERT patterns coming soon).
Query load: A SELECT DISTINCT ON query, running 10 times per second with up to 5 concurrent workers. This query pulls the latest reading for all 1000 sensors, simulating an application's needs.

You'll remember the query from earlier:

SELECT DISTINCT ON (sensorid) *
FROM sensors
ORDER BY sensorid, ts DESC;

If you’d like to recreate the benchmark, then check out the GitHub repository for the series.

The Results: SkipScan vs. Vanilla PostgreSQL

The graphs speak for themselves (please note the X axis in the query graph is a logarithmic scale), but here's a summary:

The standard PostgreSQL server started ingesting 13 % slower and couldn’t sustain the 200K/second goal (it only caught up as DISTINCT14-minute queries stopped returning).
SkipScan performed over 11x faster at p50 and p95 right from the start.
By the 14-minute mark, SkipScan was 10,548x faster at p50 and 9,603x faster at p95 than standard PostgreSQL.
SkipScan maintained stable performance throughout the run, while PostgreSQL didn’t return any results after 14 minutes (RIP your dashboard).

If you’d like to interact with the data then you can check out this PopSQL dashboard.

The Conclusion

SkipScan is a pretty remarkable feature, transforming underperforming DISTINCT queries into highly efficient operations. While there has been some discussion on adding it to PostgreSQL, TimescaleDB has your back today. Because SkipScan is not limited to hypertables, it benefits regular PostgreSQL tables as well, giving developers a performance boost just by adding the TimescaleDB extension.

In environments where you need fast, up-to-date insights—like the dashboard example with sensor data—SkipScan lets you keep pace without sacrificing performance. It’s one of those “small but mighty” features that often goes unnoticed but has an outsized impact on real-time analytics workloads.

What We’re Excited About PostgreSQL 17

Aleksander Alekseev — Thu, 16 May 2024 12:59:30 GMT

The next major PostgreSQL release (PostgreSQL 17) is scheduled for September.

In 2023, PostgreSQL regained the attention it deserves as a rock-solid relational database. It was voted the most popular DB in the Stack Overflow Developer Survey and named database management system of the year by DB-Engines. Here at Timescale, we also consolidated our status as fierce PostgreSQL fans: besides having built Timescale on PostgreSQL, we believe PostgreSQL is evolving as a platform and becoming the bedrock for the future of data. So, excuse us for being a bit excited about PostgreSQL 17.

In its latest releases, we’ve watched PostgreSQL develop toward higher performance, scalability, security, and compatibility while introducing new features to meet the evolving needs of users and applications, especially enterprise ones. The improvements to privilege administration, logical replication, and monitoring are examples of that. More importantly, during this time, we contributed, managed commitfests, and created new features and products to expand it—from boosting real-time aggregation by 50,000 % to powering production AI applications.

In this blog post, we gathered Timescale contributors and enthusiasts to discuss a few of the most exciting PostgreSQL 17 commits. As we count the days until September, we’ll also examine PostgreSQL’s direction for this release. Finally, we’ll share some of our commits, as we help build up PostgreSQL as a versatile development platform for everything.

PostgreSQL 17: Where It Came From and Where It’s Headed

Looking at the several PostgreSQL 17 commits, Aleksander Alekseev, long-time PostgreSQL contributor and Timescaler, says significant changes to modernize PostgreSQL are underway. “I believe the future of Postgres is bright,” he notes, adding that “new people are joining the project.” Perhaps influenced by the new wave of contributors, the changes to PostgreSQL 17 reflect the project’s commitment to embracing modern methodologies and adapting to the ever-evolving tech landscape

One such notable change in version 17, says Aleksander, is the decision to drop support for AIX, an operating system developed by IBM. AIX, while historically significant, has seen declining usage in recent years, prompting PostgreSQL to reallocate resources towards supporting more widely adopted platforms. This strategic move enables PostgreSQL to focus on enhancing compatibility with modern operating systems.

While they may seem more focused today, the PostgreSQL community's efforts to make PostgreSQL a solid database for modern data needs were already visible in previous versions, including the current one, PostgreSQL 16. As a specific example, Aleksander mentions the transition from Autotools to the Meson build system. Autotools, a long-standing suite of tools for configuring, building, and installing software packages, has been a stalwart in the development process of PostgreSQL.

However, with the advent of Meson, a contemporary build system known for its simplicity, speed, and scalability, PostgreSQL managed to streamline its development workflows. Meson offers advantages such as improved performance, easier maintenance, and better cross-platform compatibility, which PostgreSQL currently extends to its users.

What We’re Excited About PostgreSQL 17

Now that we’ve seen where PostgreSQL 17 is headed, let’s discuss some of the commits that have caught our 👀.

pg_createsubscriber

Suggested by Timescaler and PostgreSQL contributor Fabrízio de Mello, pg_createsubscriber is a new PostgreSQL 17 tool that allows users to create a new logical replica from a physical standby server. “The main advantage of this tool over a common logical replication setup is the initial data copy, which can take longer on large databases and have side effects, like autovacuum issues, due to the long-running transaction to copy data from one server to another. This tool will also help reduce the catchup phase,” explains Fabrízio.

Support for MERGE PARTITIONS and SPLIT PARTITIONS

While ALTER TABLE is a well-known statement that changes the structure of a PostgreSQL table, PostgreSQL 17 comes along with two new commands: MERGE PARTITIONS and SPLIT PARTITIONS. As the name indicates, these new DDL commands merge or split several partitions. “The current implementation has certain limitations though,” says Aleksander. “It works as a single process and holds the ACCESS EXCLUSIVE LOCK on the parent table during all operations. This is why the new DDL commands are not advisable for large partitioned tables under a high load,” he adds.

Add support for incremental file system backup

“This is another feature worth mentioning,” says Aleksander. Adding support for incremental file system backup in PostgreSQL enhances the database's ability to perform efficient and effective backups. Incremental backups only save changes made since the last backup (full or incremental). This significantly reduces the volume of data to be backed up compared to full backups, which capture the entire database. And since incremental backups involve less data, the backup process is faster, minimizing the impact on system performance and reducing downtime.

Developed by Robert Haas, Jakub Wartak, and Tomas Vondra, this commit has been struggling with stability issues, as explained by Robert on his blog. “Hopefully it won’t be reverted (as many other commits this month),” comments Aleksander.

Enable the failover of logical slots

Picked by two Timescalers, Fabrízio and our head of Developer Advocacy, James Blackwood-Sewell, this commit by Hou Zhijie, Shveta Malik, and Ajin Cherian lets high-availability PostgreSQL use logical replication and not lose downstream data in case of a failover. Enabling the failover of logical replication slots in PostgreSQL enhances the robustness and reliability of logical replication setups by allowing logical slots to be transferred and maintained across different database instances.

Allow EXPLAIN to report optimizer memory usage

“This commit by Ashutosh Bapat is another good one,” notes Aleksander. Allowing the EXPLAIN command to report optimizer memory usage in PostgreSQL provides valuable insights into the resources consumed by the query planner and optimizer during the preparation of query execution plans.“It will allow the developer to choose the query that uses less memory,” explains Aleksander. This makes it especially helpful for those trying to fine-tune PostgreSQL’s performance.

💻

If you’re struggling to improve your PostgreSQL performance, these resources will help you get the most out of your database.

Any on this list, really

Bruce Momjian has always been an inspiration to us—bow tie included—so we can safely say that any of the contributions on this list, which Aleksander describes as “overall performance improvements” make us excited about getting our hands on the new PostgreSQL version.

What We Committed to PostgreSQL 17

In total, 90 commits (3.5 percent of all commits) were authored, co-authored, and/or reviewed by Timescalers during the PostgreSQL 17 cycle. 😎We’re not going to bother you by going over all of them, but we asked our team of upstreamers to name some of their personal favorites.

The SLRU move to 64-bit indexes

“Personally, I’m most excited about the series of patches that moved SLRU (simple least recently used) caches to the 64-bit indexes,” says Aleksander. While we’re not there yet, this opens the path to 64-bit XIDs, which will mitigate the problem of XID wraparound certain users face under specific workloads, such as mixing long-living OLAP (online analytical processing) transactions and OLTP (on-line transaction processing) workloads on the same PostgreSQL instance.

Transitive comparisons

Another Timescaler who contributed to PostgreSQL was database architect Mats Kindahl. Mats helped with refactoring to ensure transitive comparisons in PostgreSQL, which brings several benefits to users. Transitive comparisons allow for more concise and intuitive query expressions, improve query optimization, enhance index usage, and facilitate data modeling, as developers can define relationships between entities more naturally.

standard_ExplainOneQuery

Mats also worked on the introduction of standard_ExplainOneQuery in PostgreSQL 17. This addition helps ensure consistent behavior when adding explain hooks, making it easier to predict and understand the effects of explain hooks on query explanation. Developers can focus on implementing specific hooks without worrying about the nuances of query explanation behavior, leading to more efficient development processes and facilitating query performance tuning.

UUIDv7

On the reviewing front, Aleksander reviewed (along with other contributors) the partial merge of UUIDv7 support authored by Andrey Borodin. “While there are several UUIDv7 implementations available, the UUIDv7 standard is currently in draft condition,” explains Aleksander, adding that PostgreSQL will only support when the standard is finalized. Once it’s fully supported by PostgreSQL, UUIDv7 will help make time-based queries more efficient.

Expanding PostgreSQL

Here you have it, a reflection on the direction of PostgreSQL 17, the new updates we’re excited about, and some of the contributions we made. If like us, you want to carry on (or start) building on PostgreSQL, give Timescale a try. Features like hypertables (automatically partitioned PostgreSQL tables), continuous aggregates (automatically refreshed materialized views), and advanced data management techniques will significantly enhance PostgreSQL's ability to manage your most demanding workloads effectively.

If you want to expand PostgreSQL’s capabilities while using the PostgreSQL you know and love, create a free Timescale account today.

Amazon Aurora vs. PostgreSQL: 35% Faster Ingest, Up to 16x Faster Queries, and 78% Cheaper Storage

James Blackwood-Sewell — Wed, 22 Nov 2023 18:38:28 GMT

At Timescale, we pride ourselves on making PostgreSQL fast. We started by extending PostgreSQL for new workloads, first for time series with TimescaleDB, then with Timescale Vector, and soon in other directions (keep an 👀 out). We don’t modify PostgreSQL in any way. Our innovation comes from how we integrate with, run, and schedule databases.

Many users come to us from Amazon RDS. They started there, but as their database grows and their performance suffers, they come to Timescale as a high-performance alternative. To see why, just look at our time-series benchmark, our usage-based storage pricing model, and our response to serverless, which gives you a better way of running non-time-series PostgreSQL workloads in the cloud without any wacky abstractions.

Amazon Aurora is another popular cloud database option. Sometimes, users start using Aurora right away; other times, these users migrate from RDS to Aurora looking for performance from a faster, more scalable PostgreSQL. But is this what they find?

This article looks at what Aurora is, why you’d use it, and presents some interesting benchmark results that may surprise you.

What is Aurora? (It’s not PostgreSQL)

Amazon Aurora is a database as a service (DBaaS) product released by AWS in 2015. The original selling point was of a relational database engine custom-built to combine the performance and availability of high-end commercial databases (which we guess means Oracle and SQLServer) with the simplicity and cost-effectiveness of open-source databases (MySQL and PostgreSQL).

Originally, Amazon Aurora only supported MySQL, but PostgreSQL support was added in 2017. There have been a bunch of updates over the years, with the most important being Aurora Serverless (and then, when that fell a bit flat, Serverless v2), which aims to bring the serverless “scale to zero” model to databases.

Aurora’s key pillars have always been performance and availability. It’s marketed as being faster than RDS (“up to three times the throughput of PostgreSQL”), supporting multi-region clusters, and highly scalable. Not much is known about the internals of Aurora (it’s closed-source, after all), but we do know that compute and storage have been decoupled, resulting in a cloud-native architecture that is PostgreSQL-compatible but isn’t Postgres.

Investigating Aurora

There are a few ways of running Aurora for PostgreSQL, and you’ll be asked two critical questions from the Create Database screen.

First up, you need to select a cluster storage configuration:

Do you want to pay slightly less for your compute and stored data with an additional charge per I/O request (Aurora Standard)?
Or, do you want to pay a small premium on compute and stored data, but I/O is included (Aurora I/O-Optimized)?

In our benchmark, we saw a 33 % increase in CPU costs and a massive 125 % increase in storage costs when moving from Standard to I/O-Optimized, although I/O-Optimized still came in cheaper once the I/O was factored in. AWS recommends using an I/O-Optimized instance if your I/O costs exceed 25 % of your database costs.

I/O-Optimized turns out to be a billing construct: we saw roughly equivalent performance between the two storage configurations.

After you’ve chosen that, there’s another big decision coming up: do you want to enable Serverless v2?

Although three options are shown, there are really only two: Provisioned and Serverless. Provisioned is where you choose the instance class for your database, which comes with a fixed hourly cost. Serverless is where your prices are driven by your usage.

If you have quiet periods, Serverless might save you money; if you burst all the time, it might not. When you choose a Provisioned type, you get a familiar “choose your instance type” dialog; when you select Serverless, you get something new.

So, instead of choosing a CPU and memory allocation associated with an instance, you set a range of resources in ACUs (Aurora Capacity Unit), which your cluster will operate within.

So, what exactly is an ACU? That’s an excellent question and one which we still don’t entirely know the answer to. You can see that the description states an ACU provides “2 GiB of memory and corresponding compute and networking,” but what on Earth is corresponding compute and networking?

How do you compare this to Provisioned if you have no idea how many CPUs are in an ACU? Is an ACU one CPU, half a CPU, a quarter of a CPU? We actually have no idea, and we can see no way to quickly find out. The opacity was frustrating during our tests. It feels obfuscated for no good reason.

Confusion aside, the general idea is that, at any time, Amazon Aurora will use the number of ACUs (in half-a-point increments) that it needs to sustain your current workload within the range you specify. If your workload lets you scale up and down, Serverless might be a good idea. Or is it?

Aurora costs

So, why isn’t everybody using Aurora? The other axis is price, and while Amazon Aurora pricing is significantly harder to model than RDS, it’s definitely more expensive, with the difference soaring as you scale out replicas or multiple regions.

We thought so. We have had some interesting testimonials from customers telling us that they had lost confidence in Aurora. So, to draw our own conclusions, we started where any reasonable engineer would—we benchmarked.

Benchmarking Configuration

But, before we started, we had to decide what we would benchmark against. We ended up choosing the Serverless (v2) I/O-Optimized configuration because that’s what we tend to see people using in the wild when they talk to us about migration.

When deploying Amazon Aurora Serverless, we need to choose a range of ACUs (our mystery billing units). We wanted to compare with a Timescale 8 CPU/32 GB memory instance, so we selected a minimum of 8 ACUs (16 GB) and a maximum of 16 ACUs (32 GB memory). Again, this veneer over CPUs is very confusing. In a perfect world, one would hope that an ACU provides one CPU from the underlying instance type—but we just don’t know.

We used the Time Series Benchmark Suite (TSBS) to compare Amazon Aurora for PostgreSQL because we wanted to benchmark for a specific workload type (in this case, time series) to see how the generic Aurora compared to PostgreSQL that has been extended for a particular workload (and also because we ❤️ time series).

Note: Many types of workloads are actually time series, more than you would think. This doesn’t only apply to the more traditional time-series use cases (e.g., finance) but also to workloads like energy metrics, sensor data, website events, and others.

We used the following TSBS configuration across all runs (for more info about how we run TSBS, you can see our RDS Benchmark):

	Timescale	Amazon Aurora Serverless for PostgreSQL
PostgreSQL version	15.4	15.3 (latest available)
PostgreSQL configuration	No changes	synchronous_commit=off (to match Timescale)
Partitioning system	TimescaleDB (partitions configured transparently at ingest time)	pg_partman (partitions manually configured ahead of time)
Compression into columnar	Yes	Not supported
Partition size	4h (each system ended up with 26 non-default partitions)
Scale (number of devices)	25,000
Ingest workers	16
Rows ingested	868,000,000
TSBS profile	DevOps
CPU / Memory	8 vCPU / 32GB memory	8-16 ACUs (see below for more details)
Volume size	Dynamic	Dynamic
Disk type	Default provisioned IOPs (no changes)

Aurora vs. PostgreSQL Ingest Performance Comparison

We weren’t expecting Timescale to compare when it came to ingesting data (we know the gap between us and PostgreSQL for ingest has been narrowing as PostgreSQL native partitioning gets better). By separating the compute and storage layers, we thought we would see some engineered gains in Aurora.

What we actually saw when we ran the benchmark—ingesting almost one billion rows—was Timescale ingesting 35 % faster than Aurora with 8 CPUs. Aurora was scaled up to 16 ACUs for the entire benchmark run (including the queries in the next section). So not only was Timescale 35 % faster, but it was 35 % faster with 50 % of the CPU resources (assuming 1 CPU == 1 ACU).

At this stage, some of you might be wondering why Timescale jumped in ingest speed around the 30-minute mark. The jump happened when the platform dynamically adapted the I/O on the instance as we saw data flooding in (thanks to our amazing Usage Based Storage implementation).

Aurora vs. PostgreSQL Query Performance Comparison

Query performance matters with a demanding workload because your application often needs a response in real or near real-time. While the details of the TSBS query types are basically indecipherable (here’s a cheat sheet), they model some common (although quite complex) time-series patterns that an application might use. Each query was run 10 times, and the average value was compared for each of our target systems.

The results here tell another very interesting story, with Timescale winning in most query categories—we were between 1.15x and 16x faster, with two queries being slightly slower. When we did a one-off test with a Timescale instance with 16 CPUs, some queries stretched out to 81x faster, with all categories being won by Timescale.

Why is this? Timescale is optimizing for the workload by teaching the planner how to handle these analytical queries and also using our native compression—which flips the row-based PostgreSQL data into a columnar format and speeds up analysis. For more information about how our technology works and how it can help you, check out our Timescale vs. Amazon RDS benchmark blog post.

Aurora vs. PostgreSQL Data Size Comparison

What about the total size of the CPU table at the end of the benchmark? There were no surprises here. Amazon Aurora (even though it’s using a different storage backend to PostgreSQL) doesn’t seem to change the total table size, with it coming in at 159 GB (the same as RDS did). In contrast, Timescale compresses the time-series data by 95 % to 8.6 GB.

Aurora vs. PostgreSQL Cost Comparison

There is no way to sugarcoat it: Amazon Aurora Serverless is expensive. While we were benchmarking, it used 16 ACUs constantly. First, we tried the standard Serverless product, but it charged a prohibitive amount for I/O, which is why we don’t see anyone using it for anything even remotely resembling an always-on workload. It defeats the purpose of serverless if you can’t actually ingest or query data without breaking the bank.

So, we switched to the Serverless v2 I/O-Optimized pricing, which charges a small premium on compute and storage costs and zero rates on all I/O charges. It’s supposed to help with pricing for a workload like the one we’re simulating.

Let’s see how Aurora I/O-Optimized really did. The bill for running this benchmark has two main components: compute and storage costs. (Although Aurora actually charges for some other facets, the costs were low in this case). These are the results:

Compute costs:

Aurora Serverless v2 I/O-Optimized costs $2.56 per hour for the 16 ACUs, which were used for the duration of the benchmark.
The Timescale 8vCPU instance costs $1.26 per hour (52 % cheaper than Aurora).

Storage costs:

Aurora Serverless v2 I/O-Optimized needed 159 GB of storage for the CPU table and indexes, which would be billed at $34 per month.
Timescale needed 8.6 GB to store the CPU table and indexes, which would be billed at $7.60 per month (78 % cheaper than Aurora).

Timescale is 52 % cheaper to run the machines used for the benchmark (assuming a constant workload) and 78 % cheaper to store the data created by the benchmark.

Our Finding

The main takeaway from this benchmark was that, although Aurora Serverless is commonly used to “scale PostgreSQL” for large workloads, when compared to Timescale, it fell (very) short of doing this.

Timescale was:

35 % faster to ingest
1.15x-16x faster to query in all but two query categories
95 % more efficient at storing data
52 % cheaper per hour for compute
78 % cheaper per month to store the data created

While Aurora does replace PostgreSQL’s storage backend with newer (closed-source ☹️) technology, our investigation shows that Timescale beats it for large workloads in all dimensions.

Looking at this data, people might conclude that “Aurora isn’t for time-series workloads” or “of course a time-series database beats Aurora (a PostgreSQL database) for a time-series workload.” Both of those statements are true, but we would like to leave you with three thoughts:

Timescale is PostgreSQL—in fact, it’s more PostgreSQL than Amazon Aurora.
Timescale is tuned for time-series workloads, but that doesn’t mean it’s not also great for general-purpose workloads.
A very high proportion of the “large tables” or “large datasets” that give PostgreSQL problems (and might cause people to look at Aurora) are organized by timestamp or an incrementing primary key (perhaps bigint)—both of which Timescale is optimized for, regardless of if you call your data time-series or not.

Create a free Timescale account to get started with Timescale today.

Teaching Postgres New Tricks: SIMD Vectorization for Faster Analytical Queries

James Blackwood-Sewell — Wed, 15 Nov 2023 13:28:58 GMT

After more than a year in the works, we’re proud to announce that the latest release of TimescaleDB (TimescaleDB 2.12) has added a vectorized query pipeline that makes Single Instruction, Multiple Data (SIMD) vectorization on our hybrid row columnar storage a reality for PostgreSQL. Our goal is to make common analytics queries an order of magnitude faster, making the world’s most loved database even better.

We’ve already built a mechanism for transforming your PostgreSQL tables into hybrid row columnar stores with our native columnar compression. When you compress data you get the immediate benefit of significantly reducing storage size, and you get the secondary benefit of spending less CPU time waiting for disk reads. But there is another avenue for optimization that comes from columnar storage, and we are now focused on unlocking its potential to set analytical queries on fire.

(Here's a sneak preview so you can see what we're talking about.)

If you thought you had to turn to a specialized “analytics” columnar database to serve your queries, think twice. In this article, we walk you through how we’ve supercharged PostgreSQL with vectorization, or to be more precise, implemented a vectorized query execution pipeline that lets us transparently unlock the power of SIMD, so you can start on Postgres, scale with Postgres, and stay with Postgres—even for your analytical workloads.

💡

This work started to be released in TimescaleDB 2.12 which is available today, and is continuing in TimescaleDB 2.13, which will ship in November to all time-series services in the Timescale platform. Create an account here and try it out for 30 days.

From Postgres Scaling Issues to Vectorization

The decision to implement vectorized query execution in TimescaleDB comes from a long line of initiatives aimed at improving PostgreSQL’s experience and scalability. Before we get into the technical details, let’s start by discussing where developers reach the limits of Postgres and how vectorization can help.

You love Postgres (doesn’t everyone?) and chose it to power your new application because using a rock-solid, widely-used database with an incredibly diverse ecosystem that supports full SQL just makes sense.

Things are going really well, development is easy, the application launches. You might be working with IoT devices, sensors, event data, or financial instruments—but whatever the use case, as time moves on, data starts piling up. All of a sudden, some of the queries that power your application mysteriously begin to get slower. Panic starts to settle in. 😱

Fast forward a few weeks or months, and something is off. You’re spending money on adding additional resources to the database and burning precious developer time trying to work out what’s broken. It doesn’t feel like anything is wrong on the application side, and tuning PostgreSQL hasn’t helped. Before you know it, someone has proposed splitting part of the workload into a different (perhaps “purpose-built”) database.

Complexity and tech debt rocket as the size of your tech stack balloons, your team has to learn a new database (which comes with its own set of challenges), and your application now has to deal with data from multiple siloed systems.

Teaching Postgres new tricks to make this journey smoother

This is the painful end-state that Timescale wants to help avoid, allowing developers to scale and stay with PostgreSQL. Over the years, TimescaleDB has made PostgreSQL better with many features to help you scale smoothly, like hypertables with automatic partitioning, native columnar compression, improved materialized views, query planner improvements, and much more. If it holds you back in PostgreSQL, we want to tackle it.

Which brings us to today’s announcement…

For the past year we’ve been investigating how to extend PostgreSQL to unlock techniques used by specialized analytics databases custom-built for OnLine Analytical Processing (OLAP), even while retaining ACID transactions, full support for mutable data, and compatibility with the rest of the wonderful ecosystem. We don’t have the luxury of building a database from the ground up for raw performance (with all the trade-offs that typically entails), but we think where we have ended up offers a unique balance of performance, flexibility, and stability.

You can teach an old elephant new tricks and sometimes get an order of magnitude speedup when you do!

Columnar Storage in PostgreSQL

Before we launch into the vectorization and SIMD deep dive, we need to set the scene by explaining the other feature which makes it possible, our compressed columnar storage.

By default, PostgreSQL stores and processes data in a way that is optimized for operating on data record by record (or row) as it’s inserted. The on-disk data files are organized by row, and queries use a row-based iterator to process that data. Paired with a B-tree index, a row-based layout is great for transactional workloads, which are more concerned with quickly ingesting and operating on individual records.

Databases that optimize for raw analytical performance take the opposite approach to PostgreSQL—they make some architectural trade-offs to organize writes with multiple values from one column grouped on disk. When a read happens, a column-based iterator is used, which means only the columns that are needed are read.

Column organized, or columnar, storage performs poorly when an individual record is targeted or when all columns are requested, but amazingly for the aggregate or single-column queries that are common in analytics or used for powering dashboards.

To clarify things, the following diagram shows how a row store and a column store would logically lay out data from devices measuring temperature.

Row vs. Columnar Storage: Why Not Both?

Traditionally, you had to choose between a database that supported a row-based format optimized for transactional workloads or one that supported a column-based format targeted towards analytical ones. But, what we saw over and over again with our customers is that, with the same dataset, they actually wanted to be able to perform transactional-style operations on recent data and analytical operations on historical data.

Timescale is built on Postgres, so we can store data using Postgres’ native row format effortlessly. We have also built out the ability to organize data by columns through our native columnar compression (check out this recent deep dive into the technical details). You can keep recent data in a row format and convert it to columnar format as it ages.

Both formats can be queried together seamlessly, the conversion is handled automatically in the background, and we can still support transactions and modifications on our older data (albeit less performantly).

When you’re working with columnar data, the benefit for analytical queries is immense, with some aggregate queries over columnar storage coming in 5x, 10x, and in some cases, even up to 166x faster (due to lower I/O requirements and metadata caching) compared to row-based storage, as well as taking 95 % less space to store (due to our columnar compression) when tested using the Time-Series Benchmark Suite.

But can we make this faster? Read on!

Vectorization and SIMD—Oh My!

Now that we have data in a columnar format, we have a new world of optimization to explore, starting with vectorization and SIMD. Current CPUs are amazing feats of engineering, supporting SIMD instruction sets that can process multiple data points with a single instruction, both working faster and giving much better memory and cache locality. (The exact number they can process depends on the register size of the CPU and the data size; with a 128-bit register, each vector could hold 4 x 32-bit values, resulting in a theoretical 4x speedup.)

A regular (or scalar) CPU instruction receives two values and performs an operation on them, returning a single result. A vectorized SIMD CPU instruction processes two same-sized vectors (a.k.a. arrays) of values simultaneously, executing the same operation across both vectors to create an output vector in a single step. The magic is that the SIMD instruction takes the same amount of time as its scalar equivalent, even though it’s doing more work.

Implementing vectorized query execution on top of our compressed columnar storage has been a significant focus for Timescale over the last year. It quickly became evident that implementing a vectorized query pipeline is one of the most exciting areas for optimization we can tackle—with performance increases by an order of magnitude on the table.

Timescale’s Vectorized Query Execution Pipeline

As of version 2.12, TimescaleDB supports a growing number of vectorized operations over compressed data, with many more coming in 2.13 and beyond. When we were starting, one of the biggest challenges was integrating the built-in PostgreSQL operators, which process data in row-based tuples, with our new vectorized pipeline, which would be triggered as the batch was decompressed and complete when the batch was aggregated.

This becomes very clear when we look at an aggregate query. For us to vectorize aggregation, we need to have that as part of our vectorization pipeline (and not at a higher level where PostgreSQL would normally handle it).

However, because a single query could be returning data from an uncompressed and a compressed chunk (our abstraction which partitions tables) at the same time, we also need to return the same type of data in both cases (even though no vectorization would take place for the uncompressed data). We did this by changing both plans' output to PostgreSQL Partial Aggregate nodes (which were actually developed for parallel aggregation) nodes rather than raw tuples. PostgreSQL already knows how to deal with partial aggregates, so this gives us a common interface to work with that allows early aggregation.

The following diagram contains a query plan for an aggregation query and shows how an uncompressed chunk, a compressed chunk with vectorization disabled, and a compressed chunk with vectorization enabled all flow up to the same PostgreSQL Append node.

But doing this had an amazing side-effect: we could now do early aggregation for uncompressed chunks! In fact, when we committed this in TimescaleDB 2.12, we saw a consistent 10-15 % speedup across all aggregation queries which operated on hypertables, even before we got to implementing vectorization (interestingly, a large part of this improvement comes from working with smaller datasets when aggregating, for example, smaller hash tables).

Now that we could keep PostgreSQL happy when aggregating by using Partial Aggregates, we turned our attention to the start of the pipeline. We knew that we needed to convert the compressed representation into an in-memory format, which each of our vectorization stages could use.

We chose to update our decompression node to read compressed data and output data in the Apache Arrow format, allowing us to quickly and transparently perform SIMD operations at each stage of the execution pipeline.

Vectorization Stages

So, now that we have a vectorization pipeline, we need to find operations that can benefit from SIMD to vectorize. Let’s start with an example: consider a typical dashboard query that shows some average metrics on a table with all data older than one hour compressed:

SELECT time_bucket(INTERVAL '5 minute', timestamp) as bucket,
       metric, sum(value)
FROM metrics
WHERE metric = 1 AND
      timestamp > now() - INTERVAL '1 day' 
GROUP BY bucket, metric;

Among the many things this query does are four crucial, computationally expensive stages that can be vectorized to use SIMD:

Decompressing the compressed data into the in-memory Apache Arrow format
Checking two filters in the WHERE clause, one for metric and one for time
Computing one expression using the time_bucket function
Performing aggregation using the SUM aggregate function

All of these stages benefit from vectorization in a slightly different way; let’s dig into each of them.

Vectorized decompression

We know that compression is a good thing, and when we decompress data, the CPU overhead incurred is almost always offset by the I/O savings from reading a smaller amount of data from disk. But what if we could use our CPU more efficiently to decompress data faster? In TimescaleDB 2.12, we answered that question with a 3x decompression speedup when using SIMD over vectorized batches where the algorithms support it.

While we raised our decompression throughput ceiling to 1 GB/second/core, there is more work to be done. Some parts of our modified Gorilla compression algorithm for floating-point values, as well as some custom algorithms for compressing small and repeated numbers (see this blog post for more algorithm details), don’t allow full use of SIMD because of the way they lay out compressed data, with internal references or complex flow control blocking us from unlocking more performance.

Looking to the future, we have identified some new algorithms designed with SIMD in mind, which can go an order of magnitude faster, so watch this space. 👀

On top of the speed benefits, vectorized decompression is where we convert our on-disk compression format into our in-memory Apache Arrow format that the rest of our vectorization pipeline consumes.

Vectorized filters

The next stage of query processing is applying compute-time filters from WHERE clauses. In an ideal analytical query, most of the data that doesn't match the query filters is not even read from storage. Unneeded columns are skipped, metadata is consulted to exclude entire columnar batches, and conditions are satisfied using indexes.

However, the real world is not ideal, and for many queries, not all conditions can be optimized like this. For example, when a filter (e.g., a where clause on a time range) partially overlaps a compressed batch, then some of the batch (but not all of it) has to be used to calculate the result.

In this case, vectorized filters can provide another large performance boost. As Apache Arrow vectors stream out of our decompression node, we can use SIMD to check each filter condition very efficiently by comparing the stream to a vector of constants. Using the example from above (namely, WHERE metric = 1 AND time > now() - INTERVAL '1 day'), we would compare the metric column against the value 1 and then also compare vectors of the time column against now() - INTERVAL '1 day'.

This optimization should be released in TimescaleDB 2.13, with early benchmark results against some real-world data showing up to a 50 % speedup on common queries.

But that’s not all vectorized filters can provide! Previously, even for compressed queries, all data was read from disk before filters were applied (a hold-over from the read-the-whole-row behavior that PostgreSQL employs by default).

Now that we are living in a columnar world, we can optimize this using a technique called “lazy column reads,” which reads the required columns for a batch early in the order they are defined in the WHERE clause. If any filters fail, the batch is discarded with no more I/O incurred. For queries with filters that remove a large number of full batches of records, this can result in an additional 25 % – 50 % speedup.

Vectorized expressions

Another important part of vectorized query pipelines is computing various expressions (projections) of columns that might be present in the query. In the simplest cases, this allows the use of the common vector CPU instructions for addition or multiplication, increasing the throughput.

More complex operations can benefit from handcrafted SIMD code, such as converting the string case, validating UTF-8, or even parsing JSON. More importantly, in some cases, the vectorized computation of expressions is a prerequisite for vectorizing the subsequent stages of the pipeline, such as grouping. For example, in the dashboard query we presented at the beginning of this section, we considered the grouping to be on time_bucket, so the result of this function must have a columnar in-memory representation to allow us to vectorize the grouping itself.

We haven’t made a start on vectorizing expressions yet, because aggregations will have a more immediate impact on analytics queries—but fear not, we will get to them!

Vectorized aggregation

Finally, the computation of most aggregate functions can also be vectorized to take advantage of SIMD. To demonstrate that this can work inside PostgreSQL as a partial aggregate, we built a high-throughput summation function that uses SIMD when working on columnar/compressed data, targeting the basic use case of SELECT sum(value) FROM readings_compressed (we can currently support filters on the segment_by column). Without further optimization, we saw a 3x speedup on compressed data.

Obviously, SUM is only one of the large set of aggregate functions that PostgreSQL provides (and TimescaleDB extends with hyperfunctions). So, in forthcoming versions, we will optimize our approach to aggregates and deliver vectorized functions with the eventual goal of supporting the full set of built-in and hyperfunction aggregates.

Adding It All Up

We’ve been showing you a lot of speedups, but how do they stack up in the real world?

We ran two simple queries, one which uses the vectorized SUM aggregate, and one which makes use of vectorized filters (unfortunately these can’t be combined at the moment). Both the queries were run on the same data (about 30 million rows) four times to show the gains from row-based, columnar (without a segment_by in this case), vectorized decompression, and then finally adding the last vectorized stage (aggregation or filter depending on the query).

We think the numbers can speak for themselves here 🔥.

Wrap-Up

Nothing gets us more excited at Timescale than finding smart solutions to hard problems which let people get more out of PostgreSQL. Since the first code for our vectorization pipeline hit Git, our internal Slack channels have been full of developer discussion about the optimizations and possibilities that vectorization on top of our columnar compression unlocks.

Looking forward, we are projecting that we can get even orders-of-magnitude performance improvements on some queries, and we’ve only started scratching the surface of what’s possible.

It’s an amazing time to be using Postgres.

Create a free Timescale account to get started quickly with vectorization in TimescaleDB today.

Pg_partman vs. Hypertables for Postgres Partitioning

James Blackwood-Sewell — Wed, 13 Sep 2023 14:26:04 GMT

You all know the feeling: You’ve got one big table in your database, and it’s getting slower and slower. Your app gets bottlenecked; user experience takes a dive. These aren’t happy times.

When you have data streaming into PostgreSQL constantly, sooner or later you end up with these big, slow tables. Luckily, the PostgreSQL ecosystem offers a range of partitioning techniques to optimize the performance and maintenance of these datasets. Among these partitioning methodologies, there are two that stand out as the most popular: Timescale's hypertables (optimized for time-based/range partitioning) and the pg_partman extension.

While both approaches aim to simplify partitioning, this article explores why we believe Timescale's hypertables present compelling advantages over pg_partman.

📑 Check out our previous article on when to consider partitioning— if you haven’t already.

Partitioning Strategies in PostgreSQL: Quick Overview

A partitioned table comprises many non-overlapping partitions, each covering a part of your dataset. When you select data from a partitioned table using a WHERE clause with a time-based restriction, PostgreSQL is able to immediately discard all the partitions that aren’t relevant before it plans the query.

Because we aren’t searching through all the data, we spend less time doing I/O, and the query is faster. If the total table size or (even worse) the total index size of the unpartitioned table exceeds the amount of memory Postgres uses for cache, then the difference becomes even more significant.

As we introduced in our previous article on partitioning, you can follow different strategies and techniques to partition your PostgreSQL tables. In terms of the types of partitioning, you could choose between:

Range partitioning: partitions are defined by a range of values (e.g., by month, year, or an incrementing sequence).
List partitioning: partitions are defined by a list of values (e.g., by country).
Hash partitioning: rows are partitioned based on the hash value of the partition key to distribute data across a fixed number of partitions evenly.

Depending on which partitioning strategy you’re using, you can choose between different methodologies, the most common being the following:

Using the PARTITION BY clause native in PostgreSQL. This supports the three types of partitioning (e.g., PARTITION BY RANGE, BY LIST, or BY HASH).
Using pg_partman, an extension that automates time-based partitioning in PostgreSQL.
Using Timescale, which goes one step further than pg_partman to automate partitioning by time via the concept of hypertables.

Here, we’ll focus particularly on range partitioning (by far the most common), comparing the last two methods: pg_partman and hypertables.

Pg_partman: Making PostgreSQL Partitioning Simpler

The pg_partman extension for PostgreSQL is built on the native PostgreSQL declarative approach to partitioning tables. Declarative partitioning, introduced in PostgreSQL 10, has replaced the older method of table inheritance, introducing a more intuitive and simpler approach by providing built-in support for partitioning without triggers or rules.

With declarative partitioning, much of the partitioning management is automated, but for example, creating new partitions still requires manual intervention—unless you're using tools like pg_partman. Pg_partman helps to automate the creation and management of partitioned tables and partitions through a SQL API. Although new partitions aren’t added and removed automatically, this can be managed by adding another extension like pg_cron to schedule jobs.

Without pg_partman, declarative partitioning is a lot more complicated. Ppg_partman intends to simplify this process, and indeed, it does, but there are still important tasks and nuances that will require manual intervention. A few examples:

It’s essential to ensure that the necessary partitions have been created when ingesting data to avoid a No Partition of Relation Found for Row error, which may block your writes.
If your workload involves sporadic or irregular data ingestions, you’ll need to ensure you aren't creating excessive, unnecessary partitions, as they could degrade query performance and lead to table bloat.
You must ensure that there are no gaps or overlaps between partitions, especially when dealing with manual partition modifications.
If you want to implement a retention policy to regularly drop old partitions regularly, you'll need to set this up.
If you need to alter the schema of your tables, such as adding or dropping columns, you'll often have to handle these changes manually to ensure they propagate correctly to all partitions.

Hypertables: Making PostgreSQL Partitioning Seamless

If pg_partman simplifies partition management, hypertables take this simplification to the next level: they completely automate the process. If pg_partman is the general toolkit, hypertables are the product.

Hypertables are an abstraction layer that allows you to automatically create and manage partitions (which in Timescale are called chunks) automatically without losing the ability to query as normal with SQL. Hypertables are optimized for time-based partitioning, although they also work for tables that aren’t based on time but have something similar, for example, a BIGINT primary key.

Hypertables are based on inheritance-based partitioning (which you’ll recall was the older method PostgreSQL used). While this method is harder to implement manually, it’s also more flexible, giving more granular control over the partitions. This is definitely not something that you (as an end user of partitioning) want to set up and manage, but this flexibility allows us (Timescale) to introduce some improvements over native PostgreSQL partitioning that you can directly benefit from.

What are these improvements? Let’s cover them.

Dynamic partition management: Forget about the “no partition of relation found for row” error

A normal table is transformed into a Timescale hypertable using a single command (create_hypertable):

CREATE TABLE conditions (
time        TIMESTAMPTZ       NOT NULL,
location    TEXT              NOT NULL,
device      TEXT              NOT NULL,
temperature DOUBLE PRECISION  NULL,
humidity    DOUBLE PRECISION  NULL
);
SELECT create_hypertable('conditions', 'time');

This sets up the partition column, the partition interval (seven days by default), and the unique index to support partitioning. Once the hypertable is created, new partitions (chunks) will be created on the fly as data flows into the hypertable.

As we said earlier, pg_partman can automate much of the partition creation process, but to routinely schedule this automation, you will need to integrate it with pg_cron—and you’ll have to ensure the necessary partitions are in place proactively. Without a predefined partition to host incoming data, you'll encounter the No Partition of Relation Found for Row error. (This is a common one.)

Using Timescale eliminates the risk of partitions not existing, completely removing partition management from the list of things the database owner needs to consider. You get exactly the right number of partitions when you need them.

Another hypertables’ hidden gem is that they’ll never create an unnecessary partition. Partitions are generated on the fly, meaning if there's no data to fit a potential partition, that partition simply won't be created. This is a good thing since each active partition adds a slight overhead during query planning.

Reduced table locking: No need to worry about data integrity

As we covered extensively in this post, DDL operations in PostgreSQL, such as adding a new partition, inherently require locks on the table. This means that during the brief period the operation is being performed, other transactions trying to write (insert, update, delete) to the table can be blocked until the operation completes.

In PostgreSQL there are two methods of adding partitions, from the CREATE TABLE statement and from the ALTER TABLE statement. The first will block writes, while the second will not. The same two methods can be used to drop partitions, although in this case both will block writes.

When pg_partman creates these partitions for its maintenance job, it performs DDL operations on the table. These operations the same locks—which can completely block writes. Other problems may also arise: the waiting time for transactions can increase, leading to unpredictably longer response time, and in systems where operations have a strict timeout, the waiting caused by locks can lead to operation failures.

Hypertables are designed to ensure that your application’s read or write operations are not interrupted. Timescale maintains its own partition catalogs and implements its own minimized locking strategy that allows reads and writes without interfering with adding or dropping partitions.

Easily configurable data retention

One of the amazing things about partitioning your data is that you can drop individual partitions instantly, which isn’t the case when writing large DELETE statements.

When using pg_partman, you would need to create the custom logic for removing old partitions yourself, and removing a partition will lock the master table. Also, you would need to schedule this with pg_cron or an external scheduler.

On the contrary, setting up automatic data retention policies for hypertables is straightforward: you don’t need further code or to manage more extensions. It only takes one command, add_retention_policy. You can define retention periods for specific time intervals, and Timescale will automatically drop outdated partitions when it needs to:

SELECT add_retention_policy('conditions', INTERVAL '24 hours');

Query performance optimizations

Hypertables also unlock some extra features that Timescale enables for your query plans. For example, queries that reference now() when pruning partitions will perform better due to now() being turned into a constant, and your ordered DISTINCT queries will benefit from SkipScan.

Going Beyond Partitioning

It's worth noting that while pg_partman is more of a general-purpose partition manager for PostgreSQL, hypertables unlock a wealth of features specifically tailored for time-based (or time series) data that can get very handy for scaling your large PostgreSQL tables:

Timescale compression takes a hypertable and changes it from row to column-oriented. This can reduce storage utilization by up to 95 %, unlock blazing-fast analytical queries, and still allow the data to be updated in place.
Continuous aggregates take hypertables and let you create incrementally updated materialized views for aggregate queries. You define your query and get an aggregate table that is updated as historical data changes while also keeping up with your real-time data as it flows in.
Hyperfunctions give you a blazing-fast full set of functions, procedures, and data types optimized for querying, aggregating, and analyzing time-series data.
The Timescale job scheduler lets you schedule any SQL or function-based job within PostgreSQL, meaning you don’t need an external scheduler or to load another extension like pg_cron.

Conclusion

Pg_partman is an amazing toolkit that greatly simplifies the management of declarative partitioning in PostgreSQL, but it is only that—a toolkit.

We believe hypertables are a complete product that makes partitioning much more streamlined. The dynamic partition management, reduced locking overhead, and automated retention policies make hypertables a better choice for applications dealing with large datasets. You will save time and worries, and you’ll unlock many other amazing features that will make it even easier to work with your large PostgreSQL tables.

Making PostgreSQL Backups 100x Faster via EBS Snapshots and pgBackRest

Grant Godeke — Thu, 31 Aug 2023 14:16:35 GMT

If you have experience running PostgreSQL in a production environment, you know that maintaining database backups is a daunting task. In the event of a catastrophic failure, data corruption, or other form of data loss, the ability to quickly restore from these backups will be vital for minimizing downtime. If you’re managing a database, maintaining your backups and getting your recovery strategy in order is probably the first check on your checklist.

Perhaps this has already given you one headache or two because creating and restoring backups for large PostgreSQL databases can be a very slow process.

🗒️

A refresher on your basic backup and restore Postgres commands:

Postgres Backup Cheat Sheet
Postgres Restore Cheat Sheet

The most widely used external tool for backup operations in PostgreSQL is pgBackRest, which is very powerful and reliable. But pgBackRest can also be very time-consuming, especially for databases well over 1 TB.

The problem is exacerbated when restoring backups from production databases that continue to ingest data, thus creating more WAL (write-ahead log) that must be applied. In this case, a full backup and restore can take hours or even days, which can be a nightmare in production databases.

When operating our platform (Timescale, a cloud database platform built on PostgreSQL), we struggled with this very thing. At Timescale, we pride ourselves in making PostgreSQL faster and more scalable for large volumes of time-series data—therefore, our customers’ databases are often large (many TBs). At first, we were completely basing our backup and restore operations in pgBackRest, and we were experiencing some pain:

Creating full backups was very slow. This was a problem, for example, when our customers were trying to upgrade their PostgreSQL major version within our platform, as we took a fresh, full backup after upgrade in case there was a failure shortly after. Upgrades are already stressful, and adding a very slow backup experience was not helping.
Restoring from backups was also too slow, both restoring from the backups themselves and replaying any WAL that had accrued since the last backup. (In Timescale, we automatically take full and incremental backups of all our customers’ databases.)

In this blog post, we’re sharing how we solved this problem by combining pgBackRest with EBS snapshots. Timescale runs in AWS, so we had the advantage of cloud-native infrastructure. If you're running PostgreSQL in AWS, you can perhaps benefit from a similar approach.

After introducing EBS snapshots, our backup creation and restore process got 100x faster. This significantly improved the experience for our customers and made things much easier for our team.

Quick Introduction to Database Backups in PostgreSQL (And Why We Used pgBackRest)

If you asked 100 engineers if they thought backups were important for production databases, they would all say "yes"—but if you then took those same 100 engineers and gave them a grade on their backups, most wouldn’t hit a pass mark.

We all collectively understand the need for backups, but it’s still hard to create an effective backup strategy, implement it, run it, and test that it’s working appropriately.

In PostgreSQL specifically, there are two ways to implement backups: logical database dumps, which contain the SQL commands needed to recreate (not restore) your database from scratch, and physical backups, which capture the files that store your database state.

Physical backups are usually paired with a mechanism to store the constant stream of write-ahead logs (WALs), which describe all data mutations on the system. A physical backup can then be restored to get PostgreSQL to the exact same state as it was when that backup was taken, and the WAL files rolled forward to get to a specific point in time, maybe just before someone (accidentally?) dropped all your data or your disk ate itself.

Logical backups are useful to recreate databases (potentially on other architectures), but maintaining physical backups is imperative for any production workload where uptime is valued. Physical backups are exact: they can be restored quickly and provide point-in-time recovery. In the rest of this article, we’ll discuss physical backups.

How are physical backups usually created in PostgreSQL?

The first option is using the pg_basebackup command. pg_basebackup copies the data directory and optionally includes the WAL files, but it doesn’t support incremental backups and has limited parallelization capabilities. The whole process is very manual, too. If you’re using pg_basebackup, you’ll instantly get the files you need to bootstrap a new database in a tarball or directory, but not much else.
Tools like pgBackRest were designed to overcome the limitations of pg_basebackup. pgBackRest allows for full and incremental backups, multi-threaded operations, and point-in-time recovery. It ensures data integrity by validating checksums during the backup process, supports different types of storage, and much more. In other words, pgBackRest is a robust and feature-rich tool, making it our choice for PostgreSQL backup operations.

The Problem With pgBackRest

But pgBackrest is not perfect: it reads and backs up files, causing an additional load on your system. This can cause performance bottlenecks that can complicate your backup and restore strategy, especially if you’re dealing with large databases.

Even though pgBackRest offers incremental backups and parallelization, it often gets slow when executing full backups over large data volumes or on an I/O-saturated system.

While you can sometimes rely on differential or incremental backups to minimize data (like we do in Timescale), there are situations in which creating full backups is unavoidable. Backups could also be taken on standby, but at the end of the day, you’re limited by how fast you can get data off your volumes.

We shared earlier the example of full database upgrades, but we're also talking about any other kind of migration, integrity checks, archival operations, etc. In Timescale, some of our most popular platform features (like forks, high-availability replicas, and read replicas) imply a data restore from a full backup.

Having a long-running full backup operation in your production database is not only inconvenient, it can also conflict with other high-priority DB tasks, affecting your overall performance. This was problematic for us.

The slowness of pgBackRest was also problematic when it was time to restore from these backups. It’s very good at CPU parallelization, but when you’re trying to write terabytes of data as fast as possible, I/O will be the bottleneck. When it comes to recovery time objective or RTO, every minute counts. In case of major failure, you want to get that database up as soon as possible.

Using EBS Snapshots to Speed Up the Creation of Backups

To speed up the process of creating fresh full backups, we decided to replace standard pgBackRest full backups with on-demand EBS snapshots.

Our platform runs in AWS, which comes with some advantages. Using snapshots is a much more cloud-native approach to the problem of backups compared to what’s been traditionally used in PostgreSQL management.

EBS snapshots create a point-in-time copy of a particular database: this snapshot can be restored, effectively making it a backup. The key is that taking a snapshot is significantly faster than the traditional approach with pgBackRest: in our case, our p90 snapshot time decreased by over 100x. This gap gets wider the larger your database is!

How did we implement this? Basically, we did a one-to-one replacement of pgBackRest. Instead of waiting for the pgBackRest fresh full backup to complete, we now take a snapshot. We still wait for the backup to complete, but the process is significantly faster via snapshots. This way, we get the quick snapshot but also the full data copy and checksumming for datafile integrity, which pgBackRest performs.

If a user experiences a failure shortly after an upgrade, we have a fresh backup—the snapshot—that we can quickly restore (we’ll cover how we handle restores next). We still take a fresh full backup using pgBackRest (yay for redundancy), but the key difference is that this happens after the upgrade process has been fully completed.

If a failure has happened, the service is available to our customer quickly: we don’t have to force them to wait for the lengthy pgBackRest process to finish before being able to use their service again.

The trade-offs for adopting this approach were minimal. The only downside to consider is that, by taking snapshots, we now have redundant backups (both snapshots and full backups), so we incur additional storage costs. But what we’ve gained (both in terms of customer satisfaction and our own peace of mind) is worth the price.

Combining EBS Snapshots and pgBackRest for Quick Data Restore: Taking Partial Snapshots, Replaying WAL

Solving the first problem we encountered with pgBackRest (i.e., slow creation of full backups) was relatively simple. We knew exactly when we needed an EBS snapshot to be created, as this process is always tied to a very specific workflow (e.g., performing a major version upgrade).

But we also wanted to explore using EBS snapshots to improve our data restore functionality. As we mentioned earlier, some popular features in the Timescale platform rely heavily on restores, including creating forks, high-availability replicas, and read replicas, all of which imply a data restore from a full backup.

This use case posed a slightly different and more difficult challenge since to restore from a full backup, such a backup needs to exist first, reflecting the latest state of the service.

To implement this, the first option we explored was taking an EBS snapshot when the user clicked “Create” a fork, read replica, or high-availability replica, to then restore from that snapshot. However, this process was still too slow for the end user. To get the performance we wanted, we had to think a bit beyond the naive approach and determine a way to take semi-regular snapshots across our fleet.

Fortunately, we already had a backup strategy for pgBackRest in place that we chose to mirror. Now, all Timescale services have EBS snapshots taken daily. For redundancy reasons and to verify file checksums, we still take our standard pgBackRest partial backups, but we don’t depend on them.

Once the strategy is solved, restoring data from an EBS snapshot mirrors a restore from pgBackRest very closely. We simply chose the corresponding EBS snapshot we wanted to restore—in the cases mentioned above, always the most recent—and then replayed any WAL that has accumulated since that restore point. Here, it is important to note that we still rely on pgBackRest to do our WAL management. pgBackRest works great for us here; nothing gets close in terms of parallel WAL streaming.

This EBS snapshotting and pgBackRest approach has given us great results so far. Using snapshots for restores has helped improve our product experience, also providing our customers with an even higher level of reliability. Keeping pgBackRest in parallel has given us peace of mind that we still have a traditional backup approach that validates our data as well as snapshots.

We’re continually improving our strategy though, for example, by being smarter about when we snapshot—e.g., by looking at the accumulated WAL since the last snapshot to determine if we need to snapshot certain services more frequently. This practice helps improve restore times by reducing the amount of WAL that would need to be replayed, which is often the bottleneck in this process.

On Snapshot Prewarming

One important trade-off with this EBS snapshot approach is the balance between deployment time and initial performance. One limitation of a snapshot restore is that not all blocks are necessarily prewarmed and may need to be fetched from S3 the first time they are used, which is a slow process.

To give props to pgBackRest restore, it does not have this issue. For our platform features, our trade-off was between getting the user a running read replica (or fork or high-availability replica) as quickly as possible or making sure it was as performant as possible.

After some back and forth, we decided on our current approach on prewarming: we’re reading as much as we can for five minutes, prioritizing the most recently modified files first. The idea here is that we will warm the data the user is actively engaging with first. After five minutes, we then hand the process off to PostgreSQL to continue reading the rest of the volume at a slower pace until it is complete. For the initial warming, we use a custom goroutine that reads concurrently from files.

Backing It Up

We are not completely replacing our pgBackRest backup infrastructure with EBS snapshots anytime soon: it is hard to give up on the effectiveness and reliability of pgBackRest.

But by combining EBS snapshots with pgBackRest across our infrastructure, we’ve been able to mitigate its performance problem significantly, speeding up our backup creation and restore process. This allows us to build a better product, providing a better experience to our customers.

If you’re experiencing the same pains we were experiencing with pgBackRest, think about experimenting with something similar! It may cost you a little extra money, but it can be very much worth it.

We still have work to do on our end: we will continue to iterate on the ideal snapshotting strategy across the fleet to minimize deployment times as much as possible. We are also looking at smarter ways to prewarm the snapshots and more applications for snapshots in general.

If any of these problems interest you, check out our open engineering roles (we’re hiring!). And if you are a PostgreSQL user yourself, sign up for a free Timescale trial and experience the result of EBS snapshots in action.

Implementing ASOF Joins in PostgreSQL and Timescale

James Blackwood-Sewell — Thu, 15 Jun 2023 14:10:00 GMT

What Is an `ASOF` Join?

An ASOF (or "as of") join is a type of join operation used when analyzing two sets of time-series data. It essentially matches each record from one table with the nearest—but not necessarily equal—value from another table based on a chosen column. Oracle supports this out of the box using a non-standard SQL syntax, but unfortunately, PostgreSQL does not provide a built-in ASOF keyword.

The chosen column needs to have some concept of range for the ASOF operation to work. You may think of it as being the "closest value," but not exceeding the comparison. It works for string (alphabetical), integer (ordinal), float (decimal), and any other data type that has an idea of ORDER. Because timestamps are near and dear to our hearts at Timescale, we will demonstrate with time and date columns.

✨

Want to understand how the PostgreSQL parser picks a join method or join types? Check out this article!

Performing this operation in PostgreSQL takes a bit of effort. This article aims to delve deeper into ASOF-style joins and how to implement similar functionality in PostgreSQL by subselecting data or other join types.

Understanding `ASOF` Joins

ASOF joins are a powerful tool when dealing with time-series data. In simple terms, an ASOF join will, for each row in the left table, find a corresponding single row in the right table where the key value is less than or equal to the key in the left table.

This is a common operation when dealing with financial data, sensor readings, or other types of time-series data where readings might not align perfectly by timestamp.

For a simple example, consider the real-world question, "What was the temperature yesterday at this time?" It is very unlikely that a temperature reading was taken yesterday at exactly the millisecond that the question is asked today. What we really want is "What was the temperature taken yesterday up to today's time stamp?"

This simple example becomes a lot more complex when we start comparing temperatures day over day, week over week, etc.

Implementing `ASOF` Joins in Timescale

Even though PostgreSQL does not directly support ASOF joins, you can achieve similar functionality using a combination of SQL operations. Here's a simplified step-by-step guide:

Step 1: Prepare your data

Ensure your data is in the correct format for the ASOF join. You'll need a timestamp or other monotonically increasing column to use as a key for the join.

Suppose you have two tables, bids and asks, each containing a timestamp column, and you want to join them by instrument and the nearest timestamp.

CREATE TABLE bids (
    instrument text,
    ts TIMESTAMPTZ,
    value NUMERIC
);
--
CREATE INDEX bids_instrument_ts_idx ON bids (instrument, ts DESC);
CREATE INDEX bids_ts_idx ON bids (ts);
--
CREATE TABLE asks (
    instrument text,
    ts TIMESTAMPTZ,
    value NUMERIC
);
CREATE INDEX asks_instrument_ts_idx ON asks (instrument, ts DESC);
CREATE INDEX asks_ts_idx ON asks (ts);
--

Normally you'd make both these tables into hypertables with the create_hypertable function (because you're a super educated Timescale user), but in this case, we aren't going to, as we won't be inserting much data (and we also have some Timescale magic to show off 🪄).

Step 2: Insert some test data

Next, we'll create data for four instruments, AAA, BBB, NCD, and USD.

INSERT INTO bids (instrument, ts, value)
SELECT 
   -- random 1 of 4 instruments
  (array['AAA', 'BBB', 'NZD', 'USD'])[floor(random() * 4 + 1)], 
   -- timestamp of last month plus some seconds
  now() - interval '1 month' + g.s, 
   -- random value
  random()* 100 +1
FROM (
  -- 2.5M seconds in a month
  SELECT ((random() * 2592000 + 1)::text || ' s')::interval s 
  FROM generate_series(1,3000000)) g;
INSERT INTO asks (instrument, ts, value)
SELECT 
   -- random 1 of 4 instruments
  (array['AAA', 'BBB', 'NZD', 'USD'])[floor(random() * 4 + 1)], 
   -- timestamp of last month plus some seconds
  now() - interval '1 month' + g.s, 
   -- random value
  random()* 100 +1
FROM (
  -- 2.5M seconds in a month
  SELECT ((random() * 2592000 + 1)::text || ' s')::interval s 
  FROM generate_series(1,2000000)) g;

Step 3: Query the data using a sub-select

To mimic the behavior of an ASOF join, use a SUBSELECT join operation along with conditions to match rows based on your criteria. This will run the sub-query once per row returned from the target table. We need to use the DISTINCT clause to limit the number of rows returned to one.

This will work in vanilla Postgres, but when we are using Timescale (even though we aren't using hypertables yet), we get the benefits of a Skip Scan, which will supercharge the query (for more information on this check our docs or blog post about how Skip Scan can give you an 8,000x speed-up).

SELECT bids.ts timebid, bids.value bid,
    (SELECT DISTINCT ON (asks.instrument) value ask
    FROM asks
    WHERE asks.instrument = bids.instrument
    AND asks.ts <= bids.ts
    ORDER BY instrument, ts DESC) ask
FROM bids
WHERE bids.ts > now() - interval '1 week'

                              QUERY PLAN                                                                               
-------------------------------------------------------------------------
 Index Scan using bids_ts_idx on public.bids  
    (cost=0.43..188132.58 rows=62180 width=56) 
    (actual time=0.067..1700.957 rows=57303 loops=1)
   Output: bids.instrument, bids.ts, bids.value, (SubPlan 1)
   Index Cond: (bids.ts > (now() - '7 days'::interval))
   SubPlan 1
     ->  Unique  (cost=0.43..2.71 rows=5 width=24) 
                (actual time=0.027..0.029 rows=1 loops=57303)
           Output: asks.value, asks.instrument, asks.ts
           ->  Custom Scan (SkipScan) on public.asks  
                  (cost=0.43..2.71 rows=5 width=24) 
                  (actual time=0.027..0.027 rows=1 loops=57303)
                 Output: asks.value, asks.instrument, asks.ts
                 ->  Index Scan using asks_instrument_ts_idx on public.asks  
                        (cost=0.43..15996.56 rows=143152 width=24) 
                        (actual time=0.027..0.027 rows=1 loops=57303)
                       Output: asks.value, asks.instrument, asks.ts
                       Index Cond: ((asks.instrument = bids.instrument) 
                          AND (asks.ts <= bids.ts))
 Planning Time: 1.231 ms
 Execution Time: 1703.821 ms

Conclusion

While PostgreSQL does not have an ASOF keyword, it does offer the flexibility and functionality to perform similar operations. When you're using Timescale, things only get better with the enhancements like Skip Scan.

How to Fix No Partition of Relation Found for Row in Postgres Databases

James Blackwood-Sewell — Thu, 06 Apr 2023 13:30:44 GMT

`ERROR`: No Partition of Relation Found for Row

The error message ERROR: no partition of relation {table-name} found for row is reported by PostgreSQL (and will appear in the console and the log) when a table has been configured with declarative partitioning, and data is INSERTed before a child table has been defined with constraints that match the data. This will cause the insert to fail, potentially losing the data which was in flight.

You will find this error message in other PostgreSQL-based databases, such as Amazon RDS for PostgreSQL and Amazon Aurora. But it can be avoided in Timescale when you use our hypertable abstraction. In this blog post, we’ll explain this database error in more detail to learn why.

Explanation

Let’s dive deeper into what causes a no partition of relation found for row error. When a table is partitioned using PostgreSQL declarative partitioning, it becomes a parent to which multiple child partitions can be attached. Each of these children can handle a specific non-overlapping subset of data. When partitioning by time (the most common use case), each partition would be attached for a particular date range. For example, seven daily partitions could be attached, representing the upcoming week.

When inserts are made into the parent table, these are transparently routed to the child table, matching the partitioning criteria. So an insert of a row that referenced tomorrow would be sent automatically to tomorrow’s partition. If this partition doesn’t exist, then there is a problem—there is no logical place to store this data. PostgreSQL will fail the INSERT and report no partition of relation {table-name} found for row.

How to Resolve

There are two ways around this problem, although neither is perfect. Keep reading to see the Timescale approach with hypertables that avoids these pitfalls.

Partitions can be made ahead of time—perhaps a scheduler could be used to create a month's worth of partitions automatically in advance. This works in theory (as long as that scheduler keeps running!) but will cause locking issues while the partitions are being created. Plus, it doesn’t account for data in the past or the far future.

A default partition can also be added that automatically catches all data that doesn’t have a home, but this is problematic, too, as it collects data that needs to eventually be moved into freshly created partitions. As the amount of orphaned data in the default partition grows, it will also slow down query times.

Documentation and Resources

Timescale hypertables work like regular PostgreSQL tables but provide a superior user experience when handling time-series data.
Need some advice on how to model your time-series data using hypertables? Read our best practices about choosing between a narrow, medium, or wide hypertable layout and learn when to use single or multiple hypertables.

How Timescale Can Help

As mentioned earlier, another solution is enabling the TimescaleDB extension and converting the table into a hypertable instead of using PostgreSQL declarative partitioning. This removes the need to worry about partitions (which in Timescale jargon are called chunks), as they are transparently made when inserts happen with no locking issues.

You’ll never have to see this error, worry about scheduling potentially disruptive partition creation, or think about default partitions ever again!

New to Timescale? Sign up for Timescale (30-day free trial, no credit card required) for fast performance, seamless user experience, and the best compression ratios.

Timescale vs. Amazon RDS PostgreSQL: Up to 350x Faster Queries, 44 % Faster Ingest, 95 % Storage Savings for Time-Series Data

James Blackwood-Sewell — Tue, 15 Nov 2022 14:19:00 GMT

Since we launched Timescale, our cloud-hosted PostgreSQL service for time-series data and event and analytics workloads, we have seen large numbers of customers migrating onto it from the general-purpose Amazon RDS for PostgreSQL. These developers usually struggle with performance issues on ingest, sluggish real-time or historical queries, and spiraling storage costs.

They need a solution that will let them keep using PostgreSQL while not blocking them from getting value out of their time-series data. Timescale fits them perfectly, and this article will present benchmarks that help explain why.

When we talk to these customers, we often see a pattern:

At the start of a project, developers choose PostgreSQL because it’s a database they know and love. The team is focused on shipping features, so they choose the path of least resistance—Amazon RDS for PostgreSQL.
Amazon RDS for PostgreSQL works well at first, but as the volume of time-series data in their database grows, they notice slower ingestion, sluggish query performance, and growing storage costs.
As the database becomes a bottleneck, it becomes a target for optimization. Partitioning is implemented, materialized views are configured (destroying the ability to get real-time results), and schedules are created for view refreshes and partition maintenance. Operational complexity grows, and more points of failure are introduced.
Eventually, in an effort to keep up, instance sizes are increased, and larger, faster volumes are created. Bills skyrocket, while the improvements are only temporary.
The database is now holding the application hostage regarding performance and AWS spending. A time-series database is discussed, but the developers and the application still rely on PostgreSQL features.

Does it sound familiar? It’s usually at this stage when developers realize that Amazon RDS for PostgreSQL is no longer a good choice for their applications, start seeking alternatives, and come across Timescale.

Timescale runs on AWS, offering hosted PostgreSQL with added time-series superpowers. Since Timescale is still PostgreSQL and already in AWS, the transition from RDS is swift: Timescale integrates with your PostgreSQL-based application directly and plays nicely with your AWS infrastructure.

Timescale has always strived to enhance PostgreSQL with the ingestion, query performance, and cost-efficiency boosts that developers need to run their data-intensive applications, all while providing a seamless developer experience with advanced features to ease working with time-series data.

But don’t take our word for it—let the numbers speak for themselves. In this blog post, we share a benchmark comparing the performance of Timescale to Amazon RDS for PostgreSQL. You will find all the details of our comparison and all the information required to run the benchmark yourself using the Time-Series Benchmarking Suite (TSBS).

Time-Series Data Benchmarking: A Sneak Preview

For those who can’t wait, here’s a summary: for a 160 GB dataset with almost 1 billion rows stored on a 1 TB volume, Timescale outperforms Amazon RDS for PostgreSQL with up to 44 % higher ingest rates, queries running up to 350x faster, and a 95 % smaller data footprint.

When we ingested data in both Timescale and Amazon RDS for PostgreSQL (using gp3 EBS volumes for both), Timescale was 34 % faster than RDS for 4 vCPU and 44 % for 8 vCPU configurations.

When we ran a variety of time-based queries on both databases, ranging from simple aggregates to more complex rollups through to last-point queries, Timescale consistently outperformed Amazon RDS for PostgreSQL in every query category, sometimes by as much as 350x (you can see all of the results in the Benchmarking section).

Timescale used 95 % less disk than Amazon RDS for PostgreSQL, thanks to Timescale’s native columnar compression, which reduced the size of the test database from 159 GB to 8.6 GB. Timescale's compression uses best-in-class algorithms, including Gorilla and delta-of-delta, to dramatically reduce the storage footprint.

And the storage savings above don’t even consider the effect of the object store built on Amazon S3 that we just announced for Timescale. This feature is available for testing via private beta at the time of writing but is not yet ready for production use.

Still, by running one SQL command, this novel functionality will allow you to tier an unlimited amount of data to the S3 object storage layer that’s now an integral part of Timescale. This layer is columnar (it’s based on Apache Parquet), elastic (you can increase and reduce your usage), consumption-based (you pay only for what you store), and one order of magnitude cheaper than our EBS storage, with no extra charges for queries or usage. This feature will make scalability even more cost-efficient in Timescale, so stay tuned for some exciting benchmarks!

In the remainder of this post, we’ll deep dive into our performance benchmark comparing Amazon RDS for PostgreSQL with Timescale, detailing our methods and results for comparing ingest rates, query speed, and storage footprint. We’ll also offer insight into why Timescale puts up the numbers it does, with a short introduction to its vital advantages for handling time-series, events, and analytics data.

If you’d like to see how Timescale performs for your workload, sign up for Timescale today— it’s free for 30 days, there’s no credit card required to sign up, and you can spin up your first database in minutes.

💡

More on RDS:
- Estimating RDS Costs
- Why Is RDS so Expensive?
- Alternatives to RDS
- Amazon Aurora vs. RDS

Benchmarking Configuration

As for our previous Timescale benchmarks, we used the open-source Time-series Benchmarking Suite to run our tests. Feel free to download and run it for yourself using the settings below. Suggestions for improvements are also welcome: comment on Twitter or Timescale Slack to join the conversation.

We used the following TSBS configuration across all runs:

	Timescale	Amazon RDS for PostgreSQL
PostgreSQL version	14.5	14.4 (latest available)
	No changes	synchronous_commit=off (to match Timescale)
Partitioning system	TimescaleDB (partitions automatically configured)	pg_partman (partitions manually configured)
Compression into columnar	Yes, for older partitions	Not supported
Partition size	4h (each system ended up with 26 non-default partitions)
Scale (number of devices)	25,000
Ingest workers	16
Rows ingested	868,000,000
TSBS profile	DevOps
Instance type	M5 series (4 vCPU+16 GB memory and 8 vCPU+32 GB memory)
Disk type	gp3 (16 K IOPs, 1000 MiBps throughput)
Volume size	1 TB

Hypertables are the base abstraction of Timescale's time-series magic. While they work just like regular PostgreSQL tables, they boost performance and the user experience with time-series data by automatically partitioning it (large tables become smaller chunks or data partitions within a table) and allowing it to be queried more efficiently.

If you’re familiar with PostgreSQL, you may be asking questions about partitioning in RDS. In the past, we have benchmarked TimescaleDB against unpartitioned PostgreSQL simply because that’s the journey most of our customers follow. However, we inevitably get questions about not comparing using pg_partman.

Pg_partman is another PostgreSQL extension that provides partition creation but doesn’t seamlessly create partitions on the fly: if someone inserted data outside of the currently created partitions, it would either go into a catch-all partition, degrading performance or, worse, still fail). It also doesn’t provide any additional time-series functionality, planner enhancements, or compression.

We listen to these comments, so we decided to highlight Timescale's performance (and convenience) by enabling pg_partman on the RDS systems in this benchmark. After all, the extension is considered a best practice for partitioned tables in Amazon RDS for PostgreSQL, so it was only fair we’d use it.

On our end, we enabled native compression on Timescale, compressing everything but the most recent chunk data. To do so, we segmented by the tags_id and ordered by time descending and usage_user columns. This is something we couldn’t reproduce in RDS since it doesn’t offer any equivalent functionality.

Almost everything else was exactly the same for both databases. We used the same data, indexes, and queries: almost one billion rows of data in which we ran a set of queries 100 times each using 16 threads. The only difference is that the Timescale queries use the time_bucket() function for arbitrary interval bucketing, whereas the PostgreSQL queries use extract (which performs equally well but is much less flexible).

We have split the performance data extracted from the benchmark into three sections: ingest, query, and storage footprint.

Ingest Performance Comparison

As we started to run Timescale and RDS through our 16-thread ingestion benchmark to insert almost 1 billion rows of data, we began to see some amazing wins. Timescale beat RDS by 32 % with 4 vCPUs and 44 % with 8 vCPUs. Both systems had the same I/O performance configured on their gp3 disk, so we kept looking to get to the bottom of why we were winning on busy systems.

To test the outcome without any disk I/O involvement, we used pgbench to run the following CPU-hungry SQL statement on 8 vCPU machines (using a scale of 1,000 and 16 jobs) and had some more interesting results straight away.

SELECT count(*) FROM (SELECT generate_series(1,10000000)) a

Timescale was almost twice as fast, returning an average query latency of 518 ms, while RDS returned 904 ms. This 50 % difference was consistent on both 4 vCPU and 8 vCPU instances.

Unfortunately, we can’t look inside the black box that is RDS to see what’s happening here. One hypothesis is that a large part of this difference is because Timescale gives you the exact amount of vCPU you provision for PostgreSQL (thanks, Kubernetes!), while Amazon RDS provides you a host with that many vCPUs.

This means that we (Timescale) pay for the operating overhead on Timescale, while on RDS, you (as the user) pay for this. As instances get very busy and processes fight with the operating system for CPU (like for an ingest benchmark or when you’re crunching a lot of data), this becomes a much bigger advantage for Timescale than we had anticipated. As usual, if anybody has any other possible reasons for this difference, please reach out, we’d love to hear from you.

Our benchmark shows Timescale not only ingests data faster across the board but also provides more predictable and faster results under heavy CPU load. Not a bad feature when you want to get the most out of your instances.

Query Performance Comparison

Query performance is something that needs to be optimized in a time-series database. When you ask for data, you often need to have it as quickly as possible—especially when you’re powering a real-time dashboard. TSBS has a wide range of queries, each with its own somewhat hard-to-decode description (you can find a quick primer here). We ran each query 100 times on the 4 vCPU instance types (which wasn’t quick in some cases) and recorded the results.

When we look at the table of query runtimes, we can see a clear story. Timescale is consistently faster than Amazon RDS, often by more than 100x. In some cases, Timescale performs over 350x better, and it doesn’t perform worse for any query type. The table below shows the data for 4 vCPU instances, but results are similar across all the CPU types we tested (and of course, if your instance is very busy, you could get even better results).

When we examine the amount of data loaded and processed by some of the queries with the larger differences, the reason behind these improvements becomes clear. Timescale compresses data into a columnar format, which has several impacts on performance:

Timescale compressed chunks group by column, not by row. When a subset of the columns for a table are required, they can be loaded individually, reducing the amount of data processed (especially for the single-groupby- query types).
When compressed data is loaded from disk, it takes less time, as there is simply less data to read. This is traded off against additional compute cycles to uncompress the data—a compromise that works in our favor, as you can see in the results above.
As compressed data is smaller, more of it can be cached in shared memory, meaning even fewer reads from disk (for a great introduction to this, check out Database Scaling: PostgreSQL Caching Explained by our own Kirk Roybal).

And just as a reminder, RDS had pg_partman configured for this test. This shows that while Timescale provides efficient partitioning via hypertables, we also provide a lot more than that (353x more in some instances).

Storage Usage Comparison

Total storage size is measured at the end of the TSBS ingest cycle, looking at the size of the database which TSBS has been ingesting data into. For this benchmark on Timescale, all but the most recent partition of data is compressed into our native columnar format, which uses best-in-class algorithms, including Gorilla and delta-of-delta, to reduce the storage footprint for the CPU table dramatically.

After compression, you can still access the data as usual, but you get the benefits of it being smaller and the benefits of it being columnar.

Using less storage can mean smaller volumes, lower cost, and faster access (as we saw in the query results above). In the case of this benchmark, we saved 95 %, reducing our database from 159 GB to 8.6 GB. And this isn’t an outlier, we often see these numbers for production workloads at real customers.

Beyond Benchmarks: A Closer Look at Timescale

Now that we’ve examined the results of the benchmark, let’s briefly explore some of the features that make these results possible. This section aims to offer insight into the performance comparison above and highlight some other aspects of Timescale that will improve your developer experience when working with time-series data.

✨

If you’re new to Timescale, you can also sign up for free and follow our Getting Started guide, which will introduce you to our main features in a hands-on way.

Hypertables, continuous aggregates, and query planner improvements for performance at scale

Timescale is purpose-built to provide features that handle the unique demands of time-series, analytics, and event workloads—and as we’ve seen earlier in this post, performance at scale is one of the most challenging aspects to achieve with a vanilla PostgreSQL solution.

To make PostgreSQL more scalable, we built features like hypertables and added query planner improvements allowing you to seamlessly partition tables into high-performance chunks, ensuring that you can load and query data quickly.

While some other solutions force you to think about creating and maintaining data partitions, Timescale does this for you under the hood, as queries come in with no performance impact. In fact, some of Timescale’s improvements work on tables that don’t even hold time-series data, like SkipScan, which dramatically improves DISTINCT queries on any PostgreSQL table with a matching B-tree index.

Another problem that comes with time-series data at scale is slow aggregate queries as you analyze or present data. Continuous aggregates let you take an often run or costly time-series query and incrementally materialize it in the background, providing real-time, up-to-date results in seconds or milliseconds rather than minutes or hours.

While this might sound similar to a materialized view, it not only reduces the load on your database but also takes into account the most recent inserts and doesn’t require any management once it’s configured.

Hyperfunctions, job scheduling, and user-defined functions to build faster

Once you have time-series data loaded, Timescale also gives you the tools to work with it, offering over 100 built-in hyperfunctions—custom SQL functions that simplify complex time-series analysis, such as time-weighted averages, last observation carried forward and downsampling with LTTP or ASAP algorithms, and bucketing by hour, minute, month and timezone with time_bucket(), and time_bucket_gapfill().

We also provide a built-in job scheduler, which saves the effort of installing and managing another PostgreSQL extension and lets you schedule and monitor any SQL snippet or database function.

Direct access to a global, expert support team to assist you in production

If you’re running your database in production, having direct access to a team of database experts will lift a heavy weight off your shoulders. Timescale gives all customers access to a world-class team of technical support engineers at no extra cost, encouraging discussion on any time-series topic, even if it’s not directly related to Timescale operations. You might want some help with ingest performance, tuning advice for a tricky SQL query, or best practices on setting up your schema—we are here to help.

As a comparison, deeply consultative support, general guidance, and best practices start at over $5,000 per month in Amazon RDS for PostgreSQL. Lower tiers have only a community forum or receive general advice. So this means that you need to pay an extra $60,000 a year just for such support on AWS, while you get this for free on Timescale.

Native columnar compression and object storage for cost efficiency

Cost is one of the major factors when choosing any cloud database platform, and Timescale provides multiple ways to keep your spending under control.

Timescale's best-in-class native compression allows you to compress time-series data in place while still retaining the ability to query it as normal. Compressing data in Timescale often results in savings of 90 % or more (take another look at our benchmark results, which actually saw a 95 % storage footprint reduction).

Timescale also includes built-in features to manage data retention, making it easy to implement data lifecycle policies, which remove data you don’t care about quickly, easily, and without impacting your application. You can combine data retention policies with continuous aggregates to automatically downsample your data according to a schedule.

To help reduce costs even further, Timescale offers bottomless, consumption-based object storage built on Amazon S3 (currently in private beta). Providing access to an object storage layer from within the database itself enables you to seamlessly tier data from the database to S3, store an unlimited amount of data, and pay only for what you store. All the while you retain the ability to query data in S3 from within the database via standard SQL.

It’s just PostgreSQL

Last but not least, Timescale is just PostgreSQL under the hood. Timescale supports full SQL (not SQL-like or SQL-ish). You can leverage the full breadth of drivers, connectors, and extensions in the vibrant PostgreSQL ecosystem—if it works with PostgreSQL, it works with Timescale!

If you switch from Amazon RDS for PostgreSQL to Timescale, you won’t lose any compatibility, your application will operate the same as before (but it will probably be faster, as we’ve shown).

Conclusion

When you have time-series data, you need a database that can handle time-series workloads. While Amazon RDS for PostgreSQL provides a great cloud PostgreSQL experience, our benchmarks have shown that even when paired with the pg_partman extension to provide partition management, it can’t compete with Timescale. According to our tests, Timescale can be over 40 % faster to ingest data, up to 350x faster for queries, and takes 95 % less space to store data when compressed.

On top of these findings, we offer a rich collection of time-series features that weren’t used in the benchmark. You can speed queries up even further by incrementally pre-computing responses with continuous aggregates, benefit from our job scheduler, configure retention policies, use analytical hyperfunctions, speed up your non-time-series queries with features like Skip Scan, and so much more.

If you have time-series data, don’t wait until you hit that performance wall to give us a go. Spin up an account now: you can use it for free for 30 days; no credit card required.

OpenTelemetry: Where the SQL Is Better Than the Original

James Blackwood-Sewell — Wed, 25 May 2022 11:14:00 GMT

This blog post was originally published at TFiR on May 2, 2022.

OpenTelemetry is a familiar term to those who work in the cloud-native landscape by now. Two years after the first beta was released it still maintains an incredibly active and large community, only coming second to Kubernetes when compared to other Cloud Native Computing Foundation (CNCF) projects.

For those who aren’t so familiar, OpenTelemetry was born out of the need to provide a unified front for instrumenting code and collecting observability data—a framework that can be used to handle metrics, logs, and traces in a consistent manner, while still retaining enough flexibility to model and interact with other popular approaches (such as Prometheus and StatsD).

This article explores how OpenTelemetry differs from previous observability tools and how that point of difference opens up the potential for bringing back an old friend as the query language across all telemetry data.

Observability—Then and Now

At a high level, the primary difference between OpenTelemetry and the previous generation of open-source observability tooling is one of scope. OpenTelemetry doesn’t focus on one particular signal type, and it doesn’t offer any storage or query capabilities. Instead, it spans the entire area that an application needing instrumentation cares about—the creation and transmission of signals. The benefit of this change in approach is that OpenTelemetry can offer developers a complete experience: one API and one SDK per language, which offers common concepts across metrics, logs, and traces. When developers need to instrument an app, they only need to use OpenTelemetry.

On top of that promise, OpenTelemetry can take streams of signals and transform them, enrich them, aggregate them or route them, interfacing with any backend which implements the OpenTelemetry specification. This opens up a host of new deployment possibilities—a pluggable storage provider per signal (Prometheus, Jaeger, and Loki, maybe), a unified storage provider for all of them, two subsets of metrics to two different backends, or everything being sent out of a Kubernetes cluster to an external endpoint.

Personally, the appeal of OpenTelemetry is very real to me—gathering telemetry data from a Kubernetes cluster using a single interface feels much more natural than maintaining multiple signal flows and potentially operators, and custom resource definitions (CRDs). When I think back to the pain points of getting signals out of applications and into dashboards, one of my main issues was consistently around the fractured landscape of creating, discovering, and consuming telemetry data.

OpenTelemetry and the Query Babel Tower

When discussing OpenTelemetry, the question of querying signals soon comes up. It’s amazing we now have the ability to provide applications with a single interface for instrumentation, but what about when the time comes to use that information?

If we store our data in multiple silos with separate query languages, all the value we gained from shared context, linking, and common attributes is lost. Because these languages have been developed (and are still being developed) for a single signal, they reinforce the silo approach. PromQL can query metrics, but it can’t reach out to logging or tracing data. It becomes clear that a solution to this problem is needed to allow the promise of OpenTelemetry to be realized from a consumption perspective.

As it stands today, open-source solutions to this problem have mostly been offered via a user interface. For example, Grafana can allow you to click between traces and metrics that have been manually linked and correlate via time—but this soon starts to feel a bit limited.

A New Promise

OpenTelemetry promises tagged attributes that could be used to join instrumentation and rich linkages between all signals. So what is the query equivalent of what OpenTelemetry promises? A unified language that can take inputs from systems that provide storage for OpenTelemetry data and allow rich joins between different signal types.

This language would need to be multi-purpose, as it needs to be able to express common queries for metrics, traces, and logs. Ideally, it could also express one type of signal as another when required—the rate of entries showing up in a log stream which have a type of ERROR or a trace based on the time between metric increments.

So, what would this language look like? It needs to be a well-structured query language that can support multiple different types of signal data; it needs to be able to express domain-specific functionality for each signal; it really needs to support complex and straightforward joins between data, and it needs to return data which the visualization layer can present. Other tools also need to support it, too. And hopefully, not just observability tools—integration with programming languages and business intelligence solutions would be perfect.

Designing such a language is not easy. While the simplicity of PromQL is great for most metric use cases, adding on trace and log features would almost certainly make that experience worse. Having three languages that were similar (one for each signal) and could be linked together by time and attributes at query time is a possibility, but while PromQL is a de facto standard, it seems unlikely that LogQL (Grafana Loki’s PromQL-inspired query language for logs) will show up in other products. And, at the time of writing, traces don’t have a common language. Sure we could develop those three interfacing languages, but do we need to?

Why SQL?

Before working with observability data, I was in the Open Source database world. I think we can learn something from databases here by adopting the lingua franca of data analytics: SQL. Somehow, it has been pushed to the bottom of our programming languages kit but is coming back strong due to the increasing importance of data for decision-making.

SQL is a truly a language that has stood the test of time:

It’s a well-defined standard built for modeling relationships and then analyzing data.
It allows easy joins between relations and is used in many, many data products.
It is supported in all major programming languages, and if tooling supports external query languages, it’s a good bet it will support SQL as one of them.
And finally, developers understand SQL. While it can be a bit more verbose than something like PromQL, it won’t need any language updates to support traces and metrics in addition to logs—it just needs a schema defined that models those relationships.

Despite all this, SQL is a language choice that often raises eyebrows. It’s not typically a language favored by Cloud technologies and DevOps, and with the rise in the use of object-relational mapping libraries (ORMs), which abstract SQL away from developers, it’s often ignored. But, if you need to analyze different sets of data that have something in common—so they can be joined, correlated, and compared together—you use SQL.

If before we dealt with metrics, logs, and traces in different (and usually intentionally simple) systems with no commonalities, today’s systems are becoming progressively more complex and require correlation. SQL is a perfect choice for this; in fact, this is what SQL was designed to do. It even lets us be sure that we can correlate data from outside of our Observability domain with our telemetry—all of a sudden, we would have the ability to pull in reference data and enrich our signals past the labels we attach at creation time.

At Timescale, we are convinced that a single, consistent query layer is the correct approach—and are investing in developing Promscale, a scalable backend to store all signal data which supports SQL as its native language. Whatever the solution is, we are looking forward to being able to query seamlessly across all our telemetry data, unlocking the full potential of OpenTelemetry.

Tiger Data Blog

Benchmarking PostgreSQL Batch Ingest

The Introduction: Batch Ingest in PostgreSQL

INSERT: VALUES and UNNEST

COPY: Text and Binary

The Setup

The Results

So, Which Should You Use?

Final Thoughts for Developers

Boosting Postgres INSERT Performance by 2x With UNNEST

The Introduction: INSERTs in Postgres

The Setup

The Results: INSERT VALUES vs. INSERT UNNEST

Should I use UNNEST?

Final Thoughts for Developers

PostgreSQL DISTINCT: TimescaleDB’s SkipScan Under Load

The Introduction: DISTINCT Queries in PostgreSQL

SkipScan Details

The Setup

The Results: SkipScan vs. Vanilla PostgreSQL

The Conclusion

What We’re Excited About PostgreSQL 17

PostgreSQL 17: Where It Came From and Where It’s Headed

What We’re Excited About PostgreSQL 17

pg_createsubscriber

Support for MERGE PARTITIONS and SPLIT PARTITIONS

Add support for incremental file system backup

Enable the failover of logical slots

Allow EXPLAIN to report optimizer memory usage

What We Committed to PostgreSQL 17

The SLRU move to 64-bit indexes

Transitive comparisons

standard_ExplainOneQuery

UUIDv7

Expanding PostgreSQL

Amazon Aurora vs. PostgreSQL: 35% Faster Ingest, Up to 16x Faster Queries, and 78% Cheaper Storage

What is Aurora? (It’s not PostgreSQL)

Investigating Aurora

Aurora costs

Benchmarking Configuration

Aurora vs. PostgreSQL Ingest Performance Comparison

Aurora vs. PostgreSQL Query Performance Comparison

Aurora vs. PostgreSQL Data Size Comparison

Aurora vs. PostgreSQL Cost Comparison

Our Finding

Teaching Postgres New Tricks: SIMD Vectorization for Faster Analytical Queries

From Postgres Scaling Issues to Vectorization

Teaching Postgres new tricks to make this journey smoother

Columnar Storage in PostgreSQL

Row vs. Columnar Storage: Why Not Both?

Vectorization and SIMD—Oh My!

Timescale’s Vectorized Query Execution Pipeline

Vectorization Stages

Vectorized decompression

Vectorized filters

Vectorized expressions

Vectorized aggregation

Adding It All Up

Wrap-Up

Pg_partman vs. Hypertables for Postgres Partitioning

Partitioning Strategies in PostgreSQL: Quick Overview

Pg_partman: Making PostgreSQL Partitioning Simpler

Hypertables: Making PostgreSQL Partitioning Seamless

Dynamic partition management: Forget about the “no partition of relation found for row” error

Reduced table locking: No need to worry about data integrity

Easily configurable data retention

Query performance optimizations

Going Beyond Partitioning

Conclusion

Making PostgreSQL Backups 100x Faster via EBS Snapshots and pgBackRest

Quick Introduction to Database Backups in PostgreSQL (And Why We Used pgBackRest)

How are physical backups usually created in PostgreSQL?

The Problem With pgBackRest

Using EBS Snapshots to Speed Up the Creation of Backups

Combining EBS Snapshots and pgBackRest for Quick Data Restore: Taking Partial Snapshots, Replaying WAL

On Snapshot Prewarming

Backing It Up

Implementing ASOF Joins in PostgreSQL and Timescale

What Is an ASOF Join?

Understanding ASOF Joins

What Is an `ASOF` Join?

Understanding `ASOF` Joins

Implementing `ASOF` Joins in Timescale

`ERROR`: No Partition of Relation Found for Row