Tiger Data Blog

Implementing ASOF Joins in PostgreSQL and Timescale

James Blackwood-Sewell — Thu, 15 Jun 2023 14:10:00 GMT

What Is an `ASOF` Join?

An ASOF (or "as of") join is a type of join operation used when analyzing two sets of time-series data. It essentially matches each record from one table with the nearest—but not necessarily equal—value from another table based on a chosen column. Oracle supports this out of the box using a non-standard SQL syntax, but unfortunately, PostgreSQL does not provide a built-in ASOF keyword.

The chosen column needs to have some concept of range for the ASOF operation to work. You may think of it as being the "closest value," but not exceeding the comparison. It works for string (alphabetical), integer (ordinal), float (decimal), and any other data type that has an idea of ORDER. Because timestamps are near and dear to our hearts at Timescale, we will demonstrate with time and date columns.

✨

Want to understand how the PostgreSQL parser picks a join method or join types? Check out this article!

Performing this operation in PostgreSQL takes a bit of effort. This article aims to delve deeper into ASOF-style joins and how to implement similar functionality in PostgreSQL by subselecting data or other join types.

Understanding `ASOF` Joins

ASOF joins are a powerful tool when dealing with time-series data. In simple terms, an ASOF join will, for each row in the left table, find a corresponding single row in the right table where the key value is less than or equal to the key in the left table.

This is a common operation when dealing with financial data, sensor readings, or other types of time-series data where readings might not align perfectly by timestamp.

For a simple example, consider the real-world question, "What was the temperature yesterday at this time?" It is very unlikely that a temperature reading was taken yesterday at exactly the millisecond that the question is asked today. What we really want is "What was the temperature taken yesterday up to today's time stamp?"

This simple example becomes a lot more complex when we start comparing temperatures day over day, week over week, etc.

Implementing `ASOF` Joins in Timescale

Even though PostgreSQL does not directly support ASOF joins, you can achieve similar functionality using a combination of SQL operations. Here's a simplified step-by-step guide:

Step 1: Prepare your data

Ensure your data is in the correct format for the ASOF join. You'll need a timestamp or other monotonically increasing column to use as a key for the join.

Suppose you have two tables, bids and asks, each containing a timestamp column, and you want to join them by instrument and the nearest timestamp.

CREATE TABLE bids (
    instrument text,
    ts TIMESTAMPTZ,
    value NUMERIC
);
--
CREATE INDEX bids_instrument_ts_idx ON bids (instrument, ts DESC);
CREATE INDEX bids_ts_idx ON bids (ts);
--
CREATE TABLE asks (
    instrument text,
    ts TIMESTAMPTZ,
    value NUMERIC
);
CREATE INDEX asks_instrument_ts_idx ON asks (instrument, ts DESC);
CREATE INDEX asks_ts_idx ON asks (ts);
--

Normally you'd make both these tables into hypertables with the create_hypertable function (because you're a super educated Timescale user), but in this case, we aren't going to, as we won't be inserting much data (and we also have some Timescale magic to show off 🪄).

Step 2: Insert some test data

Next, we'll create data for four instruments, AAA, BBB, NCD, and USD.

INSERT INTO bids (instrument, ts, value)
SELECT 
   -- random 1 of 4 instruments
  (array['AAA', 'BBB', 'NZD', 'USD'])[floor(random() * 4 + 1)], 
   -- timestamp of last month plus some seconds
  now() - interval '1 month' + g.s, 
   -- random value
  random()* 100 +1
FROM (
  -- 2.5M seconds in a month
  SELECT ((random() * 2592000 + 1)::text || ' s')::interval s 
  FROM generate_series(1,3000000)) g;
INSERT INTO asks (instrument, ts, value)
SELECT 
   -- random 1 of 4 instruments
  (array['AAA', 'BBB', 'NZD', 'USD'])[floor(random() * 4 + 1)], 
   -- timestamp of last month plus some seconds
  now() - interval '1 month' + g.s, 
   -- random value
  random()* 100 +1
FROM (
  -- 2.5M seconds in a month
  SELECT ((random() * 2592000 + 1)::text || ' s')::interval s 
  FROM generate_series(1,2000000)) g;

Step 3: Query the data using a sub-select

To mimic the behavior of an ASOF join, use a SUBSELECT join operation along with conditions to match rows based on your criteria. This will run the sub-query once per row returned from the target table. We need to use the DISTINCT clause to limit the number of rows returned to one.

This will work in vanilla Postgres, but when we are using Timescale (even though we aren't using hypertables yet), we get the benefits of a Skip Scan, which will supercharge the query (for more information on this check our docs or blog post about how Skip Scan can give you an 8,000x speed-up).

SELECT bids.ts timebid, bids.value bid,
    (SELECT DISTINCT ON (asks.instrument) value ask
    FROM asks
    WHERE asks.instrument = bids.instrument
    AND asks.ts <= bids.ts
    ORDER BY instrument, ts DESC) ask
FROM bids
WHERE bids.ts > now() - interval '1 week'

                              QUERY PLAN                                                                               
-------------------------------------------------------------------------
 Index Scan using bids_ts_idx on public.bids  
    (cost=0.43..188132.58 rows=62180 width=56) 
    (actual time=0.067..1700.957 rows=57303 loops=1)
   Output: bids.instrument, bids.ts, bids.value, (SubPlan 1)
   Index Cond: (bids.ts > (now() - '7 days'::interval))
   SubPlan 1
     ->  Unique  (cost=0.43..2.71 rows=5 width=24) 
                (actual time=0.027..0.029 rows=1 loops=57303)
           Output: asks.value, asks.instrument, asks.ts
           ->  Custom Scan (SkipScan) on public.asks  
                  (cost=0.43..2.71 rows=5 width=24) 
                  (actual time=0.027..0.027 rows=1 loops=57303)
                 Output: asks.value, asks.instrument, asks.ts
                 ->  Index Scan using asks_instrument_ts_idx on public.asks  
                        (cost=0.43..15996.56 rows=143152 width=24) 
                        (actual time=0.027..0.027 rows=1 loops=57303)
                       Output: asks.value, asks.instrument, asks.ts
                       Index Cond: ((asks.instrument = bids.instrument) 
                          AND (asks.ts <= bids.ts))
 Planning Time: 1.231 ms
 Execution Time: 1703.821 ms

Conclusion

While PostgreSQL does not have an ASOF keyword, it does offer the flexibility and functionality to perform similar operations. When you're using Timescale, things only get better with the enhancements like Skip Scan.

How to Fix Transaction ID Wraparound Exhaustion

Kirk Laurence Roybal — Wed, 10 May 2023 16:45:49 GMT

Timescale employs PostgreSQL contributors who are working feverishly to mitigate a problem with PostgreSQL. That problem is commonly referred to as “transaction ID wraparound” and stems from design decisions in the PostgreSQL project that have been around for decades.

Because this design decision was made so early in the project history, it affects all branches and forks of PostgreSQL, with Amazon RDS PostgreSQL, Greenplum, Netezza, Amazon Aurora, and many others suffering from transaction ID wraparound failures.

In this article, the second in a series of posts tackling PostgreSQL errors or issues, we’ll explain what transaction ID wraparound is, why it fails, and how you can mitigate or resolve it. But let’s start with a bit of PostgreSQL history.

Transaction ID Wraparound (XID Wraparound)

To fully understand the problem of transaction ID wraparound (or XID wraparound), a bit of history is in order. The idea of a transaction counter in PostgreSQL originated as a very simple answer to transaction tracking. We need to know the order in which transactions are committed to a PostgreSQL database, so let's enumerate them. What is the simplest way to give transactions a concept of order? That would be a counter. What is the simplest counter? An integer. Tada!

So, form follows function, and we have an integer counter. Seems like an obvious, elegant, and simple solution to the problem, doesn't it?

At first glance (and second and third, honestly), this rather simple solution stood up very well. Who would ever need more than 2³¹ (just over 2 billion) transactions in flight? That was an astronomical number for 1985.

Since this is such a huge number, we should only need a single counter for the entire database cluster. That will keep the design simple, prevent the need to coordinate multiple transaction counters and allow for efficient (just four bytes!) storage. We simply add this small counter to each row, and we know exactly what the high watermark is for every row version in the entire cluster.

This simple method is row-centric and yet cluster-wide. So, our backups are easy (we know exactly where the pointer is for the entire cluster), and the data snapshot at the beginning and end of our transaction is stable. We can easily tell within the transaction if the data has changed underneath us from another transaction.

We can even play peek-a-boo with other transaction data in flight. That lets us ensure that transactions settle more reasonably, even if we are wiggling the loose electrical connectors of our transaction context a bit.

We can stretch that counter quite a bit by making it a ring buffer. That is, we'll OR the next value to the end rather than add it there. That way, 2³¹ or 1 = 1. So, our counter can wrap around the top (2³¹) and only becomes problematic when it reaches the oldest open transaction at the bottom.

This "oldest" transaction is an upwardly moving number also, which then wraps around the top. So, we have the head (current transaction) chasing the tail (oldest transaction) around the integer, with 2,147,483,648 spaces from the bottom to the top. This makes our solution even look like a puppy dog, so now it's cute as well as elegant.

The idea is that this would make the counter almost infinite, as the head should never catch the tail. At that point, who could possibly need more transactions than that? Brilliant!

Transaction counters are obviously the way to go here. They just make everything work so elegantly.

Explanation: The Plan in Action

For many years, PostgreSQL raged forward with the XID wraparound transaction design. Quite a few features were added along the way that were based on this simple counter. Backups (pg_basebackup and its cousins), replication (both physical and logical), indexes, visibility maps, autovacuum, and defragmentation utilities all sprouted up to enhance and support this central concept.

All of these things worked together brilliantly for quite some time. We didn't start seeing the stress marks in the fuselage until the hardware caught up with us. As much as PostgreSQL wants to turn a blind eye to the reality of the hardware universe, the time came upon us when systems had the capacity to create more than 2³¹ transactions at a time.

High-speed ETL, queuing systems, IoT, and other machine-generated data could actually keep the system busy long enough that the counter could be pushed to its inherent limit.

"Everybody has a plan until I punch them in the face." —Mike Tyson

These processes weren't exactly showstoppers, though. We came up with band-aids for much of it.

COPY got its own transaction context, reducing the load significantly. So did VACUUM. VACUUM sprouted the ability to just freeze transactions without having to do a full row cleanup. That made the tail move forward a bit more quickly. External utilities gained features, individual tables gained VACUUM settings so they could be targeted separately.

Okay, that helped. But did it help enough? These features were never designed to fundamentally fix the issue. The issue is that size matters. But to be a bit more descriptive...

Possible Causes

How big is big?

In the early aughts, I was involved in building a data center for a private company. We spent some $5M creating a full-featured data center, complete with Halon, a 4K generator, Unisys ES7000, and a Clarion array. For the sake of our XID wraparound article, I'll focus on the Clarion array. It cost just a bit over $2M and could hold 96 drives for a whopping total of 1.6 TB! In 2002, that was incredible.

It doesn't seem so incredible now, does it? Kinda disappointing even. A few weeks ago, I needed an additional drive for my home backup unit. As I was walking through a Costco, I absent-mindedly threw a 2 TB drive into my cart that retailed for $69. It wasn't until I got home and was in the middle of installing it that it dawned on me how far we've come in the storage industry.

Some of the young whippersnappers don't even care about storage anymore. They think the "cloud" storage is effectively infinite. They're not wrong.

To bring this around to PostgreSQL, tables with 2M rows were a big deal in 2002. Now that's not even on the radar of "big data." A VLDB (very large database) at the time was 2 TB. Now it's approaching 1 PB.

"A lot" of transactions in 2002 was 2M. Now, I would place that number at somewhere around 2B. Oops. Did I just say 2B? Isn't that close to the same number I said a few paragraphs ago was the limit of our transaction space? Let me see, that was 2³¹, which is 2,147,483,648.

Ouch.

How to Resolve Transaction ID Wraparound Failure

To be fair, not everybody has this problem. 2,147,483,648 is still a really big number, so a fairly small number of systems will ever reach this limit, even in the transaction environment of 2023.

It also represents the number of transactions that are currently in flight, as the autovacuum process will latently brush away transaction counters that are no longer visible to the postmaster (pg_stat_activity). But if the number of phone calls to consultants is any indication, this limitation is nonetheless becoming quite an issue. It certainly isn't going away any time soon.

Everybody in the PostgreSQL ecosystem is painfully aware of the limitation. This problem affects more than just the core of PostgreSQL, it affects all of the systems that have grown around it also. Do you know what it also affects? All the PostgreSQL-based databases, such as Amazon RDS and Aurora.

To make any changes to the core of PostgreSQL, all of the ramifications of those changes have to be thought out in advance. Fortunately, we have a whole community of people (some of them proudly part of our own organization) that are really, really good at thinking things out in advance.

Query to show your current transaction ages:

with overridden_tables as (
  select
    pc.oid as table_id,
    pn.nspname as scheme_name,
    pc.relname as table_name,
    pc.reloptions as options
  from pg_class pc
  join pg_namespace pn on pn.oid = pc.relnamespace
  where reloptions::text ~ 'autovacuum'
), per_database as (
  select
    coalesce(nullif(n.nspname || '.', 'public.'), '') || c.relname as relation,
    greatest(age(c.relfrozenxid), age(t.relfrozenxid)) as age,
    round(
      (greatest(age(c.relfrozenxid), age(t.relfrozenxid))::numeric *
      100 / (2 * 10^9 - current_setting('vacuum_freeze_min_age')::numeric)::numeric),
      2
    ) as capacity_used,
    c.relfrozenxid as rel_relfrozenxid,
    t.relfrozenxid as toast_relfrozenxid,
    (greatest(age(c.relfrozenxid), age(t.relfrozenxid)) > 1200000000)::int as warning,
    case when ot.table_id is not null then true else false end as overridden_settings
  from pg_class c
  join pg_namespace n on c.relnamespace = n.oid
  left join pg_class t ON c.reltoastrelid = t.oid
  left join overridden_tables ot on ot.table_id = c.oid
  where c.relkind IN ('r', 'm') and not (n.nspname = 'pg_catalog' and c.relname <> 'pg_class')
    and n.nspname <> 'information_schema'
  order by 3 desc)
SELECT *
FROM per_database;

Adapted from Postgres-Checkup

Many enhancements have already been made to PostgreSQL to mitigate the transaction ID wraparound problem and solve it permanently. Here are the steps on the way to the solution.

The PostgreSQL system catalogs have already been enhanced to a 64-bit (eight-byte) transaction ID.
The functions and procedures of PostgreSQL have been expanded to 64-bit transaction ID parameters and outputs.
The backends (query worker processes) can deal with 64-bit transaction IDs.
Work has been done on the utilities of PostgreSQL (such as pg_basebackup) that previously assumed 32-bit integer transactions.
Replication, VACUUM, and other processes have been enhanced for 64-bit transactions.
A lot of other "stuff." Many smaller incidental fixes that were based on 32-bit assumptions needed modification.

The goal of all of these changes is to eventually move to a 64-bit transaction counter for the entire system.

Where do we go from here?

There's a bit of bad news. I'm going to close my eyes while I write this, so I won't have to look at your face while you read it.

Updating the user tables in your database to use 64-bit transaction counters will require rewriting all of your data. Remember at the beginning, where I said the transaction counter was a per-row solution? Oh, yeah.

That means that its limitations are also per row. There are only eight bytes reserved for xminand eight bytes for xmax in the row header. So, every single row of data in the database is affected.

At some point, there will be a major version of PostgreSQL that requires a data dump, replication, pg_upgrade or another such process to re-create every row in the database in the new format. It is true that every major version of PostgreSQL could change the format of data on disk.

The pg_upgrade utility will not be able to use symlinks or hardlinks for the upgrade. These links usually allow for some efficiency while upgrading. There will be no such shortcuts when the "fix" for transaction ID wraparound is put into place.

Okay, now for the good news. We will all be in retirement (if not taking a dirt nap) when the next bunch of ~~suckers~~ engineers has to deal with this issue again. 2⁶³ is not double the number of transactions. It is 9,223,372,034,707,292,160 (nine quintillion) more.

What to do while you're waiting for infinity

You can still make use of some basic mitigation strategies for transaction ID wraparound failures:

Make the autovacuum process more aggressive to keep up with maintaining the database.
Use custom settings to make the autovacuum process more aggressive for the most active tables.
Schedule vacuumdb to do additional vacuuming tasks for PostgreSQL to catch up faster.
Vacuum the TOAST tables separately so the autovacuum has a better chance of catching up.
REINDEX CONCURRENTLY more frequently so that the autovacuum has less work to do.
CLUSTER ON INDEX will re-order the data in the table to the same order as an index, thus "vacuuming" the table along the way.
VACUUM FULL, which blocks updates while vacuuming but will finish without interruption. Let me say that again. There will be no writes while VACUUM FULL is running, and you can't interrupt it. 😠
Switch over to a secondary. The transaction counter will be reset to one when the system is restarted. (There are no transactions in flight, are there? 😄)
Use batching for INSERT, UPDATE, and DELETE operations. The counter is issued per transaction (not row), so grouping operations helps reserve counters.

All of these strategies are basically the same thing. The objective is to ensure the tail number (oldest transaction) moves forward as quickly as possible. This will prevent you from ending up in a "transaction ID wraparound" scenario. 🙂 ♥️ 👍

Documentation and Resources

Check out the PostgreSQL documentation on routine vacuuming to prevent transaction ID wraparound failures.
The Timescale Docs also troubleshoot transaction ID wraparound exhaustion.

How Timescale Can Help

While Timescale—also built on the rock-solid foundation of PostgreSQL— does not solve transaction ID wraparound failure, it can help you prevent it since our ingestion inherently batches the data by design after you install timescaledb-parallel-copy.

Of course, you can do this for yourself with transaction blocks, but our tools will do the right thing automatically.

We also provide a general-purpose job scheduler that can be useful for adding VACUUM and CLUSTER operations.

So, if you want to mitigate the chances of ever dealing with XID wraparound problems while enjoying superior query performance and storage savings compared to vanilla PostgreSQL or Amazon RDS for PostgreSQL, try Timescale. Sign up now (30-day free trial, no credit card required) for fast performance, seamless user experience, and the best compression ratios.

The PostgreSQL Job Scheduler You Always Wanted (Use it With Caution)

Kirk Laurence Roybal — Thu, 19 Jan 2023 16:28:52 GMT

As a PostgreSQL guy, it really makes you wonder why a built-in job scheduler is not a part of the core PostgreSQL project. It is one of the most requested features in the history of ever. Yet, somehow, it just isn’t there.

Essentially, a job scheduler is a process that kicks off in-database functions and procedures at specified times and runs them independently of user sessions. The benefits of having a scheduler built into the database are obvious: no dependencies, no inherent security leaks, fits in your existing high availability plan, and takes part in your data recovery plan, too.

The PostgreSQL Global Development Group has been debating for years about including a built-in job scheduler. Even after the addition of background processes that would support the feature (all the way back in 9.6), background job scheduling is unfortunately not a part of core PostgreSQL.

So being the PostgreSQL lovers we are at Timescale, we decided to build such a scheduler so that our users and customers can benefit from a job scheduler in PostgreSQL. In TimescaleDB 2.9.1, we extended it to allow you to schedule jobs with flexible intervals and provide you with better visibility of error logs.

The flexible intervals enable you to determine whether the next run of the job occurs based on the scheduled clock time or the end of the last job run. And by “better visibility” of the job logs, we mean that they are also being logged to a table where they can be queried internally. These were extended to prevent overlapping job executions, provide predictable job timing, and provide better forensics.

We extensively use the advantage of this internal scheduler for our core features, enabling us to defer compression, data retention, and refreshing of continuous aggregates to a background process (among other things).

📝 Editor's note: Learn more about how TimescaleDB's hypertables enable all these features above as a PostgreSQL extension, plus other awesome things like automatic partitioning.

This scheduler makes Timescale much more responsive to the caller and results in more efficient processing of these tasks. For our own benefit, the job scheduler needs to be internal to the database. It also needs to be efficient, controllable, and scale with the installation.

We made all this power available to you as a PostgreSQL end user. If you're running PostgreSQL in your own hardware, you can install the TimescaleDB extension. If you're running in AWS, you can try our platform for free.

The PostgreSQL Job Scheduler Debate

But not so fast. Before you start rejoicing, let’s review the reasons that the PostgreSQL Global Development Group chose not to include a scheduler in the database—there'll be educational for you as a word of caution.

Rather than rehashing the discussion list on the subject, let's summarize the obstacles that came up in the mailing list:

PostgreSQL is multi-process, not multi-thread. This simple fact makes having a one-to-one relationship of processes to user-defined tasks a fairly heavy implementation issue. Under normal circumstances, PostgreSQL expects to lay a process onto a CPU (affinity), load the memory through the closest non-uniform memory access (NUMA) controller, and do some fairly heavy data processing.

This works great when the expectation is that the process will be very busy the majority of the time. Schedulers do not work like that. They sit around with some cheap threads waiting to do something for the majority of the life of the thread. Just the context switching alone would make using a full-blown process very expensive.

Background workers' processes are a relatively small pool by design. This has a lot to do with the previous paragraph, but also that each process allocates the prescribed memory at startup. So, these processes compete with SQL query workers for CPU and memory. And the background processes have priority over both resources since they are allocated at system startup.

The next issue is more semantic. There are quite a few external schedulers available. Each one of them has a different implementation of the time management system. That is, there is a question about just how exactly the job should be invoked. Should it be invoked again if it is still running from the last time? Should the job be started again based on clock time or relative to the previous job run? From the beginning or the end of the last run?

There are quite a few more questions of this nature, but you get the idea. No matter how the community answers these questions, somebody will complain that the implementation is the wrong answer because \.

Why We Still Need a PostgreSQL Job Scheduler

Timescale doesn't have the luxury of debating how many angels can dance on the head of a pin. As a database service working with large volumes of data in PostgreSQL, we face a hard requirement of background maintenance for the actions of archival, compression, and general storage. Timescale's core features, excluding hyperfunctions, depend on the job scheduler.

But, rather than create a bespoke scheduler for our own purposes we built a general-purpose scheduler with a public application programming interface.

This general-purpose scheduler is generally available as part of TimescaleDB. You may use it to set a schedule for anything you can express as a procedure or function. In PostgreSQL, that's a huge advantage because you have the full power of the PostgreSQL extension system at your disposal. This list includes plug-in languages, which allow you to do anything the operating system can do.

Timescale assumes that the developer/administrator is a sane and reasonable person who can deal with a balance of complexity. That is longhand for "we trust you to do the right thing."

With Great Power Comes Great Responsibility

So, let's talk first about a few best design practices for using the Timescale (PostgreSQL) built-in job scheduler.

Keep it short. The dwell time of the background process can lead to high concurrency. You are also using a process shared by other system tasks such as sorting, sequential scans, and other system tasks.
Keep it unlocked. Try to minimize the number of exclusive locks you create while doing your process.
Keep it down. The processes that you are using are shared by the system, and you are competing for resources with SQL query worker processes. Keep that in mind before you kick off hundreds or thousands of scheduled jobs.

Now, assuming we are using the product fairly and judiciously, we can move on to the features and benefits of having an internal scheduler.

Built-In PostgreSQL Job Scheduler: All the Nice Stuff

Now that we've covered the things that demand caution, here's a list of some of the benefits of using this scheduler:

Physical streaming replication will also replicate the job schedule. When you go to switch over to your replica, everything will already be there.
You don't need a separate high-availability plan for your scheduler. If the system is alive, so are your scheduled jobs.
The jobs can report on their own success or failure to internal tables and the PostgreSQL log file.
The jobs can do administrative functions like dropping tables and changing table structure by monitoring the existing needs and structures.
When you install Timescale, it's already there.

📝 Editor's note: Quick reminder that you can install the TimescaleDB extension if you're running your own PostgreSQL database, or sign up for the Timescale platform (free for 30 days).

How The Job Scheduler Works

There is a quick introductory article in the Timescale documentation. Click that link if you want more detailed information.

The TL;DR version is that you make a PostgreSQL function or procedure and then call the add_job() function to schedule it. Of course, you can remove it from the schedule using… Wait for it... delete_job().

That's it. Really. All that power is at your fingertips, and all you need to know is two function signatures.

Something to be aware of while you're using the scheduler is that the job may be scheduled to repeat from the end of the last run or from the scheduled clock time (in TimescaleDB 2.9.1 and beyond). This allows you to ensure that the previous job has completed (by picking from the end of the run) or that the job executes at a prescribed time (making job completion your responsibility).

If you feel a bit homesick and just want to look at your adorable job, there's also:

SELECT * FROM timescaledb_information.jobs;

And, of course, for completeness, there's always alter_job() for rescheduling, renaming, etc.

Once your job has been created, it becomes the responsibility of the job scheduler to invoke it at the proper time. The job scheduler is a PostgreSQL background process. It wakes up every 10 seconds and checks to see if any job is scheduled in the near future.

If such a job is queued up, it will request another background process from the PostgreSQL master process. The database system will provide one (provided there are any available). The provided process becomes responsible for the execution of your job.

This basic operation has some ramifications. We have already mentioned that we need to use these background processes sparingly for resource allocation reasons. Also, there are only a few of them available. The maximum parallel count of background processes is determined by max_worker_processes. If you need help configuring TimescaleDB background workers, check out our documentation.

📝 You can also check out this blog post on tuning TimescaleDB parameters.

On my system (Kubuntu 22.04.1, PostgreSQL 14.6), the default is 43. That number is just an example, as the package manager for each distribution of PostgreSQL has discretion about the initial setting. Your mileage **will** vary.

Changing this parameter requires a restart, so you will need to make a judgment call about how many concurrent processes you expect to kick off. Add that to this base number and restart your system. Of course, a reasonable number has been added for you in Timescale. Remember the CPU and memory limitations while you are making this adjustment.

What to Do With A PostgreSQL Job Scheduler: A Few Ideas

The original reasons for creating this scheduler involve building out-of-the-box features involving data management. That includes compression, continuous aggregates, retention policy implementation, downsampling, and backfilling.

You may want to use this for event notifications, sending an email, clustered index maintenance, partition creation, pruning, archiving, refreshing materialized views, or summarizing data somewhere to avoid the need for triggers. These are just a few of the obvious ideas that jump into my consciousness. You can literally do anything that the operating system allows.

What Not to Do

This would be a bad place to gum up the locking tables. That is, be sure that whatever you do here is done in a concurrent manner.

REFRESH INDEX CONCURRENTLY is better than DROP / CREATE INDEX. REFRESH MATERIALIZED VIEW CONCURRENTLY is better than REFRESH MATERIALIZED VIEW. You get it. Use CONCURRENTLY, or design concurrently. Better yet, do things in a tiny atomic way that takes little time anyway.

Long-running transactions that create a lot of locks will interfere with the background writer, the planner, and the vacuum processes. If you crank up too many concurrent processes, you may also run out of memory. Please try to schedule everything to run in series. You’ll thank me later.

Well Wishes to the Newly Crowned Emperor

Now you have the power to do anything your little heart desires in the background of PostgreSQL without having any external dependencies. We hope you feel empowered, awed, and a little bit special. We also hope you will use your new powers for good!

Try the Updated Job Scheduler

The job scheduler is available in TimescaleDB 2.9.1 and beyond. If you’re self-hosting TimescaleDB, follow the upgrade instructions in our documentation. If you are using the Timescale platform, upgrades are automatic, meaning that you already have the scheduler at your fingertips.

Keep Learning

If this article has inspired you to keep going with your PostgreSQL hacking, check out our collection of articles on PostgreSQL fine tuning.

Read Before You Upgrade: Best Practices for Choosing Your PostgreSQL Version

Kirk Laurence Roybal — Fri, 11 Nov 2022 18:35:31 GMT

PostgreSQL has a long-standing reputation for having a miserable upgrade process. So, when the community heartily recommends that you should upgrade as soon as possible to the latest and greatest PostgreSQL version, it's not really surprising that your heart sinks, your mouth goes dry, and the outright dread of another laborious job takes over.

It's almost like finishing a long hike or trying to convince somebody that Betamax was better than VHS. Eventually, you just want it to be over so you can take a nap. There's not even any joy about all the new features and speed. It's just too exhausting to generate emotion anymore.

This blog post will hopefully serve as a guide for when to pull off the old band-aid. That is, when you should upgrade and what PostgreSQL version you should select as a target. By the end of this post, we will introduce you to our best practices for upgrading your PostgreSQL version in Timescale, so you can get over this process of upgrading as quickly and safely as possible.

When to Upgrade PostgreSQL: Common Myths

The PostgreSQL Global Development Group has simplified the upgrade process quite a bit with more explicit version numbering. Since there are only two external stimuli, there are only two choices: upgrade the binaries (minor version change) or upgrade the data on disk (major version change).

The developers of PostgreSQL never really had a plan in mind for when and how to upgrade. This seems a bit of a harsh statement when tools like pg_upgrade exist but bear with me. These tools were meant to make upgrades possible, not to imply any particular schedule or recommendations for an upgrade plan. The actual upgrade implementation was always left as an exercise for the administrator.

Let's start with some of the community's conventional wisdom and pretend that those ideas were actually a plan of sorts.

Myth 1: “Upgrade as fast as possible, every time”

This "plan" is based on the fear of existing bugs. It is a very Rumsfeldian plan that assumes you don't know what the bugs are, but you're certainly better off if they're fixed. This makes for a very aggressive upgrade pace and hopes for a better tomorrow rather than a stable today.

Myth 2: "Upgrade when you have to"

The complete opposite fear-based pseudo-plan is to stick to the existing version—come hell or high water—unless you run into an otherwise unfixable bug that affects your installation. This is based on the idea that the bugs we know are better than the bugs we don't know. Unfortunately, it ignores the bugs you don't even know exist.

Myth 3: “Upgrade for every minor version”

This is the general recommendation of the PostgreSQL Global Development Group. The general idea is that all software has bugs, and upgrading is better than not upgrading. That is a bit over-optimistic about new bugs being introduced and kind of ignores that new features that you don’t care about have to be configured—or else.

This comes a bit closer to planning than guessing for minor versions, as the minor versions of PostgreSQL do not change the file system; they only change the binaries. These upgrades tend to be super heavy on bug fixes and very low on new features, which is where bugs tend to get introduced. It doesn't say anything about bugs you have actually encountered, nor does it say anything about any improvements from which you might be able to benefit.

Myth 4: “Upgrade when you have time to kill”

Probably the most dangerous plan since you will never have more time in the future and will probably never upgrade. Experience says that this is a completely silly plan that never gets implemented.

Myth 5: “Upgrade when there are security fixes”

Okay, this makes some kind of sense. Unfortunately, it ignores the rest of your installation and puts the application development team into tailspin mode for your DevOps enjoyment. It is the kind of policy you end up with when the DevOps team doesn’t really care about the Apps team.

When to Upgrade PostgreSQL

Much of this guide is based on personal experience with PostgreSQL upgrades over the years. In some cases, the old was better than the new, and in others, the other way around. In some cases, the fixes worked immediately. In others, well, not so much.

Very few hard and fast rules can be drawn when coming up with a plan of this nature, but I'll try to bring the experience to bear in a way that helps to make a decision in the future. That being said, this is a "best practice" based on experience, not a "sure-fire thing."

As a way to reduce the amount of just sheer subjectivity and opinion around choosing the moment to upgrade, I've taken a look through the release notes of PostgreSQL. In this lookie-look, I've attempted to note where bug fixes occurred and mentally move them back to the version where they were discovered. Unfortunately, this task is also somewhat subjective, as I was not a part of the bug fix development or the bug discovery. So these are just educated guesses, but I hope rather good ones.

Then I looked at the mental list that I had made and thought about whether it matched my personal experience with successful versus unsuccessful upgrades. It (again) seemed a subjectively good indicator of when an upgrade succeeded or failed.

So, on to the findings.

The first thing I noticed in my research is that the biggest upgrade failures were with a new major version containing updates to the write-ahead log (WAL). These were most notable for versions 10 and 12.

Version 10 would make a book by itself. It was a major undertaking, with quite a few subsystem rewrites. In these version upgrades, there were numerous additions to items (like WAL for hash indexes), as well as improvements and changes to the background writer to support structural changes on disk. These major updates introduced the largest number of unintended behaviors, which lasted the longest before being detected and fixed.

The next most striking failures came from logical replication between 10 and 11. Of course, logical replication was invented for 10, so there had never been an attempt to use it for production upgrades before. This first use in the field was—how should I put it?—interesting.

After that, the bugs died down a lot but were never quite gone.

Upgrade Plan

Here is my list of questions to ask before an upgrade.

1. How big is the change? Was it a major refactor, and did it involve any of the following?

Query planner: minor.
WAL: major.
Background writer: major.
Memory, caching, locks, or anything else managed by the parent process: minor.
Index engine: major (or just rebuild all your indexes anyway).
Replication: major.
Logging: minor.
Vacuuming: minor.

2. Were there any huge performance gains?

3. Does it include major security fixes?

4. Are there major built-in function() improvements/enhancements?

5. Do all of my extensions exist for the new version?

These are my rules of thumb for whether a new PostgreSQL version is compelling for upgrade. Unfortunately, this still requires some subjective evaluation and a bit of professional knowledge. For instance, just because vacuum is a major feature, it doesn't mean it has ever been a problem with an upgrade. It could be, though, and we should look at its major changes with a bit of a wry mouth hold.

This brings me to my personal procedure that has (so far) followed the above guidelines.

Upgrade major versions when they reach the minor version .2. That is, 10.2, 11.2, 12.2, etc. This technique avoids the most egregious bugs introduced in major versions but still allows for staying reasonably close to the current.
Upgrade minor versions as they are available. Minor upgrades have not created major issues thus far in my personal experience. The speed increases, bug fixes, security patches, and internationalization have been worth the minor risk.
Upgrade immediately if your version is nearing the five-year mark. The PostgreSQL Global Development Group releases a new major version every year and supports it for five years after its release. You don't want to be left with an unsupported version.
Upgrade when the security team tells you to. It doesn't happen very often, but when it does, it's a major event.
Upgrade because you need functionality. Things to upgrade for: CONCURRENTLY, SYSTEM, and performance. Things not to upgrade for: functions(), operators, and libraries.

That's all there is to it.

I hope this blog post has helped you to make a decision for when PostgreSQL has compelling new features for you.

Of course, this is only a general rule of thumb. If you feel compelled to upgrade for some other reason, don't let my guide tell you what not to do. It only intends to help in the absence of any other stimuli for upgrade. You do you.

I Am Ready to Upgrade. Now, What?

So now you have followed the checklist above and determined that it’s time for you to upgrade your PostgreSQL version. If you’re running a production database, this may be easier said than done, especially if we are talking about upgrading your major version (e.g., from PostgreSQL 13 to PostgreSQL 14):

Minor versions of PostgreSQL (e.g., from PostgreSQL 13 to PostgreSQL 13.2) are always backward compatible with the major version. That means that if you upgrade your production database, it is unlikely that anything is going to break due to the upgrade.
However, major versions of PostgreSQL are not backward compatible. That means that when you upgrade the PostgreSQL version of a database behind a mission-critical application, this may introduce user-facing incompatibilities which might require code changes in your application to ensure no breakage.

Practical example: if you are upgrading from PostgreSQL 13 to 14, in PostgreSQL 14, the factorial operators ! and !! are no longer supported, nor is running the factorial function on negative numbers. What may seem like a silly example is, in fact, illustrative that assumptions made about how certain functions (or even operators) work between versions may break once you update.

Fortunately, PostgreSQL is awesome enough to provide clear Release Notes stating the changes between versions. But this doesn’t solve our problem: how to upgrade production databases safely?

Timescale to the Rescue

This is one of the many areas in which choosing a cloud database will help. If you are self-hosting your mission-critical PostgreSQL database and want to run a major upgrade, you would have first to create a copy of your database manually, dumping your production data and restoring it in another database with the same config as your production database.

Then, you would have to upgrade this database and run your testing there. This process can take a while depending on your database's size (and if we’re talking about a time-series application, it’s probably pretty big).

Timescale makes the upgrading process way more approachable. Timescale is a database cloud for time-series applications built on TimescaleDB and PostgreSQL. In other words, this is PostgreSQL under the hood—with a sprinkle of TimescaleDB as the time-series secret sauce.

Timescale databases (which are called “services”) run on a particular version of TimescaleDB and PostgreSQL:

As a user of Timescale, you don’t have to worry about the TimescaleDB upgrades: they will be handled automatically by the platform during a maintenance window picked by you. These upgrades are backward compatible and nothing you should worry about. They require no downtime.
The upgrades between minor versions of PostgreSQL are also automatically handled by the platform during your maintenance window. As we mentioned, these upgrades are also backward compatible. However, they require a service restart, which could cause a small (30 seconds to a few minutes) of downtime if you do not have a replica. We always alert users ahead of these in advance.

✨

Editor's Note: For security reasons, we always run the latest available minor version within a major version on PostgreSQL in Timescale. These minor updates may contain security patches, data corruption problems, and fixes to frequent bugs—as a managed service provider, we have to store our customers’ data as safely as possible.

But what about upgrades between major versions of PostgreSQL? Since these are often not backward compatible, we cannot automatically upgrade your service in Timescale from, let’s say, PostgreSQL 13 to 14, which may introduce problems in your code and cause major issues!

Also, upgrading between major versions of PostgreSQL can (unfortunately but unavoidably) introduce some downtime. If you are running a mission-critical application, you want complete control over when that unavoidable downtime will occur. And you certainly want to test that upgrade first.

A database platform like Timescale can certainly help solve this issue. Upgrading your major version of Postgres will always be a decent lift—but a hosted database platform can make this process way smoother, helping you automate what can be automated and also facilitating your testing:

In Timescale, you can upgrade the PostgreSQL version that’s running on your service by simply clicking a button.
You can use database forks to test your upgrade safely. Also, by clicking a button, Timescale allows you to create a database fork (a.k.a. an exact copy of your database) which you can then upgrade to estimate the required downtime to upgrade your production instance.
You can also use forks to test your application changes. Once your fork is upgraded, you can run some of your production queries—you can find some of these using pg_stat_statements—on the fork to ensure they don’t contain any breaking changes to the new major version.

Let’s explore this more in the next section. If you’re not using Timescale, you can create a free account here—you’ll have free access for 30 days, no credit card required.

Safely Upgrading Major PostgreSQL Versions in Timescale

Here’s how you can safely upgrade your Timescale service:

First, fork your service. Timescale allows you to fork (a.k.a. copy) your databases in one click—a fast and cost-effective process. You will only be charged when your fork runs, and you can immediately delete it after your testing is complete.
Now that you have a perfect copy of your production database ready for testing (with the click of a button), it’s time to click another button to tell the platform to upgrade your major PostgreSQL version automatically. You can do this in Timescale—we’ll tell you exactly how in a minute.
Once the upgrade is complete in your fork, run your tests.
In order to see how long the upgrade took on the fork, you can go to your metrics tab and check how long your service was unavailable (the grey zone in your CPU and RAM graphs). This will give you an estimate as to how long your primary service will be down when you choose to upgrade it.
When you’re sure that nothing breaks, you can upgrade your primary service. Make sure to plan accordingly! Upgrading will cause downtime, so make sure you have accounted for that as a part of your upgrade plan.

Let’s see how this looks in the console.

First, check which TimescaleDB and PostgreSQL version your database is running on your service Overview page.

To fork your service is as easy as going to the Operations tab and clicking on the Fork service option. This will automatically create an exact snapshot of your database.

To upgrade your major version of PostgreSQL, go to your Maintenance tab. Under Service upgrades, you will see a Service upgrades button. If you click that button, your service will be updated to the next major version of Postgres (in the example below, the service would be upgraded from PostgreSQL 13.7 to PostgreSQL 14).

Your Upgrade Is Complete

That’s it! You can now use the latest and greatest that PostgreSQL has to offer. That said, choosing to upgrade is no small feat. Before going through the upgrade process, there is a lot to consider, and it is important to have a plan to account for the downtime you will experience.

While the upgrade process can be a bit painful, you can at least rely on Timescale to handle the technical orchestration of the upgrade. In the future, we hope to offer even better tooling to make the upgrade process entirely pain-free (but we have to walk before we can run, right?).

If you’d like to see what Timescale has to offer, start a free trial if you haven’t already. There’s no credit card required!

Database Scaling: PostgreSQL Caching Explained

Kirk Laurence Roybal — Tue, 13 Sep 2022 14:34:51 GMT

Follow us, friends, as we take a journey backward in time. We're going back to 1990, when soft rock was cool, and fanny packs were still okay. But we're not going there to enjoy the music and hang out at the mall. We’re going there to talk about database scaling and PostgreSQL caching. We’re going there because that was the last time PostgreSQL made simple sense—at least when it comes to resource management.

It was a time when the network was slower than a hard drive, the hard drives were slower than memory, and memory was slower than CPU. Back then, there was no such thing as a file system cache, hard drive cache, or operating system cache. Stuff like the Linux kernel was just a gleam in Linux Torvalds eye.

Why are we going there, you might ask? To be honest, because your poor author is a bit lazy. And because it's less likely that you'll be overwhelmed by the description we're about to give you.

PostgreSQL implemented a strategy to speed up access to data in those special years of clarity and simplicity. The basic idea was simple. Memory is faster than disk, so why not keep some of the most used stuff in memory to speed up retrieval? (Cue the sinister laughter from the future.)

This improvement has proven far more effective and valuable than the original authors probably envisioned. As PostgreSQL has matured over the years, the shared memory system matured with it. This most basic idea, what we commonly know as caching, continues to be very useful—in fact, the second most useful thing in PostgreSQL besides the working memory for each query. However, it is becoming less and less accurate over time, and other factors are becoming more prominent.

We are going to start at the beginning and then introduce the pesky truth—much in the same way that it blindsided the developers of PostgreSQL caching. As we go along, I can hear the advanced users of PostgreSQL. They are saying things like "except when," "unless," and "on the such-and-such platform." Yes, yes. We may or may not get around to your favorite exception to the rule. If we don't, apologies in advance for skipping it in the name of clarity and simplicity. This is not the last article we will ever write (in fact, there are already two more planned in the series, so stay tuned!). Please share your thoughts in the Timescale Forum blog channel and we'll try to get there in the next few go-arounds.

The good news is that I hope to introduce you to the concepts in a digestible format and pace. The bad news is that caching is a huge problem domain, and it will take a while to introduce you to all those concepts if you want to learn more about database scaling. Keep reading, and the information will get more useful and accurate over time.

Scaling Your Database: A Trip Down Shared Memory Lane

Back to 1990. There were basically two problems to solve to have a practical design for shared memory. The first one is that PostgreSQL is a multi-process system by design, so things happening in parallel processes can (and do) affect each other. The other is that the overhead of managing the memory system can't take more time than it would have just to retrieve the data anyway.

The good news for the first problem is that we already had a similar problem in the form of file system access. To solve that problem, we used an in-memory locking table.

Access to the file system is doled out by the postmaster process (the main one that creates all the other processes). Any other PostgreSQL process that wants to access a file has to ask the postmaster "pretty please" first. These locks are maintained in memory associated with the postmaster process. There isn't much reason to maintain them elsewhere because if the main process dies, the database will be unlocked. In other words, all the other processes working on files will be closed, and all locks released.

These requirements are suspiciously close to the same requirements for shared memory access. For the file system, we call this "locking" or, for memory, "latching." For the Postgres shared memory system, we call them "pins." Pins are very similar to locks but much simpler. All we have to care about is reading or writing to memory. So there are only two types of pins.

Now that we have the cooperation system down to two actions and a bit of memory, the next issue to solve is finding what you want when you need it. This is a simple matter of a memory scan. In PostgreSQL, the files on disk that store the actual table data are managed already with a page and leaf descriptor.

Table files in PostgreSQL

These descriptors are simply an indicator of the location of a row within a database file. The format of a page is described in the manual.

Curiously, in that description it also says:

> All the details can be found in src/include/storage/bufpage.h.

Which is a reference to the shared memory code. It turns out that every disk write operation is handled in memory first. The ctid (page and leaf location of the eventual location in the data files for the table) is assigned before the data is written to disk.

That allows the pages in memory to be "looked up" using the same description the file system uses, even if the data hasn't yet been written to the file system. Clever, eh?

💡

We could go off on a tangent here about how the journaling system works and why a page could be in the journal (known as the write-ahead log) and memory but not written to the data file system yet. That is a topic for another day. Suffice it to say for now that durability is guaranteed by writing to the journal, so this is fine and dandy. In a future article, we'll also talk about how this buffering of writes acts as a backup for the journal. Again, that's getting ahead of ourselves.

Accessing shared memory

Each connection to the database is handled by a process that the PostgreSQL developers affectionately call a “backend.” This process is responsible for interpreting a query and providing the result. In some cases, it can retrieve that result from the shared memory held by the postmaster process. To access shared memory, we have to ask if the buffer system in the postmaster keeps a copy. The postmaster responds with one of two options:

No, these aren’t the pages you’re looking for.
Yes, and this is what it might look like.

"Might" in this case, because we are now beginning to see the effects of the first issue mentioned above. No, don't look back there; we'll repeat it here. The issue is that the processes are also affecting each other. A buffer may change based on any process still in flight acting on it. So, if we want to know that the buffer is valid, we have to read it while we "pin" it.

The semantics of this are much the same as the file system. Any number of processes may access the buffers for reading purposes. The postmaster simply keeps a running list of these processes. When any process comes along with a write operation and makes a change to the buffer, all of the ones that were reading it get a notice that the contents changed in flight. It is up to each "backend" (process handling connections to the user) to reread the buffer and validate that the data continues to be "interesting." That is, the row still matches the criteria of the query.

Since the data in shared memory is managed in pages, not rows, the particular row that a query was accessing may or may not have actually changed at all. It may have just had the misfortune of being in roughly the same place as something else that changed. It may have changed, but none of the columns that are a part of the query criteria were affected. It may have changed those columns, but in a way that still matches. Or the row may now no longer be a part of what the query was searching for. This is all up to the parallel processes handling the user query to decide.

Assuming that the data has made it through this gauntlet, it can be returned to the caller. We can reasonably assume that the row looks exactly like what would have been returned had we looked it up in the file system instead of memory. Despite having to work with a form of locking and lookup, we also presume that this was cheaper than spinning up a disk and finding and reading the data.

PostgreSQL’s Shared Memory: The Design Principles

Now that we know how the basic process of accessing shared data works, let's have a few words about why it was originally designed this way. PostgreSQL is an MVCC (multi-version concurrency control) system—another topic beyond this article's scope to explain. For the moment, we'll condense this to the point of libel. INSERT, DROP, TRUNCATE and SELECT are cheap. UPDATE, DELETE and MERGE are expensive. This is largely due to the tracking system for row updates. And yes, DELETE is considered an UPDATE for tracking purposes.

💡

PostgreSQL actually doesn’t UPDATE or DELETE rows. For tracking purposes, it maintains a copy of every version of a row that existed. On UPDATE, it creates a new row, makes the changes in the new row, and marks the row as current for any future transactions. For DELETE, it just marks the existing row as no longer, well, uh, existing. This is called a tombstone record. It allows all transactions in flight (and future transactions) to know that the row is dead, creating yet another cleanup problem, which we’ll (hopefully) talk about in future articles.

The caching system follows the same coding paradigm to have the same performance characteristics. It is possible in an alternate universe that there is a cheaper solution for caching that provides "better" concurrency. That being said, the overall system is as fast as the slowest part. If the design of the caching system wildly diverged from the file system, the total response to the caller would suffer at the worst-performing points of both systems.

Also, this system is tightly integrated into the PostgreSQL query planner. A secondary system (of nearly any kind) would likely introduce inefficiencies that are far greater than any benefits would be likely to cover.

And lastly, the system acts not only as a way to return the most-sought data to the caller but also as a change buffer for data modifying queries. Multiple changes may be made to the same row before being written to the file system. The background writer (responsible for making the journal changes permanent in the data files) is intelligent enough to save the final condition of the row to disk. This added efficiency alone pays for a lot of the complexity of caching.

Cache eviction

There are a few outstanding things to consider in the design of PostgreSQL caching. The first is that, eventually, something has to come along and evict the pages out of the cache. It can't just grow indefinitely. That something is called the background writer. We know from the design above that PostgreSQL keeps track of which processes find the data interesting. When that list of processes is at zero entries, and the data hasn't been accessed in a while, the background writer will mark the block as "reusable" by putting it in a list called the "free space map." (Yes, this is much the same as the autovacuum process for the file system).

In the future, the memory space will be overwritten by some other (presumably more active) data. It's the circle of life. Buffer eviction is a garbage collection process with no understanding of what queries are in flight or why the data was put into the buffer. It just comes along on a timer and kicks things out that haven't been active in a while.

Forced eviction considered evil

Also, the backend processes we have already mentioned may decide that they need a lot of memory to do some huge operation and request the postmaster commit everything currently staged to disk. This is called a buffer flush, and it is immediate. It is miserable for performance to get into a position where a buffer flush is necessary. All concurrent processes will halt until the flush is completed and verified. <== A horrible statement in a concurrent database.

The postmaster may decide to flush the buffer cache to respond to some backend process. This is just as horrible as the previous paragraph for the same reasons.

Hey, I Was Using That

PostgreSQL is paying attention (by the autovacuum process) to which data blocks are being accessed in the file system. If these accesses reach a threshold, PostgreSQL will read the block from the disk and stick it back in the cache because it seems to answer many questions. This process is blind to the queries that access the data, the eviction process, and anything else, for that matter. It's the old late-night kung-fu flick version of sticking the commercials in there anywhere. There is no rhyme or reason to where these blocks will end up in the buffer memory space, but they seem interesting, so in you go.

In fact, the blocks in memory are effectively unordered. Because of the process of eviction and restoration, no spatial order is guaranteed (or even implied.) This means that a "cache lookup" is effectively reading the page and leaf location for each block every time the cache is accessed. The postmaster holds a list of pages in the cache in order—much like a hash index—with the memory location of each block. As the size of the cache increases, additional lookups in the "cache index" are implied.