Tiger Data Blog

How to Measure Your IIoT PostgreSQL Table

Doug Pagnutti — Thu, 12 Mar 2026 18:50:42 GMT

I was doing some validation tests for an essay about the performance envelope for an IIoT PostgreSQL database and realized that measuring a database table is not as straightforward as I assumed it would be.

The general idea was that I would insert IIoT data into a table and then measure the size and performance of the table as it grows. But how do you actually read the size of a table? What is performance? How can we quantify these values in a way that’s useful for us engineers?

Here’s what I did.

Table Size

There are two key measurements that define a table’s size: How many rows does it have and how much disk space does it occupy.

Row Count

For small tables, this is straightforward:

SELECT COUNT(*) FROM

However, that query requires scanning every row in the table. For typical IIoT tables, like the ones I was testing, that might be billions of rows and might take minutes to execute.

Instead there’s a much faster query:

SELECT reltuples::bigint AS row_count 
FROM pg_class 
WHERE relname = ‘’

This is the row count that PostgreSQL uses for the query planner. It’s not guaranteed to match the row count exactly, because it’s not continuously updated, but it’s close enough and returns almost instantly.

⚠️

For hypertables created with TimescaleDB, reltuples should not be used. Instead use approximate_row_count()

Size on Disk

PostgreSQL stores table data across several components: the heap (the main table data), the indices, and TOAST storage (where large values get stashed). All three contribute to the table size and overall storage requirements as shown in the following image.

Here’s the query I used to get the three separate components.

SELECT pg_relation_size(‘') AS heap_size,
pg_indexes_size('') AS indexes_size,
pg_table_size('')
  - pg_relation_size('') AS toast_size;

This will return the sizes in bytes, but you can also use the function pg_size_pretty() to get a more human readable output.

Ingest Capacity

Ingest capacity is critical to IIoT workflows, and it’s where a lot of systems run into serious trouble. How do you measure capacity? You can either get an approximate value from current ingest speeds, or push your database to the limit.

If You Already Have a Data Source Connected

If your data stream is already connected, you can look at how long ingests are taking and figure out the capacity from that.

This requires the built-in tool pg_stat_statements which is essential for any serious database. To enable it (it ships with PostgreSQL, so it’s always available) run the following query:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Once it’s running, it creates a table called pg_stat_statements that you can query for your INSERT performance:

SELECT 
    query,
    calls,
    rows,
    total_exec_time / 1000 AS total_time_sec,
    mean_exec_time AS avg_ms_per_call,
    rows / NULLIF(calls, 0) AS avg_rows_per_call,
    rows / NULLIF(total_exec_time / 1000, 0) AS rows_per_sec
FROM pg_stat_statements
WHERE query ILIKE '%INSERT%%'
ORDER BY total_exec_time DESC;

This gives you a picture of real ingest performance based on what your application is actually doing. You'll see how many rows each call inserts (obviously you’re batching), the average time per call, and a rough rows-per-second figure. The time it takes to insert a batch divided by the period of your desired insertions gives you a rough estimate of how much ingest capacity you’re using.

You can reset the stats whenever you want for a fresh baseline:

SELECT pg_stat_statements_reset();

By measuring this as your table grows, you’ll get a good sense of how your ingest capacity is evolving and you’ll be able to deal with it well before it becomes an issue.

The actual ingest capacity

If you don’t mind really pushing your table to its limits (and maybe breaking it), you can try to ingest as much as possible and see if the database keeps up. I wrote a full walkthrough for this, including the SQL for generating realistic IIoT data and a scripted test loop, in How to Break Your PostgreSQL IIoT Database and Learn Something in the Process.

Query Speed

Query speed is the most obvious metric for a database, as it affects everyone using the data. However, I found it to be one of the most difficult to generalize. Every application will have specific queries that are important, and different definitions of what is ‘fast enough’. It’s also something that tends to degrade over time and only become an issue well into the life of the table.

For queries you’re already running

If you already have dashboards running, or your analysis workflow in place, you can again use pg_stat_statements. Here's how to pull information for the 20 slowest queries:

SELECT
    query,
    calls,
    rows,
    mean_exec_time AS avg_ms,
    total_exec_time / 1000 AS total_time_sec,
    stddev_exec_time AS stddev_ms,
    rows / NULLIF(calls, 0) AS avg_rows_returned
FROM pg_stat_statements
WHERE query ILIKE '%SELECT%%'
ORDER BY total_exec_time DESC
LIMIT 20;

For more general queries

IIoT queries tend to fall into two categories: wide (what is the state of all devices at a specific time?) and deep (what is the history of a particular device?). By running at least one example from each type, you’ll get a sense of how quickly these types of queries will return.

Generic Wide Query

SELECT DISTINCT ON (tag_id) 
  tag_id, 
  time, 
  value
FROM 
ORDER BY tag_id, time DESC
LIMIT 100

This returns the most recent value from 100 tags.

Generic Deep Query

SELECT 
    tag_id,
    DATE_TRUNC('hour',time) as hour,
    AVG(value) as hourly_average
FROM 
WHERE tag_id = 
GROUP BY DATE_TRUNC('hour',time)
ORDER BY hour DESC
LIMIT 100

This returns the past 100 hourly averages from one specific tag.

💡

It's important to run these queries multiple times to get a robust measurement. There are a lot of internal optimizations that PostgreSQL uses to speed up common queries and it’s therefore likely to run faster after a few executions.

Putting It All Together

The real value comes from combining these measurements as the table grows. Here's the general approach I followed for my essay:

Create a simple IIoT table schema and common index.
Measure table size (rows + disk space), query times, and ingest time for a couple standard batches.
Insert many batches as fast as possible so the table grows quickly.
Repeat steps 2 and 3 until some predefined limit (usually disk space or query time)

If I was instead using a real production system, I would rely more on pg_stat_statements to track query and ingest rates. Doing this every day when the system is new and then a weekly check will ensure you know exactly how your table is evolving.

How to Break Your PostgreSQL IIoT Database and Learn Something in the Process

Doug Pagnutti — Wed, 18 Feb 2026 19:43:56 GMT

As engineers, we're taught to design for reliability. We do design calculations, run simulations, build and test prototypes, and even then we recognize that these are imperfect, so we include safety factors. When it comes to the Industrial Internet of Things (IIoT) though, we rarely give the same level of scrutiny to the components that we rely on.

What if we treated our IIoT database the same way we treated the physical things we produce? We build and design a prototype database, and then put it through some serious testing, even to failure.

The Value (and Perils) of Stress Testing

Think of database stress testing as a destructive materials test for your data storage. You wouldn't trust a bridge made of untested steel, so don’t trust your database until you know its limits.

The Value:

Identify Bottlenecks: Stress testing reveals the weak links—what is likely to fail first? Will you run out of storage? Will your queries get bogged down? Or will you hit the dreaded ingest wall (when data comes in faster than it can be stored)?
Determine Real-World Behaviour: You'll find out exactly how your database performance changes as the amount of data increases. What issues are future-you going to struggle with?
Optimize Configuration: Just like you might build a few different prototypes and see how it affects failure modes, changing your database configuration, especially when it comes to indices, can dramatically affect how it behaves. Building a rigorous stress testing framework provides a safe way to optimize your design.

I hope it goes without saying, but please, please don’t run this on your production environment. Even if it’s technically a different database but the same hardware, this test can wreak havoc on your resources and crash your system. You’ve been warned.

What to Measure?

There’s no point going through all the effort to break your system if you don’t learn anything. Assuming you’re using a PostgreSQL database (It’s 2026, Just Use PostgreSQL), here is a decent set of metrics to keep track of while you’re putting your database through its paces.

Table Size

The size of a Postgresql table is generally measured by number of rows, but the actual space on disk that it occupies is a sum of the heap (the main relational table), the indices, and the TOAST (storage for large objects).

The following query will give the number or rows as well as the size of each component of the table in bytes.

SELECT
      reltuples::bigint AS row_count,
      pg_relation_size('iiot_history') AS heap_size,
      pg_indexes_size('iiot_history') AS indices_size,
      pg_table_size('iiot_history') -
            pg_relation_size('iiot_history') AS toast_size
FROM pg_class WHERE relname = 'iiot_history';

The reason for the odd row_count is that counting rows the standard way, with COUNT(*), requires scanning the whole table, which is going to be painfully slow when we’re building a table big enough to break things.

Table Performance

The best way to measure table performance is to use the actual queries that your production system will use. At a minimum, this should include your batched INSERT (you always batch, right?) and at least one common SELECT. Keep in mind that for a table with N rows, the timing for queries tend to be either constant, log(N), N or worse depending on how the indices are structured.

You can get very accurate timing info from running your queries with the prefix EXPLAIN ANALYZE, and it’s worth doing this at least once to see what the database is doing under the hood. However, I recommend running the whole test with a scripting language and then just timing the execution of that particular step.

Server Performance

Don’t forget the engine that’s driving all this machinery. You’ll need to watch the CPU, Memory, Storage, and Network Bandwidth. People in the IT world tend to talk about headroom for a server, and that’s what you’re really looking at: how much spare capacity do you have? Your CPU and Memory usage might spike at times, but the important thing is that it’s not always running at max capacity.

There are a lot of free and paid tools to monitor these variables. I almost always do this type of test in a VM (easier to clean up the mess when it all breaks) and I like to use Prometheus but honestly Perfmon in Windows or Top in Linux gives you all you really need.

Setting Limits

It’s helpful to set some limits on these parameters so you know when to stop the test. For database size, it might be some measurement like a year's worth of data, or when the drive is 80% full. For ingest timing, I suggest stopping when inserting takes longer than the desired ingest frequency—this is the ingest bottleneck and something you really want to avoid in production. Scan times can be limited by the time it takes for a specific query. Maybe calculating the average value from one tag over the past hour must be less than 10s.

How to Simulate Data?

There are lots of ways to insert data, but it’s usually a tradeoff between how well the data represents real scenarios and how long it takes to run the test.

The following is one of my favourite methods for injecting large amounts of data into an IIoT database:

Say you have a classic IIoT history table like the following:

CREATE TABLE iiot_history(
	time TIMESTAMPZ NOT NULL,
	tag_id INT NOT NULL,
	value DOUBLE PRECISION,
	PRIMARY KEY (tag_id, time)
);

If you expect to ingest 10,000 tags at 1s intervals, you can use the following INSERT query to add a day’s worth of history to the back end of your table.

INSERT INTO iiot_history(time, tag_id, value)
	SELECT *, random() as value 
FROM(
		SELECT generate_series(
			min_date-INTERVAL '1day',
			min_date-INTERVAL '1s',
			INTERVAL '1s') as time
		FROM (SELECT LEAST(NOW(),MIN(time)) AS min_date 
FROM iiot_history)
),
		generate_series(1,10000) as tag_id;

This will generate random data values for every second during a day and for every tag_id from 1 to 10,000. Not exactly as interesting as real data, but enough to fill up your table.

The nice thing about this query is that you should be able to run it in parallel to your real-time data pipeline and it won’t mess with your data (aside from potentially locking your table while it runs). It’s also easy to modify this query to inject more or less tags as well as change the time interval if you’re playing around with different configurations.

If you use this query, or whichever one you prefer, in a script (I usually use Python), then you can automate the whole test. Something along the lines of:

Get database size
Run select queries, measure execution time
Run insert queries several times, measure and average execution time
Artificially grow database size
Repeat 1-3 until one of the failure conditions is reached.

How to Interpret Results and What to Expect in the Real World?

Your test results will give you some clear data points, but you still need to do some interpreting.

Identify the Limiting Component: Where did the database fail? If it’s a query that took too long, you might be able to speed things up with a clever index. If it’s an insert that took too long, you might be able to speed things up by removing that clever index you added earlier.
Optimize: There’s a lot you can do to improve table performance before throwing the whole thing out in frustration:
1. Proper Indexing: Choosing an index is almost always a tradeoff, for example: Indexing the tag_id column before the time column will speed up most queries, at the cost of slower inserts as the table grows. Indexing the time column first will avoid the ‘ingest wall’ at the cost of slower queries. Figure out which solution is best.
2. Plan for the future: Will you need more hardware in a few months or a few years? Being able to estimate the life of your existing architecture means you won’t be caught unawares when it no longer suffices.
3. Partitioning/Chunking: For very large tables, you may need to partition appropriately (see PostgreSQL extensions like TimescaleDB). How great would it be to learn you’ll need this before you actually need this.
Add a Safety Factor: If your test showed a maximum reliable throughput of 15,000 rows/sec, set your operational limit to 10,000 rows/sec. The real world has peaks, unexpected queries, and background maintenance tasks that will steal resources. Like we do with all engineering products, design with margin.

If you treat your database like a prototype and really put it through its paces, you’ll get a preview of how it’ll behave in the future and make good, proactive design decisions instead of struggling in the future. Now, go break something (and learn).