---
title: "Balance Relational Purity and Speed in High Frequency Systems"
published: 2026-04-24T08:41:52.000-04:00
updated: 2026-04-24T08:41:52.000-04:00
excerpt: "Normalized schemas create latency at scale. This guide shows when to flatten your tables and use columnar compression to cut join overhead and reclaim query speed."
tags: Database, Compression, PostgreSQL
authors: NanoHertz Solutions - Jake Hertz
---

> **TimescaleDB is now Tiger Data.**

Standard [relational normalization](https://www.tigerdata.com/learn/how-to-use-postgresql-for-data-normalization) is the bedrock of database design. It prevents data duplication and maintains integrity by splitting information into specialized tables. However, as your tables approach the 500 million row mark, the very joins that keep your data clean begin to degrade performance. You may notice [p95 latencies](https://www.tigerdata.com/blog/postgres-optimization-treadmill) creeping up even though your queries remain logically sound. This guide explains how to identify when relational purity is holding you back and how to use data flattening and columnar compression to reclaim your speed.

## What You Will Learn

This guide breaks down the mechanics of read amplification and join overhead in high-volume systems. You will learn:

-   How normalized schemas create latency through expensive nested loop joins.
-   When to flatten data into wide tables to reduce query-time computation.
-   The way columnar compression turns denormalized tables into high-performance assets.
-   Techniques to reduce read amplification by minimizing the data your database must touch.

## Why It Matters

Standard normalization relies on joins to reconstruct data at query time. On small datasets, this is efficient. On tables with billions of rows, every join adds a layer of I/O and CPU overhead that compounds as your data volume grows. Row-based storage forces the database to read every column in a row, even if your query only needs two, leading to massive read amplification.

Flattening your schema, moving frequently joined metadata directly into your main ingestion table, removes the need for these expensive joins. While this usually increases storage size in a row-based system, specialized [columnar storage](https://www.tigerdata.com/blog/building-columnar-compression-in-a-row-oriented-database) allows for aggressive compression of this redundant data. By choosing the right schema architecture, you can significantly boost the responsiveness of your real-time dashboards and analytics.

## Audit Your p95 Latency

Start by identifying the queries that drive your latency spikes. Use the [pg\_stat\_statements extension](https://www.tigerdata.com/blog/using-pg-stat-statements-to-optimize-queries) to find queries with high total execution time. Run EXPLAIN (ANALYZE, BUFFERS) on these queries to see how the database interacts with the storage layer.

Look for "[Nested Loop](https://www.tigerdata.com/learn/strategies-for-improving-postgres-join-performance)" or "Hash Join" operations where the "Shared Hit Blocks" are high. If the database spends 80% of its time matching keys between a 500M row metrics table and a 10k row device table, your architecture is likely CPU-bound by join coordination. A high number of "read" buffers in the output indicates that the database is hunting through indexes and table pages to find metadata that isn't stored locally with the record.

When you see a query plan where the join cost increases linearly with the size of the primary table, you have identified a "join tax" that cannot be solved with more memory or faster disks. At this stage, the overhead of managing the relationship between tables is outweighing the benefits of relational purity.

## Migrate High-Access Metadata

Identify the metadata columns that appear most often in your WHERE and JOIN clauses.

-   Good Candidates: Low-cardinality strings like region, site\_id, or device\_type. These compress effectively in a columnar format because the engine only needs to store a single value and a list of row counts.
-   Poor Candidates: High-cardinality unique identifiers like session\_uuid or frequently updated timestamps. These values vary for almost every row, which prevents the compression engine from reducing the storage footprint and can lead to table bloat.

Then, add the new denormalized column to your high-frequency ingestion table:

```SQL
ALTER TABLE device_metrics ADD COLUMN region TEXT;
```

Next, backfill the existing rows. On tables with hundreds of millions of rows, a single UPDATE statement can cause massive transaction log bloat and table locks. Instead, backfill in batches or use a join-based update to pull metadata from your reference table:

```SQL
UPDATE device_metrics m
SET region = d.region
FROM devices d
WHERE m.device_id = d.id
  AND m.region IS NULL;
```

This migration eliminates the join at query time, allowing the database to filter and aggregate in a single pass. By localizing the data, you ensure the database engine no longer needs to load secondary table pages into the buffer pool just to check a filter condition.

## Configure Columnar Compression

After flattening your table, you must enable columnar storage to handle the redundant metadata. In Tiger Data, this is achieved through a hybrid storage engine that partitions data into "chunks" and then compresses those chunks into a columnar format. This storage engine groups data by column rather than row, allowing the database to ignore unused columns during a scan and reducing the physical I/O required for every query.

To implement this, you first define your compression policy. You must choose a segmentby column, typically a device ID or primary key, and an orderby column, usually a timestamp. The segmentby column is the most important for performance; it determines how data is grouped within the compressed segments.

```SQL
-- 1. Enable compression on the hypertable
ALTER TABLE device_metrics SET (
  timescaledb.compress = true,
  timescaledb.compress_orderby = 'ts DESC',
  timescaledb.compress_segmentby = 'device_id, region'
);

-- 2. Add a policy to compress data older than 7 days
SELECT add_compression_policy('device_metrics', INTERVAL '7 days');
```

When the compression policy runs, the database transforms the row-based data into compressed columnar batches. For a flattened table, this is where the performance "magic" happens. If you have 100 million rows where the region is 'north\_east', a row-based engine stores that string 100 million times. The columnar engine stores it once along with a metadata bitmask, reducing the storage footprint of that column by up to 99%.

You can verify the effectiveness of your compression and the reduction in [read amplification](https://www.tigerdata.com/blog/hidden-performance-cost-wildcard-queries) by querying the compression statistics:

```SQL
SELECT 
    total_chunks,
    number_compressed_chunks,
    pg_size_pretty(before_compression_total_bytes) AS before_size,
pg_size_pretty(after_compression_total_bytes) AS after_size
FROM hypertable_columnstore_stats('device_metrics');
```

This architectural shift moves your database from being I/O bound to being CPU efficient. By reducing the "Shared Hit Blocks" (the number of 8KB pages the database must load into memory), you free up the buffer pool for other critical operations and effectively raise the performance ceiling of your entire system.

## Measuring the Join Tax

You can measure the performance gap between a normalized join and a flattened table by comparing their execution costs. Consider a system tracking [millions of IoT sensors](https://www.tigerdata.com/blog/timescaledb-manufacturing-iot-building-data-pipeline) that needs to find the average reading for a specific region.

### The Normalized Approach (Slow)

This query must join two tables, forcing the database to match foreign keys for every row in the time range.

```SQL
SELECT avg(m.value), d.region
FROM device_metrics m
JOIN devices d ON m.device_id = d.id
WHERE m.ts > now() - interval '1 hour.'
 AND d.region = 'north_east'
GROUP BY d.region;
```

### ​The Flattened and Compressed Approach (Fast)

By moving the region column into the device\_metrics table and using columnar compression, the join disappears. The database only reads the value and region columns from disk, ignoring the rest of the row.

```SQL
SELECT avg(value), region
FROM device_metrics
WHERE ts > now() - interval '1 hour.'
 AND region = 'north_east'
GROUP BY region;
```

## Next Step

Identify your most expensive join query using pg\_stat\_statements. Create a denormalized version of that table that includes the necessary metadata, and test query performance against a compressed columnar version. This test will help you determine if your system is hitting a structural ceiling that only an architectural change can fix.

To try this on a fully managed instance, [start a free Tiger Cloud trial](https://console.cloud.tigerdata.com/signup). You can compare and contrast query performance on real data without impacting your production database.