Introducing Direct Compress: Up to 40x Faster, Leaner Data Ingestion for Developers (Tech Preview)

Posted by

Sven Klemm

Time-series and analytical data continues to grow at an unprecedented pace, and with it comes the challenge of efficiently storing and querying massive datasets. Traditionally, compressing this data required background jobs, and additional tuning. This slowed down ingestion, added operational headache, and delayed storage savings.

That’s why today, we're excited to announce Direct Compress, a new feature coming to TimescaleDB that compresses data during ingestion in memory, eliminating the need for traditional compression policies and improving insert performance by up to 40x.

Note: Direct Compress is currently available as a tech preview in TimescaleDB 2.21 for COPY operations, with full support for INSERT operations coming in a later version.

The Evolution of TimescaleDB’s Columnstore

TimescaleDB has long been recognized for its industry-leading compression capabilities. With hypercore, TimescaleDB's hybrid row-columnar storage engine, users can achieve compression ratios of over 90% while maintaining fast query performance. Traditionally, the system would:

Insert data in uncompressed row format
Write individual WAL records for each tuple
Later compress chunks through background policies

Now, Direct Compress fundamentally changes this approach by compressing data during the ingestion process itself.

What is Direct Compress?

Direct Compress is a feature that allows TimescaleDB to compress data in memory as it's being ingested. Instead of writing WAL records for individual tuples, the system writes compressed batches directly to disk. This approach addresses several key challenges that developers and database administrators face when working with high-volume time-series data:

Excessive I/O overhead: Traditional ingestion requires writing each tuple individually to the WAL, creating significant I/O bottlenecks
Dependency on compression policies: Previously, you had to wait for background compression jobs to optimize storage
Insert performance limitations: Large-scale data ingestion was constrained by the overhead of individual tuple processing

Benchmark Results (37x Improvement)

To test the per-tuple overhead, a narrow table with only one integer column was used. Direct compression provided considerable performance improvements, with the single integer table achieving 148.8 million tuples per second using 10k batch compression—a 37x improvement over uncompressed insertion. For a table with a timestamp column and 2 integer columns we achieved an insert rate of 66 million tuples per second with compression.

The schema used does have a big impact on achievable insert rate, with more complex datatypes like jsonb or wider rows having lower ingest rates. Parsing integer columns was found to have the least overhead compared to other datatypes, and for these benchmarks more than half of the cpu time was spent parsing input even when using binary input format. Performance scaled linearly across all thread counts until reaching the storage I/O bottleneck. During these tests we used a Tiger Cloud instance with 64 cores and EBS storage—with more optimized storage higher numbers are probably achievable. For the uncompressed tests no indexes were present on the hypertables. The 1k and 10k batch size refers to the batch size used internally during compression, not the batch size used by the client sending the data.

Key Benefits

Reduced I/O operations

By compressing data in memory before writing to disk, Direct Compress eliminates the need to write individual WAL records for each tuple. Instead, only compressed batches are written, dramatically reducing I/O overhead.

Eliminated policy dependencies

With Direct Compress, your INSERT operations already produce compressed chunks. This means compress_chunk() functions and compression policies become less critical to your workflow, simplifying your database maintenance.

Immediate storage efficiency

Unlike traditional compression that happens after ingestion, Direct Compress provides storage benefits immediately, reducing your storage footprint from the moment data arrives.

How Direct Compress Works

Direct Compress operates by intercepting data during the ingestion process and compressing it in memory before writing to disk. The process involves:

Batch Collection: Data is collected in configurable batches during COPY or INSERT operations.
In-Memory Compression: Each batch is compressed using TimescaleDB's proven compression algorithms.
Optimized Writing: Compressed batches are written directly to disk with minimal WAL overhead.

This approach differs from traditional compression methods because it eliminates the two-step process of "ingest then compress," instead performing both operations simultaneously. Importantly, Direct Compress requires batched operations on the client side to achieve these performance benefits. With direct compression, data ingestion becomes limited by CPU processing rather than IO speed.

Roadmap

COPY support (TimescaleDB 2.21 - Tech Preview)
INSERT support (coming soon)
Continuous aggregate support (coming soon)

Getting Started with Direct Compress

Prerequisites

Before using Direct Compress, ensure you have:

TimescaleDB version 2.21 or later (currently in tech preview)
A hypertable with compression enabled (see example)
Batched client operations to make use of the feature

Important requirements and limitations

Direct Compress requires batching on the client side to function effectively. It cannot be used:

If the hypertable schema has unique constraints
If the hypertable has triggers
Continuous aggregates on the target hypertable

Configuration options

Direct Compress is controlled through several GUCs (Grand Unified Configuration parameters):

timescaledb.enable_direct_compress_copy (default: off)

Enables the core Direct Compress feature for COPY operations. When enabled, chunks will be marked as unordered, so presorting is not required.

timescaledb.enable_direct_compress_copy_sort_batches (default: on)

Enables per-batch sorting before writing compressed data, which can improve query performance.

timescaledb.enable_direct_compress_copy_client_sorted (default: off)

⚠️ DANGER: When enabled, chunks will not be marked as unordered. Only use this if your data is globally sorted, as queries requiring ordering will produce incorrect results with unsorted data. In the context of this feature we can distinguish between local and global sorting. Local sorting means within the current batch data is sorted. Global sorting means there is no batch that will overlap with the current batch.

Basic Usage Example

-- Create a hypertable with compression
CREATE TABLE sensor_data(
    time timestamptz, 
    device text, 
    value float
) WITH (
    tsdb.hypertable,
    tsdb.partition_column='time'
);

-- Enable Direct Compress
SET timescaledb.enable_direct_compress_copy = on;

-- Use binary format for maximum performance
COPY sensor_data FROM '/tmp/sensor_data.binary' WITH (format binary);

Best Practices and Recommendations

1. Use binary format

Binary format achieves the highest insert rates. While CSV and text formats are supported, binary format provides optimal performance.

2. Consider order by configuration

The default orderby configuration is time DESC for query optimization. However, for maximum Direct Compress benefits, consider changing this to time to optimize for insert performance:

ALTER TABLE sensor_data SET (timescaledb.orderby = 'time');

This represents a trade-off between insert performance and query performance—choose based on your primary use case.

3. Presort data before ingestion

While TimescaleDB can do sorting as part of Direct Compress, it will take away CPU resources from other tasks.

4. Leverage multiple threads

The benchmark results show significant benefits from parallel ingestion. Consider using multiple threads for large data imports.

Migration and Compatibility

Upgrading existing tables

Direct Compress works with any existing hypertable that has the columnstore enabled, provided the limitations (no unique constraints, triggers, or continuous aggregates) are met.

Backward compatibility

Direct Compress is fully compatible with existing TimescaleDB compression features. You can use both traditional columnstore policies and Direct Compress simultaneously, though Direct Compress reduces the need for background compression jobs.

Looking Forward

Direct Compress represents a significant milestone in TimescaleDB's ongoing evolution toward real-time analytics at scale. This feature is part of our broader commitment to eliminating the traditional trade-offs between ingestion speed and storage efficiency.

Future enhancements to Direct Compress will include:

Support for INSERT
Additional optimizations for unsorted data when using direct compress
Compatibility with continuous aggregates
Enhanced client-side tooling for optimal batching

Try Direct Compress Today

Direct Compress brings considerable performance improvements to TimescaleDB users by eliminating the traditional ingestion bottleneck. With up to 40x faster ingestion rates and immediate storage benefits, this feature is a game-changer for high-volume time-series applications.

Whether you're managing IoT sensor data, financial market feeds, or application monitoring metrics, Direct Compress can help you achieve unprecedented ingestion performance while reducing storage costs from day one.

We encourage you to try the tech preview of Direct Compress in your development environment and share your experiences with the community. Your feedback will help us refine this feature as we move toward full release. As always, our team is available to help you optimize your TimescaleDB deployment for your specific use case.

Ready to get started? Check out our documentation or contact our team for personalized assistance with Direct Compress implementation.

Have questions about Direct Compress or want to share your results? Join the conversation in our community forum or reach out to us on GitHub.

About the Author

Sven is the tech lead for TimescaleDB, but his journey with databases started a long time ago. For over 25 years, he has been a huge fan of PostgreSQL, and it's that deep-seated passion that led him to where he is today. His work on planner optimizations and diving into the columnstore to squeeze out every bit of performance is a direct extension of his goal: to make the Postgres ecosystem even more powerful and efficient for everyone who uses it.

That long history with Postgres also informs his work on the security front. One of the projects he is most passionate about is pgspot, where he gets to help build a more secure future for the database. After all these years, he has seen firsthand how a strong, trustworthy foundation is essential. To him, a great database isn't just about speed; it's about protecting the data with unwavering reliability. This blend of performance and security is what truly excites him every day.

When he's not in the weeds of database code, you can find him thinking about the bigger picture—how to make the community and product stronger, safer, and more user-friendly. He loves the challenge of taking a complex problem and finding a simple, elegant solution. His journey with Postgres has taught him that the best technology is built on a foundation of trust and a commitment to continuous improvement.

Date published

Sep 09, 2025

Posted by

Sven Klemm

Get Started Free with Tiger CLI

Date published

Sep 09, 2025

Posted by

Sven Klemm

Get Started Free with Tiger CLI