Category: All posts
Sep 09, 2025
Posted by
Sven Klemm
Time-series and analytical data continues to grow at an unprecedented pace, and with it comes the challenge of efficiently storing and querying massive datasets. Traditionally, compressing this data required background jobs, and additional tuning. This slowed down ingestion, added operational headache, and delayed storage savings.
That’s why today, we're excited to announce Direct Compress, a new feature coming to TimescaleDB that compresses data during ingestion in memory, eliminating the need for traditional compression policies and improving insert performance by up to 40x.
Note: Direct Compress is currently available as a tech preview in TimescaleDB 2.21 for COPY operations, with full support for INSERT operations coming in a later version.
TimescaleDB has long been recognized for its industry-leading compression capabilities. With hypercore, TimescaleDB's hybrid row-columnar storage engine, users can achieve compression ratios of over 90% while maintaining fast query performance. Traditionally, the system would:
Now, Direct Compress fundamentally changes this approach by compressing data during the ingestion process itself.
Direct Compress is a feature that allows TimescaleDB to compress data in memory as it's being ingested. Instead of writing WAL records for individual tuples, the system writes compressed batches directly to disk. This approach addresses several key challenges that developers and database administrators face when working with high-volume time-series data:
To test the per-tuple overhead, a narrow table with only one integer column was used. Direct compression provided considerable performance improvements, with the single integer table achieving 148.8 million tuples per second using 10k batch compression—a 37x improvement over uncompressed insertion. For a table with a timestamp column and 2 integer columns we achieved an insert rate of 66 million tuples per second with compression.
The schema used does have a big impact on achievable insert rate, with more complex datatypes like jsonb or wider rows having lower ingest rates. Parsing integer columns was found to have the least overhead compared to other datatypes, and for these benchmarks more than half of the cpu time was spent parsing input even when using binary input format. Performance scaled linearly across all thread counts until reaching the storage I/O bottleneck. During these tests we used a Tiger Cloud instance with 64 cores and EBS storage—with more optimized storage higher numbers are probably achievable. For the uncompressed tests no indexes were present on the hypertables. The 1k and 10k batch size refers to the batch size used internally during compression, not the batch size used by the client sending the data.
By compressing data in memory before writing to disk, Direct Compress eliminates the need to write individual WAL records for each tuple. Instead, only compressed batches are written, dramatically reducing I/O overhead.
With Direct Compress, your INSERT
operations already produce compressed chunks. This means compress_chunk()
functions and compression policies become less critical to your workflow, simplifying your database maintenance.
Unlike traditional compression that happens after ingestion, Direct Compress provides storage benefits immediately, reducing your storage footprint from the moment data arrives.
Direct Compress operates by intercepting data during the ingestion process and compressing it in memory before writing to disk. The process involves:
COPY
or INSERT
operations.This approach differs from traditional compression methods because it eliminates the two-step process of "ingest then compress," instead performing both operations simultaneously. Importantly, Direct Compress requires batched operations on the client side to achieve these performance benefits. With direct compression, data ingestion becomes limited by CPU processing rather than IO speed.
Roadmap
Before using Direct Compress, ensure you have:
Direct Compress requires batching on the client side to function effectively. It cannot be used:
Direct Compress is controlled through several GUCs (Grand Unified Configuration parameters):
timescaledb.enable_direct_compress_copy
(default: off)
Enables the core Direct Compress feature for COPY
operations. When enabled, chunks will be marked as unordered, so presorting is not required.
timescaledb.enable_direct_compress_copy_sort_batches
(default: on)
Enables per-batch sorting before writing compressed data, which can improve query performance.
timescaledb.enable_direct_compress_copy_client_sorted
(default: off)
⚠️ DANGER: When enabled, chunks will not be marked as unordered. Only use this if your data is globally sorted, as queries requiring ordering will produce incorrect results with unsorted data. In the context of this feature we can distinguish between local and global sorting. Local sorting means within the current batch data is sorted. Global sorting means there is no batch that will overlap with the current batch.
-- Create a hypertable with compression
CREATE TABLE sensor_data(
time timestamptz,
device text,
value float
) WITH (
tsdb.hypertable,
tsdb.partition_column='time'
);
-- Enable Direct Compress
SET timescaledb.enable_direct_compress_copy = on;
-- Use binary format for maximum performance
COPY sensor_data FROM '/tmp/sensor_data.binary' WITH (format binary);
Binary format achieves the highest insert rates. While CSV and text formats are supported, binary format provides optimal performance.
The default orderby
configuration is time DESC
for query optimization. However, for maximum Direct Compress benefits, consider changing this to time
to optimize for insert performance:
ALTER TABLE sensor_data SET (timescaledb.orderby = 'time');
This represents a trade-off between insert performance and query performance—choose based on your primary use case.
While TimescaleDB can do sorting as part of Direct Compress, it will take away CPU resources from other tasks.
The benchmark results show significant benefits from parallel ingestion. Consider using multiple threads for large data imports.
Direct Compress works with any existing hypertable that has the columnstore enabled, provided the limitations (no unique constraints, triggers, or continuous aggregates) are met.
Direct Compress is fully compatible with existing TimescaleDB compression features. You can use both traditional columnstore policies and Direct Compress simultaneously, though Direct Compress reduces the need for background compression jobs.
Direct Compress represents a significant milestone in TimescaleDB's ongoing evolution toward real-time analytics at scale. This feature is part of our broader commitment to eliminating the traditional trade-offs between ingestion speed and storage efficiency.
Future enhancements to Direct Compress will include:
Direct Compress brings considerable performance improvements to TimescaleDB users by eliminating the traditional ingestion bottleneck. With up to 40x faster ingestion rates and immediate storage benefits, this feature is a game-changer for high-volume time-series applications.
Whether you're managing IoT sensor data, financial market feeds, or application monitoring metrics, Direct Compress can help you achieve unprecedented ingestion performance while reducing storage costs from day one.
We encourage you to try the tech preview of Direct Compress in your development environment and share your experiences with the community. Your feedback will help us refine this feature as we move toward full release. As always, our team is available to help you optimize your TimescaleDB deployment for your specific use case.
Ready to get started? Check out our documentation or contact our team for personalized assistance with Direct Compress implementation.
Have questions about Direct Compress or want to share your results? Join the conversation in our community forum or reach out to us on GitHub.
Sven is the tech lead for TimescaleDB, but his journey with databases started a long time ago. For over 25 years, he has been a huge fan of PostgreSQL, and it's that deep-seated passion that led him to where he is today. His work on planner optimizations and diving into the columnstore to squeeze out every bit of performance is a direct extension of his goal: to make the Postgres ecosystem even more powerful and efficient for everyone who uses it.
That long history with Postgres also informs his work on the security front. One of the projects he is most passionate about is pgspot, where he gets to help build a more secure future for the database. After all these years, he has seen firsthand how a strong, trustworthy foundation is essential. To him, a great database isn't just about speed; it's about protecting the data with unwavering reliability. This blend of performance and security is what truly excites him every day.
When he's not in the weeds of database code, you can find him thinking about the bigger picture—how to make the community and product stronger, safer, and more user-friendly. He loves the challenge of taking a complex problem and finding a simple, elegant solution. His journey with Postgres has taught him that the best technology is built on a foundation of trust and a commitment to continuous improvement.