How CERN Powers Ground-Breaking Physics with TimescaleDB

This is an installment of our Community Member Spotlight series, in which we invite our customers to share their work, spotlight their success, and inspire others with new ways to use technology to solve problems.

About the Company

The European Laboratory for Particle Physics (CERN) stands as a global leader in fundamental physics research, conducting large-scale physics experiments such as the Large Hadron Collider (LHC) which produces high volumes of time-series data. Control systems monitor parameters like voltage, pressure, and temperature to support physics analysis and ensure experimental reproducibility.

More than 800 SCADA systems built on SIMATIC WinCC Open Architecture generate hundreds of gigabytes of data daily. This volume demands efficient storage and fast, consistent querying. For the NextGen Archiver module of WinCC OA, developed by CERN and Siemens/ETM, managing growing data volumes and throughput while delivering responsive data visualizations has been a persistent challenge.

About the Team

I’m Rafal Kulaga, Staff Software Engineer at CERN with over a decade of experience developing software for large and distributed control systems. Alongside me, software engineers Antonin Kveton and Martin Zemko contributed to the TimescaleDB backend of the Next Generation Archiver for WinCC OA, in collaboration with Siemens/ETM through the CERN openlab programme.

The Challenge: Navigating a Legacy Data Landscape

For over two decades, CERN has used WinCC OA to build and operate large-scale SCADA systems, now standard across the organization. More than 800 systems span key domains including:

Gas Distribution
Electrical Grid
Vacuum
Environment & Radiation
Detector Controls
Machine Protection
Cryogenics
Cooling & Ventilation

In the Large Hadron Collider experiments alone, each detector uses over 100 WinCC OA systems and generates millions of data points—signals that produce time-stamped values essential for both real-time operations and post-run physics analysis.

A core function of WinCC OA is archiving time-series data, including events and alarms. Archiving is not only about storage, but also enabling efficient querying—both from external tools and directly within WinCC OA. Without historical access, operators see only a live snapshot of system states.

Since 2008, the RDB Archiver has fulfilled this role at CERN. Though still active in about 30% of systems, it suffers from two main issues: tight coupling to Oracle Database, and the accumulation of technical debt due to rigid, complex schemas that limit flexibility and performance.

CERN's Strategic Vision: The NextGen Archiver (NGA)

To overcome the limitations of its legacy system, CERN and Siemens/ETM launched the NextGen Archiver (NGA) project in 2017. Now deployed in ~500 systems, NGA was built to:

Reduce technical debt
Improve performance
Support new use cases
Provide an alternative to Oracle

At its core, NGA features a pluggable backend architecture that supports multiple databases, including PostgreSQL and TimescaleDB, enabling a more open, flexible system. Oracle support is maintained for compatibility with existing RDB Archiver schemas, while the new TimescaleDB solution currently runs in parallel.

Selecting the Backend: Why TimescaleDB

CERN's selection criteria included the following considerations:

On-premise deployment and support
High availability and reliability
Long-term viability and potential exit strategies
Scalability and performance
Ecosystem maturity and client support

As a PostgreSQL extension, TimescaleDB aligned closely with CERN’s existing expertise and offered specialized support for time-series workloads. Benchmark testing against PostgreSQL confirmed its suitability as the backend for the NextGen Archiver.

Writing Throughput

In benchmark tests, the write throughput achieved with TimescaleDB and PostgreSQL has comfortably exceeded the requirements of even the largest WinCC OA systems used at CERN (up to 20,000 rows per second per connection). TimescaleDB delivered the highest performance, even with large datasets, high time-series counts, and concurrent writers. Additionally, TimescaleDB has significantly simplified schema design by automatically and efficiently partitioning hypertables and enforcing data retention policies.

If needed, ingestion rates could be further improved by switching from text-based to binary COPY for data ingestion and by tuning timing parameters to favor fewer, larger writes — at the cost of a slight increase in latency.

“Our largest systems write at a rate well below 20k rows per second per connection, so the measured throughput of 40k rows per second means that TimescaleDB will cope with them easily”, said Rafal Kulaga.

Metric	Required writing rate (per system connection)	PostgreSQL (DB only)	TimescaleDB (DB only)	TimescaleDB (DB+ Client)
Writing rates	20k rows/sec	67k rows/sec	77k rows/sec	40k rows/sec

Compression Storage Savings

In their tests, CERN confirmed that TimescaleDB’s columnar compression would deliver at least 7x reduction in storage requirements. Each column is compressed individually, delivering significant savings:

78–87% space reduction for synthetically generated data
90–95% space reduction for historical data extracted from production systems

Query Performance on Compressed Data

Query speeds improved substantially when reading compressed data, due to lower I/O. The gains increase with query size and range.

“For the read results using compression, we have observed pretty significant speed-ups, from 10x to 40x, depending on your query, range, and data frequency," said Martin Zemko.

Continuous Aggregates

Continuous Aggregates (CAggs), which automatically materialize and refresh summaries, were another key advantage.

“For continuous aggregates the performance improvements are impressive,” said Martin Zemko, “because the volume of the queried data is significantly reduced. We also observed strong caching effects because the amount of data is relatively tiny to the original data, fitting your hypertables into memory quite easily. We expect Continuous Aggregates to be a game-changer especially for plotting high-frequency signals.”

With its demanding scale, critical uptime requirements, and strict reproducibility standards, CERN provides a rigorous testbed for any database. Success here signals confidence in the solution’s ability to perform in other complex enterprise environments.

Future Plans: TimescaleDB in Production

Based on current results, CERN plans to standardize on TimescaleDB as an alternative to Oracle for storage of historical data from WinCC OA systems, leveraging compression and continuous aggregates. Full-scale benchmarks are underway, with CERN-wide production deployment targeted for 2027.

The TimescaleDB databases powering the NextGen Archiver are provisioned via the Database on Demand (DBOD) service offered by the CERN IT Department, enabling streamlined, scalable and fully managed deployments aligned with CERN’s internal infrastructure standards.