---
title: "SCADA Data Management at Scale: Architecture, Historians, and the Modern Database"
description: " SCADA data management at scale: schema design, historian limits, Ignition integration, and migration patterns—with three oil and gas deployments as proof."
section: "General"
---

> **TimescaleDB is now Tiger Data.**

The SCADA database problem is not a storage problem. SCADA systems were built to capture and serve operational data reliably (pressure readings, flow rates, valve states) for operators watching control room dashboards. That job they do well. The job they were not designed for is what industrial teams now need: SQL access to years of tag history, cross-site aggregation, ML pipelines on live sensor data, and cloud-native deployment across distributed operations.

IIoT expansion is widening this gap. Sensor counts grow, analytical demands rise, and the mismatch between what proprietary historians do and what engineering teams need gets more expensive to ignore.

This guide covers schema design for SCADA tag data, how purpose-built historians work and where they break, Ignition integration paths, and a migration framework, with three oil and gas customer deployments as proof. 

## What Makes SCADA Data Different from General IoT Data

General IoT data is diverse. SCADA data has a specific structure that shapes every downstream decision about how to store and query it.

**Tag-based architecture.** Every sensor reading in a SCADA system gets a named tag: PUMP_01.PRESSURE, WELL_42.FLOW_RATE, COMP_07.DISCHARGE_TEMP. A single facility may track thousands to hundreds of thousands of tags at once. This is not a metric-per-series model. It's a named-entity model where the tag name encodes the asset hierarchy and measurement type in a single identifier.

**Write frequency.** Typical SCADA systems write at 1-second intervals per tag. High-frequency installations (compressor monitoring, subsea systems) write at sub-second intervals. Oil and gas facilities routinely track 50,000 to 250,000 data points per second across a fleet.

**Long retention requirements.** FERC and PHMSA regulations require a minimum of 7 years of process data retention in US oil and gas. Many operators retain 20 to 30 years for production analytics and compliance audits. This is not a use case for rotating 90-day hot storage windows.

**Mixed data types.** A single tag records numeric readings (pressure, temperature, flow rate), binary status values (valve open/closed), alarm events, and quality flags. The quality code is mandatory for data validity audits in regulated environments.

**Exception reporting.** Many SCADA historians use deadband filtering, writing a new value only when it changes beyond a threshold. This reduces storage volume but introduces time-series gaps that any downstream database needs to handle correctly.

### What is a historian in a SCADA system?

A historian is the software layer in a SCADA system responsible for storing and serving time-stamped tag data from PLCs and RTUs. Products like AVEVA PI (formerly OSIsoft PI), GE Proficy, and Inductive Automation Ignition include historian functionality. For a full breakdown of how historians are built and where they are deployed, see the guide to [<u>what is a data historian</u>](https://www.tigerdata.com/learn/what-is-a-data-historian).

## SCADA Historian Architecture: How It Works and Where It Breaks

Purpose-built historians do several things well. Native OPC-UA and OPC-DA connectivity means a historian plugs directly into PLCs and RTUs without custom integration work. Built-in deadband compression reduces storage volume at the edge before data hits disk. Operator tools (trend views, alarm management, report templates) integrate tightly with the storage layer. These are real advantages built up over decades of industrial deployment.

The break points emerge when operators push historians beyond their original design envelope.

**Tag-based licensing.** AVEVA PI licenses by aggregate tag count, with tiered overage rates as tag counts grow. As IIoT expands monitoring scope (vibration sensors, edge devices, new wells), licensing costs scale with sensor count. At tens of thousands of tags, that math gets expensive fast.

**Analytics limitations.** Proprietary historians use closed query languages (PI AF, PI Data Link) that do not support standard SQL. Running ad-hoc analysis requires exporting data to a separate analytics environment, adding latency, data movement cost, and tooling complexity. The historian was designed as a data store and operator interface, not as an analytical platform.

**No cloud-native path.** Most incumbent historians were designed for on-premises deployment with optional cloud bridges. They were not built for multi-site aggregation across cloud regions, or for feeding ML pipelines and dashboards from a single SQL endpoint. The cloud integration story is typically a separately licensed add-on with significant configuration overhead.

**Vendor lock-in.** Proprietary data formats and query APIs mean that migrating away from a historian requires rewriting all downstream queries and reports. This switching cost is real, and it keeps many operators on legacy systems long past the point where the economics would otherwise favor a change.

| **Historian** | **Query Language** | **Cloud Path** | **Licensing Model** | **Strengths** | **Limitations** |
| --- | --- | --- | --- | --- | --- |
| AVEVA PI System | PI AF / PIQL | PI Cloud Connect (bridge) | Aggregate tag (tiered) | Deep OT integration, industry standard | Closed API, tag-count cost, no SQL |
| GE Proficy (iFIX/CIMPLICITY) | Historian SDK | Limited | Per-tag / server | Broad PLC support | Legacy architecture, complex migration |
| Canary Labs | Canary Views | Canary Cloud | Per-server | Lower cost than PI | Proprietary, no SQL |
| Ignition 8.3 Core Historian (QuestDB) | QuestDB SQL subset | Via Ignition Cloud | Included in Ignition | Native Ignition 8.3+, up to 2M points/sec | Not PostgreSQL; limited managed cloud options |

For a detailed comparison of historians and time-series databases and guidance on when to switch, see the [<u>data historian vs. time-series database</u>](https://www.tigerdata.com/learn/moving-past-legacy-systems-data-historian-vs-time-series-database) decision guide.

---

## Time-Series Database Architecture for SCADA: Schema Design and Performance

The case for a PostgreSQL-native time-series database is not that it replaces OPC-UA or handles edge ingest better than a historian. It does not do those things. The case is that it handles the downstream analytical workload historians were never designed for. And it handles it in standard SQL, with tooling teams already use.

### Schema design for SCADA tag data

The standard schema pattern for SCADA data in a relational time-series database is the narrow model: one row per observation, with tag identity stored as a foreign key rather than duplicated in every row. This scales to high-cardinality environments without the storage and indexing overhead of wide tables.

`-- Asset metadata table
CREATE TABLE assets (
    asset_id    SERIAL PRIMARY KEY,
    asset_name  TEXT NOT NULL,
    location    TEXT,
    asset_type  TEXT
);

-- Tag definitions table
CREATE TABLE tags (
    tag_id      SERIAL PRIMARY KEY,
    asset_id    INTEGER REFERENCES assets(asset_id),
    tag_name    TEXT NOT NULL,
    unit        TEXT,
    description TEXT
);

-- Hypertable for SCADA readings
CREATE TABLE tag_readings (
    time        TIMESTAMPTZ NOT NULL,
    tag_id      INTEGER REFERENCES tags(tag_id),
    value       DOUBLE PRECISION,
    quality     SMALLINT  -- 0 = good, non-zero = quality code
);

SELECT create_hypertable('tag_readings', by_range('time'));

-- Optional: space partition by tag_id for very high-cardinality deployments
SELECT add_dimension('tag_readings', by_hash('tag_id', 4));`

Separating asset and tag metadata from readings enables relational joins without duplicating tag names in every row. Queries like "all pressure readings for compressors in location X above threshold" are standard SQL joins. create_hypertable partitions readings by time automatically, so range queries scan only the relevant chunks. This is what keeps dashboard latency stable at billions of rows. The quality column handles the mixed-status reality of SCADA data without a separate alarm table.

### Compression for long-term retention

TimescaleDB's native compression via Hypercore uses a columnar format for time-series chunks, achieving 90-98% compression ratios in documented deployments. In production: Flogistix by Flowco achieved 84% compression on gas compressor telemetry data deployed on Tiger Data. See [<u>how Flogistix by Flowco reduced infrastructure management costs by 66%</u>](https://www.tigerdata.com/blog/how-flogistix-by-flowco-reduced-infrastructure-management-costs-by-66-with-tiger-data) for the full deployment context. For SCADA environments requiring 7 to 30 years of retention, this compression differential has direct infrastructure cost implications.

### Continuous aggregates for downsampling

SCADA systems generate raw 1-second readings. Dashboards and compliance reports typically need 1-minute, 1-hour, and 1-day summaries. Writing custom aggregation jobs to maintain those rollups is the standard approach. It's also a maintenance burden that grows as the system scales.


TimescaleDB continuous aggregates maintain these rollups automatically in the background, updating only the time buckets that changed since the last refresh. This is what historian downsampling does, but in standard SQL, without a proprietary scheduling system. The ColumnarIndexScan optimization introduced in [<u>TimescaleDB 2.26</u>](https://www.tigerdata.com/blog/timescaledb-2-26) delivers up to 70x speedups on summary queries, directly relevant for historian-style aggregated reads.

### Write throughput in production

WaterBridge ingests 10,000 data points per second using TimescaleDB on commodity infrastructure in a production historian replacement deployment. The [<u>how WaterBridge uses TimescaleDB for real-time data consistency</u>](https://www.tigerdata.com/blog/how-waterbridge-uses-timescaledb-for-real-time-data-consistency) case study covers the architecture. This is a production workload figure, not a synthetic benchmark. The actual ceiling on optimized hardware is higher.

TimescaleDB's architectural advantage is PostgreSQL compatibility, full analytical SQL, continuous aggregates, and managed cloud, not raw ingestion speed. QuestDB's ingestion benchmarks hold up for pure-ingest workloads inside the Ignition native historian module. For broader context on PostgreSQL for industrial workloads, see [<u>PostgreSQL for industrial IoT data</u>](https://www.tigerdata.com/blog/postgresql-for-everything-industrial-iot-data) and [<u>best practices for IIoT energy monitoring applications</u>](https://www.tigerdata.com/blog/best-practices-for-building-iiot-energy-monitoring-applications).

## Connecting Your SCADA System to Tiger Data

The integration path depends on whether the team is already in the Ignition ecosystem or connecting from another SCADA platform.

### Ignition integration: two paths, both endorsed

On April 20, 2026, Tiger Data was named a Gold Technology Provider in the Inductive Automation Ignition Technology Ecosystem. Inductive Automation has formally endorsed Tiger Data as a recommended time-series backend for the Ignition platform. Joint technical content (deployment guides, webinars) is planned for 2026. See the [<u>Inductive Automation and Tiger Data partnership announcement</u>](https://www.tigerdata.com/newsroom/inductive-automation-and-tiger-data-collaborate-to-modernize-the-industrial-historian-market) for details.

Colby Clegg, CEO of Inductive Automation, stated: "Our users run facilities generating millions of data points daily. They need time-series performance at scale, full SQL access, and site-to-enterprise connectivity."

For Ignition users, there are two backend options, both supported by Inductive Automation:

**Path 1. Ignition Core Historian (QuestDB, native Ignition 8.3).** If the primary requirement is highest-throughput raw ingest tightly coupled to Ignition's native historian module, the QuestDB-powered Core Historian in Ignition 8.3 is purpose-built for that workload, capable of up to 2 million data points per second. This is the right choice when ingest performance is the dominant constraint and complex analytical SQL isn't an immediate need.

**Path 2. Tiger Data via Ignition SQL Historian module (Gold-tier option).** If the team needs full PostgreSQL compatibility, complex analytical SQL, continuous aggregates for automatic downsampling, pgvector for AI-driven anomaly detection on historical data, and Tiger Cloud for managed multi-site deployment, TimescaleDB connects via Ignition's SQL Historian module using standard JDBC. Ignition writes tag data directly to a TimescaleDB hypertable. The integration path is standard PostgreSQL connectivity, which Ignition's SQL Historian module already supports.

For the technical implementation walkthrough, see the [<u>Ignition and TimescaleDB: the technical integration guide</u>](https://www.tigerdata.com/blog/ignition-and-timescaledb-perfect-pairing).

### Non-Ignition integration paths

Teams not using Ignition have direct options:

**OPC-UA / MQTT via Telegraf.** Telegraf's OPC-UA input plugin reads tags from PLCs and RTUs and writes to TimescaleDB via the PostgreSQL output plugin. This works for teams using any OPC-UA-compatible SCADA system without Ignition as middleware.

**Direct PostgreSQL JDBC.** Any SCADA system or middleware that can write to a PostgreSQL database can write to TimescaleDB with no proprietary connector required. The connection string is a standard PostgreSQL JDBC connection.

**Grafana.** SCADA dashboards can be built directly on TimescaleDB using Grafana's native PostgreSQL data source. No separate visualization layer or data export pipeline required.

## When to Augment, When to Replace: Migration Patterns

Engineers with existing historians face a real decision. Three patterns cover most situations.

### Pattern 1: Tiger Data as Analytics Layer (Keep Your Historian)

Best fit for teams whose historian handles OT data collection and alarm management well, but who need SQL-accessible historical data for BI reporting, ML pipelines, or cross-site aggregation.

The historian stays as the operational layer. TimescaleDB becomes the analytical layer, reading from the existing historian API (PI AF, Canary SDK) and writing to a TimescaleDB hypertable. No historian data migrates. No OT workflows change. Downstream analytics tools get full SQL access without touching the historian configuration.

### Pattern 2: Parallel Historian (Dual-Write Validation)

Best fit for teams evaluating a full historian replacement but unwilling to cut over without validation.

The SCADA system writes to both the existing historian and Tiger Data for 30 to 90 days. Teams compare query results, validate completeness, and benchmark against their actual workload. Once analytics tools move over, teams decommission the legacy historian.

WaterBridge deployed this architecture for historian replacement in oil and gas water handling, ingesting 10,000 data points per second. See [<u>how WaterBridge uses TimescaleDB for real-time data consistency</u>](https://www.tigerdata.com/blog/how-waterbridge-uses-timescaledb-for-real-time-data-consistency).

### Pattern 3: Full Replacement (Greenfield or End-of-Life)

Best fit for new IIoT deployments, systems approaching historian end-of-life, or operators whose per-tag licensing costs have become prohibitive.

Flogistix by Flowco deployed Tiger Data for gas compressor telemetry monitoring and achieved 66% infrastructure cost reduction, 84% compression, and 99% reliability. See [<u>how Flogistix by Flowco reduced infrastructure management costs by 66%</u>](https://www.tigerdata.com/blog/how-flogistix-by-flowco-reduced-infrastructure-management-costs-by-66-with-tiger-data).

Mechademy migrated from MongoDB to Tiger Data for industrial digital twin infrastructure (predictive analytics across rotating equipment), achieving 87% infrastructure cost reduction and 50x scale improvement. Mechademy is an adjacent use case (hybrid digital twin) rather than a direct historian replacement, but the infrastructure economics apply. See [<u>how Mechademy cut hybrid digital twin infrastructure costs by 87%</u>](https://www.tigerdata.com/blog/how-mechademy-cut-hybrid-digital-twin-infrastructure-costs).

### Choosing the right path

Choose Tiger Data if:

- The team needs full SQL access to historical SCADA data: joins with asset metadata, production schedules, or maintenance records
- Sensor count is scaling and per-tag historian licensing is getting expensive
- A managed cloud path is required for multi-site data aggregation or cloud analytics
- The existing system needs AI/ML capabilities on historical sensor data (pgvector for anomaly detection)
- The team is already in the Ignition ecosystem and wants the Gold-tier full-SQL backend option

Consider the Ignition Core Historian (QuestDB) instead if:

- The primary requirement is maximum-throughput raw ingest tightly integrated with Ignition 8.3's native historian module
- Complex analytical SQL and managed cloud aren't near-term needs

For oil and gas context on [<u>real-time analytics in oil and gas</u>](https://www.tigerdata.com/blog/how-real-time-analytics-in-oil-gas-prevents-millions-in-losses-unlocks-efficiency) and the operational case for modernizing industrial data infrastructure, see that overview.

## FAQ: SCADA Database and Historian Questions

### What is a historian in a SCADA system?

A historian is the software layer that stores and serves time-stamped tag data (pressure, flow, valve states, alarm events) from PLCs and RTUs. AVEVA PI, GE Proficy, and Inductive Automation Ignition all include historian functionality. For a full breakdown, see [<u>what is a data historian</u>](https://www.tigerdata.com/learn/what-is-a-data-historian).

### What's the best database for SCADA data?

For SCADA deployments that need SQL access, cloud connectivity, and long-term analytical capability, a PostgreSQL-native time-series database like TimescaleDB is the strongest fit. For raw ingestion performance tightly coupled to Ignition 8.3's native historian, QuestDB's Core Historian is purpose-built for that workload. The choice depends on whether ingestion speed or analytical flexibility dominates.

### Can PostgreSQL handle SCADA write throughput?

Yes, when configured as a TimescaleDB hypertable. TimescaleDB partitions data by time, enabling sustained high write throughput with predictable query performance. WaterBridge ingests 10,000 data points per second in production on commodity infrastructure.

### What are database design best practices for SCADA instrument data?

The narrow model (asset ID, tag ID, timestamp, value, quality code) scales better than wide tables for high-cardinality SCADA environments. Use hypertable time partitioning and continuous aggregates for 1-minute and 1-hour rollups. Add native compression once data ages past the hot window. [<u>pgvector</u>](https://github.com/pgvector/pgvector) enables anomaly detection on historical sensor data without a separate analytics platform. The schema example in the architecture section above is a concrete starting point.

### How long should SCADA data be retained?

US oil and gas operators retain process data for 7 years minimum (PHMSA and FERC); many keep 20 to 30 years for production analytics. TimescaleDB compression (84% in the Flogistix deployment) makes long-term retention economically viable. Retention policies can automatically expire or downsample older data via continuous aggregates.

### Can I connect Ignition SCADA to Tiger Data?

Yes. Tiger Data is a Gold Technology Provider in the Inductive Automation Ignition Technology Ecosystem (April 2026). The integration uses Ignition's SQL Historian module with standard PostgreSQL/JDBC. Ignition writes tag data directly to a TimescaleDB hypertable. For the technical walkthrough, see [<u>Ignition and TimescaleDB: the Perfect Pairing</u>](https://www.tigerdata.com/blog/ignition-and-timescaledb-perfect-pairing).

### Why use TimescaleDB instead of InfluxDB for SCADA?

PostgreSQL compatibility. InfluxDB 3.x adopted a new SQL model in its 2025 rewrite. It isn't PostgreSQL, and teams on InfluxDB 2.x face migration considerations (Flux deprecated, changed data model). TimescaleDB runs on PostgreSQL, so existing tools, drivers, and dashboards work without modification. PostgreSQL's 30-year stability record matters for environments requiring 10 to 30-year data retention.

### What is the difference between a SCADA database and a time-series database?

In most industrial contexts, "SCADA database" means a historian: software purpose-built to store tag-based OT data. A time-series database (TSDB) is a general-purpose database optimized for time-stamped data. The distinction is narrowing. Ignition 8.3 ships QuestDB (a TSDB) as its native historian, and TimescaleDB (a TSDB on PostgreSQL) replaces or augments traditional historians.

### How much does it cost to store SCADA data in the cloud?

Costs depend on ingestion rate, retention period, compression, and cloud provider. TimescaleDB's 84% compression in the Flogistix deployment substantially reduces storage costs. Flogistix achieved 66% overall infrastructure cost reduction. Tiger Cloud pricing is at [<u>tigerdata.com</u>](https://www.tigerdata.com/pricing); actual costs depend on volume and configuration.

### Can Tiger Data replace AVEVA PI?

TimescaleDB can serve as an analytics layer alongside PI, as a parallel historian for validation, or as a full replacement. The migration patterns section above covers each scenario. The [<u>data historian vs. time-series database</u>](https://www.tigerdata.com/learn/moving-past-legacy-systems-data-historian-vs-time-series-database) guide covers the when-to-switch decision in detail.

### How does Tiger Data handle SCADA data compression?

TimescaleDB's native compression (Hypercore) uses columnar storage for time-series chunks, typically achieving 90-98% compression on repetitive sensor data. Flogistix achieved 84% compression in production on gas compressor telemetry. Continuous aggregates handle downsampling automatically. Raw 1-second readings stay in the hot window while hourly summaries remain queryable indefinitely.

### What's the best database for oil and gas SCADA data and pipeline monitoring?

For oil and gas SCADA workloads requiring long retention, regulatory compliance (PHMSA, FERC), and analytical SQL access, a PostgreSQL-native time-series database is the strongest fit. [<u>Tiger Data customers</u>](https://www.tigerdata.com/case-studies) in this vertical include Flogistix (gas compressor telemetry, 66% cost reduction) and WaterBridge (water handling operations, 10,000 data points per second). For pure Ignition-native ingest at high volume, QuestDB's Core Historian in Ignition 8.3 is purpose-built for that workload.