---
title: "MQTT to PostgreSQL: Architecture, Database Options, and Implementation Paths"
description: "A broker-agnostic guide to connecting MQTT to PostgreSQL: architecture, database selection, Telegraf, HiveMQ, and custom Python paths, with IIoT production examples."
section: "General"
---

> **TimescaleDB is now Tiger Data.**

MQTT is a lightweight, TCP-based messaging protocol designed for low-bandwidth, high-latency, and unreliable networks. This makes it ideal for IoT and Machine-to-Machine communication. Every MQTT deployment needs a separate database, and that choice determines query flexibility, cost at scale, and operational complexity for the life of the system.

This guide covers the decision upstream of implementation: which database fits MQTT workloads, why, and how to connect them. For a broader IoT database comparison, see [<u>The Best Databases for IoT in 2026</u>](https://www.tigerdata.com/learn/how-to-choose-an-iot-database).

A note on bias: Tiger Data makes [<u>Tiger Cloud</u>](https://www.tigerdata.com/cloud), a managed database built on TimescaleDB. We cover the alternatives including the cases where Tiger Cloud is not the right fit.

## What is an MQTT database, and why your choice matters

An MQTT database is a persistent data store that captures messages from an MQTT broker and makes them queryable over time. Because MQTT brokers do not persist messages, every production MQTT deployment requires a database layer to store, retain, and analyze the telemetry data flowing through the broker. Choosing the right MQTT broker/database combination is the central architectural decision for any IoT pipeline.

MQTT (Message Queuing Telemetry Transport) is a lightweight pub/sub protocol developed by IBM in the 1990s and now the de facto standard for IoT telemetry. Devices publish messages on named topics; a broker (Mosquitto, HiveMQ, EMQX, AWS IoT Core) routes them to subscribers. The broker does not persist data.

Three QoS levels govern delivery guarantees: at-most-once (QoS 0, fire and forget), at-least-once (QoS 1, possible duplicates), and exactly-once (QoS 2, highest overhead). The database pipeline handles whichever QoS the broker uses.

Ten devices writing temperature readings once per second is a trivially small workload. Ten thousand devices writing multiple metrics at 1 Hz produce roughly 30 million rows per hour. At that scale, a database without time-series partitioning slows down, and the wrong choice is expensive to undo once sensor data accumulates.

See [<u>Storing IoT Data: 8 Reasons Why You Should Use PostgreSQL</u>](https://www.tigerdata.com/learn/storing-iot-data-8-reasons-why-you-should-use-postgresql) for the foundational case. For the IIoT context, see [<u>A Beginner's Guide to IIoT and Industry 4.0</u>](https://www.tigerdata.com/learn/a-beginners-guide-to-iiot-and-industry-4-0).

## How MQTT data reaches a database: the three-layer architecture

Most MQTT pipeline problems come from confusion about which layer owns which responsibility. The architecture has three layers.

**Device layer.** Sensors, PLCs, and edge gateways publish messages on MQTT topics. Payloads are JSON or binary. The device has no knowledge of the downstream database.

**Broker layer.** The broker receives messages, routes them to subscribers, and provides QoS guarantees. It does not store data beyond in-flight queuing. Common options: Mosquitto (self-hosted), HiveMQ (enterprise), EMQX (high-throughput with rule engine), AWS IoT Core (fully managed).

**Storage layer.** A subscriber process, connector, or agent receives messages from the broker and writes them to the database. This is where most implementation decisions are made.

The storage layer can be replaced without changing the device layer or the broker, a point broker vendors tend to obscure since their product is the middleware they're selling.

| **Bridge pattern** | **How it works** | **Best for** |
| --- | --- | --- |
| Telegraf MQTT consumer | Telegraf subscribes to broker topics; `outputs.postgresql` writes to Tiger Cloud | Teams that want no custom code; most common production pattern |
| Broker native connector | HiveMQ PostgreSQL extension; EMQX rule engine and data bridge | Teams already running enterprise HiveMQ or EMQX |
| Custom Python/Node client | paho-mqtt subscriber writes batched inserts to PostgreSQL | Teams needing full control over payload parsing and error handling |
| Cloud-managed pipeline | AWS IoT Core to Lambda to Tiger Cloud | Teams on AWS who want fully managed ingestion |

## Database options for MQTT data: a comparison

### Vanilla PostgreSQL

Standard PostgreSQL handles MQTT data well at small scale. SQL JOINs let you combine sensor readings with device metadata, location tables, and maintenance records. Grafana, dbt, pgAdmin, and the rest of the PostgreSQL ecosystem work without modification.

The limitation is time-series scale. Without time-series partitioning, query performance degrades on tables with hundreds of millions of rows. INSERT throughput without connection pooling and batching becomes a bottleneck at high device counts. Retention and archival require manual management.

Choose vanilla PostgreSQL if you have fewer than 100 devices, ingestion rate below 1,000 messages per second, no long-term retention requirement, and you want the simplest possible setup with no additional tooling.

### TimescaleDB / Tiger Cloud (recommended for scale)

[<u>TimescaleDB</u>](https://www.tigerdata.com/timescaledb) is an open-source extension to PostgreSQL that adds time-series-native partitioning ([<u>hypertables</u>](https://www.tigerdata.com/docs/learn/hypertables/understand-hypertables)), columnar compression ([<u>Hypercore</u>](https://www.tigerdata.com/docs/learn/columnar-storage/understand-hypercore)), and [<u>continuous aggregates</u>](https://www.tigerdata.com/docs/learn/continuous-aggregates). Tiger Cloud is the managed service. The full SQL interface stays intact: every PostgreSQL tool, driver, and query works without modification.

Hypertables auto-partition data by time, keeping queries bounded as data grows into billions of rows. Hypercore compresses historical chunks with 90-98% storage reduction in typical deployments. Continuous aggregates maintain pre-computed hourly and daily rollups that update automatically as new data arrives.

Choose Tiger Cloud if you have more than 100 devices, ingestion above 1,000 messages per second, need long-term retention with compression, need to JOIN sensor data with relational metadata, or want a managed service with no infrastructure operations.

For a comparison across managed services, see [<u>Best Managed Time-Series Databases in 2026</u>](https://www.tigerdata.com/learn/best-managed-time-series-databases-in-2026).

### InfluxDB

InfluxDB is a purpose-built time-series database with its own line protocol and Flux/InfluxQL query languages. It handles homogeneous, high-cardinality sensor streams efficiently when all reads are time-range queries on a single measurement type.

Two limitations matter for MQTT workloads. First, InfluxDB has no SQL JOINs. Combining sensor data with device metadata or maintenance history requires application-layer logic or a second relational database. Second, the version sprawl creates long-term maintenance risk. Teams who built pipelines on InfluxDB 1.x have found migration to 2.x or 3.0 disruptive, with API breaks documented across r/influxdb threads and the project's own GitHub discussions, and the Cloud Serverless and Cloud Dedicated variants have different APIs again. For homogeneous sensor streams where queries never cross measurement boundaries and the team is already committed to the InfluxDB 3.x stack, InfluxDB is a reasonable fit.

### SQLite

SQLite is appropriate for edge devices and local prototyping. It is not a candidate for production multi-device MQTT ingestion; write concurrency, network access, and scale are all limited by design.

### NoSQL (MongoDB, Cassandra)

MongoDB's JSONB flexibility helps with heterogeneous payloads, but time-range queries and joins require application-layer logic. Cassandra offers high write throughput with poor analytical query support and significant operational complexity. Both suit schema-flexible use cases with very high write volume and no analytical requirements. For most IIoT deployments, they are not the right fit.

### Database selection framework

| **Criterion** | **Vanilla PostgreSQL** | **Tiger Cloud (TimescaleDB)** | **InfluxDB** |
| --- | --- | --- | --- |
| Device count | Fewer than 100 | 100 to millions | 100 to millions |
| Ingestion rate | Fewer than 1,000 msg/sec | 1,000+ msg/sec | 1,000+ msg/sec |
| SQL JOINs with metadata | Yes | Yes | No |
| Long-term retention and compression | Manual | Automatic (Hypercore) | Yes |
| Managed service | No (self-hosted) | Yes (Tiger Cloud) | Yes (Cloud Serverless/Dedicated) |
| Query tooling | Standard SQL | Standard SQL | Flux / InfluxQL / Non-native SQL via Apache Arrow |
| Version stability | Stable | Stable | Fragmented (1.x / 2.x / 3.0) |

## Why Tiger Cloud handles MQTT workloads at industrial scale

MQTT sensor streams have a predictable shape: time-ordered rows, one per device per reading, arriving continuously. TimescaleDB is designed for exactly this pattern.

**Hypertable auto-partitioning.** Tiger Cloud automatically creates time-based chunks (default 7-day intervals). Queries that filter by time range scan only the relevant chunks, keeping latency stable at sub-second on billion-row tables.

**Hypercore compression.** Sensor readings from the same device at consecutive timestamps are highly similar, which makes them compressible. Hypercore converts older chunks from row to columnar storage, achieving 90-98% compression in typical deployments. Less storage means lower cloud costs and more history online at the same budget.

**Continuous aggregates.** Production dashboards query aggregates, not raw rows. Continuous aggregates maintain pre-computed hourly and daily rollups that update automatically as new data arrives, so a query over 30 days runs in milliseconds.

**Standard SQL tooling.** Grafana connects via the native PostgreSQL data source. dbt models work without modification. There is no proprietary query language to learn.

**JOINs across data types.** A complete IIoT picture requires joining sensor readings with device registry, maintenance history, and ownership tables. That is standard SQL in Tiger Cloud. It requires application-layer logic in InfluxDB.

## Three ways to connect MQTT to PostgreSQL

The below are decision paths, not ranked recommendations. The right choice depends on your broker, team, and payload structure.

### Path A: Telegraf (no custom code, most teams start here)

Telegraf is an open-source metrics collection agent. Despite being maintained by InfluxData, it has a first-class PostgreSQL output plugin and is the most widely deployed MQTT-to-PostgreSQL bridge in production.

The architecture: Telegraf subscribes to the broker using the `inputs.mqtt_consumer` plugin, parses incoming messages, and writes batched rows to a Tiger Cloud hypertable using the `outputs.postgresql` plugin. No custom code. Built-in error handling, retry logic, and configurable batch size.

The limitation: Telegraf requires careful schema alignment between MQTT topic structure and table columns. Teams with highly heterogeneous payloads may need the custom client path. For configuration details, see the [<u>Tiger Data documentation</u>](https://www.tigerdata.com/docs/integrate/observability-alerting/telegraf).

### Path B: HiveMQ Enterprise PostgreSQL extension

HiveMQ offers a [<u>native PostgreSQL extension</u>](https://www.hivemq.com/products/extensions/postgresql-extension/) (part of the HiveMQ Enterprise Extensions suite) that writes MQTT messages directly to a PostgreSQL-compatible database including Tiger Cloud without a separate bridge process.

Architecture: HiveMQ broker routes messages through the extension directly to a Tiger Cloud hypertable with no additional agent. Best for teams already running HiveMQ Enterprise who want minimal moving parts. For a practitioner's perspective, see [<u>HiveMQ and TimescaleDB: It Just Works</u>](https://www.tigerdata.com/blog/hivemq-and-timescaledb-it-just-works).

Limitation: requires a HiveMQ Enterprise license. Not available for Mosquitto or EMQX users.

### Path C: Custom Python client (maximum control)

For teams with complex payload schemas, custom error recovery requirements, or non-standard MQTT topic structures.

Architecture: a paho-mqtt subscriber processes messages in a callback, buffers them in a thread-safe queue, and a background thread writes batches to Tiger Cloud via the PostgreSQL COPY protocol.

Choose this path when payloads are structurally heterogeneous, you need custom deduplication, or you need business logic before writing. The trade-off is maintaining custom application code.

For the full implementation guide including batching strategy, deduplication with `ON CONFLICT`, and production edge cases, see [<u>From MQTT to SQL: A Practical Guide to Sensor Data Ingestion</u>](https://www.tigerdata.com/blog/mqtt-sql-practical-guide-sensor-data-ingestion).

### Path D: AWS IoT Core (cloud-managed pipeline)

For teams on AWS: AWS IoT Core receives MQTT messages and routes them via rules to Lambda, Kinesis, or directly to databases.

The architecture: MQTT device to AWS IoT Core to Lambda function to Tiger Cloud via the PostgreSQL driver.

Note: AWS Timestream LiveAnalytics was deprecated and closed to new customers in June 2025. Teams migrating from Timestream should treat AWS IoT Core as the retained ingestion layer and Tiger Cloud as the storage replacement. For the full implementation walkthrough, see [<u>Building IoT Pipelines for Faster Analytics With IoT Core</u>](https://www.tigerdata.com/blog/do-more-with-timescale-on-aws-iot-core).

## Schema design for MQTT payloads

How you model MQTT data determines query performance and maintainability. The core tension: MQTT devices often publish different payloads from the same broker. A compressor, a flow meter, and a temperature sensor all share a broker but have different fields.

**Narrow schema (one row per measurement):** Columns: `ts TIMESTAMPTZ, device_id TEXT, metric_name TEXT, value FLOAT8`

Pros: handles any new metric type without schema migration; simple to insert. Cons: requires filtering on `metric_name` in every query; high row count; aggregations over multiple metrics are slower.

**Wide schema (one row per device reading, all fields as columns):** Columns: `ts TIMESTAMPTZ, device_id TEXT, temperature FLOAT8, pressure FLOAT8, flow_rate FLOAT8`

Pros: faster queries; natural fit for Grafana time-series panels; easier to read. Cons: requires schema migration to add new fields; sparse rows when devices report different subsets.

**JSONB hybrid:** store the raw payload in a JSONB column alongside normalized columns. Useful during development when the schema is still evolving, but slower for production queries than fully normalized columns.

**Timestamps:** always use `TIMESTAMPTZ`. For hypertables, ts is the required partition dimension. Add `device_id` as a secondary space dimension when cardinality exceeds 10,000 devices to avoid hot chunks.

For the full narrow-vs-wide treatment, see [<u>wide vs. narrow schema design for PostgreSQL tables</u>](https://www.tigerdata.com/learn/designing-your-database-schema-wide-vs-narrow-postgres-tables).

## Production considerations

**QoS levels and write guarantees.** QoS 0 (at most once) treats writes as best-effort. QoS 1 (at least once) delivers messages but may duplicate them; the ingestion pipeline must handle duplicates via `ON CONFLICT DO NOTHING` or a unique constraint on `(ts, device_id, metric_name)`. QoS 2 (exactly once) eliminates duplicates but adds broker round-trip overhead.

**Late-arriving messages.** Tiger Cloud hypertables accept out-of-order inserts. Rows land in the correct time chunk based on the message timestamp; no special handling is required unless the delay exceeds the chunk interval (default 7 days).

**Retention and compression.** Configure retention policies to drop chunks after a defined period. A common production setup: compress chunks older than 7 days, drop raw chunks after 90 days, retain aggregated data for 2 years via continuous aggregates.

**Monitoring.** Watch buffer queue depth, flush latency, and rejected rows (duplicates or schema mismatches).

For the broader context of industrial data management and historian migration, see [<u>Data Historian vs. Time-Series Database</u>](https://www.tigerdata.com/learn/moving-past-legacy-systems-data-historian-vs-time-series-database) and [<u>SCADA Data Management at Scale</u>](https://www.tigerdata.com/learn/scada-data-management-at-scale-architecture-historians-and-the-modern-database).

## IIoT use cases: MQTT to PostgreSQL in production

The MQTT to PostgreSQL pattern is in production across [<u>oil and gas telemetry</u>](https://www.tigerdata.com/oil-and-gas), [<u>manufacturing telemetry</u>](https://www.tigerdata.com/manufacturing), and [<u>energy telemetry</u>](https://www.tigerdata.com/energy-telemetry) at industrial scale. Here are two confirmed Tiger Data deployments.

### Oil and gas: gas compressor telemetry (Flogistix)

Flogistix by Flowco collects continuous telemetry from gas compression equipment across oil and gas fields, monitoring pressure, temperature, flow rate, and operational state. Field equipment publishes sensor readings to an edge MQTT broker, Telegraf collects and batches the data to Tiger Cloud, and operations teams query real-time anomaly detection dashboards in Grafana.

After consolidating on Tiger Data, Flogistix achieved a 66% reduction in infrastructure management costs and 84% data compression on stored telemetry. For the full story, see [<u>How Flogistix by Flowco Reduced Infrastructure Costs by 66%</u>](https://www.tigerdata.com/blog/how-flogistix-by-flowco-reduced-infrastructure-management-costs-by-66-with-tiger-data).

### Industrial asset management: diagnostic telemetry (Mechademy)

Mechademy provides hybrid digital twin infrastructure for oil, gas, and energy companies, including assets representing 6% of world LNG production. Tiger Data processes 10 million diagnostic tests every 30 minutes for Mechademy. MQTT is the protocol layer for sensor data flowing from monitored assets into the diagnostic engine. The relational model, joining sensor readings with asset registry and maintenance history, is a key reason Mechademy chose TimescaleDB over MongoDB, achieving an 87% infrastructure cost reduction. For the full story, see [<u>How Mechademy Cut Hybrid Digital Twin Infrastructure Costs</u>](https://www.tigerdata.com/blog/how-mechademy-cut-hybrid-digital-twin-infrastructure-costs).

### Smart buildings: HVAC and energy monitoring

Building management systems use MQTT to stream occupancy, HVAC state, and energy consumption from distributed sensors. Tiger Cloud's continuous aggregates handle hourly rollups, time-range queries support peak demand analysis, and JOINs connect sensor readings to floor-plan and zone metadata. The same architecture that handles oil field compressor telemetry scales to building sensor networks.

## FAQ: MQTT to PostgreSQL

### Do I need a time-series database for MQTT data, or will regular PostgreSQL work?

Regular PostgreSQL works at small scale (fewer than 100 devices, fewer than 1,000 messages per second). At industrial scale, without time-series partitioning, query performance degrades on tables with hundreds of millions of rows and INSERT throughput becomes a bottleneck.

TimescaleDB adds hypertables (automatic time-based partitioning) on top of PostgreSQL with no change to SQL syntax. For most MQTT deployments beyond prototype stage, the TimescaleDB extension is the right choice.

### Can I use Telegraf to connect MQTT to PostgreSQL without writing Python?

Yes. Telegraf's `inputs.mqtt_consumer` plugin subscribes to MQTT topics and its `outputs.postgresql` plugin writes to any PostgreSQL-compatible database including Tiger Cloud. No custom code required. This is the most common production pattern for teams that want a low-maintenance bridge. For configuration details, see the [<u>Tiger Data documentation</u>](https://www.tigerdata.com/docs/integrate/observability-alerting/telegraf).

### Should I use InfluxDB or TimescaleDB for MQTT data?

Both handle high-ingest time-series data. TimescaleDB uses standard SQL, so Grafana, dbt, and BI tools work without additional drivers; InfluxDB uses Flux or InfluxQL (InfluxDB 3 uses non-native SQL via Apache Arrow). TimescaleDB supports JOINs for correlating sensor data with device metadata and maintenance records; InfluxDB does not. The InfluxDB version sprawl (1.x, 2.x, 3.0, Cloud Serverless, Cloud Dedicated) also introduces migration risk for long-running pipelines.

### What happens to my MQTT data if the broker goes down?

MQTT QoS 1 and 2 provide delivery guarantees that survive broker restarts. Messages published during an outage are queued on the publishing device (if the client supports persistent sessions) and delivered when the connection restores. The database is unaffected. Configure the bridge process to buffer messages during database unavailability and retry on reconnection.

### What is EMQX Tables, and does it replace a separate database?

EMQX Tables is a native time-series storage layer introduced by EMQX in 2025. For simple dashboarding of recent data it may be sufficient. It does not replace a general-purpose database for workloads that require SQL JOINs with device metadata, long-term retention with compression, BI tool integration, or data ownership and compliance guarantees.

### What's the difference between storing MQTT data in TimescaleDB vs. vanilla PostgreSQL?

Both store data in standard PostgreSQL tables. The difference is operational: TimescaleDB adds automatic time-based partitioning (hypertables), Hypercore compression for historical data, and continuous aggregates that maintain pre-computed rollups. Vanilla PostgreSQL requires manual partitioning setup, custom archival scripts, and expensive re-computation on every dashboard load. TimescaleDB is a PostgreSQL extension and does not change the query interface or driver compatibility.

### I was using AWS Timestream for MQTT data. What should I migrate to?

AWS Timestream LiveAnalytics was deprecated and closed to new customers in June 2025. The MQTT ingestion layer (AWS IoT Core) remains fully supported and does not need to change. Replace the storage layer with Tiger Cloud (TimescaleDB-powered managed PostgreSQL). See [<u>Building IoT Pipelines for Faster Analytics With IoT Core</u>](https://www.tigerdata.com/blog/do-more-with-timescale-on-aws-iot-core) for the implementation walkthrough.

### How does MQTT QoS interact with database write guarantees?

QoS is a broker-level guarantee, not a database guarantee. QoS 1 (at least once) can produce duplicates at the subscriber. The database bridge must handle them via `ON CONFLICT DO NOTHING` or a unique constraint on `(ts, device_id)`. QoS 2 (exactly once) eliminates broker-level duplicates but adds round-trip overhead. Most production pipelines use QoS 1 with database-side deduplication.