
By Tiger Data Team
Updated at May 8, 2026
MQTT is a lightweight, TCP-based messaging protocol designed for low-bandwidth, high-latency, and unreliable networks. This makes it ideal for IoT and Machine-to-Machine communication. Every MQTT deployment needs a separate database, and that choice determines query flexibility, cost at scale, and operational complexity for the life of the system.
This guide covers the decision upstream of implementation: which database fits MQTT workloads, why, and how to connect them. For a broader IoT database comparison, see The Best Databases for IoT in 2026.
A note on bias: Tiger Data makes Tiger Cloud, a managed database built on TimescaleDB. We cover the alternatives including the cases where Tiger Cloud is not the right fit.
An MQTT database is a persistent data store that captures messages from an MQTT broker and makes them queryable over time. Because MQTT brokers do not persist messages, every production MQTT deployment requires a database layer to store, retain, and analyze the telemetry data flowing through the broker. Choosing the right MQTT broker/database combination is the central architectural decision for any IoT pipeline.
MQTT (Message Queuing Telemetry Transport) is a lightweight pub/sub protocol developed by IBM in the 1990s and now the de facto standard for IoT telemetry. Devices publish messages on named topics; a broker (Mosquitto, HiveMQ, EMQX, AWS IoT Core) routes them to subscribers. The broker does not persist data.
Three QoS levels govern delivery guarantees: at-most-once (QoS 0, fire and forget), at-least-once (QoS 1, possible duplicates), and exactly-once (QoS 2, highest overhead). The database pipeline handles whichever QoS the broker uses.
Ten devices writing temperature readings once per second is a trivially small workload. Ten thousand devices writing multiple metrics at 1 Hz produce roughly 30 million rows per hour. At that scale, a database without time-series partitioning slows down, and the wrong choice is expensive to undo once sensor data accumulates.
See Storing IoT Data: 8 Reasons Why You Should Use PostgreSQL for the foundational case. For the IIoT context, see A Beginner's Guide to IIoT and Industry 4.0.
Most MQTT pipeline problems come from confusion about which layer owns which responsibility. The architecture has three layers.
Device layer. Sensors, PLCs, and edge gateways publish messages on MQTT topics. Payloads are JSON or binary. The device has no knowledge of the downstream database.
Broker layer. The broker receives messages, routes them to subscribers, and provides QoS guarantees. It does not store data beyond in-flight queuing. Common options: Mosquitto (self-hosted), HiveMQ (enterprise), EMQX (high-throughput with rule engine), AWS IoT Core (fully managed).
Storage layer. A subscriber process, connector, or agent receives messages from the broker and writes them to the database. This is where most implementation decisions are made.
The storage layer can be replaced without changing the device layer or the broker, a point broker vendors tend to obscure since their product is the middleware they're selling.
Bridge pattern | How it works | Best for |
Telegraf MQTT consumer | Telegraf subscribes to broker topics; | Teams that want no custom code; most common production pattern |
Broker native connector | HiveMQ PostgreSQL extension; EMQX rule engine and data bridge | Teams already running enterprise HiveMQ or EMQX |
Custom Python/Node client | paho-mqtt subscriber writes batched inserts to PostgreSQL | Teams needing full control over payload parsing and error handling |
Cloud-managed pipeline | AWS IoT Core to Lambda to Tiger Cloud | Teams on AWS who want fully managed ingestion |
Standard PostgreSQL handles MQTT data well at small scale. SQL JOINs let you combine sensor readings with device metadata, location tables, and maintenance records. Grafana, dbt, pgAdmin, and the rest of the PostgreSQL ecosystem work without modification.
The limitation is time-series scale. Without time-series partitioning, query performance degrades on tables with hundreds of millions of rows. INSERT throughput without connection pooling and batching becomes a bottleneck at high device counts. Retention and archival require manual management.
Choose vanilla PostgreSQL if you have fewer than 100 devices, ingestion rate below 1,000 messages per second, no long-term retention requirement, and you want the simplest possible setup with no additional tooling.
TimescaleDB is an open-source extension to PostgreSQL that adds time-series-native partitioning (hypertables), columnar compression (Hypercore), and continuous aggregates. Tiger Cloud is the managed service. The full SQL interface stays intact: every PostgreSQL tool, driver, and query works without modification.
Hypertables auto-partition data by time, keeping queries bounded as data grows into billions of rows. Hypercore compresses historical chunks with 90-98% storage reduction in typical deployments. Continuous aggregates maintain pre-computed hourly and daily rollups that update automatically as new data arrives.
Choose Tiger Cloud if you have more than 100 devices, ingestion above 1,000 messages per second, need long-term retention with compression, need to JOIN sensor data with relational metadata, or want a managed service with no infrastructure operations.
For a comparison across managed services, see Best Managed Time-Series Databases in 2026.
InfluxDB is a purpose-built time-series database with its own line protocol and Flux/InfluxQL query languages. It handles homogeneous, high-cardinality sensor streams efficiently when all reads are time-range queries on a single measurement type.
Two limitations matter for MQTT workloads. First, InfluxDB has no SQL JOINs. Combining sensor data with device metadata or maintenance history requires application-layer logic or a second relational database. Second, the version sprawl creates long-term maintenance risk. Teams who built pipelines on InfluxDB 1.x have found migration to 2.x or 3.0 disruptive, with API breaks documented across r/influxdb threads and the project's own GitHub discussions, and the Cloud Serverless and Cloud Dedicated variants have different APIs again. For homogeneous sensor streams where queries never cross measurement boundaries and the team is already committed to the InfluxDB 3.x stack, InfluxDB is a reasonable fit.
SQLite is appropriate for edge devices and local prototyping. It is not a candidate for production multi-device MQTT ingestion; write concurrency, network access, and scale are all limited by design.
MongoDB's JSONB flexibility helps with heterogeneous payloads, but time-range queries and joins require application-layer logic. Cassandra offers high write throughput with poor analytical query support and significant operational complexity. Both suit schema-flexible use cases with very high write volume and no analytical requirements. For most IIoT deployments, they are not the right fit.
Criterion | Vanilla PostgreSQL | Tiger Cloud (TimescaleDB) | InfluxDB |
Device count | Fewer than 100 | 100 to millions | 100 to millions |
Ingestion rate | Fewer than 1,000 msg/sec | 1,000+ msg/sec | 1,000+ msg/sec |
SQL JOINs with metadata | Yes | Yes | No |
Long-term retention and compression | Manual | Automatic (Hypercore) | Yes |
Managed service | No (self-hosted) | Yes (Tiger Cloud) | Yes (Cloud Serverless/Dedicated) |
Query tooling | Standard SQL | Standard SQL | Flux / InfluxQL / Non-native SQL via Apache Arrow |
Version stability | Stable | Stable | Fragmented (1.x / 2.x / 3.0) |
MQTT sensor streams have a predictable shape: time-ordered rows, one per device per reading, arriving continuously. TimescaleDB is designed for exactly this pattern.
Hypertable auto-partitioning. Tiger Cloud automatically creates time-based chunks (default 7-day intervals). Queries that filter by time range scan only the relevant chunks, keeping latency stable at sub-second on billion-row tables.
Hypercore compression. Sensor readings from the same device at consecutive timestamps are highly similar, which makes them compressible. Hypercore converts older chunks from row to columnar storage, achieving 90-98% compression in typical deployments. Less storage means lower cloud costs and more history online at the same budget.
Continuous aggregates. Production dashboards query aggregates, not raw rows. Continuous aggregates maintain pre-computed hourly and daily rollups that update automatically as new data arrives, so a query over 30 days runs in milliseconds.
Standard SQL tooling. Grafana connects via the native PostgreSQL data source. dbt models work without modification. There is no proprietary query language to learn.
JOINs across data types. A complete IIoT picture requires joining sensor readings with device registry, maintenance history, and ownership tables. That is standard SQL in Tiger Cloud. It requires application-layer logic in InfluxDB.
The below are decision paths, not ranked recommendations. The right choice depends on your broker, team, and payload structure.
Telegraf is an open-source metrics collection agent. Despite being maintained by InfluxData, it has a first-class PostgreSQL output plugin and is the most widely deployed MQTT-to-PostgreSQL bridge in production.
The architecture: Telegraf subscribes to the broker using the inputs.mqtt_consumer plugin, parses incoming messages, and writes batched rows to a Tiger Cloud hypertable using the outputs.postgresql plugin. No custom code. Built-in error handling, retry logic, and configurable batch size.
The limitation: Telegraf requires careful schema alignment between MQTT topic structure and table columns. Teams with highly heterogeneous payloads may need the custom client path. For configuration details, see the Tiger Data documentation.
HiveMQ offers a native PostgreSQL extension (part of the HiveMQ Enterprise Extensions suite) that writes MQTT messages directly to a PostgreSQL-compatible database including Tiger Cloud without a separate bridge process.
Architecture: HiveMQ broker routes messages through the extension directly to a Tiger Cloud hypertable with no additional agent. Best for teams already running HiveMQ Enterprise who want minimal moving parts. For a practitioner's perspective, see HiveMQ and TimescaleDB: It Just Works.
Limitation: requires a HiveMQ Enterprise license. Not available for Mosquitto or EMQX users.
For teams with complex payload schemas, custom error recovery requirements, or non-standard MQTT topic structures.
Architecture: a paho-mqtt subscriber processes messages in a callback, buffers them in a thread-safe queue, and a background thread writes batches to Tiger Cloud via the PostgreSQL COPY protocol.
Choose this path when payloads are structurally heterogeneous, you need custom deduplication, or you need business logic before writing. The trade-off is maintaining custom application code.
For the full implementation guide including batching strategy, deduplication with ON CONFLICT, and production edge cases, see From MQTT to SQL: A Practical Guide to Sensor Data Ingestion.
For teams on AWS: AWS IoT Core receives MQTT messages and routes them via rules to Lambda, Kinesis, or directly to databases.
The architecture: MQTT device to AWS IoT Core to Lambda function to Tiger Cloud via the PostgreSQL driver.
Note: AWS Timestream LiveAnalytics was deprecated and closed to new customers in June 2025. Teams migrating from Timestream should treat AWS IoT Core as the retained ingestion layer and Tiger Cloud as the storage replacement. For the full implementation walkthrough, see Building IoT Pipelines for Faster Analytics With IoT Core.
How you model MQTT data determines query performance and maintainability. The core tension: MQTT devices often publish different payloads from the same broker. A compressor, a flow meter, and a temperature sensor all share a broker but have different fields.
Narrow schema (one row per measurement): Columns: ts TIMESTAMPTZ, device_id TEXT, metric_name TEXT, value FLOAT8
Pros: handles any new metric type without schema migration; simple to insert. Cons: requires filtering on metric_name in every query; high row count; aggregations over multiple metrics are slower.
Wide schema (one row per device reading, all fields as columns): Columns: ts TIMESTAMPTZ, device_id TEXT, temperature FLOAT8, pressure FLOAT8, flow_rate FLOAT8
Pros: faster queries; natural fit for Grafana time-series panels; easier to read. Cons: requires schema migration to add new fields; sparse rows when devices report different subsets.
JSONB hybrid: store the raw payload in a JSONB column alongside normalized columns. Useful during development when the schema is still evolving, but slower for production queries than fully normalized columns.
Timestamps: always use TIMESTAMPTZ. For hypertables, ts is the required partition dimension. Add device_id as a secondary space dimension when cardinality exceeds 10,000 devices to avoid hot chunks.
For the full narrow-vs-wide treatment, see wide vs. narrow schema design for PostgreSQL tables.
QoS levels and write guarantees. QoS 0 (at most once) treats writes as best-effort. QoS 1 (at least once) delivers messages but may duplicate them; the ingestion pipeline must handle duplicates via ON CONFLICT DO NOTHING or a unique constraint on (ts, device_id, metric_name). QoS 2 (exactly once) eliminates duplicates but adds broker round-trip overhead.
Late-arriving messages. Tiger Cloud hypertables accept out-of-order inserts. Rows land in the correct time chunk based on the message timestamp; no special handling is required unless the delay exceeds the chunk interval (default 7 days).
Retention and compression. Configure retention policies to drop chunks after a defined period. A common production setup: compress chunks older than 7 days, drop raw chunks after 90 days, retain aggregated data for 2 years via continuous aggregates.
Monitoring. Watch buffer queue depth, flush latency, and rejected rows (duplicates or schema mismatches).
For the broader context of industrial data management and historian migration, see Data Historian vs. Time-Series Database and SCADA Data Management at Scale.
The MQTT to PostgreSQL pattern is in production across oil and gas telemetry, manufacturing telemetry, and energy telemetry at industrial scale. Here are two confirmed Tiger Data deployments.
Flogistix by Flowco collects continuous telemetry from gas compression equipment across oil and gas fields, monitoring pressure, temperature, flow rate, and operational state. Field equipment publishes sensor readings to an edge MQTT broker, Telegraf collects and batches the data to Tiger Cloud, and operations teams query real-time anomaly detection dashboards in Grafana.
After consolidating on Tiger Data, Flogistix achieved a 66% reduction in infrastructure management costs and 84% data compression on stored telemetry. For the full story, see How Flogistix by Flowco Reduced Infrastructure Costs by 66%.
Mechademy provides hybrid digital twin infrastructure for oil, gas, and energy companies, including assets representing 6% of world LNG production. Tiger Data processes 10 million diagnostic tests every 30 minutes for Mechademy. MQTT is the protocol layer for sensor data flowing from monitored assets into the diagnostic engine. The relational model, joining sensor readings with asset registry and maintenance history, is a key reason Mechademy chose TimescaleDB over MongoDB, achieving an 87% infrastructure cost reduction. For the full story, see How Mechademy Cut Hybrid Digital Twin Infrastructure Costs.
Building management systems use MQTT to stream occupancy, HVAC state, and energy consumption from distributed sensors. Tiger Cloud's continuous aggregates handle hourly rollups, time-range queries support peak demand analysis, and JOINs connect sensor readings to floor-plan and zone metadata. The same architecture that handles oil field compressor telemetry scales to building sensor networks.
Regular PostgreSQL works at small scale (fewer than 100 devices, fewer than 1,000 messages per second). At industrial scale, without time-series partitioning, query performance degrades on tables with hundreds of millions of rows and INSERT throughput becomes a bottleneck.
TimescaleDB adds hypertables (automatic time-based partitioning) on top of PostgreSQL with no change to SQL syntax. For most MQTT deployments beyond prototype stage, the TimescaleDB extension is the right choice.
Yes. Telegraf's inputs.mqtt_consumer plugin subscribes to MQTT topics and its outputs.postgresql plugin writes to any PostgreSQL-compatible database including Tiger Cloud. No custom code required. This is the most common production pattern for teams that want a low-maintenance bridge. For configuration details, see the Tiger Data documentation.
Both handle high-ingest time-series data. TimescaleDB uses standard SQL, so Grafana, dbt, and BI tools work without additional drivers; InfluxDB uses Flux or InfluxQL (InfluxDB 3 uses non-native SQL via Apache Arrow). TimescaleDB supports JOINs for correlating sensor data with device metadata and maintenance records; InfluxDB does not. The InfluxDB version sprawl (1.x, 2.x, 3.0, Cloud Serverless, Cloud Dedicated) also introduces migration risk for long-running pipelines.
MQTT QoS 1 and 2 provide delivery guarantees that survive broker restarts. Messages published during an outage are queued on the publishing device (if the client supports persistent sessions) and delivered when the connection restores. The database is unaffected. Configure the bridge process to buffer messages during database unavailability and retry on reconnection.
EMQX Tables is a native time-series storage layer introduced by EMQX in 2025. For simple dashboarding of recent data it may be sufficient. It does not replace a general-purpose database for workloads that require SQL JOINs with device metadata, long-term retention with compression, BI tool integration, or data ownership and compliance guarantees.
Both store data in standard PostgreSQL tables. The difference is operational: TimescaleDB adds automatic time-based partitioning (hypertables), Hypercore compression for historical data, and continuous aggregates that maintain pre-computed rollups. Vanilla PostgreSQL requires manual partitioning setup, custom archival scripts, and expensive re-computation on every dashboard load. TimescaleDB is a PostgreSQL extension and does not change the query interface or driver compatibility.
AWS Timestream LiveAnalytics was deprecated and closed to new customers in June 2025. The MQTT ingestion layer (AWS IoT Core) remains fully supported and does not need to change. Replace the storage layer with Tiger Cloud (TimescaleDB-powered managed PostgreSQL). See Building IoT Pipelines for Faster Analytics With IoT Core for the implementation walkthrough.
QoS is a broker-level guarantee, not a database guarantee. QoS 1 (at least once) can produce duplicates at the subscriber. The database bridge must handle them via ON CONFLICT DO NOTHING or a unique constraint on (ts, device_id). QoS 2 (exactly once) eliminates broker-level duplicates but adds round-trip overhead. Most production pipelines use QoS 1 with database-side deduplication.