MQTT-Based IoT Data Pipeline
Case study: High-throughput MQTT data pipeline processing real-time telemetry from thousands of connected devices. Built for reliability, low latency, and horizontal scale.

Project Overview
The client shipped hardware to end customers and needed visibility into how those devices were performing in the field. Their existing telemetry setup was a single MQTT broker writing directly to a relational database — fine at a few hundred devices, but unstable under the burst traffic that came with fleet growth. Devices reconnecting after a network outage would flood the broker simultaneously, the database write queue would back up, and telemetry gaps would appear in monitoring dashboards at exactly the moments they were most needed.
We redesigned the telemetry pipeline from broker to storage, introducing a durable message bus between ingestion and persistence and replacing the relational database with a purpose-built time-series store. The result handles burst traffic without shedding messages, scales horizontally as the fleet grows, and queries efficiently against months of high-frequency device data.
Technical Architecture
The pipeline is designed around the principle that no single component should be a bottleneck or single point of failure:
- MQTT Broker Cluster: Clustered EMQX deployment with QoS 1 message delivery guarantees — devices connect with persistent sessions so messages queued during outages are delivered when the device reconnects, not lost
- Kafka Bridge: MQTT-to-Kafka bridge decouples ingestion rate from processing rate; the broker acknowledges messages immediately and Kafka provides the durable buffer that absorbs burst traffic without back-pressure to devices
- Stream Processor: Python consumers read from Kafka, validate and enrich telemetry records (device metadata, unit conversions, anomaly flags), and write to storage — horizontally scalable by adding consumer instances
- TimescaleDB: PostgreSQL extension optimized for time-series data; continuous aggregations pre-compute common queries (hourly averages, daily min/max) so dashboard queries stay fast as the dataset grows
Scale and Reliability
The redesigned pipeline removed the failure modes that caused telemetry gaps and gave the team the operational headroom to grow the fleet without rearchitecting again:
- Burst absorption: Devices reconnecting after a network event now flood Kafka, not the database — the broker stays healthy and message delivery guarantees hold regardless of downstream processing speed
- No data loss on failure: If a stream processor instance fails, Kafka consumer group rebalancing redistributes its partitions to healthy instances; messages are replayed from last committed offset with no gaps
- Efficient historical queries: TimescaleDB’s chunk-based storage and continuous aggregations keep query times consistent even as months of per-device per-second data accumulates
- Operational visibility: Kafka consumer lag metrics give the team a leading indicator of pipeline health — a growing lag signals a problem before it manifests as missing data in a dashboard
Challenges
- Handling burst traffic from large device fleets
- Ensuring message delivery guarantees at scale
- Storing and querying high-cardinality time-series data
Solutions
- Deployed clustered MQTT broker with QoS tiering
- Bridged MQTT to Kafka for durable stream processing
- Used TimescaleDB for compressed time-series storage
Results & impact
Stable ingestion across full device fleet
Sub-second telemetry visibility in dashboards
Horizontally scalable pipeline with no single point of failure