Building a Real-Time IIoT Data Pipeline with Edge Computing
Why Edge Computing Matters for Industrial Data
In traditional SCADA architectures, every data point travels from the sensor to a centralized server — often hundreds of miles away — before any processing occurs. This model worked when poll intervals were measured in seconds and data volumes were manageable.
Modern IIoT deployments generate millions of data points per minute. Sending all of that to the cloud is expensive, slow, and fragile. A network hiccup means lost data. A cloud outage means blind operations.
Edge computing flips this model. By processing data close to where it's generated, you get:
- Sub-millisecond response times for critical control decisions
- 90%+ bandwidth reduction by aggregating and filtering at the source
- Resilience — edge nodes continue operating even when disconnected
- Data sovereignty — sensitive OT data never leaves your facility
Architecture Overview
A well-designed IIoT edge pipeline has four layers:
Layer 1: Data Acquisition
The foundation is connecting to your physical infrastructure. Industrial protocols like Modbus, OPC UA, and MQTT serve different purposes:
| Protocol | Use Case | Strengths |
|---|---|---|
| Modbus TCP | PLC communication | Simple, universal, low overhead |
| OPC UA | Information modeling | Rich metadata, security, discovery |
| MQTT | Sensor telemetry | Pub/sub, lightweight, reliable delivery |
| DNP3 | Utility SCADA | Designed for unreliable networks |
Edgeo handles all four natively, with a unified configuration model:
// Define your data sources declaratively
const sources = {
'line-1-plc': {
protocol: 'modbus-tcp',
host: '192.168.1.100',
registers: [
{ address: 40001, name: 'temperature', type: 'float32' },
{ address: 40003, name: 'pressure', type: 'float32' },
{ address: 40005, name: 'flow_rate', type: 'float32' },
],
pollInterval: 500, // ms
},
};
Layer 2: Edge Processing
Raw sensor data is rarely useful on its own. The edge processing layer transforms, enriches, and filters data before it goes anywhere:
Normalization — Convert raw register values to engineering units. A Modbus register value of 16789 becomes 82.3°C after applying the sensor's scaling factor.
Quality filtering — Discard readings that fail range checks or spike detection. A temperature sensor reading -9999 is a fault code, not a data point.
Aggregation — Compress high-frequency data into meaningful summaries. Instead of sending 1,000 readings per second to the cloud, send min/max/mean/std every second.
// Edge processing pipeline
pipeline('temperature-monitoring')
.source('line-1-plc', 'temperature')
.transform(raw => raw * 0.01 + offset) // Scale to engineering units
.filter(val => val > -40 && val < 200) // Physical range check
.deadband(0.5) // Only forward meaningful changes
.aggregate({
window: '1s',
emit: ['min', 'max', 'mean', 'count'],
})
.sink('cloud-historian');
Layer 3: Local Storage and Buffering
Edge nodes need local storage for two reasons: real-time queries and store-and-forward during network outages.
Edgeo uses an embedded time-series store optimized for industrial data patterns:
- Write-optimized — handles burst writes during high-frequency acquisition
- Compressed — industrial data compresses extremely well (often 10:1 or better)
- Queryable — supports time-range queries for local dashboards and alerting
- Bounded — configurable retention prevents disk exhaustion on edge devices
When connectivity to the cloud is lost, data accumulates locally. When the connection is restored, the buffer drains automatically — with backpressure to avoid overwhelming upstream systems.
Layer 4: Cloud Integration
Not all data belongs in the cloud, but some data must get there. The cloud layer handles:
- Long-term storage — years of historical data for trend analysis
- Cross-site analytics — compare performance across facilities
- Machine learning — train models on aggregated data, deploy inference at the edge
- Compliance — centralized audit trails and regulatory reporting
The key insight is that the cloud receives processed, meaningful data — not raw sensor noise. This reduces cloud costs by 90% or more while actually improving data quality.
Sizing Your Edge Hardware
One of the most common questions is "what hardware do I need?" The answer depends on your data volume:
Small Deployment (< 1,000 tags)
A Raspberry Pi 4 or equivalent ARM SBC is sufficient. Edgeo's memory footprint starts at ~128MB, and a single ARM core can handle thousands of tag updates per second.
Medium Deployment (1,000 – 50,000 tags)
An Intel NUC or industrial PC with 8GB RAM and an SSD. This handles tens of thousands of tag updates per second with room for local analytics.
Large Deployment (50,000+ tags)
A rack-mounted edge server with 32GB+ RAM. At this scale, you're likely running multiple protocol adapters and complex processing pipelines. Edgeo scales horizontally across cores.
Monitoring Your Pipeline
An edge pipeline that you can't observe is a liability. Edgeo exposes Prometheus-compatible metrics at every stage:
# Data acquisition health
edgeo_source_reads_total{source="line-1-plc"} 1284923
edgeo_source_errors_total{source="line-1-plc"} 3
edgeo_source_latency_ms{source="line-1-plc",quantile="0.99"} 12
# Processing pipeline
edgeo_pipeline_throughput{pipeline="temp-monitoring"} 1000
edgeo_pipeline_filtered_total{pipeline="temp-monitoring"} 42891
edgeo_pipeline_buffer_bytes{pipeline="temp-monitoring"} 1048576
# Cloud sync
edgeo_sync_pending_bytes 0
edgeo_sync_last_success_timestamp 1706745600
Getting Started
The fastest path to a working IIoT pipeline:
- Install Edgeo on your edge hardware (single binary, no dependencies)
- Configure a source — start with one PLC or MQTT broker
- Define a pipeline — even a simple pass-through gets data flowing
- Add a sink — send processed data to your cloud platform or local dashboard
- Iterate — add processing steps, more sources, and alerting rules as you learn
The entire setup takes less than an hour for a basic deployment. The Edgeo documentation includes step-by-step guides for every major PLC vendor and cloud platform.
Conclusion
Edge computing isn't just a buzzword for industrial applications — it's a fundamental architectural shift that solves real problems: latency, bandwidth, resilience, and data sovereignty. By processing data where it's generated, you build systems that are faster, cheaper, and more reliable than cloud-only architectures.
The key is starting small, proving value, and expanding. Your edge pipeline doesn't need to be perfect on day one. It needs to be better than what you have now — and with modern tools like Edgeo, that bar is easy to clear.