Back to Blog
Architecture5 min read

Building a Real-Time IIoT Data Pipeline with Edge Computing

Edgeo Team|

Why Edge Computing Matters for Industrial Data

In traditional SCADA architectures, every data point travels from the sensor to a centralized server — often hundreds of miles away — before any processing occurs. This model worked when poll intervals were measured in seconds and data volumes were manageable.

Modern IIoT deployments generate millions of data points per minute. Sending all of that to the cloud is expensive, slow, and fragile. A network hiccup means lost data. A cloud outage means blind operations.

Edge computing flips this model. By processing data close to where it's generated, you get:

  • Sub-millisecond response times for critical control decisions
  • 90%+ bandwidth reduction by aggregating and filtering at the source
  • Resilience — edge nodes continue operating even when disconnected
  • Data sovereignty — sensitive OT data never leaves your facility

Architecture Overview

A well-designed IIoT edge pipeline has four layers:

Layer 1: Data Acquisition

The foundation is connecting to your physical infrastructure. Industrial protocols like Modbus, OPC UA, and MQTT serve different purposes:

ProtocolUse CaseStrengths
Modbus TCPPLC communicationSimple, universal, low overhead
OPC UAInformation modelingRich metadata, security, discovery
MQTTSensor telemetryPub/sub, lightweight, reliable delivery
DNP3Utility SCADADesigned for unreliable networks

Edgeo handles all four natively, with a unified configuration model:

// Define your data sources declaratively
const sources = {
  'line-1-plc': {
    protocol: 'modbus-tcp',
    host: '192.168.1.100',
    registers: [
      { address: 40001, name: 'temperature', type: 'float32' },
      { address: 40003, name: 'pressure', type: 'float32' },
      { address: 40005, name: 'flow_rate', type: 'float32' },
    ],
    pollInterval: 500, // ms
  },
};

Layer 2: Edge Processing

Raw sensor data is rarely useful on its own. The edge processing layer transforms, enriches, and filters data before it goes anywhere:

Normalization — Convert raw register values to engineering units. A Modbus register value of 16789 becomes 82.3°C after applying the sensor's scaling factor.

Quality filtering — Discard readings that fail range checks or spike detection. A temperature sensor reading -9999 is a fault code, not a data point.

Aggregation — Compress high-frequency data into meaningful summaries. Instead of sending 1,000 readings per second to the cloud, send min/max/mean/std every second.

// Edge processing pipeline
pipeline('temperature-monitoring')
  .source('line-1-plc', 'temperature')
  .transform(raw => raw * 0.01 + offset)  // Scale to engineering units
  .filter(val => val > -40 && val < 200)   // Physical range check
  .deadband(0.5)                           // Only forward meaningful changes
  .aggregate({
    window: '1s',
    emit: ['min', 'max', 'mean', 'count'],
  })
  .sink('cloud-historian');

Layer 3: Local Storage and Buffering

Edge nodes need local storage for two reasons: real-time queries and store-and-forward during network outages.

Edgeo uses an embedded time-series store optimized for industrial data patterns:

  • Write-optimized — handles burst writes during high-frequency acquisition
  • Compressed — industrial data compresses extremely well (often 10:1 or better)
  • Queryable — supports time-range queries for local dashboards and alerting
  • Bounded — configurable retention prevents disk exhaustion on edge devices

When connectivity to the cloud is lost, data accumulates locally. When the connection is restored, the buffer drains automatically — with backpressure to avoid overwhelming upstream systems.

Layer 4: Cloud Integration

Not all data belongs in the cloud, but some data must get there. The cloud layer handles:

  • Long-term storage — years of historical data for trend analysis
  • Cross-site analytics — compare performance across facilities
  • Machine learning — train models on aggregated data, deploy inference at the edge
  • Compliance — centralized audit trails and regulatory reporting

The key insight is that the cloud receives processed, meaningful data — not raw sensor noise. This reduces cloud costs by 90% or more while actually improving data quality.

Sizing Your Edge Hardware

One of the most common questions is "what hardware do I need?" The answer depends on your data volume:

Small Deployment (< 1,000 tags)

A Raspberry Pi 4 or equivalent ARM SBC is sufficient. Edgeo's memory footprint starts at ~128MB, and a single ARM core can handle thousands of tag updates per second.

Medium Deployment (1,000 – 50,000 tags)

An Intel NUC or industrial PC with 8GB RAM and an SSD. This handles tens of thousands of tag updates per second with room for local analytics.

Large Deployment (50,000+ tags)

A rack-mounted edge server with 32GB+ RAM. At this scale, you're likely running multiple protocol adapters and complex processing pipelines. Edgeo scales horizontally across cores.

Monitoring Your Pipeline

An edge pipeline that you can't observe is a liability. Edgeo exposes Prometheus-compatible metrics at every stage:

# Data acquisition health
edgeo_source_reads_total{source="line-1-plc"} 1284923
edgeo_source_errors_total{source="line-1-plc"} 3
edgeo_source_latency_ms{source="line-1-plc",quantile="0.99"} 12

# Processing pipeline
edgeo_pipeline_throughput{pipeline="temp-monitoring"} 1000
edgeo_pipeline_filtered_total{pipeline="temp-monitoring"} 42891
edgeo_pipeline_buffer_bytes{pipeline="temp-monitoring"} 1048576

# Cloud sync
edgeo_sync_pending_bytes 0
edgeo_sync_last_success_timestamp 1706745600

Getting Started

The fastest path to a working IIoT pipeline:

  1. Install Edgeo on your edge hardware (single binary, no dependencies)
  2. Configure a source — start with one PLC or MQTT broker
  3. Define a pipeline — even a simple pass-through gets data flowing
  4. Add a sink — send processed data to your cloud platform or local dashboard
  5. Iterate — add processing steps, more sources, and alerting rules as you learn

The entire setup takes less than an hour for a basic deployment. The Edgeo documentation includes step-by-step guides for every major PLC vendor and cloud platform.

Conclusion

Edge computing isn't just a buzzword for industrial applications — it's a fundamental architectural shift that solves real problems: latency, bandwidth, resilience, and data sovereignty. By processing data where it's generated, you build systems that are faster, cheaper, and more reliable than cloud-only architectures.

The key is starting small, proving value, and expanding. Your edge pipeline doesn't need to be perfect on day one. It needs to be better than what you have now — and with modern tools like Edgeo, that bar is easy to clear.

#edge-computing#iiot#data-pipeline#mqtt

Related Articles