A link to a remote PV site drops for forty minutes, a SCADA gateway reboots after a firmware push, or a market feed publishes late. Each leaves a hole in the time series, and the instinct to paper over it with a straight line is where the damage starts. The goal is not a gap-free chart; it is a record that recovers real data where it exists and is honest about everything else.
A backfilled value and a measured one must never look identical to a model.
Capture at the edge, recover idempotently
The first defence against gaps is to stop creating them. A small buffer on the edge gateway holds readings polled from Modbus or OPC UA tags when the uplink to the central TSDB is down, then replays them once the link returns. With MQTT this is largely store-and-forward: persistent sessions and queued messages. With a polling gateway it is a local ring buffer keyed by source timestamp. Either way, the data the plant actually produced survives a network outage instead of vanishing.
Replay has to be safe to repeat. We make backfill idempotent by writing on a natural key of source, signal and timestamp, so a re-sent batch upserts rather than duplicates. A watermark per source records the latest contiguous timestamp confirmed as durable, so a recovering buffer knows where to resume and a crashed backfill job can re-run without double-counting energy or smearing a cumulative meter total.
Label every value, do not fabricate it
The most expensive mistake in telemetry is treating a missing value as zero. A null means we do not know the output; a true zero means the inverter was producing nothing. Conflate them and a comms outage becomes phantom downtime, a clear-sky baseline drifts towards the floor, and any model trained on the history quietly inherits the bias. Keep nulls as nulls through ingestion, and assert zero only where a status flag or a known curtailment signal supports it.
Where a value is reconstructed at all, mark it as such. We carry a quality flag on every reading: measured, interpolated, backfilled or estimated. A forecast can then exclude interpolated points from training, a settlement report can footnote them, and an operator can see at a glance which figures are real. Interpolation is a presentation choice for a dashboard, never a silent edit to the stored record.
Gap handling looks like plumbing until an unflagged interpolation poisons a forecast or inflates a curtailment claim. Getting it right is discipline applied consistently across every source and protocol: buffering, watermarks, upserts and honest quality flags. That discipline is what we build into the data layers behind our energy & renewables work, so the numbers your models and reports rely on are the numbers the plant actually produced.