March 24, 2025 telematics data quality

The Telematics Data Quality Gap Nobody Talks About

Thomas Bergmann

CEO & Co-Founder, RouteLyft

GPS telematics device mounted inside a freight truck cab

Before we built anything at RouteLyft, we spent several months reading raw telematics feeds. Not aggregated position histories, not cleaned API outputs — the actual binary streams coming off FMS (Fleet Management System) interfaces and CAN bus telematics gateways from a range of fleet hardware. What we found was a data quality picture considerably messier than the one implied by most freight visibility marketing material.

The assumption most visibility platforms carry, at least implicitly, is that telematics data is a continuous, reliable stream of GPS coordinates that arrives in near-real-time and can be trusted as a ground truth for shipment position. In our experience across EU carrier integrations, that assumption is wrong in three specific and consequential ways.

Problem one: position report intervals vary wildly, and they matter

Fleet hardware from different manufacturers and different generations reports position at fundamentally different intervals. A modern Webfleet-based unit in a well-maintained fleet might report every 30 seconds when the vehicle is moving. An older Transics or Siemens VDO unit might report every five minutes. Some carriers, particularly smaller Eastern European operators running older Scania or DAF units with basic CAN bus integrations, report position on event triggers only — engine start, ignition off, hard brake — with no continuous position stream between events.

When you ingest all of these into a single pipeline and attempt to produce a uniform shipment timeline, the naive approach produces nonsense. A truck that has five-minute reporting intervals looks very similar to a truck that has stalled for five minutes. A truck reporting on engine-event triggers only creates huge blank windows in the timeline — not because nothing is happening, but because the hardware is not reporting. If your ETA model treats "no position update" as "no movement," you will systematically over-estimate dwell times for event-trigger hardware and produce MAPE values that look acceptable on average but fail badly on specific carrier/hardware combinations.

The correct approach is to track the reporting interval as a first-class attribute of the source feed and build the timeline model around expected update cadence rather than assuming continuous observation. A 45-minute gap in a feed that reports every 30 seconds is anomalous; a 45-minute gap in a feed that reports every 60 minutes is expected silence. These are not the same signal.

Problem two: GPS drift and the border crossing false positive

GPS drift is a known issue, but its impact on freight visibility is particularly acute at border crossings — which are, of course, exactly the locations where you most need accurate position data. Consider the geometry: a border crossing like Kiefersfelden on the German–Austrian A93 corridor has a holding area, an inspection lane, and a checkpoint, spread over roughly 800 metres. A GPS signal with 15–20 metre accuracy under good conditions can place a truck in the wrong administrative zone — pre-checkpoint versus post-checkpoint — depending on satellite geometry at the time of the report.

We encountered this concretely on the Rotterdam–Warsaw corridor, where a carrier's GPS unit was intermittently reporting positions that placed the vehicle 200–300 metres ahead of its actual position on the A2 approach to the German–Polish crossing at Slubice. The position updates were not wrong enough to look wrong — they were wrong in a way that made the crossing look faster than it was, suppressing an alert that should have fired based on dwell time at the border approach. The driver was sitting in a queue. The system thought he was already through.

Addressing this requires not just raw position accuracy but an understanding of the physical geometry of each border crossing and ferry terminal in your routing network. Geofencing triggers that depend on a single position report hitting inside a polygon are vulnerable to this type of error. Geofencing that requires N consecutive position reports inside the polygon, or that uses a confidence interval around the reported position, is materially more reliable — at the cost of being slower to trigger.

Problem three: carrier gateway latency

The position that the telematics hardware reports and the position that arrives at your API endpoint are not always the same age. Between the hardware and your system sits the carrier's telematics gateway — typically a middleware layer that aggregates vehicle data, applies business rules, and exposes it via an API or EDI feed. This layer introduces latency that varies considerably and is almost never documented by the carrier.

In our integrations work, we observed gateway latencies ranging from under 60 seconds (modern REST-with-webhook setups at well-resourced carriers) to 45 minutes (older gateway infrastructure running batch processing cycles). The 45-minute case was not theoretical — it was a specific mid-size German regional carrier whose EDI feed we integrated, where the gateway batched position data on a 30-minute cycle and then had 15 minutes of processing time. A position report that was current at 14:00 arrived at our ingest layer stamped 14:45.

We're not saying that high gateway latency makes a carrier unsuitable for integration — it means the latency profile needs to be tracked and the timeline model adjusted accordingly. A position timestamped by the source is fundamentally different from a position timestamped by when your system received it. If you're using receive-time as a proxy for event-time and the gateway latency is 45 minutes, your ETA model is working on stale data while believing it is working on current data. The fix is simple — preserve source timestamps and track gateway lag explicitly — but it requires awareness of the problem in the first place.

How we normalise noisy signal into a usable timeline

The pipeline approach that produces reliable shipment timelines from heterogeneous telematics sources has three stages: ingestion with metadata preservation, feed-level characterisation, and event-driven state inference.

In the ingestion stage, every position report is stored with three timestamps: source hardware timestamp (from the telematics unit's internal clock), gateway receive timestamp (when the carrier's system logged it), and our ingest timestamp (when it hit our pipeline). The delta between source and our ingest timestamp gives the total delivery lag for that position report. We maintain per-carrier, per-hardware-type lag distributions as a feed-level attribute.

In the feed-level characterisation stage, we track the expected reporting interval for each active vehicle feed based on the observed distribution over the prior 72 hours. If a feed that typically reports every 90 seconds goes silent for 20 minutes, that is a signal. If a feed that typically reports every 8 minutes goes silent for 20 minutes, that is within normal variance.

In the state inference stage, we use the characterised feed against a route-level progression model. Rather than attempting to infer position from position, we infer state from position plus expected feed behaviour plus known route waypoints. A vehicle that was last confirmed inside the Brenner approach geofence 3.5 hours ago, whose feed has been silent for that duration, whose gateway lag profile is 5–10 minutes, and whose feed typically reports every 4 minutes — that combination produces a high-confidence inference of "stationary at or near the border crossing, cause unknown," not "unknown position."

The signal you should not throw away

There is an instinct in data engineering to filter aggressively — to discard "bad" position reports and work only with "clean" data. With telematics feeds, this approach discards useful signal. A GPS report that is geometrically inconsistent with the vehicle's physical movement trajectory (a position jump of 15km between two reports separated by 30 seconds, for instance) is not random noise — it is a GPS integrity failure that carries its own information value. The timestamp, the carrier feed source, and the location of the jump can be useful in characterising which hardware and which network segment produces integrity failures and at what rate.

Similarly, gateway silence is a signal. Event-trigger reports that fire on engine start and ignition-off only are not gaps in the timeline — they are position anchors at known state transitions, and they constrain the inference problem considerably. A vehicle whose ignition-off event was recorded at a known truck stop on the A7 north of Hamburg at 22:14 did not teleport. Its next ignition-on event will tell you a rest stop duration, which feeds directly into remaining drive time calculations under EU Regulation No. 561/2006 on driver hours.

Building a freight visibility platform on top of real telematics data means accepting that the signal is noisy, heterogeneous, and sometimes structurally limited — and designing the normalisation layer to extract maximum information from that reality rather than pretending it is cleaner than it is. The teams that get ETA accuracy right are the ones that model the feed, not just the freight.