How Accurate Are Freight ETA Predictions? A Data-Driven Breakdown

Data visualization of freight ETA accuracy across EU lanes

When freight visibility vendors talk about ETA accuracy, they tend to talk about it in terms that obscure more than they reveal. "High-accuracy ETA predictions." "Reliable delivery estimates." These claims are difficult to evaluate without knowing what is being measured, over which lane type, and — crucially — what kind of delay the prediction model is actually trying to predict.

We analysed shipment data across four major EU road freight corridors to understand where ETA estimates break down. The findings are not what most people in the industry expect.

How ETA MAPE is typically calculated — and why it misleads

Mean Absolute Percentage Error (MAPE) is the standard metric for ETA prediction accuracy in freight: for each shipment, you calculate the absolute difference between predicted arrival time and actual arrival time at some point during the journey, divide by the predicted time remaining, and average across all shipments. Lower MAPE is better.

The problem is how the calculation is typically applied. Most visibility vendors calculate MAPE against the estimated arrival at destination, averaged across all shipments including those that arrived on time. For a well-planned intra-country lane with no border crossings and low congestion variability, MAPE might run at 3–5% — impressive-sounding. This figure conceals the performance on the shipments that actually matter: cross-border shipments with customs exposure, shipments crossing high-variability corridors, and shipments that experienced exceptions.

A visibility platform that accurately predicts 95% of shipments and wildly mispredicts the 5% of shipments that have exceptions is not, in operational terms, providing good ETA accuracy. It is providing good ETA accuracy for shipments that did not need an ETA correction and failing exactly where intervention would have been valuable.

The driving time prediction problem is essentially solved

Driving time prediction on European road freight corridors has become a reasonably well-understood problem. The combination of GPS trace data, historical traffic patterns by day-of-week and hour-of-day, and EU Regulation No. 561/2006 driver hours constraints gives a model enough inputs to produce driving time estimates within 8–12% MAPE for most major EU motorway corridors — the A1, A2, A7, A8, E40 class roads.

Brenner corridor congestion patterns are well-documented in traffic modelling literature; the A22-Autobahn between Verona and the Brenner Pass sees predictable summer peaks, and historical data from ASFINAG and Italian Autostrade monitoring infrastructure is publicly accessible and well-integrated into commercial routing services. A Hamburg→Milan estimate based on driving time alone is not hard to get within 60–90 minutes of actual for well-behaved transits.

The corollary: if your ETA prediction is based primarily on driving time with some traffic adjustment, and it looks accurate in aggregate, that accuracy is coming from the easy cases. The hard cases — the ones that drive customer escalations — are not in the average.

Where predictions actually fail: dwell time at fixed points

In our shipment data analysis across the Hamburg→Prague (via Dresden/Bad Brambach), Rotterdam→Warsaw (via Frankfurt/Slubice), Antwerp→Milan (via Gotthard or Brenner), and Hamburg→Bucharest (via Vienna/Budapest) corridors, we categorised the source of ETA errors greater than 3 hours. The breakdown was consistent across corridors:

  • Customs dwell at border crossings: 52–61% of large ETA errors (depending on corridor)
  • Ferry/port terminal waiting time: 8–14% (most relevant on routes via Rostock or for Baltic routes)
  • Unplanned vehicle breakdown or driver hours exception: 11–17%
  • Warehouse/consignee dwell at destination (delayed unloading): 9–14%
  • Traffic congestion on motorway network: 7–12%

The dominant source of large ETA errors is customs dwell. And customs dwell is specifically the category that most ETA models handle worst, for the reason discussed in our previous post on border delay detection: the inputs that would allow accurate dwell time estimation — current queue length at the customs post, declaration error hold rates on this day, staffing levels at the post — are not consistently available in real-time through carrier APIs.

Historical average dwell times by border crossing are available from aggregate datasets and can serve as a baseline. The Brenner Pass in summer historically averages 2.5–3.5 hours southbound on Fridays. The Bad Brambach crossing averages 45–90 minutes on normal weekdays. Using these averages improves ETA accuracy on average, but average performance at border crossings masks high variance — a crossing that averages 90 minutes might range from 20 minutes to 7 hours depending on day, time, and whether there is any inspection activity. An ETA model using the average dwell figure as a point estimate produces predictions that are accurate on the average day and badly wrong on the bad day.

The Gotthard scenario: why fixed-point variance dominates

Consider a concrete example. Alpenweg Logistics AG, a Swiss-based freight forwarder, routes 40-foot reefer loads from Antwerp to Milan on a regular weekly schedule via the Gotthard road tunnel. The transit includes the Swiss customs crossing at Basel/Weil am Rhein and the Italian customs entry at Chiasso.

On a normal Tuesday, the Basel crossing takes 35–50 minutes. Chiasso, with goods pre-cleared under Swiss customs procedures, takes another 25–40 minutes. The ETA deviation due to fixed-point dwell on a normal transit: 60–90 minutes total, manageable within a standard delivery window.

On a Friday before a Swiss public holiday, the Basel approach backs up. Trucks queuing on the A2 approach can wait 4–5 hours before reaching the customs hall. This is not random variance — it is a predictable event with leading indicators (holiday calendar, day-of-week, time of day, historical queue length data from ASTRA, the Swiss road traffic authority). An ETA model that incorporates these leading indicators can widen the confidence interval appropriately and alert earlier; a model using only average dwell produces an ETA that is off by 3–4 hours with no warning.

What better ETA prediction looks like in the EU cross-border context

We're not saying that accurate ETA prediction for EU cross-border freight is impossible — it is a solvable problem, with honest representation of uncertainty. The architectural requirements are specific.

First: the model must treat fixed-point dwell as a distinct prediction sub-problem with its own inputs, not as a component of "road time" or "transit time." Driving time and customs dwell have fundamentally different variance structures and different available predictors. Merging them into a single estimate loses predictive information.

Second: the confidence interval matters as much as the point estimate. A prediction of "arrival at 16:00 ± 4 hours" is more operationally useful than "arrival at 16:00" when crossing a high-variance border checkpoint. Coordinators who understand the confidence interval can make better decisions about downstream scheduling than coordinators who receive a point estimate that is wrong 30% of the time.

Third: the ETA should update on customs event signals, not only on position signals. When an NCTS CC007C arrival message fires (the vehicle has formally presented to customs), the ETA model should shift from historical average dwell time to real-time dwell tracking. When the CC025C release message fires, the model should recalculate departure time and recompute the destination ETA accordingly. This requires treating customs feed events as first-class inputs to the ETA model rather than supplementary information appended after the estimate is already generated.

The freight visibility platforms that produce genuinely useful ETA predictions for EU cross-border operations are the ones that treat the problem as three distinct sub-problems — driving time, border dwell, and destination dwell — each with separate inputs and separate uncertainty models, combined into a composite estimate with explicit confidence bounds. The platforms that produce high average MAPE scores are often the ones that have optimised for the easy lanes and the on-time shipments. That is not the same thing.