A Suezmax tanker, laden, burning VLSFO at 14 knots. Fujairah to Djibouti, 2,222 nautical miles through the Gulf of Aden with Bab el-Mandeb disrupted and 369 active NAVWARNs scrolling across the ticker. The deterministic model says: about 6.6 days.
That number is wrong. Not because the math is bad — the speed model is fine. It's wrong because it pretends the weather at each waypoint is known exactly. It's wrong because a single forecast collapses a range of possible sea states into one value. It's wrong because an underwriter who writes a policy against a point estimate is underpricing the tail.
So we don't give you one number. We give you 500.
The problem with deterministic ETAs
Every voyage ETA calculator works the same way. Take the route distance. Divide by service speed. Adjust for sea state using some Beaufort-to-speed-loss table. Output a number with false precision: "159.4 hours."
The issue isn't the adjustment — it's that the weather input is a single forecast. Open-Meteo's marine API (what we use) gives you wave height, wind speed, and swell height at a given lat/lon for a given time. Good data. But it's one scenario out of many possible ones.
How wrong can a single forecast be? Here's the empirical uncertainty by forecast lead time:
| Lead Time | Relative Error | Wave Height Sigma (1.5m base) | Wind Speed Sigma (15kt base) |
|---|---|---|---|
| 0h (nowcast) | 15% | 0.30m | 3.0 kt |
| 24h | 20% | 0.30m | 3.0 kt |
| 48h | 25% | 0.38m | 3.2 kt |
| 72h | 30% | 0.45m | 3.8 kt |
| 5d | 40% | 0.60m | 5.1 kt |
| 7d+ | 50% | 0.75m | 6.4 kt |
For a 2,222nm voyage at 14 knots, segment 1 departs with a 15% uncertainty window. By segment 5 or 6 — days into the future — that window has opened to 40% or more. The tail segments of a voyage are fundamentally uncertain. A deterministic model ignores this. Monte Carlo doesn't.
What Monte Carlo actually does
Monte Carlo simulation is a brute-force way to answer the question: "If I sailed this voyage 500 times with slightly different weather each time, what would the distribution of arrival times look like?"
Here's the algorithm, stripped to its core:
for each simulation (1 to 500):
1. Generate a correlated random perturbation for each route segment
2. Apply the perturbation to the base weather forecast
3. Compute effective speed per segment using the perturbed weather
4. Sum all segment transit times → total voyage hours
Sort all 500 total-hours values.
Extract percentiles: P10, P25, P50, P75, P90.
The P50 is your median — the most likely outcome. P10 is the lucky scenario (calm weather, favorable conditions). P90 is the bad case — rough seas across multiple segments. The gap between P10 and P90 is the honest answer about how uncertain this voyage really is.
Percentile estimates stabilize quickly. At 100 simulations, the P50 is stable to within ~1 hour for a typical 6-day voyage. At 500, the P90 — which is the trickier tail to pin down — stabilizes to within ~30 minutes. Going to 10,000 would give marginal improvement but take 20x longer. 500 is the pragmatic sweet spot: runs in under 2 seconds, percentiles you can trust.
Step 1: Correlated perturbations via AR(1)
The naive approach is to generate independent random weather at each segment. Roll the dice six times, get six random sea states. This is physically wrong.
Weather systems span hundreds of miles. If segment 3 hits a storm, segment 4 almost certainly hits the same storm. Treating them as independent events would make the distribution unrealistically narrow — bad weather in one segment would be "cancelled out" by good weather in the next, averaging away the extremes.
We use an AR(1) (first-order autoregressive) process with a correlation coefficient of 0.85:
perturbation[0] = randn() // standard normal
perturbation[i] = 0.85 * perturbation[i-1]
+ sqrt(1 - 0.85²) * randn() // innovation term
The 0.85 means: if segment 3 draws a +2 sigma perturbation (bad weather), segment 4 will center around +1.7 sigma. Still bad, but decaying. By segment 7 or 8, the correlation has washed out and conditions are effectively independent of that original storm.
This matches physical reality. A 500nm weather system affects 2-3 consecutive route segments, then its influence fades. The sqrt(1 - rho²) innovation term (about 0.527 when rho=0.85) injects fresh randomness at each step, preventing the series from being completely deterministic.
Step 2: Perturbing the forecast
Each route segment has a base weather forecast from Open-Meteo: wave height, wind speed, swell height. The perturbation (a z-score from the AR(1) process) gets applied to each:
// Lead-time-dependent uncertainty scaling
uncertaintyScale = 0.15 + 0.05 * min(leadTimeDays, 7)
// Per-variable sigma with minimum floors
waveSigma = max(0.3m, baseWaveHeight * uncertaintyScale)
windSigma = max(3 kt, baseWindSpeed * uncertaintyScale * 0.85)
swellSigma = max(0.2m, baseSwellHeight * (uncertaintyScale + 0.05))
// Apply the z-score
perturbedWave = max(0, baseWave + zScore * waveSigma)
perturbedWind = max(0, baseWind + zScore * windSigma)
perturbedSwell = max(0, baseSwell + zScore * swellSigma)
Two details matter here.
Lead-time scaling. The first segment's weather is a nowcast — 15% uncertainty. Segment 5, three days from departure, gets 30%. This widening cone of uncertainty is the single biggest reason the P10-P90 spread exists.
Minimum sigma floors. The formula baseWaveHeight * uncertaintyScale would give zero sigma in calm seas (wave height = 0). But "calm right now" doesn't mean "guaranteed calm" — light conditions can shift to moderate ones faster than heavy weather dissipates. The floors (0.3m wave, 3kt wind, 0.2m swell) prevent unrealistically tight distributions in fair weather.
Open-Meteo gives solid deterministic forecasts but does not provide ensemble spread data. Our uncertainty model is synthetic — we estimate how much a forecast might be wrong based on historical verification statistics, not from running 50 atmospheric model members. This matters most in the P90 tail: our synthetic uncertainty may underestimate the true probability of extreme weather events. We flag this with a disclaimer in every stochastic result. ECMWF ensemble integration is planned — when it arrives, the P90 values will get more honest.
Step 3: Beaufort speed curves
This is where the perturbed weather becomes a transit time. For each segment, we:
- Convert wave height, wind speed, and swell into an effective Beaufort number
- Look up the speed reduction factor for the vessel type and load condition
- Apply it to the service speed
The speed curves come from IMO-documented voluntary speed reduction data (Kwon 2008, Lu et al. 2015). They're not linear. A laden tanker at Beaufort 4 barely notices — maybe 3% speed loss. At Beaufort 6, it's 12-15%. At Beaufort 8, it's 30%+. The relationship is roughly cubic: small increases in sea state cause disproportionate speed loss.
Laden tanker effective speed at 14kt service speed across Beaufort conditions
This non-linearity is why Monte Carlo matters. If you average the weather and compute one ETA, you get the wrong answer. A 50/50 chance of Beaufort 4 or Beaufort 8 doesn't average to Beaufort 6 performance — the Beaufort 8 half costs disproportionately more time. Jensen's inequality at sea.
Step 4: Summing segment times
Each simulation produces a transit time for each segment. Sum them. That's one simulated voyage duration.
Do it 500 times. Sort the results. Read off the percentiles.
For the Fujairah→Djibouti route in the screenshot — 2,222nm, tanker, laden, 14kt — the AI briefing generated from the stochastic model reports a 10% probability of arriving 157 hours or more behind schedule. That P90 didn't come from someone's guess. It came from 500 simulations, each one drawing weather from a correlated perturbation field, running it through Beaufort speed curves, and summing segment times.
What the output looks like
The stochastic model returns five percentiles plus summary statistics:
| Percentile | Meaning | What it tells an underwriter |
|---|---|---|
| P10 | 10% of simulations arrived faster | Best-case scenario. Don't plan around this. |
| P25 | 25th percentile | Optimistic but plausible. |
| P50 | Median outcome | Your best single estimate. Use this for scheduling. |
| P75 | 75th percentile | Build contingency around this for fuel planning. |
| P90 | Only 10% of simulations were worse | Your stress test. If the policy still works at P90, it's sound. |
The P10-P90 range is the information product. A tight range (say, P10=155h, P90=165h) means the voyage is weather-predictable — probably a short hop in a stable ocean region. A wide range (P10=130h, P90=200h) means significant uncertainty, usually because the route crosses storm-prone regions or extends past the reliable forecast horizon.
The distribution is the honest answer. A single ETA is just the middle of a range you're choosing not to show.
Per-segment uncertainty breakdown
The model doesn't just give you a voyage total — it breaks uncertainty down by segment. This tells you where the risk lives. For a Gulf of Aden transit, the uncertainty is often concentrated in the open-water segments approaching the Bab el-Mandeb, where monsoon conditions drive significant wave variability, while the final approach to Djibouti (shorter segment, calmer coastal waters) has a much tighter distribution.
Each segment reports its own P10-P90 transit time range, mean effective speed, and mean weather impact factor. If one segment is dominating the voyage uncertainty, you know exactly where.
Why this matters for underwriting
Marine underwriters price delay risk. A cargo policy with a specific arrival window has a quantifiable exposure when P90 is 6.5 days instead of 6.6 days. But the current state of the art at most desks is: "takes about a week, maybe a bit more if weather's bad." That's not risk assessment. That's hope.
The Monte Carlo distribution gives you:
- A defensible P90 for stress testing — "there's a 10% chance this voyage takes more than X hours" is a statement you can use in a peer review
- Fuel consumption ranges — the 366.6 MT fuel estimate for our Fujairah→Djibouti tanker is the deterministic value; the P90 fuel burn is higher because the vessel spends more time at sea
- CII compliance margins — a CII Rating of C under deterministic conditions might slip to D at P90 weather if the vessel burns more fuel over more hours
- A basis for comparing routes — Route A might have a lower P50 but a wider P10-P90 spread than Route B; which one you prefer depends on your risk appetite
The stochastic model incorporates seasonal climatological baselines. We maintain multiplier tables for 11 ocean regions across all 12 months — data from IMO Circular MSC.1/Circ.1228, NOAA hurricane tracks, JMA typhoon climatology, and Indian Met Dept monsoon dates. The Red Sea/Gulf of Aden region has a multiplier of 1.4 in July-August (SW monsoon peak) and 0.8 in April-May (calm inter-monsoon). These multipliers inflate or deflate the perturbation sigmas, so the Monte Carlo distribution properly reflects seasonal risk — even when the 72h forecast for next Tuesday happens to look calm.
The code, briefly
The core simulation is surprisingly compact. The AR(1) perturbation generator is 8 lines. The weather perturbation function is 10. The simulation loop itself is a for-loop that calls the speed model 500 × numSegments times, collects an array of total hours, sorts it, and reads percentiles.
The real engineering is in the details that make it trustworthy:
- Box-Muller transform for generating standard normal variates from uniform random numbers
- Minimum sigma floors that prevent zero-uncertainty in calm conditions
- Lead-time-dependent scaling that widens the cone as segments project further into the future
- Single z-score per segment applied to all three weather variables (wave, wind, swell) — because high waves and high winds co-occur; treating them independently would allow simulations with 4m waves and 5kt wind, which isn't physical
- Results cached in Postgres with a 2-hour TTL — stochastic runs are expensive, weather doesn't change that fast
There's no machine learning here. No neural networks. Just a well-structured sampling process on top of a physics-based speed model. That's a feature, not a limitation — the result is interpretable, auditable, and doesn't drift with retraining.
The Stochastic tab in the Voyage Scorer is available for any scored route. Pick your origin, destination, vessel type, load condition, and speed. Hit "Score Route." The deterministic result appears immediately; the stochastic distribution follows a second or two later, with per-segment breakdowns and percentile ranges you can actually use in a peer review.
If you're pricing marine risk and your current tool gives you a single ETA number with no uncertainty band — I'm curious how you think about the tail. What margin do you add? Do you have a rule of thumb, or is it gut feel?
Try the Stochastic Model
Score any port-to-port voyage with 500 Monte Carlo simulations. Fujairah to Djibouti. Singapore to Hamburg. Pick a route.
Open the Platform