500 simulations, one voyage: how Monte Carlo turns ETA guesses into probability

A Suezmax tanker, laden, burning VLSFO at 14 knots. Fujairah to Djibouti, 2,222 nautical miles through the Gulf of Aden with Bab el-Mandeb disrupted and 369 active NAVWARNs scrolling across the ticker. The deterministic model says: about 6.6 days.

That number is wrong. Not because the math is bad — the speed model is fine. It's wrong because it pretends the weather at each waypoint is known exactly. It's wrong because a single forecast collapses a range of possible sea states into one value. It's wrong because an underwriter who writes a policy against a point estimate is underpricing the tail.

So we don't give you one number. We give you 500.

The problem with deterministic ETAs

Every voyage ETA calculator works the same way. Take the route distance. Divide by service speed. Adjust for sea state using some Beaufort-to-speed-loss table. Output a number with false precision: "159.4 hours."

The issue isn't the adjustment — it's that the weather input is a single forecast. Open-Meteo's marine API (what we use) gives you wave height, wind speed, and swell height at a given lat/lon for a given time. Good data. But it's one scenario out of many possible ones.

How wrong can a single forecast be? Here's the empirical uncertainty by forecast lead time:

Lead Time	Relative Error	Wave Height Sigma (1.5m base)	Wind Speed Sigma (15kt base)
0h (nowcast)	15%	0.30m	3.0 kt
24h	20%	0.30m	3.0 kt
48h	25%	0.38m	3.2 kt
72h	30%	0.45m	3.8 kt
5d	40%	0.60m	5.1 kt
7d+	50%	0.75m	6.4 kt

For a 2,222nm voyage at 14 knots, segment 1 departs with a 15% uncertainty window. By segment 5 or 6 — days into the future — that window has opened to 40% or more. The tail segments of a voyage are fundamentally uncertain. A deterministic model ignores this. Monte Carlo doesn't.

What Monte Carlo actually does

Monte Carlo simulation is a brute-force way to answer the question: "If I sailed this voyage 500 times with slightly different weather each time, what would the distribution of arrival times look like?"

Here's the algorithm, stripped to its core:

for each simulation (1 to 500):
  1. Generate a correlated random perturbation for each route segment
  2. Apply the perturbation to the base weather forecast
  3. Compute effective speed per segment using the perturbed weather
  4. Sum all segment transit times → total voyage hours

Sort all 500 total-hours values.
Extract percentiles: P10, P25, P50, P75, P90.

The P50 is your median — the most likely outcome. P10 is the lucky scenario (calm weather, favorable conditions). P90 is the bad case — rough seas across multiple segments. The gap between P10 and P90 is the honest answer about how uncertain this voyage really is.

Why 500 simulations?

Percentile estimates stabilize quickly. At 100 simulations, the P50 is stable to within ~1 hour for a typical 6-day voyage. At 500, the P90 — which is the trickier tail to pin down — stabilizes to within ~30 minutes. Going to 10,000 would give marginal improvement but take 20x longer. 500 is the pragmatic sweet spot: runs in under 2 seconds, percentiles you can trust.

Step 1: Correlated perturbations via AR(1)

The naive approach is to generate independent random weather at each segment. Roll the dice six times, get six random sea states. This is physically wrong.

Weather systems span hundreds of miles. If segment 3 hits a storm, segment 4 almost certainly hits the same storm. Treating them as independent events would make the distribution unrealistically narrow — bad weather in one segment would be "cancelled out" by good weather in the next, averaging away the extremes.

We use an AR(1) (first-order autoregressive) process with a correlation coefficient of 0.85:

perturbation[0] = randn()                        // standard normal
perturbation[i] = 0.85 * perturbation[i-1]
                + sqrt(1 - 0.85²) * randn()       // innovation term

The 0.85 means: if segment 3 draws a +2 sigma perturbation (bad weather), segment 4 will center around +1.7 sigma. Still bad, but decaying. By segment 7 or 8, the correlation has washed out and conditions are effectively independent of that original storm.

This matches physical reality. A 500nm weather system affects 2-3 consecutive route segments, then its influence fades. The sqrt(1 - rho²) innovation term (about 0.527 when rho=0.85) injects fresh randomness at each step, preventing the series from being completely deterministic.

Route & Threats tab showing Fujairah to Djibouti route map with JWC zones and threat overlays — The Route & Threats view of our Fujairah→Djibouti voyage — 6 waypoints through JWC listed areas with a risk score of 44. Each segment between waypoints gets its own weather perturbation, correlated with its neighbors.

Step 2: Perturbing the forecast

Each route segment has a base weather forecast from Open-Meteo: wave height, wind speed, swell height. The perturbation (a z-score from the AR(1) process) gets applied to each:

// Lead-time-dependent uncertainty scaling
uncertaintyScale = 0.15 + 0.05 * min(leadTimeDays, 7)

// Per-variable sigma with minimum floors
waveSigma  = max(0.3m,  baseWaveHeight * uncertaintyScale)
windSigma  = max(3 kt,  baseWindSpeed  * uncertaintyScale * 0.85)
swellSigma = max(0.2m,  baseSwellHeight * (uncertaintyScale + 0.05))

// Apply the z-score
perturbedWave  = max(0, baseWave  + zScore * waveSigma)
perturbedWind  = max(0, baseWind  + zScore * windSigma)
perturbedSwell = max(0, baseSwell + zScore * swellSigma)

Two details matter here.

Lead-time scaling. The first segment's weather is a nowcast — 15% uncertainty. Segment 5, three days from departure, gets 30%. This widening cone of uncertainty is the single biggest reason the P10-P90 spread exists.

Minimum sigma floors. The formula baseWaveHeight * uncertaintyScale would give zero sigma in calm seas (wave height = 0). But "calm right now" doesn't mean "guaranteed calm" — light conditions can shift to moderate ones faster than heavy weather dissipates. The floors (0.3m wave, 3kt wind, 0.2m swell) prevent unrealistically tight distributions in fair weather.

Honest about the limitations

Open-Meteo gives solid deterministic forecasts but does not provide ensemble spread data. Our uncertainty model is synthetic — we estimate how much a forecast might be wrong based on historical verification statistics, not from running 50 atmospheric model members. This matters most in the P90 tail: our synthetic uncertainty may underestimate the true probability of extreme weather events. We flag this with a disclaimer in every stochastic result. ECMWF ensemble integration is planned — when it arrives, the P90 values will get more honest.

Step 3: Beaufort speed curves

This is where the perturbed weather becomes a transit time. For each segment, we:

Convert wave height, wind speed, and swell into an effective Beaufort number
Look up the speed reduction factor for the vessel type and load condition
Apply it to the service speed

The speed curves come from IMO-documented voluntary speed reduction data (Kwon 2008, Lu et al. 2015). They're not linear. A laden tanker at Beaufort 4 barely notices — maybe 3% speed loss. At Beaufort 6, it's 12-15%. At Beaufort 8, it's 30%+. The relationship is roughly cubic: small increases in sea state cause disproportionate speed loss.

14.0 kt

Beaufort 3 (service speed)

12.6 kt

Beaufort 5 (-10%)

11.2 kt

Beaufort 7 (-20%)

9.1 kt

Beaufort 8+ (-35%)

Laden tanker effective speed at 14kt service speed across Beaufort conditions

This non-linearity is why Monte Carlo matters. If you average the weather and compute one ETA, you get the wrong answer. A 50/50 chance of Beaufort 4 or Beaufort 8 doesn't average to Beaufort 6 performance — the Beaufort 8 half costs disproportionately more time. Jensen's inequality at sea.

Step 4: Summing segment times

Each simulation produces a transit time for each segment. Sum them. That's one simulated voyage duration.

Do it 500 times. Sort the results. Read off the percentiles.

For the Fujairah→Djibouti route in the screenshot — 2,222nm, tanker, laden, 14kt — the AI briefing generated from the stochastic model reports a 10% probability of arriving 157 hours or more behind schedule. That P90 didn't come from someone's guess. It came from 500 simulations, each one drawing weather from a correlated perturbation field, running it through Beaufort speed curves, and summing segment times.

KPI bar showing Risk Score 44, Distance 2,222nm, Fuel 366.6 MT, CO2 1,155.2 MT, CII Rating C, Confidence 100% — The KPI bar for our scored voyage: Risk Score 44, 2,222nm, 366.6 MT fuel, 1,155.2 MT CO2, CII Rating C. The Stochastic tab (visible in the tab bar) holds the full Monte Carlo distribution.

What the output looks like

The stochastic model returns five percentiles plus summary statistics:

Percentile	Meaning	What it tells an underwriter
P10	10% of simulations arrived faster	Best-case scenario. Don't plan around this.
P25	25th percentile	Optimistic but plausible.
P50	Median outcome	Your best single estimate. Use this for scheduling.
P75	75th percentile	Build contingency around this for fuel planning.
P90	Only 10% of simulations were worse	Your stress test. If the policy still works at P90, it's sound.

The P10-P90 range is the information product. A tight range (say, P10=155h, P90=165h) means the voyage is weather-predictable — probably a short hop in a stable ocean region. A wide range (P10=130h, P90=200h) means significant uncertainty, usually because the route crosses storm-prone regions or extends past the reliable forecast horizon.

The distribution is the honest answer. A single ETA is just the middle of a range you're choosing not to show.

Per-segment uncertainty breakdown

The model doesn't just give you a voyage total — it breaks uncertainty down by segment. This tells you where the risk lives. For a Gulf of Aden transit, the uncertainty is often concentrated in the open-water segments approaching the Bab el-Mandeb, where monsoon conditions drive significant wave variability, while the final approach to Djibouti (shorter segment, calmer coastal waters) has a much tighter distribution.

Each segment reports its own P10-P90 transit time range, mean effective speed, and mean weather impact factor. If one segment is dominating the voyage uncertainty, you know exactly where.

Voyage Scorer showing the scored route with risk breakdown and route summary — The Risk Overview tab: risk score gauge, route summary showing AEFJR→DJJIB at 2,222nm, and the risk exposure panel with JWC Listed Areas scoring 100. The Stochastic tab next to Route & Threats holds the Monte Carlo results.

Why this matters for underwriting

Marine underwriters price delay risk. A cargo policy with a specific arrival window has a quantifiable exposure when P90 is 6.5 days instead of 6.6 days. But the current state of the art at most desks is: "takes about a week, maybe a bit more if weather's bad." That's not risk assessment. That's hope.

The Monte Carlo distribution gives you:

A defensible P90 for stress testing — "there's a 10% chance this voyage takes more than X hours" is a statement you can use in a peer review
Fuel consumption ranges — the 366.6 MT fuel estimate for our Fujairah→Djibouti tanker is the deterministic value; the P90 fuel burn is higher because the vessel spends more time at sea
CII compliance margins — a CII Rating of C under deterministic conditions might slip to D at P90 weather if the vessel burns more fuel over more hours
A basis for comparing routes — Route A might have a lower P50 but a wider P10-P90 spread than Route B; which one you prefer depends on your risk appetite

Seasonal weather multipliers

The stochastic model incorporates seasonal climatological baselines. We maintain multiplier tables for 11 ocean regions across all 12 months — data from IMO Circular MSC.1/Circ.1228, NOAA hurricane tracks, JMA typhoon climatology, and Indian Met Dept monsoon dates. The Red Sea/Gulf of Aden region has a multiplier of 1.4 in July-August (SW monsoon peak) and 0.8 in April-May (calm inter-monsoon). These multipliers inflate or deflate the perturbation sigmas, so the Monte Carlo distribution properly reflects seasonal risk — even when the 72h forecast for next Tuesday happens to look calm.

The code, briefly

The core simulation is surprisingly compact. The AR(1) perturbation generator is 8 lines. The weather perturbation function is 10. The simulation loop itself is a for-loop that calls the speed model 500 × numSegments times, collects an array of total hours, sorts it, and reads percentiles.

The real engineering is in the details that make it trustworthy:

Box-Muller transform for generating standard normal variates from uniform random numbers
Minimum sigma floors that prevent zero-uncertainty in calm conditions
Lead-time-dependent scaling that widens the cone as segments project further into the future
Single z-score per segment applied to all three weather variables (wave, wind, swell) — because high waves and high winds co-occur; treating them independently would allow simulations with 4m waves and 5kt wind, which isn't physical
Results cached in Postgres with a 2-hour TTL — stochastic runs are expensive, weather doesn't change that fast

There's no machine learning here. No neural networks. Just a well-structured sampling process on top of a physics-based speed model. That's a feature, not a limitation — the result is interpretable, auditable, and doesn't drift with retraining.

The Stochastic tab in the Voyage Scorer is available for any scored route. Pick your origin, destination, vessel type, load condition, and speed. Hit "Score Route." The deterministic result appears immediately; the stochastic distribution follows a second or two later, with per-segment breakdowns and percentile ranges you can actually use in a peer review.

If you're pricing marine risk and your current tool gives you a single ETA number with no uncertainty band — I'm curious how you think about the tail. What margin do you add? Do you have a rule of thumb, or is it gut feel?

Try the Stochastic Model

Score any port-to-port voyage with 500 Monte Carlo simulations. Fujairah to Djibouti. Singapore to Hamburg. Pick a route.

Open the Platform