The theory behind DeepMind's weather AI models
xi(k) = embedding of node i at layer kN(i) = neighbors of node iφ = message function (typically MLP)⊕ = permutation-invariant aggregation (sum, mean, max)γ = update function (typically MLP)eji = edge features from j to iGraph nodes have no natural ordering. The same graph with nodes labeled [A,B,C] or [C,A,B] should produce the same output.
Sum/mean aggregation guarantees this:
sum([mA, mB, mC]) = sum([mC, mA, mB])
ᾱt = cumulative noise schedule (decreases with t)
The weather has structure! Not all 87M outputs are independent.
The 32D manifold captures the modes of variation in plausible weather states.
FGN is trained only on per-location CRPS (marginals). Yet it learns realistic joint distributions!
Why? The shared global noise z means the only way to minimize CRPS everywhere simultaneously is to output spatially coherent fields.