Geometric Deep Learning | DeepMind Weather AI

1. The Fundamental Blueprint

The core insight of Geometric Deep Learning is that successful neural network architectures can be derived from first principles using three foundational concepts. This provides a unified framework for understanding CNNs, GNNs, and Transformers as special cases of the same geometric blueprint.

Ω

Domain

The geometric structure underlying your data (grids, graphs, manifolds)

G

Symmetry Group

Transformations that should not change the output (translations, rotations, permutations)

X

Signal Space

The space of functions/features defined on the domain

The Blueprint Formula

Neural network layers should be designed to be equivariant to the symmetry group:

Equivariance: If you transform the input, the output transforms in a predictable, corresponding way
Invariance: The final output (for classification) should be unchanged by symmetry transformations

2. The 5Gs: Five Geometric Domains

Geometric Deep Learning categorizes domains into five geometric categories (the "5Gs"), each with distinct symmetry groups and corresponding architectures:

Domain	Symmetry Group	Architecture	Applications
Grids	Translation	CNNs	Images, video
Groups	Group elements	Group-equivariant CNNs	Rotational data
Graphs	Permutation	GNNs, Message Passing	Molecules, social networks
Geodesics	Isometries	Geometric CNNs	3D shapes, meshes
Gauges	Gauge transformations	Gauge-equivariant networks	Particle physics, manifolds

3. Building GNNs: Message Passing Framework

Core Message Passing Neural Network (MPNN) Formula

h_i^(k+1) = φ( x_i, ⊕_j∈N(i) ψ(x_i, x_j, e_ij) )

Where:

ψ = message function (computes messages between node pairs)
⊕ = aggregation function (sum, mean, max, or attention)
φ = update function (MLP that combines node state with aggregated messages)

Three Flavors of GNN Layers

1. Convolutional (least expressive)

Fixed, pre-computed attention weights based on graph topology. Best for homophilous graphs (similar nodes connect). Most scalable via sparse matrix multiplication.

2. Attentional (medium)

Learned, feature-dependent attention weights. Can handle heterophilous graphs. Examples: GAT, Transformers.

3. Message Passing (most expressive)

Arbitrary functions of node pairs and edge features. Maximum flexibility but highest computational cost. Examples: MPNN, EdgeConv.

4. Key Design Choices

Aggregation Operations

Operation	Trade-offs
Sum	Default choice, preserves multiset info. Sensitive to outliers.
Mean	Normalized view, variable neighborhoods. Loses count info.
Max	Highlights salient features. Loses multiset info.
Attention	Learns importance dynamically. More parameters, slower.

Number of Layers (Depth)

Each layer expands receptive field by 1 hop
Too few layers: Cannot capture long-range dependencies
Too many layers: Over-smoothing (all node representations become similar)
Typical sweet spot: 2-4 layers for many tasks

Prediction Task Levels

Node-level: Classifier on final node embeddings
Edge-level: Pool node pair or use edge embeddings
Graph-level: Global pooling over all nodes

5. Equivariance: The Design Principle

Why Equivariance Matters

Data Efficiency

Doesn't need to learn the same function for all transformed versions

Generalization

Built-in invariances prevent overfitting to spurious correlations

Physical Correctness

Respects known symmetries of the problem domain

Implementing Equivariance

Permutation equivariance (graphs):
- Use symmetric aggregation functions
- Share parameters across all nodes/edges

Translation equivariance (grids):
- Use convolutions (weight sharing across positions)

Rotation/reflection equivariance E(3) for 3D:
- Use spherical harmonics and tensor products
- Examples: NequIP, MACE, PaiNN for molecular simulations
                    

6. Overcoming GNN Limitations

Problem	Solutions
Over-smoothing	Skip connections, DropEdge, normalization
Over-squashing	Graph rewiring, virtual nodes
Limited expressivity	Higher-order WL tests, subgraph methods
Long-range dependencies	Graph Transformers, virtual edges

Beyond Standard Message Passing

Cellular/Simplicial complexes: Go beyond pairwise relationships
Sheaf neural networks: Heterogeneous information flow
Neural algorithmic reasoning: GNNs that execute algorithms

7. Application Domains

Domain	Key Architecture Features
Molecular property prediction	Invariant to atom permutation, rotation-equivariant for 3D
Protein structure (AlphaFold)	SE(3)-equivariant attention, multi-scale
Drug discovery	Message passing on molecular graphs
Traffic prediction	Spatio-temporal GNNs
Weather forecasting	Icosahedral mesh GNNs (GraphCast), diffusion models (GenCast)
Physics simulation	Equivariant to physical symmetries

8. Key Takeaways for Practitioners

1

Start with symmetries: Always ask "what transformations shouldn't matter?"

2

Use the simplest architecture that respects symmetries: Don't over-engineer

3

Leverage pre-built components: PyG has most architectures implemented

4

Consider expressivity vs. efficiency trade-off: More expressive isn't always better

5

Test on appropriate benchmarks: Use domain-specific datasets

6

Watch for over-smoothing: Monitor representation similarity across layers

7

Global context helps: Add master/virtual nodes for graph-level communication

9. Implementation Guidelines

Step-by-Step Process

Identify the domain: What is the natural structure of your data?
Identify symmetries: What transformations shouldn't change predictions?
Choose signal representation: Features on nodes, edges, globally?
Design equivariant layers: Message passing respecting symmetries
Add pooling/readout: Map to predictions while maintaining invariance
Stack layers with scale separation: Multi-resolution representations

Recommended Libraries

PyTorch Geometric (PyG)

Most comprehensive, production-ready

Deep Graph Library (DGL)

Framework-agnostic

Jraph

JAX-based, good for research

e3nn

For E(3)-equivariant networks

10. Further Learning Resources

Lecture videos: AMMI course on YouTube
Interactive playground: Distill.pub GNN introduction
GDL Book: geometricdeeplearning.com

Core Insight

"The most successful deep learning architectures (CNNs, GNNs, Transformers) are all special cases of the same geometric blueprint."