The Math Breakthrough

Renormalized Diffusion on Graphs

Old Cleora computed all walks exactly. New Cleora computes them — and then refuses to let them collapse the representation.

The Core Idea in One Sentence

New Cleora turns plain graph diffusion into a renormalized dynamical system. After each sparse propagation step, it re-centers and whitens the entire node-embedding cloud, counteracting the spectral collapse that usually makes deep message passing unusable.

Classical message passing mixes information. That is its strength, and also its failure mode. Repeated averaging pushes node representations toward a small number of dominant graph modes, causing oversmoothing, loss of rank, and eventual collapse toward trivial directions.

New Cleora introduces a mathematically decisive correction: interleaved whitening. Instead of merely propagating and row-normalizing, it periodically re-centers the node cloud and equalizes variance across all feature directions. The effect is striking: much deeper propagation becomes not only possible but genuinely informative, without backpropagation, trainable parameters, or GPU hardware.

Propagation explores the graph. Whitening preserves the subspace.

Not more parameters. Better geometry.

Singular value spectrum comparison: classical diffusion collapses toward a few dominant directions while interleaved whitening preserves a broad spectrum across iterations — Classical diffusion compresses the representation into a few dominant directions. Interleaved whitening repeatedly restores isotropy, keeping a larger informative subspace alive.

The Original Cleora Operator

Let $G = (V, E)$ be a graph with $n = |V|$ nodes. Let $d$ be the embedding dimension, $X_t \in \mathbb{R}^{n \times d}$ the node embedding matrix at iteration $t$, and $M$ a graph propagation operator — typically either:

Left Markov

$$M = D^{-1}A$$

Row-stochastic. Each row sums to 1. Equivalent to a random walk transition matrix.

Symmetric Markov

$$M = D^{-1/2} A\, D^{-1/2}$$

Preserves spectral properties. Eigenvectors form the graph's spectral embedding.

The original Cleora iteration is:

$$X_{t+1} = \mathcal{N}_{\text{row}}(M\, X_t)$$

where $\mathcal{N}_{\text{row}}$ performs row-wise $\ell_2$ normalization.

This is already powerful. A single sparse matrix multiplication aggregates all weighted one-step neighbor information simultaneously; repeating it accumulates multi-hop structure without random-walk sampling or gradient-based training. It is deterministic, parameter-free, and scales with sparse linear algebra rather than stochastic training.

But mathematically, the original iteration inherits the generic failure mode of repeated diffusion.

Why Classical Diffusion Collapses

If $M$ is a connected row-stochastic operator, then under mild conditions:

$$M^t \;\longrightarrow\; \mathbf{1}\,\pi^\top$$

where $\pi$ is the stationary distribution. Therefore:

$$M^t X_0 \;\longrightarrow\; \mathbf{1}\,(\pi^\top X_0)$$

This is a rank-one object: every node tends toward the same direction, modulated only by a global average. In graph ML language, this is the familiar oversmoothing phenomenon. The more one propagates, the more one loses node-level discrimination.

Row-wise $\ell_2$ normalization helps stabilize magnitudes and cosine geometry, but it does not fully solve the spectral problem. It rescales each row independently; it does not restore lost rank, reopen collapsed singular directions, or decorrelate features globally.

The classical tradeoff: more iterations bring more global structural context — but also more collapse toward the dominant graph modes. This is not a software bug. It is a mathematical inevitability of the diffusion operator itself.

New Cleora attacks this tradeoff at the source.

Effective rank vs iteration: classical diffusion loses usable dimensionality rapidly while interleaved whitening maintains high effective rank throughout propagation — In ordinary propagation, usable dimensionality collapses with depth. With interleaved whitening, the representation stays high-rank far longer — turning iteration count from a fragile hyperparameter into a meaningful scale parameter.

The New Recurrence: Propagation + Whitening

The new operator inserts a full-covariance whitening step between propagation stages. The recurrence becomes:

$$X_{t+1} = \mathcal{W}\!\left(\mathcal{N}_{\text{row}}\!\left(\big((1-\rho)\,M + \rho\, I\big)\, X_t\right)\right)$$

where $\rho \in [0,1)$ is an optional residual weight, $\mathcal{N}_{\text{row}}$ keeps each node vector on the unit sphere, and $\mathcal{W}$ is the whitening operator.

Sparse Markov Propagation

Multiply by the graph operator. Aggregate all walks of the current length in parallel.

Exact graph walk aggregation

↓

Row $\ell_2$ Normalization

Project each node vector back onto the unit sphere $S^{d-1}$. Preserve cosine geometry.

Local angular stabilization

↓

Full-Covariance Whitening

Center the node cloud. Compute the $d \times d$ covariance. Apply inverse square root. Return to isotropic position.

Global isotropy restoration

↺ repeat

The whitening operator, precisely

For a matrix $Y \in \mathbb{R}^{n \times d}$, define its centered version and feature covariance:

$$\mu = \frac{1}{n}\,\mathbf{1}^\top Y, \qquad \widetilde{Y} = Y - \mathbf{1}\mu^\top, \qquad \Sigma = \frac{1}{n-1}\,\widetilde{Y}^\top \widetilde{Y}$$

If $\Sigma = V \Lambda V^\top$ is the eigendecomposition of the feature covariance, then PCA whitening is:

$$\mathcal{W}(Y) = \widetilde{Y}\, V\, \Lambda^{-1/2}$$

This makes the output centered and approximately isotropic:

$$\frac{1}{n-1}\,\mathcal{W}(Y)^\top\, \mathcal{W}(Y) \;\approx\; I$$

This is not cosmetic normalization. It is a global geometric intervention that couples every feature dimension to every other feature dimension.

The One-Equation Punchline

Suppose the centered matrix has singular value decomposition:

$$\widetilde{Y} = U\, S\, V^\top$$

where $U \in \mathbb{R}^{n \times d}$ has orthonormal columns, $S$ is diagonal with singular values, and $V$ is orthogonal. Then:

$$\Sigma = \frac{1}{n-1}\, V S^2 V^\top$$

Therefore PCA whitening gives:

$$\mathcal{W}(Y) = U\, S\, V^\top \cdot V \left(\frac{S^2}{n-1}\right)^{-1/2} = \sqrt{n-1}\;\, U$$

The whitening step discards the singular values and keeps the orthogonal subspace basis. Diffusion tries to collapse the spectrum; whitening throws that collapse away and retains the geometry of the informative subspace itself.

This identity is mathematically beautiful and conceptually decisive. A plain diffusion step says: keep multiplying by $M$, let dominant modes win, let weak modes die. A whitened diffusion step says:

Propagate through the graph
Remove the common-mode drift
Equalize the surviving directions
Continue

So the new Cleora does not merely smooth more. It repeatedly asks: "After this diffusion step, which directions of variation across nodes still matter?" — and then rescales those directions so they remain usable in the next step.

Why This Is More Than “Just Another Normalization”

It is tempting to think of whitening as a cosmetic post-processing step. It is not. The distinction is fundamental.

Row Normalization (Local)

Treats each node independently:

$$X_{i,:} \leftarrow \frac{X_{i,:}}{\|X_{i,:}\|_2}$$

Preserves angular geometry at the node level. Does not couple feature dimensions. If two columns become nearly redundant, row normalization will not restore diversity.

Whitening (Global, Second-Order)

Uses the full covariance matrix $\Sigma$:

$$\mathcal{W}(Y) = \widetilde{Y}\, V\, \Lambda^{-1/2}$$

Every feature dimension is coupled to every other. Global anisotropy is measured and corrected. Collapsed directions are reopened. Redundant directions are decorrelated.

The original Cleora diffuses each coordinate mostly on its own. The new Cleora measures the geometry of the whole node cloud, then globally rebalances that geometry before the next diffusion step. This is the decisive mathematical innovation.

What centering removes

In repeated averaging systems, one of the dominant failure modes is drift toward a shared, nearly constant direction across all nodes. Centering explicitly removes that common-mode component before re-scaling:

$$\widetilde{Y} = Y - \mathbf{1}\mu^\top$$

New Cleora does not merely slow collapse; it actively projects away the most trivial part of the collapse.

Key insight: Whitening does not invent signal. It prevents existing signal from being buried under spectral imbalance.

Feature covariance matrix before and after whitening: strong anisotropy and correlations are transformed into near-identity covariance — Whitening is not cosmetic rescaling. It actively removes anisotropy and feature redundancy across the entire embedding cloud, transforming a highly correlated covariance into near-identity structure.

Spectral View: From Power Iteration to Orthogonal Iteration

Without whitening, repeated application of $M$ behaves like a power method:

$$X_t \;\approx\; M^t\, X_0$$

Whenever $M$ has a spectral gap, repeated multiplication increasingly favors dominant eigenmodes. This is the basic mathematical reason why deep diffusion oversmooths.

Interleaving whitening changes that picture profoundly. The new iteration becomes much closer to block power iteration / orthogonal iteration:

Power Iteration (Old Regime)

Keep pushing everything into the single easiest direction
Dominant modes win; weak modes die
Effective rank collapses monotonically
Depth is self-defeating

Orthogonal Iteration (New Regime)

Track a whole informative subspace
Diffusion identifies graph-coherent directions
Whitening removes anisotropy, resets the singular spectrum
Next step explores from a full-rank, isotropic state

In numerical linear algebra, this distinction is well understood: power iteration extracts the dominant mode and collapses everything else. Orthogonal iteration preserves a whole subspace. Interleaved whitening upgrades Cleora from one to the other.

Geometric View: A Product of Spheres Plus Isotropic Position

There is a clean geometric story underlying the new recurrence. The method alternates between two entirely different kinds of geometric constraints.

Local Geometry

Row normalization constrains each node embedding to unit norm. After each propagation step, every row lives on the sphere $S^{d-1}$. Across all nodes, the state lives on a product manifold:

$$(S^{d-1})^n$$

This is a node-local constraint: each individual vector is retracted back to the sphere.

Global Geometry

Whitening acts on the entire cloud of node embeddings at once. By centering and applying the inverse square root of the covariance, it places the cloud into isotropic position:

$$\frac{1}{n-1}\, X^\top X \;\approx\; I$$

This is a cloud-global constraint: the ensemble is shaped as a whole.

One full iteration of new Cleora can therefore be read as an alternating procedure between:

A graph-local averaging operator (propagation)
A node-local norm constraint (row normalization)
A global feature-space isotropy constraint (whitening)

That is why the method feels qualitatively different from a standard MPNN layer stack. It is not just aggregating messages. It is alternating between propagation and geometric correction.

Geometric transformation: an elongated anisotropic embedding cloud is transformed into an isotropic circular distribution by whitening — Row normalization is a local spherical constraint. Whitening is a global isotropy constraint. The elongated ellipsoid of a diffused embedding cloud is reshaped into isotropic position before the next propagation step.

Physics View: Renormalized Diffusion

There is a useful physics metaphor that makes the dynamics immediately intuitive.

Diffusion is a coarse-graining operation: it smooths away high-frequency detail, suppresses local noise, amplifies low-frequency global modes.
Whitening is a renormalization step: it rescales the surviving degrees of freedom so they remain comparable and usable.

In ordinary diffusion, the flow drifts toward a trivial fixed point. In new Cleora, each whitening step says: not yet.

It subtracts the collapsed mean mode, rescales the remaining directions, and sends the system back into the next diffusion step with restored representational balance. This is why the method can be described as renormalized diffusion on graphs.

The innovation is not more message passing. The innovation is what happens between message-passing steps.

Why Deep Iteration Becomes Meaningful

The original Cleora already made a key observation: a few propagation steps can capture local and medium-range structure extremely well. But eventually classical diffusion saturates. Interleaved whitening changes what "more iterations" means.

Old Regime

More iterations mostly increase smoothness
Eventually all nodes start to look alike
Depth becomes self-defeating
Iteration count is a fragile hyperparameter

New Regime

Each iteration still aggregates higher-order walk information
Whitening repeatedly removes drift and restores isotropy
Later steps keep revealing new higher-order structure
Iteration count becomes a meaningful scale parameter

Whitening frequency as a new control axis

Interleaved whitening introduces a genuinely new degree of algorithmic control:

Whiten every step: strongest anti-collapse behavior
Whiten every $q$ steps: allows some diffusion accumulation between reconditioning steps
Final whitening only: improves output conditioning but does not fundamentally change the trajectory

The user can tune not just propagation depth, but the rhythm of renormalization. Combined with the optional residual weight $\rho$, this creates a rich family of parameter-free graph embedding dynamics:

$$((1-\rho)\,M + \rho\, I)\, X_t$$

The residual term lets Cleora interpolate between pure graph diffusion and persistence of the current representation, further stabilizing very deep iteration regimes.

Mean pairwise cosine similarity vs iteration: classical diffusion rapidly approaches 1.0 (total collapse) while interleaved whitening keeps similarity low, resisting oversmoothing — Oversmoothing appears as rising pairwise similarity — all nodes becoming indistinguishable. Classical diffusion rapidly crosses the collapse threshold. Interleaved whitening keeps embeddings diverse across far more iterations.

Why This Remains Scalable

A critical practical point: whitening happens in feature space, not graph space.

Propagation Cost

$$O(|E|\, d)$$

Sparse matrix multiplication. Scales with the number of edges, not nodes squared.

Whitening Cost

$$O(n\, d^2 + d^3)$$

Works through a $d \times d$ covariance matrix. The dense step scales with embedding dimension, not node count.

The graph itself may have millions of nodes, but the covariance is only $d \times d$. The difficult dense linear algebra happens over the embedding dimension, not over the number of nodes. Conceptually, the algorithm is still sparse-graph-first. It adds a small dense correction in feature space to guide a very large sparse propagation in node space.

The new Cleora preserves every property of the original philosophy:

Sparse linear algebra over huge graphs
Fully deterministic computation
Strong CPU efficiency
No training loop, no negative sampling, no GPU dependency

But it adds a much richer geometry to the iteration itself.

Permutation Equivariance Is Preserved

A global operation often raises an obvious concern: does whitening break the symmetry that graph models should respect? It does not.

If the rows of $Y$ are permuted by a permutation matrix $P$, then $Y \mapsto PY$. The row mean is unchanged (averaging over rows is permutation-invariant), and the covariance is also unchanged:

$$(P\widetilde{Y})^\top (P\widetilde{Y}) = \widetilde{Y}^\top P^\top P\, \widetilde{Y} = \widetilde{Y}^\top \widetilde{Y}$$

Therefore the whitening transform is the same, and:

$$\mathcal{W}(PY) = P\,\mathcal{W}(Y)$$

So whitening is global but still permutation-equivariant. The method gains a genuinely nonlocal correction without sacrificing one of the central symmetries of graph representation learning.

What This Means for Graph ML

The standard deep-learning narrative says message passing has a depth problem: too few iterations means not enough context; too many means oversmoothing and rank collapse. New Cleora suggests a different path entirely.

The problem is not message passing itself. The problem is letting contraction accumulate without any global geometric correction. Interleaved whitening introduces precisely that correction.

What becomes newly plausible

Deeper parameter-free propagation without immediate collapse
Global covariance steering without attention mechanisms
Spectral balancing without trainable filters
Subspace-preserving diffusion instead of blunt smoothing

The broader principle: After every operator that mixes information, restore isotropy before mixing again. This principle is immediately relevant to MPNNs, deep graph diffusion, graph transformers, and even transformer language models, where the attention matrix itself is a graph-like propagation operator.

A scientifically careful statement about WL-style limits

New Cleora does not refute theorems about anonymous message passing. It changes the regime. Cleora uses deterministic identity-based initialization rather than anonymous all-equal node features. Interleaved whitening is a global full-covariance operator that couples feature dimensions. The whole pipeline remains permutation-equivariant, but it is no longer the usual "independent local aggregator per channel" picture.

So the correct claim is not that "WL theory is wrong." The correct claim is stronger in practice and cleaner in theory:

Interleaved whitening moves Cleora outside the narrow regime where the usual depth-collapse intuition dominates. This is not a denial of theory, but a construction that steps around the assumptions that made the old pessimism seem inevitable.

What is genuinely new

For a mathematical or ML audience, the novelty can be stated cleanly:

Cleora was already an exact, deterministic sparse propagation algorithm.
The new Cleora adds full-covariance whitening inside the recurrence, not merely as post-processing.
That converts a collapsing diffusion into an orthogonalized, renormalized multi-step graph embedding flow.
The method remains parameter-free, deterministic, sparse, and CPU-friendly.

The Math Breakthrough

The Core Idea in One Sentence

The Original Cleora Operator

Left Markov

Symmetric Markov

Why Classical Diffusion Collapses

The New Recurrence: Propagation + Whitening

Sparse Markov Propagation

Row \(\ell_2\) Normalization

Full-Covariance Whitening

The whitening operator, precisely

The One-Equation Punchline

Why This Is More Than “Just Another Normalization”

Row Normalization (Local)

Whitening (Global, Second-Order)

What centering removes

Spectral View: From Power Iteration to Orthogonal Iteration

Power Iteration (Old Regime)

Orthogonal Iteration (New Regime)

Geometric View: A Product of Spheres Plus Isotropic Position

Local Geometry

Global Geometry

Physics View: Renormalized Diffusion

Why Deep Iteration Becomes Meaningful

Old Regime

New Regime

Whitening frequency as a new control axis

Why This Remains Scalable

Propagation Cost

Whitening Cost

Permutation Equivariance Is Preserved

What This Means for Graph ML

What becomes newly plausible

A scientifically careful statement about WL-style limits

What is genuinely new

Further Reading