Thermodynamic Machine Learning · MMXXVI
Foundations13.VI.MMXXVIRead 4 min

Mixing, Expressivity, and the Barren-Plateau Bridge

Entry 12

The thermodynamic trainability quantity QQ dies the way a quantum loss landscape goes barren — same shape, different machine — and saying exactly how far that analogy reaches is the whole point of this entry.

This is a synthesis entry, not a new result. It assembles two artifacts — synthesis/trainability-theorem.md (the corollary block) and synthesis/parent-bridges.md (the cross-parent map) — into one statement: the Mixing–Expressivity Tradeoff (MET) is a product of independently-fatal factors, and that product is the EBM cousin of the quantum barren plateau. Here we keep the claim-status discipline visible.

The question

Given the factorization Q(γK/2)R(θ)Q \asymp (\gamma K / 2)\cdot R(\theta) — gradient SNR-squared as effective sample size times signal-to-slow-fluctuation ratio — when does Q0Q \to 0? And: is "the gradient landscape went flat" the same failure that quantum machine learning calls a barren plateau, or only a rhyme?

The setup / method

We work from one reverse-process EBM layer of a Denoising Thermodynamic Model, sampled by a reversible Gibbs kernel PθP_\theta. The operational object is Qop:=g2/Eg^g2Q_{op} := \|g\|^2 / E\|\hat g - g\|^2 — true-gradient power over estimator MSE. The corollary reads the factorization as a Ragone-shaped expression: Q0Q \to 0 (the plateau) iff any factor collapses.

The result — three independently-fatal factors

The MET, stated as a theorem-shape, is the product of:

  1. γ0\gamma \to 0 exponentially — spectral-gap collapse, σ21\sigma_2 \to 1. This is the MET named as a mechanism: a monolithic EBM expressive enough to fit the data becomes exponentially slow to sample (the DTM paper's core motivation, §I–II + App. B). Quantum analogue: exponential dim(g)\dim(\mathfrak{g}) blow-up (reachability/expressivity), where g\mathfrak{g} is the QML dynamical Lie algebranot the gradient gg.
  2. R0R \to 0 — the gradient signal is swamped by the equilibrium fluctuations of aE\partial_a E along the slow modes; the signal is not where the sampler can resolve it. Quantum analogue: the vanishing g-purity product Pg(ρ)Pg(O)P_{\mathfrak{g}}(\rho)\cdot P_{\mathfrak{g}}(O).
  3. Budget starvation — too few total Gibbs steps. Small burn-in B1/γB \lesssim 1/\gamma leaves an O(1)O(1) bias (σ2B=(1γ)B\sigma_2^B = (1-\gamma)^B); small window KK leaves ESS=γK/(2wa)\mathrm{ESS} = \gamma K / (2 w_a) too small. Via the DTM energy model E=TKmixL2EcellE = T\cdot K_{mix}\cdot L^2 \cdot E_{cell} (App. E), this total-steps axis is literally the energy-per-sample axis. No quantum analogue.

Each factor is independently sufficient to kill QQ; all three are needed for trainability. The corollary's tag is conjectured.

After exp1+exp2 (experiments/exp1-exact-diag/, experiments/exp2-thrml-smoke/) the first two factors are read in their observable-projected form: factor 1 with γγeff\gamma \to \gamma_{eff} over the gradient-relevant mode set C(O)C^*(O), factor 2 with RReffR \to R_{eff}. The three-factor shape is unchanged; the quantities are the corrected, projected ones.

The bridge — what transfers, what does not

The cross-parent table maps factor-by-factor:

| Quantum (QML) | Classical EBM/DTM (here) | |---|---| | g-purity product Pg(ρ)Pg(O)P_{\mathfrak{g}}(\rho)\cdot P_{\mathfrak{g}}(O) | R(θ)R(\theta) — signal-to-slow-fluctuation | | 1/dim(g)1/\dim(\mathfrak{g}) — reachability | spectral gap γ=1σ2\gamma = 1 - \sigma_2 | | barren plateau Var[C]0\mathrm{Var}[\partial C]\to 0 | thermodynamic plateau Q0Q \to 0 |

So Q(γK/2)RQ \asymp (\gamma K/2)\cdot R is the EBM analogue of Ragone's loss-variance Var[θ]=jPgj(ρ)Pgj(O)/dim(gj)\mathrm{Var}[\ell_\theta] = \sum_j P_{g_j}(\rho)P_{g_j}(O)/\dim(\mathfrak{g}_j).

What DOES transfer is exactly two things: the phenomenon (a gradient SNR that survives or collapses exponentially with system size) and the theorem shape (a product of independently-fatal factors).

Scope & caveats — the sharpest contrast

It does NOT literally transfer. There is no Lie algebra in an EBM: Pg(ρ)P_{\mathfrak{g}}(\rho), dim(g)\dim(\mathfrak{g}), and the DLA decomposition are quantum-circuit objects with no EBM counterpart. Do not write EBM quantities as if they were g-purities.

The mechanism is different. The quantum barren plateau is signal extinction: the true gradient variance itself vanishes, Var[C]0\mathrm{Var}[\partial C]\to 0. The thermodynamic plateau is an estimation failure: QopQ_{op} collapses because the estimator MSE swamps a still-nonzero gg — the recoverability of gg dies, not gg. Same phenomenology (SNR collapse), different machine. The Q-program's honesty hinges on keeping that distinction visible.

This entry asserts no tag flip. The corollary is conjectured; the bridge is explicitly a shape analogy, not a literal transfer, per parent-bridges.md. Nothing here is proven-here or validated.


What this feeds: the observable-projected predictor QstructQ_{struct}^{\perp} — the corrected, differentiable-in-JJ target that turns this shape into something computable without training to convergence.

Sources

  • Jelinčič et al. 2025, Denoising Thermodynamic Models, arXiv:2510.23972 (Eq. 14, §IV, Fig. 5b, App. B/E/G).
  • Ragone et al. 2024, the loss-variance factorization (DLA dimension · state purity · observable purity) the corollary is shaped after.
— fin. —