Thermodynamic Machine Learning · MMXXVI
Experiment3.VI.MMXXVIRead 5 min

Exp 5 — Gap vs Multimodality: The Mechanism

Entry 6

A zero-compute, exactly-solvable RBM family shows the reversible kernel's observable gap collapsing as multimodality turns on, with reversibilization itself adding a multimodality-tracking slowdown — the mechanism behind exp4's τL\tau \propto L, computed with no chain and no burn-in.

The question

Exp4 left two readings of its τL\tau \propto L scaling unresolved: reading-(1), a genuine γeff0\gamma_{eff} \to 0 plateau, versus reading-(2), merely inadequate burn-in. Both are consistent with a chain that fails to decorrelate at scale. The discriminating move is to remove the chain entirely: build a substrate small enough to exact-diagonalize, read γeff\gamma_{eff} and τint\tau_{int} off the spectrum and resolvent, and ask whether the collapse survives when there is no burn-in left to blame.

The setup

Substrate: a bipartite RBM with energy E=vTWhE = -v^{\mathsf{T}} W h, planted multimodality of MM patterns (the exp2 substrate, the DTM-native block-Gibbs structure). Gradient observables are the couplings fij=vihjf_{ij} = -v_i h_j, which are Z2\mathbb{Z}_2-even.

The grid (frozen at b20b2cc, with the pre-run cluster-anchor correction 13d120f; ran 2026-06-03, laptop CPU, float64, 250 s, no GPU):

  • P1–P3 (primary): reversible single-site random-scan joint Gibbs, exact-diag, m{4,5,6}m \in \{4,5,6\} (N{8,10,12}N \in \{8,10,12\}) ×\times β{0.5,1,2,3}\beta \in \{0.5,1,2,3\} ×\times M{1,2,3,4}M \in \{1,2,3,4\} plus a ferro control = 60 cells; an m=7m=7 (N=14N=14) top-k scaling check (4 cells, not pass-gated).
  • P4 (secondary): cross-kernel — non-reversible block PfwdP_{fwd} (resolvent τint\tau_{int}) vs reversible Psym=12(Pfwd+Pfwd)P_{sym} = \tfrac{1}{2}(P_{fwd} + P_{fwd}^{*}) (resolvent and exact-diag), m{4,6}×β{1,3}×M{1,2,3,4}m \in \{4,6\} \times \beta \in \{1,3\} \times M \in \{1,2,3,4\} = 16 cells.

Sanity guards all passed: σ1=1\sigma_1 = 1 to <106<10^{-6}; stochastic/π\pi-stationary to 1016\sim 10^{-16}; reversibility residual 103\sim 10^{-3}10410^{-4} for PfwdP_{fwd} (non-reversible) and 1017\sim 10^{-17} for PsymP_{sym} (reversible). The decisive validation: resolvent τint\tau_{int} vs exact-diag on PsymP_{sym} agree to 6×10146 \times 10^{-14} (Guard b). The slow_obs_overlap 1020\sim 10^{-20}102310^{-23} confirms σ2\sigma_2 is Z2\mathbb{Z}_2-odd / observable-orthogonal — which is why the pre-run anchor was corrected from raw σ2\sigma_2 to the observable-relevant CC^{\perp}.

The result

P1 — gap collapse: NOT PASSED (strict); mechanism confirmed. γeff\gamma_{eff} collapses 1–2 orders as MM turns on at β2\beta \geq 2:

 m, β  | γ_eff(M=1)  M=2    M=3    M=4   | Csize(M=1→4)
 4,3.0 |   0.648   0.050  0.010  0.658  | 78→1→2→26
 5,3.0 |   0.648   0.006  0.025  0.629  | 80→1→2→47
 6,3.0 |   0.647   0.076  0.405  0.076  | 120→1→14→2
 6,2.0 |   0.626   0.194  0.416  0.229  | 120→2→15→3

The deepest is γeff=0.006\gamma_{eff} = 0.006 at m5-β\beta3-MM2 — a 100×\approx 100\times collapse. It is physical, not a cluster artifact: the raw observable gap γ\gamma_{\perp} collapses identically (m5-β\beta3: 0.5770.0060.0180.3730.577 \to 0.006 \to 0.018 \to 0.373). Csize reads out the mechanism: M=1M=1 gives a cluster \approx all modes (unimodal rank-1 coupling, no slow mode), while M=2,3M=2,3 give Csize=1–2, a clean A7-separated slow cluster.

Strict P1 fails for one well-understood reason: at M=4M=4 in small mm (rank-4 coupling in 4–6 dims \approx near-full-rank) the planted patterns over-saturate model capacity, basins merge, the slow mode vanishes, and γeff\gamma_{eff} recovers — breaking the registered monotone-in-MM criterion (γeff(M=4)/γeff(M=1)=0.118>0.1\gamma_{eff}(M{=}4)/\gamma_{eff}(M{=}1) = 0.118 > 0.1, not monotone). The m=7m=7 check corroborates this decisively: with more capacity, M=4M=4 stays collapsed (γeff=0.110\gamma_{eff} = 0.110, Csize=2) rather than recovering. The registered alternative outcome ("γeff\gamma_{eff} stays Ω(1)\Omega(1)") is also rejected.

P2 — off-cluster subdominance (O5.a): NOT PASSED (strict, 1/8); confirmed 7/8. With r=TOfull/TOC1r = T_O^{full}/T_O^{C} \to 1 iff the observable's τint\tau_{int} concentrates in the slow cluster: of the 8 cells with γeff0.10\gamma_{eff} \leq 0.10, 7 confirm (r=1.001r = 1.0011.0941.094). The one failure is again the M=4M=4 saturation cell (m6-β\beta3-MM4, r=1.273r = 1.273, dev 0.27>TOLP2=0.100.27 > \text{TOL}_{P2} = 0.10), where A7 itself degrades (observable mass spread over nobsrel=181n_{obs}^{rel} = 181 modes, not the 2-mode cluster). Half-Sokal convention throughout (TO=τagg/2T_O = \tau_{agg}/2).

P3 — sweet-spot map (no pass gate). With ESS=γeffK/2\text{ESS} = \gamma_{eff} K / 2, every primary cell has ESS@K=10003.17\text{ESS}@K{=}1000 \geq 3.17 (deepest: m5-β\beta3-MM2 \to ESS 3.2\approx 3.2). So even at deepest collapse a feasible K=1000K = 1000 reaches ESS 3\approx 3 (KτintK \gtrsim \tau_{int}, marginal); KτintK \gg \tau_{int} never strictly closes in the frozen grid.

P4 — A2-vs-mixing: PASSED. The reversible PsymP_{sym} is slower than the non-reversible PfwdP_{fwd} in 16/16 cells (τint\tau_{int} ratio 1.0011.0011.284>11.284 > 1), monotone-increasing in MM at β=1\beta=1 (1.0181.0281.0441.0671.018 \to 1.028 \to 1.044 \to 1.067), and largest in the cleanest-multimodal cell (β=3\beta=3, M=2M=2, ratio 1.2841.284, Csize=1). Reversibilization adds a mixing penalty that tracks the multimodal slow-mode structure — reproducing the exp3 (fast, non-reversible) \to exp4 (slow, reversible) contrast on an exactly-characterizable model. Resolvent\leftrightarrowdiag cross-check: max rel-err 6×10146 \times 10^{-14}.

Scope and caveats

This is mechanism-confirming, NOT a proof the DTM checkpoint sits in the plateau. Because γeff\gamma_{eff} here is exact-from-spectrum with no chain and no burn-in possible, the collapse cannot be a burn-in artifact — so the controlled substrate supports reading-(1) over reading-(2). But the honest ceiling holds: γeff\gamma_{eff} bottoms at 0.006\sim 0.006 (not 00), the sweet spot stays marginally open (ESS 3\approx 3), and the A2 penalty is 1128%28\% (not the DTM's unbounded τL\tau \propto L). The substrate is small (N14N \leq 14), controlled and planted, not a trained DTM-MNIST conditional — so this is mechanism, not prevalence or depth. Tracking is near-tautological on small families; no QopQstructQ_{op} \approx Q_{struct}^{\perp} factorization claim is made. The MM-knob is non-monotone in effective multimodality at small mm (capacity saturation); a clean monotone sweep needs larger mm, a feasible follow-up not re-run here. Propagation is Risk 5 / Risk 1 risk-ledger sharpening only — the operational claim stays [conjectured], with no tag flip in any outcome.


What this feeds: the literal reading-(1)-vs-(2) verdict on the real DTM chain still needs the at-scale earlier/less-trained-checkpoint τ\tau-sweep (GPU, credit-gated); exp5 sharpens the risk ledger and confirms the experiments/exp4 mechanism without reaching its depth.

— fin. —