Exp 5 — Gap vs Multimodality: The Mechanism · Thermodynamic Machine Learning

A zero-compute, exactly-solvable RBM family shows the reversible kernel's observable gap collapsing as multimodality turns on, with reversibilization itself adding a multimodality-tracking slowdown — the mechanism behind exp4's $\tau \propto L$ , computed with no chain and no burn-in.

The question

Exp4 left two readings of its $\tau \propto L$ scaling unresolved: reading-(1), a genuine $\gamma_{eff} \to 0$ plateau, versus reading-(2), merely inadequate burn-in. Both are consistent with a chain that fails to decorrelate at scale. The discriminating move is to remove the chain entirely: build a substrate small enough to exact-diagonalize, read $\gamma_{eff}$ and $\tau_{int}$ off the spectrum and resolvent, and ask whether the collapse survives when there is no burn-in left to blame.

The setup

Substrate: a bipartite RBM with energy $E = -v^{\mathsf{T}} W h$ , planted multimodality of $M$ patterns (the exp2 substrate, the DTM-native block-Gibbs structure). Gradient observables are the couplings $f_{ij} = -v_i h_j$ , which are $\mathbb{Z}_2$ -even.

The grid (frozen at b20b2cc, with the pre-run cluster-anchor correction 13d120f; ran 2026-06-03, laptop CPU, float64, 250 s, no GPU):

P1–P3 (primary): reversible single-site random-scan joint Gibbs, exact-diag, $m \in \{4,5,6\}$ ( $N \in \{8,10,12\}$ ) $\times$ $\beta \in \{0.5,1,2,3\}$ $\times$ $M \in \{1,2,3,4\}$ plus a ferro control = 60 cells; an $m=7$ ( $N=14$ ) top-k scaling check (4 cells, not pass-gated).
P4 (secondary): cross-kernel — non-reversible block $P_{fwd}$ (resolvent $\tau_{int}$ ) vs reversible $P_{sym} = \tfrac{1}{2}(P_{fwd} + P_{fwd}^{*})$ (resolvent and exact-diag), $m \in \{4,6\} \times \beta \in \{1,3\} \times M \in \{1,2,3,4\}$ = 16 cells.

Sanity guards all passed: $\sigma_1 = 1$ to $<10^{-6}$ ; stochastic/ $\pi$ -stationary to $\sim 10^{-16}$ ; reversibility residual $\sim 10^{-3}$ – $10^{-4}$ for $P_{fwd}$ (non-reversible) and $\sim 10^{-17}$ for $P_{sym}$ (reversible). The decisive validation: resolvent $\tau_{int}$ vs exact-diag on $P_{sym}$ agree to $6 \times 10^{-14}$ (Guard b). The slow_obs_overlap $\sim 10^{-20}$ – $10^{-23}$ confirms $\sigma_2$ is $\mathbb{Z}_2$ -odd / observable-orthogonal — which is why the pre-run anchor was corrected from raw $\sigma_2$ to the observable-relevant $C^{\perp}$ .

The result

P1 — gap collapse: NOT PASSED (strict); mechanism confirmed. $\gamma_{eff}$ collapses 1–2 orders as $M$ turns on at $\beta \geq 2$ :

 m, β  | γ_eff(M=1)  M=2    M=3    M=4   | Csize(M=1→4)
 4,3.0 |   0.648   0.050  0.010  0.658  | 78→1→2→26
 5,3.0 |   0.648   0.006  0.025  0.629  | 80→1→2→47
 6,3.0 |   0.647   0.076  0.405  0.076  | 120→1→14→2
 6,2.0 |   0.626   0.194  0.416  0.229  | 120→2→15→3

The deepest is $\gamma_{eff} = 0.006$ at m5- $\beta$ 3- $M$ 2 — a $\approx 100\times$ collapse. It is physical, not a cluster artifact: the raw observable gap $\gamma_{\perp}$ collapses identically (m5- $\beta$ 3: $0.577 \to 0.006 \to 0.018 \to 0.373$ ). Csize reads out the mechanism: $M=1$ gives a cluster $\approx$ all modes (unimodal rank-1 coupling, no slow mode), while $M=2,3$ give Csize=1–2, a clean A7-separated slow cluster.

Strict P1 fails for one well-understood reason: at $M=4$ in small $m$ (rank-4 coupling in 4–6 dims $\approx$ near-full-rank) the planted patterns over-saturate model capacity, basins merge, the slow mode vanishes, and $\gamma_{eff}$ recovers — breaking the registered monotone-in- $M$ criterion ( $\gamma_{eff}(M{=}4)/\gamma_{eff}(M{=}1) = 0.118 > 0.1$ , not monotone). The $m=7$ check corroborates this decisively: with more capacity, $M=4$ stays collapsed ( $\gamma_{eff} = 0.110$ , Csize=2) rather than recovering. The registered alternative outcome (" $\gamma_{eff}$ stays $\Omega(1)$ ") is also rejected.

P2 — off-cluster subdominance (O5.a): NOT PASSED (strict, 1/8); confirmed 7/8. With $r = T_O^{full}/T_O^{C} \to 1$ iff the observable's $\tau_{int}$ concentrates in the slow cluster: of the 8 cells with $\gamma_{eff} \leq 0.10$ , 7 confirm ( $r = 1.001$ – $1.094$ ). The one failure is again the $M=4$ saturation cell (m6- $\beta$ 3- $M$ 4, $r = 1.273$ , dev $0.27 > \text{TOL}_{P2} = 0.10$ ), where A7 itself degrades (observable mass spread over $n_{obs}^{rel} = 181$ modes, not the 2-mode cluster). Half-Sokal convention throughout ( $T_O = \tau_{agg}/2$ ).

P3 — sweet-spot map (no pass gate). With $\text{ESS} = \gamma_{eff} K / 2$ , every primary cell has $\text{ESS}@K{=}1000 \geq 3.17$ (deepest: m5- $\beta$ 3- $M$ 2 $\to$ ESS $\approx 3.2$ ). So even at deepest collapse a feasible $K = 1000$ reaches ESS $\approx 3$ ( $K \gtrsim \tau_{int}$ , marginal); $K \gg \tau_{int}$ never strictly closes in the frozen grid.

P4 — A2-vs-mixing: PASSED. The reversible $P_{sym}$ is slower than the non-reversible $P_{fwd}$ in 16/16 cells ( $\tau_{int}$ ratio $1.001$ – $1.284 > 1$ ), monotone-increasing in $M$ at $\beta=1$ ( $1.018 \to 1.028 \to 1.044 \to 1.067$ ), and largest in the cleanest-multimodal cell ( $\beta=3$ , $M=2$ , ratio $1.284$ , Csize=1). Reversibilization adds a mixing penalty that tracks the multimodal slow-mode structure — reproducing the exp3 (fast, non-reversible) $\to$ exp4 (slow, reversible) contrast on an exactly-characterizable model. Resolvent $\leftrightarrow$ diag cross-check: max rel-err $6 \times 10^{-14}$ .

Scope and caveats

This is mechanism-confirming, NOT a proof the DTM checkpoint sits in the plateau. Because $\gamma_{eff}$ here is exact-from-spectrum with no chain and no burn-in possible, the collapse cannot be a burn-in artifact — so the controlled substrate supports reading-(1) over reading-(2). But the honest ceiling holds: $\gamma_{eff}$ bottoms at $\sim 0.006$ (not $0$ ), the sweet spot stays marginally open (ESS $\approx 3$ ), and the A2 penalty is $1$ – $28\%$ (not the DTM's unbounded $\tau \propto L$ ). The substrate is small ( $N \leq 14$ ), controlled and planted, not a trained DTM-MNIST conditional — so this is mechanism, not prevalence or depth. Tracking is near-tautological on small families; no $Q_{op} \approx Q_{struct}^{\perp}$ factorization claim is made. The $M$ -knob is non-monotone in effective multimodality at small $m$ (capacity saturation); a clean monotone sweep needs larger $m$ , a feasible follow-up not re-run here. Propagation is Risk 5 / Risk 1 risk-ledger sharpening only — the operational claim stays [conjectured], with no tag flip in any outcome.

What this feeds: the literal reading-(1)-vs-(2) verdict on the real DTM chain still needs the at-scale earlier/less-trained-checkpoint $\tau$ -sweep (GPU, credit-gated); exp5 sharpens the risk ledger and confirms the experiments/exp4 mechanism without reaching its depth.