Thermodynamic Machine Learning · MMXXVI
Experiment20.VI.MMXXVIRead 4 min

Exp 19 — Hotter Top: Decorrelates, Then the Cost Wall

Entry 24

A hotter top cures the temperature obstruction exp18 left open, but the cheapest decorrelating ladder costs more rungs than we are allowed — so the wall moves from overlap to thermodynamic length.

This is the complete technical record for experiments/exp19-hotter-top-first/. Here we keep the grid, the gates, and the claim-status discipline. Outcome HOT-TOP-DECORRELATES-COSTWALL. MEASURE-ONLY — no tag moves.

The question

exp18 left two suspects for why a reversible parallel-tempering ladder would not mix the trained DTM: a hot end that did not decorrelate, and mis-placement of the rungs. exp19 asks both at once on the corrected (trained-weight-refreshed) 60_12, t=200t=200 MNIST DTM at INPUT_IDX=0, single-input conditional πθ(x0)\pi_\theta(\cdot\,|\,x_0): does a hotter top (βtop0\beta_{top}\to 0, i.e. αtop0\alpha_{top}\to 0) make the single-replica reversible 4-block-Gibbs kernel decorrelate (ττTARGET=25\tau \le \tau_{TARGET}=25), and if so, can an equal-acceptance ladder span the decorrelating regime within the conferred budget RMAX=96R_{MAX}=96 rungs?

The setup

Two analytic-plus-empirical frontiers over a 10-point αtop\alpha_{top} grid floored at 0.010.01. (a) A decorrelation frontier: single-replica integrated autocorrelation τ(αtop)\tau(\alpha_{top}) at flat LLADDER=2000L_{LADDER}=2000 (Sokal-resolved throughout). (b) An analytic cost frontier: R(αtop)=1+round(βαtop1Cdα/δ)R^*(\alpha_{top}) = 1 + \mathrm{round}\big(\beta \cdot \int_{\alpha_{top}}^{1}\sqrt{C}\,d\alpha \,/\, \delta^*\big) with δ=1.683\delta^*=1.683 — the rung count an equal-acceptance ladder needs to span [αtop,1][\alpha_{top}, 1]. Gated before any scored compute, all PASS: patch_live=True with the A2 self-adjoint cert PASS (Π\Pi-reversible to <1010<10^{-10}); provenance proven (opt_counts=[12200,12200]=200×61=200\times61, weights_hash\neinit_hash); and the trained-weight refresh guard that invalidated exp15/16/17 now PROVEN clean (constructor_was_stale=True, refreshed_vs_trained=0.0, refresh_ok=True) — so this is a valid trained read, not the stale-INIT bug. βtarget\beta_{target} MEASURED at runtime =1.0=1.0. Ran on a rented H200, SSH-driven, 1.1941.194 of the 4.04.0 GPU-h cap.

The result

A hotter top DOES cure the decorrelation — a temperature effect, not local-kernel non-ergodicity. τ(α)\tau(\alpha) is non-monotone, rising to 330\sim 330 near α=0.18\alpha=0.18 before collapsing:

| αtop\alpha_{top} | 0.5 | 0.18 | 0.08 | 0.03 | 0.02 | 0.01 | |---|---|---|---|---|---|---| | τ\tau | 305.5 | 329.5 | 250.3 | 79.3 | 15.8 | 2.2 | | τ25\tau\le 25? | ✗ | ✗ | ✗ | ✗ | | |

So the decorrelating set is S={0.02,0.01}S=\{0.02, 0.01\} and the coldest decorrelating top (the cheapest span) is αtop=0.02\alpha_{top}^*=0.02. The anchor τ(0.5)=305.5\tau(0.5)=305.5 reproduces exp18's τhot318\tau_{hot}\approx 318 — regression intact. But the analytic cost frontier does not overlap it:

| αtop\alpha_{top} | 0.5 | 0.25 | 0.18 | 0.05 | 0.02 | 0.01 | |---|---|---|---|---|---|---| | C\int\sqrt{C} | 105.5 | 156.5 | 172.7 | 217.7 | 227.5 | 231.0 | | RR^* | 64 | 94 | 104 | 130 | 136 | 138 |

Decorrelation requires αtop0.02\alpha_{top}\le 0.02; tractability requires αtop0.25\alpha_{top}\ge\sim 0.25 (R(0.25)=94R^*(0.25)=94). There is no αtop\alpha_{top} that BOTH decorrelates AND has R96R^*\le 96: minαSR=R(0.02)=136>96\min_{\alpha\in S} R^* = R^*(0.02)=136 > 96. The thermodynamic length βC227\beta\cdot\int\sqrt{C}\approx 227 over the span [0.02,1.0][0.02, 1.0] is the wall. Cost-aware decisive STOP at Stage A — no Stage-B ladder built, the R2R^2-faithful round-trip cert never approached (budget_hit=False, clean completion).

This matched the pre-commitment: SS nonempty \wedge minR=136>RMAX=96\min R^*=136>R_{MAX}=96 routed correctly to a Stage-A STOP. The prediction-precision check confirmed direction (sublinear RR^*, >64>64, costwall) but found the magnitude under-estimated — the crossing landed deeper (0.020.02, not the forecast [0.02,0.08][0.02,0.08] band edge) and on-DTM CC ran higher than the exp18 extrapolation, so the cost wall is worse than forecast (R3045%R^*\approx 30\text{–}45\% above the predicted 90100\sim 90\text{–}100).

Scope and caveats

Config-scoped: 60_12, SEED=0, t=200t=200, INPUT_IDX=0, single-input conditional, the reversible 4-block-Gibbs kernel, the equal-acceptance ladder family, the 10-point grid floored at 0.010.01. HOT-TOP-DECORRELATES-COSTWALL means "this kernel on this landscape needs R=136R^*=136 for a single-span ladder," never "reversible PT cannot mix DTMs," and never a fundamentality verdict about QopQstructQ_{op}\approx Q_{struct}^{\perp}. R(αtop)R^*(\alpha_{top}) is a Gaussian-overlap LOWER BOUND for a multimodal landscape, so the true rung count could be higher — it corroborates the wall, it never rules mixing in. The grid floor at 0.010.01 gives "deepest tested," not "provably deepest." Burn-in init (no exact πr\pi_r) means a STOP carries no robustness asymmetry. This sharpens spine Risk 5 (A2\leftrightarrowA6 antagonism), quantitatively: the reversible/self-adjoint kernel the factorization theorem requires (A2) needs α0.02\alpha\approx 0.02 to decorrelate, but a theorem-compatible tractable ladder exists only down to α0.25\alpha\approx 0.25 — and R=136R^*=136 vs RMAX=96R_{MAX}=96 is the measured size of that gap on the real trained DTM. The conditional factorization stays [solid]; the operational claim stays [conjectured] (KτintK\gg\tau_{int}, A7 open). No tag moves.


What this feeds: this is the keystone behind the A2\leftrightarrowA6 cost-wall framing — the obstruction is relocated, not removed, so the indicated pivot is to switch families (hierarchical PT / simulated tempering / population annealing to amortize the 136-rung length), or to fix the measurement with a τint\tau_{int}-robust QopQ_{op} estimator, rather than build a still-larger single-span ladder or raise the cap again.

— fin. —