Thermodynamic Machine Learning · MMXXVI
Experiment15.VI.MMXXVIRead 5 min

Exp 12 — PT vs P_sym: Outcome F (Measurement-Limited)

Entry 16

Parallel tempering visibly accelerated the cold replica, yet the measurement could not see far enough to turn that acceleration into a verdict — so exp12 returns Outcome F, not a win.

This is the complete technical record for experiments/exp12-pt-vs-psym/. Here we keep the gates, the windows, and the claim-status discipline. Ran 2026-06-15 on a laptop CPU, float64, ~274 s wall. Gates frozen at pre-commitment 46306c2 (gate-1) and runner a976d80 (gate-2). Reproduce: P0_MODE=full HOST_RAM_GB=8 python3 pt_calibrate.pyMEASURE-ONLY.

The question

exp5 fixed the baseline: on cell C-deep the exact slow-mode timescale of the bare reversible kernel P_sym is TO(Psym)=1409.42T_O(P_{sym}) = 1409.42, with τmax(Psym)=112.48\tau_{max}(P_{sym}) = 112.48. Does a reversible parallel-tempering mixture K=12(LS+SL)K = \tfrac{1}{2}(LS + SL) — local sweeps LL composed with replica swaps SS, symmetrized so detailed balance holds — cut that slow cluster relative to bare P_sym? This is an operational-tier feasibility precursor, asking whether PT moves the mixing-speed axis at all on this substrate.

The setup

A frozen 8-cell grid over C-deep (R4/R6/R8, primary and convex schedules), plus a C-uni diagnostic and an optional C-deep2 cell. P1 is the reversibility gate (selfadjoint_check_pt.py): at m=2m=2, R=3R=3 every reversible kernel — LL, swaps, SmixS_{mix}, SmixnsS_{mix}^{n_s}, KK, KPTK_{PT}, and the actual ns=R1n_s = R{-}1 super-sweep KactualK_{actual} — self-adjoint to <1010< 10^{-10}; the deterministic control Kdet=3.96×102>104K_{det} = 3.96\times10^{-2} > 10^{-4} shows the test has teeth; swap-formula cross-check 1016\sim 10^{-16}. Formula-confirmed PASS, frozen, not re-run in the verdict.

The verdict pipeline sets τ^PT\hat{\tau}_{PT} by a doubling rule, then estimates the operational timescale TOPTT_O^{PT} independently from each short operational window at {5,20,50}τ^\{5, 20, 50\}\cdot\hat{\tau}. The A6 adequacy gate must clear before any speedup (P2) is read: in particular the F1 / no-upward-divergence guard requires TOPTT_O^{PT} to have stabilized by the 50τ^50\cdot\hat{\tau} window.

The result

Outcome F on all 8 cells (A6_FAIL). PT did what it was supposed to on the descriptive axis: τmax\tau_{max} cut 14–22× on the primary cells (e.g. R6 21×, R8 22×), and projected bulk VAC rates γbulkPT\gamma_{bulk}^{PT} landed at 0.23–0.45 (primary) / 0.06–0.10 (convex). But TOPTT_O^{PT} kept rising across windows on every cell:

| cell | τ^PT\hat{\tau}_{PT} | τmax\tau_{max} cut | TOPTT_O^{PT} @ {5,20,50}τ^\{5,20,50\}\hat{\tau} | A6 | |---|---|---|---|---| | C-deep R4 primary | 8.0 | 14× | 32 → 68 → 86 | A6_FAIL | | C-deep R6 primary | 5.3 | 21× | 14 → 32 → 51 | A6_FAIL | | C-deep R8 primary | 5.0 | 22× | 14 → 28 → 44 | A6_FAIL | | C-deep R4 convex | 21.5 | 5× | 88 → 215 → 271 | A6_FAIL | | C-deep R6 convex | 25.5 | 4× | 77 → 206 → 264 | A6_FAIL | | C-deep R8 convex | 31.2 | 4× | 121 → 243 → 310 | A6_FAIL | | C-uni R4 (diag.) | 5.8 | 19× | 29 → 56 → 70 | A6_FAIL | | C-deep2 R4 (opt.) | 4.1 | 22× | 14 → 23 → 26 | A6_FAIL |

Because the binding F1 guard fired, map_outcome mapped every cell A6_FAIL → F (pt_calibrate.py:908). Crucially, A6_FAIL is not UNRESOLVED: τ^\hat{\tau} stabilized and the {20,50}τ^\{20,50\}\cdot\hat{\tau} windows completed — only the verdict-TOT_O stabilization sub-check failed.

The speedups are therefore descriptive (non-verdict): they are gated behind the failed A6 and read at the largest still-unstabilized 50τ^50\cdot\hat{\tau} window, so they are not valid P2 verdicts. P4 (Q_op tracks the predictor) passed within the c=3c=3 band on every cell with the ratio trending toward 1 — but, again, gated behind A6 and read as descriptive only. The A7 multimodal calibration (m=3, M=2, R=2; extended 4096) validated the dominant VAC gap: sampled-vs-exact γ\gamma ratio 0.997.

Scope and caveats

This is the honest core: exp12 does not distinguish two readings, and says so. The verdict-TOT_O is re-estimated from each short window (pt_calibrate.py:774) whose longest span is 82–400 steps, whereas the doubling probe that set τ^\hat{\tau} ran 1,000–2,000 steps. Because the longer probe trajectories are not reused for verdict TOT_O, the half-Sokal autocorrelation sum at the short windows is plausibly truncated low and still rising — not KτintK \gg \tau_{int} failing outright. So the open fork is (i) genuine residual long-memory in the cold-replica observables (γbulkPT\gamma_{bulk}^{PT} reading below the bare-P_sym 0.78), versus (ii) window-dependent TOT_O estimation. exp12 does not resolve which. The evidence that TOT_O is approaching a plateau rather than diverging: the 2050τ^20\to50\cdot\hat{\tau} growth is smaller than 520τ^5\to20\cdot\hat{\tau} on every cell, the 50τ^50\cdot\hat{\tau} F1 checks largely pass (R6-primary excepted), and Q_op/predictor trends toward 1.

Two corrections are recorded without unfreezing the pre-commitment: the cache-aware FSE cost is workPT=1.5Rwork_{PT} = 1.5R (not the §7 illustration's R\approx R), so the raw TOT_O-reduction demand at the bar is 12×/18×/24×\approx 12\times/18\times/24\times at R=4/6/8 — but the binding bar, compute-normalized speedup 2.0\ge 2.0, is unchanged and does not affect the F verdict, which A6 sets first. And the slow-cluster acceleration ratio γeffPT/γeff(Psym)113×\gamma_{eff}^{PT}/\gamma_{eff}(P_{sym}) \approx 113\times at R4 is a coarse projected proxy computed with unit VAC-mode weights (pt_calibrate.py:1031) — not evidence that PT "solves the mixing-speed axis."

No tag moves. No GPU authorization, no Route-C verdict, no fundamentality claim. The conditional factorization stays [solid], the operational tier stays [conjectured]; the central-spine A2↔A6 / Risk-ledger interpretation is HELD pending separate explicit conferral. This entry records the measured outcome only.


What this feeds: a future design must size verdict-TOT_O from the longer probe-length trajectories (or from a TOT_O-convergence criterion rather than τ^\hat{\tau}-stability) before PT's 141422×22\times τmax\tau_{max} cut can be read as a clean P2 verdict; Route C remains a reasonable candidate but exp12 does not establish that the A6 obstruction has moved to a new intrinsic limit. The lineage continues from exp11's λ-sweep on the objective axis and exp5's spectral baseline on the mixing axis.

— fin. —