Exp 12 — PT vs P_sym: Outcome F (Measurement-Limited)

Parallel tempering visibly accelerated the cold replica, yet the measurement could not see far enough to turn that acceleration into a verdict — so exp12 returns Outcome F, not a win.

This is the complete technical record for experiments/exp12-pt-vs-psym/. Here we keep the gates, the windows, and the claim-status discipline. Ran 2026-06-15 on a laptop CPU, float64, ~274 s wall. Gates frozen at pre-commitment 46306c2 (gate-1) and runner a976d80 (gate-2). Reproduce: P0_MODE=full HOST_RAM_GB=8 python3 pt_calibrate.py — MEASURE-ONLY.

The question

exp5 fixed the baseline: on cell C-deep the exact slow-mode timescale of the bare reversible kernel P_sym is $T_O(P_{sym}) = 1409.42$ , with $\tau_{max}(P_{sym}) = 112.48$ . Does a reversible parallel-tempering mixture $K = \tfrac{1}{2}(LS + SL)$ — local sweeps $L$ composed with replica swaps $S$ , symmetrized so detailed balance holds — cut that slow cluster relative to bare P_sym? This is an operational-tier feasibility precursor, asking whether PT moves the mixing-speed axis at all on this substrate.

The setup

A frozen 8-cell grid over C-deep (R4/R6/R8, primary and convex schedules), plus a C-uni diagnostic and an optional C-deep2 cell. P1 is the reversibility gate (selfadjoint_check_pt.py): at $m=2$ , $R=3$ every reversible kernel — $L$ , swaps, $S_{mix}$ , $S_{mix}^{n_s}$ , $K$ , $K_{PT}$ , and the actual $n_s = R{-}1$ super-sweep $K_{actual}$ — self-adjoint to $< 10^{-10}$ ; the deterministic control $K_{det} = 3.96\times10^{-2} > 10^{-4}$ shows the test has teeth; swap-formula cross-check $\sim 10^{-16}$ . Formula-confirmed PASS, frozen, not re-run in the verdict.

The verdict pipeline sets $\hat{\tau}_{PT}$ by a doubling rule, then estimates the operational timescale $T_O^{PT}$ independently from each short operational window at $\{5, 20, 50\}\cdot\hat{\tau}$ . The A6 adequacy gate must clear before any speedup (P2) is read: in particular the F1 / no-upward-divergence guard requires $T_O^{PT}$ to have stabilized by the $50\cdot\hat{\tau}$ window.

The result

Outcome F on all 8 cells (A6_FAIL). PT did what it was supposed to on the descriptive axis: $\tau_{max}$ cut 14–22× on the primary cells (e.g. R6 21×, R8 22×), and projected bulk VAC rates $\gamma_{bulk}^{PT}$ landed at 0.23–0.45 (primary) / 0.06–0.10 (convex). But $T_O^{PT}$ kept rising across windows on every cell:

| cell | $\hat{\tau}_{PT}$ | $\tau_{max}$ cut | $T_O^{PT}$ @ $\{5,20,50\}\hat{\tau}$ | A6 | |---|---|---|---|---| | C-deep R4 primary | 8.0 | 14× | 32 → 68 → 86 | A6_FAIL | | C-deep R6 primary | 5.3 | 21× | 14 → 32 → 51 | A6_FAIL | | C-deep R8 primary | 5.0 | 22× | 14 → 28 → 44 | A6_FAIL | | C-deep R4 convex | 21.5 | 5× | 88 → 215 → 271 | A6_FAIL | | C-deep R6 convex | 25.5 | 4× | 77 → 206 → 264 | A6_FAIL | | C-deep R8 convex | 31.2 | 4× | 121 → 243 → 310 | A6_FAIL | | C-uni R4 (diag.) | 5.8 | 19× | 29 → 56 → 70 | A6_FAIL | | C-deep2 R4 (opt.) | 4.1 | 22× | 14 → 23 → 26 | A6_FAIL |

Because the binding F1 guard fired, map_outcome mapped every cell A6_FAIL → F (pt_calibrate.py:908). Crucially, A6_FAIL is not UNRESOLVED: $\hat{\tau}$ stabilized and the $\{20,50\}\cdot\hat{\tau}$ windows completed — only the verdict- $T_O$ stabilization sub-check failed.

The speedups are therefore descriptive (non-verdict): they are gated behind the failed A6 and read at the largest still-unstabilized $50\cdot\hat{\tau}$ window, so they are not valid P2 verdicts. P4 (Q_op tracks the predictor) passed within the $c=3$ band on every cell with the ratio trending toward 1 — but, again, gated behind A6 and read as descriptive only. The A7 multimodal calibration (m=3, M=2, R=2; extended 4096) validated the dominant VAC gap: sampled-vs-exact $\gamma$ ratio 0.997.

Scope and caveats

This is the honest core: exp12 does not distinguish two readings, and says so. The verdict- $T_O$ is re-estimated from each short window (pt_calibrate.py:774) whose longest span is 82–400 steps, whereas the doubling probe that set $\hat{\tau}$ ran 1,000–2,000 steps. Because the longer probe trajectories are not reused for verdict $T_O$ , the half-Sokal autocorrelation sum at the short windows is plausibly truncated low and still rising — not $K \gg \tau_{int}$ failing outright. So the open fork is (i) genuine residual long-memory in the cold-replica observables ( $\gamma_{bulk}^{PT}$ reading below the bare-P_sym 0.78), versus (ii) window-dependent $T_O$ estimation. exp12 does not resolve which. The evidence that $T_O$ is approaching a plateau rather than diverging: the $20\to50\cdot\hat{\tau}$ growth is smaller than $5\to20\cdot\hat{\tau}$ on every cell, the $50\cdot\hat{\tau}$ F1 checks largely pass (R6-primary excepted), and Q_op/predictor trends toward 1.

Two corrections are recorded without unfreezing the pre-commitment: the cache-aware FSE cost is $work_{PT} = 1.5R$ (not the §7 illustration's $\approx R$ ), so the raw $T_O$ -reduction demand at the bar is $\approx 12\times/18\times/24\times$ at R=4/6/8 — but the binding bar, compute-normalized speedup $\ge 2.0$ , is unchanged and does not affect the F verdict, which A6 sets first. And the slow-cluster acceleration ratio $\gamma_{eff}^{PT}/\gamma_{eff}(P_{sym}) \approx 113\times$ at R4 is a coarse projected proxy computed with unit VAC-mode weights (pt_calibrate.py:1031) — not evidence that PT "solves the mixing-speed axis."

No tag moves. No GPU authorization, no Route-C verdict, no fundamentality claim. The conditional factorization stays [solid], the operational tier stays [conjectured]; the central-spine A2↔A6 / Risk-ledger interpretation is HELD pending separate explicit conferral. This entry records the measured outcome only.

What this feeds: a future design must size verdict- $T_O$ from the longer probe-length trajectories (or from a $T_O$ -convergence criterion rather than $\hat{\tau}$ -stability) before PT's $14$ – $22\times$ $\tau_{max}$ cut can be read as a clean P2 verdict; Route C remains a reasonable candidate but exp12 does not establish that the A6 obstruction has moved to a new intrinsic limit. The lineage continues from exp11's λ-sweep on the objective axis and exp5's spectral baseline on the mixing axis.