Exp 13 — Measurement Repair: T_O Is Calibratable · Thermodynamic Machine Learning

Once you estimate the operational timescale from a single converged calibration trajectory instead of from each operational window, the alleged long-memory divergence disappears — and the windows behave exactly as the CLT predicts.

This is the complete technical record for experiments/exp13-measurement-repair/. It repairs the measurement design of experiments/exp12/, which had read an apparent A6_FAIL divergence and floated Route C (residual long-memory) as the explanation.

The question

exp12 estimated the operational timescale $T_O$ (and the asymptotic stationary variance $S_a$ ) separately on each operational window, then declared an $A6\_FAIL$ "divergence." Two readings were possible. Either the divergence is genuine residual long-memory — the operational windows really are too short, which would justify Route C — or it is an estimation artifact: $T_O$ is window-dependent because the windows are too short to estimate $T_O$ stably, not too short to average over. exp13 was built to settle exactly this: decouple calibration of $T_O$ from the operational read.

The setup

Estimate one doubling-stable $T_O$ and stationary variance $S_a$ from a long, exact- $\pi_r$ -initialized calibration trajectory, freeze that $S_a^{*}$ , then read the finite-window adequacy on independent operational windows against the fixed $S_a^{*}$ . The doubling-stability criterion is two consecutive trajectory doublings with $|\Delta T_O|/T_O < 0.15$ and $\lVert\Delta S\rVert_1 / \Sigma S < 0.15$ . Reuses the frozen exp12 pt_kernel (a976d80); gates frozen pre-commitment a078483 (gate-1) and runner fe94199 (gate-2). Ran 2026-06-15, CPU float64, 309.5 s wall. Reproduce with P0_MODE=full HOST_RAM_GB=8 python3 lto_calibrate.py (MEASURE-ONLY, moves no tag).

The result — two solid findings

$T_O$ is calibratable — Cal-STABLE on all 8 cells. Under exact- $\pi_r$ initialization the doubling criterion was met at stable_L = 8000 (32000 for the R6-convex cell). Converged $T_O^{*}$ : primary $R4/R6/R8$ = 97 / 74 / 63; convex = 294 / 327 / 400. So exp12's $A6\_FAIL$ "divergence" was a window-dependent estimation artifact, not genuine residual long-memory — and Route C is not materially justified (the calibration converged).

The operational windows are NOT demonstrated inadequate. With the fixed $S_a^{*}$ , the finite-window check $F1 = K\cdot\mathrm{Var}[\bar f_a]/S_a^{*}$ sits in the $[0.667, 1.5]$ band at both $20\hat\tau^{*}$ and $50\hat\tau^{*}$ on all 8 cells: $F1@20\hat\tau^{*} \in [1.01, 1.36]$ , $F1@50\hat\tau^{*} \in [1.07, 1.24]$ . The descriptive P4 ratios all sit in the $c=3$ band. The verdict windows behave exactly as the O2 CLT predicts.

Why the frozen runner still emitted S-ADQ — gate-specification diagnosis. The S-ADQ on all 8 cells reflects two mis-specified adequacy gates, not measured inadequacy. (a) The $5\hat\tau^{*}\!\to\!20\hat\tau^{*}$ no-upward-divergence guard is mis-specified: O2 predicts $K\cdot\mathrm{Var}[\bar f_a]$ rises toward $S_a^{*}$ from below, so a guard that rejects an upward $5\hat\tau^{*}\!\to\!20\hat\tau^{*}$ movement rejects the expected convergence; the $5\hat\tau^{*}$ point is genuinely pre-asymptotic. (b) The §8 swap/round-trip/OVL AND-gate was wired too broadly — applied as a pre-P2 binding gate, so any sub-failure forced S-ADQ. Its intended role is to gate a negative S-C verdict, not to block a compute-normalized positive P2. The §8 sub-failure here was high swap acceptance ( $> 0.60$ ), which is ladder redundancy already penalized by work_PT, not invalid sampling.

Non-verdict pointers. Against the converged $T_O^{*}$ , the would-be compute-normalized speedup $T_O(P_{sym})/(T_O^{*}\cdot \mathrm{work}_{PT})$ is $R4 = 2.42$ , $R6 = 2.12$ ( $\ge 2.0$ ), $R8 = 1.86$ (primary, $\mathrm{work}_{PT}=1.5R$ ); convex all $< 2.0$ . $T_O(P_{sym}) = 1409.42$ (exact). These are non-verdict — the gate artifact blocked the operational $Q_{op}$ -based P2 read, so exp13 issues no P2 verdict (though they suggest the corrected re-read could land S-A for primary $R4/R6$ ). The convex $K_{PT}$ mixes far slower ( $T_O^{*}$ 294–400, $\gamma_{bulk}^{PT} \approx 0.074 < \Omega(1)$ ), confirming the primary kernel $K=\tfrac12(LS+SL)$ is the right one. A7 multimodal calibration ( $m{=}3, M{=}2, R{=}2$ ): sampled-VAC vs exact extended-spectrum ratio 0.997.

Scope and caveats

This is a gate-specification artifact, not demonstrated inadequacy — exp13 does not show the operational windows are inadequate; it shows the opposite (F1 passes at the verdict windows). It also retracts the earlier stable_L mis-framing: stable_L is a calibration requirement (trajectory length to estimate $S_a$ stably), not a replacement mixing timescale. The ratio stable_L $/\hat\tau^{*}$ compares an estimator-calibration length with a physical autocorrelation time; it does not show $\hat\tau^{*}$ underestimates the operational timescale, and exp13 makes no " $\hat\tau^{*}$ underestimates by $\sim 10^{3}\times$ " or " $\tau_{max}$ -sized windows are inadequate" claim. No GPU authorization (would-be speedups are non-verdict), no Route-C verdict, no fundamentality claim, no tag change. Conditional factorization stays [solid], operational stays [conjectured].

What this feeds: exp14 — the corrected operational read. It keeps exp13's calibrated $S_a^{*}$ , kernels, ladders, and $\{20,50\}\hat\tau^{*}$ windows on fresh held-out seeds; gates window adequacy on F1 at those windows (treating $5\hat\tau^{*}$ and the approach-from-below as diagnostics), restores the §8 AND-gate to gating a negative S-C verdict only, and retains $\gamma_{bulk}$ for S-A — giving the primary $R4/R6$ cells a clean confirmatory path without changing the kernel after seeing favorable speedups.