Thermodynamic Machine Learning · MMXXVI
Experiment11.VI.MMXXVIRead 4 min

Exp 9 — The Q Objective Demo: It Doesn’t Steer

Entry 11

We took the structural trainability predictor, made it the aux term of a loss, and trained against it — to see whether QstructQ_{struct}^{\perp} is not just measurable but usable; the honest answer here is "computable and differentiable, yes; steering, no."

The question

The companion predictor work (experiments/exp1-exact-diag/) established a differentiable, observable-projected predictor. The escalation: if it is differentiable in the couplings JJ, does adding it to the objective push a model toward higher trainability? This is the first in-loop test — exp8 only evaluated the predictor at isolated anchors. The deeper warning behind it is the flagship thesis that you can't train your way to trainability; exp9 puts that claim to a number.

The setup

Loss with an auxiliary trainability term:

L=LKL+λ(logQpool),λ{0,0.1,0.01}.L = L_{KL} + \lambda\,\bigl(-\log Q_{pool}\bigr),\qquad \lambda \in \{0,\,0.1,\,0.01\}.

Three arms (baseline λ=0\lambda{=}0, primary λ=0.1\lambda{=}0.1, secondary λ=0.01\lambda{=}0.01) across 4 training cells (m{4,5}m\in\{4,5\}, hidden {2,3}\in\{2,3\}), Adam, step cap 20002000 — 12 runs total. The verdict arm is the primary λ=0.1\lambda{=}0.1. The pooled QQ uses Change A, an m2m^2-RHS factored resolvent ran every Adam step of the 8 augmented runs (16,000 in-loop gradient-bearing QQ evaluations). Everything was frozen pre-run: constants, seed table, the c50/c25c50/c25-only matched-crossing verdict basis, pair rules, and the steering bar ρ=1.5\rho = 1.5. Run: 1913 s on laptop CPU, JAX 0.9.1 (x64), all under the declared 4 h cap.

The result

Registered Outcome 3 fired: P1 PASS + P4 verified + P3 PASS + P2 FAIL. The objective is feasible but does not steer at the verdict λ\lambda.

  • P1 — feasibility + fidelity: PASS (construction/formula-level). All 12 runs finite at every step; value-agreement worst case QJAXQnumpy/Qnumpy=2.26×1014|Q_{JAX}-Q_{numpy}|/|Q_{numpy}| = 2.26\times10^{-14} over 24 comparisons (gate 101010^{-10}); autodiff-vs-FD worst best-hh rel-err per cell {2.51,2.53,1.07,1.96}×109\{2.51,2.53,1.07,1.96\}\times10^{-9} (gate 10410^{-4}), PASS ×4\times 4.
  • P2 — steering: FAIL. Paired ratios Qaug/QbaseQ^{aug}/Q^{base} at equal KL progress over 8 pairs gave median 1.125<ρ=1.51.125 < \rho = 1.5 (the median leg failed). The consistency leg passed alone: count(ratio >1>1) =7/86= 7/8 \geq 6. So the objective moves QstructQ_{struct}^{\perp} in the intended direction almost everywhere, but the matched-progress effect size (median +12.5%+12.5\%) is far below the +50%+50\% demo bar.
  • P3 — KL-compatibility guard: PASS. Every arm crossed 0.25\leq 0.25 KL within the cap (primary 4/44/4); augmented arms crossed earlier than baseline yet stalled at higher final LKLL_{KL} — they trade terminal convergence, not early progress.
  • P4 — R4 carry-over: verified over 192 diagnostic cells; max detailed-balance residual 4.16×10174.16\times10^{-17}, Cheeger margin +3.68×103+3.68\times10^{-3}, zero sym-check failures; the armed HALT never fired.

Decomposition (D11.i): the gains are signal-side dominated — numerator ratios span 0.960.962.052.05, denominator (noise) ratios only 0.870.871.041.04. Even the one large pair ((5,2)·c25, ratio 2.372.37) is num 2.05×2.05 \times den1^{-1} 1.151.15. Per the pre-commitment, a signal-side-driven gain is the anti-convergence-flavored channel — the weaker steering mode. The mixing/estimability channel (TOT_O down) was essentially unmoved.

Dose-response (D11.iv): the secondary λ=0.01\lambda{=}0.01 arm is a clean null (median 1.00021.0002, 4/8>14/8>1). Monotone in λ\lambda: 010\to1, 0.011.000.01\to{\sim}1.00, 0.11.130.1\to{\sim}1.13.

Scope and caveats

This is a predictor result, not an estimator result: exp9 moved and measured QstructQ_{struct}^{\perp} exactly — no sampling, no QopQ_{op} ran. Nothing here bears on whether high QstructQ_{struct}^{\perp} implies high actual SNR; the operational factorization tier stays [conjectured], gates unchanged. No tag moves; G2 untouched.

The scope is demo-level: 4 cells, N10N \leq 10, one seed table, 4 teachers at βt=3\beta_t = 3, two λ\lambda points. "Does not steer here" is scoped to this family/architecture/teacher set/λ\lambda pair — never fundamentality.

The headline trap is the finals. At final checkpoints the primary arm holds pooled QQ at O(10O(10102)10^2) while baselines collapse to O(108O(10^{-8}105)10^{-5}) — an apparent 5×106\geq 5\times10^{6} "gain" at (4,2)(4,2), up to 109{\sim}10^{9} at (5,3)(5,3). That number is pure anti-convergence: the aux term freezes LKLL_{KL} high, keeping g\lVert g\rVert large. D5 excludes finals from the verdict by design; the matched-crossing basis removed exactly this confound, leaving the honest +12.5%+12.5\%.


What this feeds: the dose-response monotone trend (1.001.131.00 \to 1.13, consistency 7/87/8) points at a larger-λ\lambda probe — but that is a new design decision for the researcher under a fresh pre-commitment, not a continuation of this frozen one-shot λ\lambda set. Outcome 3 returns the design to the researcher; the conferred annotations (scan G1 cell, spine Risk-2 sub-bullet, HTDML-property parenthetical) record this as a demo-level negative.

— fin. —