A synthesis of the program's honest negatives: the operational trainability claim is gated not by a budget shortfall but by a structural obstruction, and each failure is sharp enough to be a result.
The Q-program proves a conditional factorization in regime A1–A8 + plateau + F4 ([solid], written proof O1–O6 over the single proven-here lemma O1.c). The operational claim — that this holds on a real DTM at the one actually runs — stays [conjectured], gated by A2 (-reversibility, which the no-leakage machinery needs) and A6 (). What follows is the record of trying to meet those gates.
The A2 ↔ A6 obstruction
The question. Can a kernel be both reversible (A2) and fast-mixing enough that (A6)? Nothing formally forbids it — but the factorization is about the multimodal plateau, exactly where the two become operationally antagonistic.
The setup. experiments/exp4-reversible-50tau/ fixed exp3's blockers: a genuinely reversible symmetrized 4-block Gibbs kernel (self-adjointness gated to ), then a non-circular doubling-stability probe for on the real 60_12 MNIST DTM at (Lightning H200).
The result. does not stabilize — it grows dead-linearly in trajectory length : over , across six doublings, and the doubling-stability rule () is never met. So is UNRESOLVED, the windows are uninstantiable, and A6 is unreachable — registered outcome (F): P0-HALT, P1–P5 unrun. exp6 reproduced at every checkpoint , pushing the reading from "deep-checkpoint effect" toward "slow from the start."
Scope and caveats. The exp3→exp4 jump is confounded (kernel and window both changed), so is not pure truncation — but the clean attribution is to the reversible kernel mixing far slower. The deeper tension: exp3's faster kernel violated A2; the A2-valid kernel is the slow one. reads either as a genuine near-zero gap or as inadequate burn-in, but both give the same operational conclusion. No tag flip: A6 is unreachable here, not the conjecture false.
The thermodynamic-length wall
The question. If a single reversible kernel mixes too slowly, can a reversible parallel-tempering ladder bridge a hot, free-mixing regime down to the cold target — cheaply?
The setup. experiments/exp19-hotter-top-first/ ran an equal-acceptance reversible-PT mixing probe on the trained 60_12 DTM (single-input conditional, INPUT_IDX=0), with a hot-top sweep ( floored at 0.01) and an analytic rung-count , . The trained-weight-refresh guard () passed, fixing the bug that invalidated the earlier exp15/16 reads.
The result. A hotter top does cure decorrelation — the single replica reaches at (; at , ): a temperature effect, not local-kernel non-ergodicity. But the cheapest decorrelating span needs rungs . The two frontiers do not overlap — decorrelation needs , tractability needs (). The gap is a thermodynamic-length wall of — decisive STOP at Stage A.
Scope and caveats. Config-scoped (60_12, seed 0, , this kernel and ladder family) — never a fundamentality verdict about reversible PT. is a Gaussian-overlap lower bound, so the true cost can only be higher. exp18 had already shown PT fails by schedule, not by overlap: a uniform- ladder over missed the acceptance band not because spacing can't move acceptance (it moves it ) but because the DTM's specific heat is non-uniform — the hot edge clears the floor only after the cold edges overshoot the ceiling (PT-MARGINAL). exp19 then resolved both of exp18's suspects and the ladder is still intractable. MEASURE-ONLY; moves no tag.
You cannot optimize the proxy directly
The question. is computable without training to convergence and differentiable in . Can you therefore use it as an in-loop objective (HTDML) to steer a model toward trainability?
The setup. experiments/exp11-htdml-objective-lambda-sweep/ swept the objective weight over 4 cells, with a matched-/held-crossing design and a KL-compatibility task guard.
The result. The computability + differentiability-in-use half is PASS — up to in-loop pooled- evaluations with gradients, agreeing with the unmodified numpy spectral pipeline to (P1). But the steering half does not steer (registered Outcome 2). As rises the matched-crossing ratio climbs ( undefined) while the task-guard pass fraction collapses (). The only verdict-eligible arm, , fails Leg 1 (median , reproducing exp9 exactly) and its genuine-channel gate ( median — the channel never engages). The large ratios at are non-verdict quantities: they buy by anti-convergence that abandons the KL task. This is Goodhart — push the proxy past the bar and you destroy what it proxies for.
Scope and caveats. Demo-level scope (, one seed table, this ladder) — a scoped negative, never "cannot steer." It is a predictor measurement, not a validation: no sampling ran. No tag moves; G2 untouched.
What this feeds: the two pre-registered pivots from exp19 — switch sampler families (hierarchical PT / simulated tempering / population annealing, at the risk of breaking A2 and trading a mixing problem for a theorem problem) or the theory route of a -robust estimator that validates the operational claim without a fast-mixing kernel. In this field, a sharp, well-instrumented negative — A6 unreachable, the wall, Goodhart on the proxy — is the contribution.