Every claim in this program eventually lands on one physical event: a single bit, deciding which way to fall on thermal noise. The hardware is where thermodynamic machine learning stops being a metaphor.
The p-bit as a physical Gibbs sampler
A network of p-bits is not a model of a Gibbs sampler — it is one. Each bit settles into +1 or −1 with probability P(s = +1) = σ(β·I): a thermal coin-flip, biased by its input current I and sharpened by the inverse temperature β. Cold devices threshold hard and behave like logic; hot ones flip almost at random. That one knob, β, is the temperature behind every hardware Gibbs step — the same temperature the whole trainability story is written in.
Why hardware — the energy case
The appeal is not speed for its own sake; it is the cost of a sample. An sMTJ p-bit performs the σ(β·I) flip for roughly a femtojoule — against the ten-or-so picojoules the same draw costs in software. That is about four orders of magnitude per coin-flip, and a thermodynamic model spends its entire training budget on coin-flips. The catch is that cheap samples are only useful if they are also fast-mixing — which is exactly where the physics stops cooperating.
The differentiability gate
The hybrid program rests on an assumption almost nobody states aloud: that somewhere there is a substrate where an energy model’s couplings J are differentiable functions of network parameters. If they are, gradient descent flows cleanly from a loss on the sampler’s behaviour all the way back into the encoder; if not, the hybrid is two systems bolted together that cannot be co-trained. We called it the G1b gate and audited eight published substrates against it. The answer was a clean, uncomfortable zero — and no two failed for the same reason. So the missing piece had to be built, not located: a hypernetwork W = gφ(u) that emits the couplings of a bias-free, ℤ2-even RBM, with G1b constructed at a value-agreement of 7.06×10⁻¹⁵ on a small exactly-diagonalizable family.
The substrate audit
Each row is a published substrate and the gate it fails; the last is the one we built.
- DTM clone
- No encoder at all — its couplings J are trained directly, with a threshold-binarized input and no network in the energy path.
- thrml
- As shipped, exposes raw coupling tensors and sums factor energies; there is no encoder and no reversibility tooling.
- NEAT-RN
- Its only knob is analog DC voltage, and its dynamics are a measured time-reversal-breaking Brownian gyrator — it violates A2 (π-reversibility) as a matter of physics, not implementation.
- TSMN
- Does learn, but the learnable object is a temporal memory Ti, not the energy Eθ.
- TNN
- Its edge weights are sampled state variables, not encoder outputs.
- NQS
- Uses an RBM, but the RBM is the variational ansatz — its width α is raw capacity, not a net→EBM split.
- Boltzmann-prior VAE
- The closest — it clears the encoder-plus-EBM gate G1a, but fails G1b: its network does not parameterize the prior’s couplings, so the latent energy stays a separate trained object.
- Hypernetwork W = gφ(u)
- Built here. The network produces the RBM coupling matrix W from a conditioning code u; bias-free energy E = −v⊤Wh stays ℤ2-even, so the symmetry quotient is legitimate for every learned W. G1b constructed.
The trap at 10⁻¹⁶
The natural way to build that substrate hides a silent detonation. Differentiating a mixing-time objective through an eigendecomposition carries a term that scales as 1/Δλ — the inverse gap to the adjacent eigenvalue. At the checkpoint the battery anchored on, those gaps measured 1.1×10⁻¹⁶ to 3.3×10⁻¹⁶, right at the floor of double precision. An eigh-based gradient would not have crashed; it would have returned a finite, wrong number — the worst failure, because nothing flags it (condition number ≈ 10¹⁶). The fix was a deflated-resolvent route that never forms an eigendecomposition and stays smooth wherever the spectral gap γu > 0 (condition number ≈ 10²). If you ever autodiff through a spectral objective: never eigh — use a resolvent.
The honest scope: this builds and verifies the construction gate, on a small exactly-solvable family. It says nothing about the gate that still binds — whether such a substrate mixes fast enough at useful scale (the operational conditions A6, K ≫ τint, and A7, overlapping-bulk relaxation). Those need an at-scale reversible run we have not done, so the substrate is constructed, not validated. What we can now say is narrower and firmer: the differentiable net→EBM substrate the whole hybrid program assumes into existence did not exist, it is buildable, and the most natural way to build it hides a silent numerical detonation at the precision floor.