Hardware · Thermodynamic Machine Learning

Every claim in this program eventually lands on one physical event: a single bit, deciding which way to fall on thermal noise. The hardware is where thermodynamic machine learning stops being a metaphor.

The p-bit as a physical Gibbs sampler

A network of p-bits is not a model of a Gibbs sampler — it is one. Each bit settles into +1 or −1 with probability P(s = +1) = σ(β·I): a thermal coin-flip, biased by its input current I and sharpened by the inverse temperature β. Cold devices threshold hard and behave like logic; hot ones flip almost at random. That one knob, β, is the temperature behind every hardware Gibbs step — the same temperature the whole trainability story is written in.

Why hardware — the energy case

The appeal is not speed for its own sake; it is the cost of a sample. An sMTJ p-bit performs the σ(β·I) flip for roughly a femtojoule — against the ten-or-so picojoules the same draw costs in software. That is about four orders of magnitude per coin-flip, and a thermodynamic model spends its entire training budget on coin-flips. The catch is that cheap samples are only useful if they are also fast-mixing — which is exactly where the physics stops cooperating.

The differentiability gate

The hybrid program rests on an assumption almost nobody states aloud: that somewhere there is a substrate where an energy model’s couplings J are differentiable functions of network parameters. If they are, gradient descent flows cleanly from a loss on the sampler’s behaviour all the way back into the encoder; if not, the hybrid is two systems bolted together that cannot be co-trained. We called it the G1b gate and audited eight published substrates against it. The answer was a clean, uncomfortable zero — and no two failed for the same reason. So the missing piece had to be built, not located: a hypernetwork W = g_φ(u) that emits the couplings of a bias-free, ℤ₂-even RBM, with G1b constructed at a value-agreement of 7.06×10⁻¹⁵ on a small exactly-diagonalizable family.

The substrate audit

Each row is a published substrate and the gate it fails; the last is the one we built.

DTM clone: No encoder at all — its couplings J are trained directly, with a threshold-binarized input and no network in the energy path.
thrml: As shipped, exposes raw coupling tensors and sums factor energies; there is no encoder and no reversibility tooling.
NEAT-RN: Its only knob is analog DC voltage, and its dynamics are a measured time-reversal-breaking Brownian gyrator — it violates A2 (π-reversibility) as a matter of physics, not implementation.
TSMN: Does learn, but the learnable object is a temporal memory T_i, not the energy E_θ.
TNN: Its edge weights are sampled state variables, not encoder outputs.
NQS: Uses an RBM, but the RBM is the variational ansatz — its width α is raw capacity, not a net→EBM split.
Boltzmann-prior VAE: The closest — it clears the encoder-plus-EBM gate G1a, but fails G1b: its network does not parameterize the prior’s couplings, so the latent energy stays a separate trained object.
Hypernetwork W = gφ(u): Built here. The network produces the RBM coupling matrix W from a conditioning code u; bias-free energy E = −v^⊤Wh stays ℤ₂-even, so the symmetry quotient is legitimate for every learned W. G1b constructed.

The trap at 10⁻¹⁶

The natural way to build that substrate hides a silent detonation. Differentiating a mixing-time objective through an eigendecomposition carries a term that scales as 1/Δλ — the inverse gap to the adjacent eigenvalue. At the checkpoint the battery anchored on, those gaps measured 1.1×10⁻¹⁶ to 3.3×10⁻¹⁶, right at the floor of double precision. An eigh-based gradient would not have crashed; it would have returned a finite, wrong number — the worst failure, because nothing flags it (condition number ≈ 10¹⁶). The fix was a deflated-resolvent route that never forms an eigendecomposition and stays smooth wherever the spectral gap γ_u > 0 (condition number ≈ 10²). If you ever autodiff through a spectral objective: never eigh — use a resolvent.

The honest scope: this builds and verifies the construction gate, on a small exactly-solvable family. It says nothing about the gate that still binds — whether such a substrate mixes fast enough at useful scale (the operational conditions A6, K ≫ τ_int, and A7, overlapping-bulk relaxation). Those need an at-scale reversible run we have not done, so the substrate is constructed, not validated. What we can now say is narrower and firmer: the differentiable net→EBM substrate the whole hybrid program assumes into existence did not exist, it is buildable, and the most natural way to build it hides a silent numerical detonation at the precision floor.

[What is thermodynamic trainability][The differentiability gate][Exp 8 — the build][Notebook]