// In pursuit of trainable thermodynamics — Writings & Fragments

Writings & Fragments

Field notes from a research program where the question is not can we build a bigger model but can the thing we build actually be trained. The writings below are attempts to work that out in the open.

Negative Result20.VI.MMXXVIRead 7 min

The Reversible Kernel Your Theorem Needs Is the One That Won’t Mix

Our acceleration theorem needs a reversible sampler and a flood of decorrelated samples per step. On a real trained MNIST model, those two demands are antagonistic — and we have the curve that shows it.

Read entry →

Research20.VI.MMXXVIRead 7 min

The Spectral Gap Is the Wrong Object

The textbook way to predict whether a thermodynamic model trains — read off its slowest sampling mode — can over-predict the gradient SNR by a factor of ten-octillion. Under spin-flip symmetry the slowest mode is invisible to the gradient. Here is why, and the fix.

Read entry →

Research18.VI.MMXXVIRead 6 min

Measure Mixing on the Gradient, Not the Order Parameter

The most convenient thing to watch in a sampler — a single scalar like the magnetization — is the wrong thing to watch. It can mistime your gradient by 72x, and the failure is exactly where you most need the reading.

Read entry →

Negative Result18.VI.MMXXVIRead 6 min

It’s Not the Overlap Wall — It’s the Schedule

Parallel tempering is supposed to fail when adjacent replicas stop overlapping. On a trained MNIST energy model it fails differently: the specific heat is so uneven along the ladder that no uniform temperature step satisfies every rung at once.

Read entry →

Methodology18.VI.MMXXVIRead 6 min

The Init-Weight Bug That Faked “PT Mixes a Deep EBM”

Two expensive GPU runs concluded parallel tempering mixed a trained MNIST energy model near-ideally. A third run found the sampler's local kernels were silently built from the untrained weights — the real swap-acceptance was 0.0099, and nothing mixed.

Read entry →

Research16.VI.MMXXVIRead 7 min

A Trainability Predictor You Can Compute Before Training Converges

The primer named Q; this is the refined object behind it. An observable-projected, multi-mode, differentiable predictor read from the sampler's spectrum — tracking the gradient SNR in 45 of 48 exact-diag cells where a single gap manages 23.

Read entry →

Research15.VI.MMXXVIRead 6 min

Measuring Q: Gradient SNR in Thermodynamic Samplers

The primer argued that trainability comes down to a single number, Q — the squared signal-to-noise ratio of the gradient. This is how you actually measure it: the estimator and its bias, a three-rung ladder from exact diagonalization to GPU scale, and the pre-registered result that plateau onset tracks the Gibbs spectral gap.

Read entry →

Negative Result14.VI.MMXXVIRead 7 min

You Can’t Train Your Way to Trainability

We dropped the trainability predictor into the loss as a regularizer and turned the knob. The diagnostic soared a billionfold. The model stopped learning. A clean Goodhart's-law negative result, traced across the whole dose ladder.

Read entry →

Research10.VI.MMXXVIRead 6 min

No Existing Substrate Clears the Differentiability Gate — So We Built One

We audited eight thermodynamic-ML substrates for one property — can a network produce differentiable energy-model couplings? Every one fails a different gate. So we built the missing piece, and a numerical trap nearly sank it at an eigenvalue gap of 1e-16.

Read entry →

Primer10.VI.MMXXVIRead 4 min

What Is Thermodynamic Trainability?

Some energy-based models learn and others stall on a plateau, and the difference is rarely the architecture — it is whether the gradient signal survives the sampling noise. A field guide to thermodynamic trainability: the quantity Q, the Gibbs spectral gap, the fight between mixing and expressivity, and the p-bit hardware where it all comes down to a single thermal coin-flip.

Read entry →

Announcement8.VI.MMXXVIRead 1 min

Thermodynamic Machine Learning — a Research Notebook

A public notebook on whether energy-based models running on physical thermodynamic hardware can actually be trained. The bottleneck is no longer the model — it is whether the thing that runs on thermal noise emits a gradient you can still hear. This is where that gets worked out, in the open.

Read entry →

Research3.VI.MMXXVIRead 6 min

Multimodality Collapses the Effective Spectral Gap by 100x

Plant a handful of modes in an RBM and the observable-relevant gap drops 100-fold — 0.65 to 0.006. We read it straight off the exact spectrum, so it cannot be a burn-in or finite-sample lie.

Read entry →

End of current entries