The exact toolchain behind every run — the open libraries, pinned versions, and rented accelerators that turn a claim into a result anyone can re-run.
The program runs on a frozen pre-registration discipline: a run’s steps, constants, seeds, and stop conditions are content-hashed to a git commit before any data is seen, and no threshold is relaxed afterward. Every claim carries one of four status tags — solid, conjectured, proven-here, validated — that move only by explicit conferral. Many runs are MEASURE-ONLY: they emit numbers but issue no verdict, so the code can never self-authorize a conclusion.
The toolchain
- Python
- Host language. Every run is
python3 <script>.py, wrapping JAX, thrml, and thedtm-replicationsubstrate. - JAX
- Autodiff + XLA, in two lines:
0.9.1(x64 / CPU) for the exact differentiable-Q work in float64, and0.10.1(CUDA 12) for GPU sampling and energy on the real DTM. Also the value-agreement reference checked against NumPy (≈1e-14). - NumPy / SciPy
- Exact diagonalization (
eigh) and resolvent baselines — the ground-truth reference that gates the JAX path. - thrml 0.1.3
- Extropic’s Ising sampling library; builds the per-replica parallel-tempering kernels (
AnnealingIsingSamplingProgram) at DTM scale. - Equinox
- Immutable parameter-tree surgery (
eqx.tree_at) to refresh the sampler’s interactions to trained weights — the fix for the exp15 init-weight bug. - dtm-replication @ 7c22d19
- The DTM-MNIST
60_12substrate codebase, git-pinned, behind every GPU run. - Weights & Biases
- Every run is logged and version-tracked (offline, deliberately non-verdict-bearing instrumentation). The workspace is public on W&B.
- Rented NVIDIA GPUs
- H100 80GB (exp3) → H200, all on Lightning AI Studio — gated behind a CPU small-family authorization and hard GPU-hour caps.
- Git
- Frozen pre-commitment: every run is content-hashed to a commit before any data is seen, so no threshold can be relaxed after the fact.
- Laptop CPU (float64)
- Exact-diagonalization and small-family runs.
What each experiment added
The program is a ladder: each experiment introduces one new method, tool, or capability on top of the runs before it. Every entry links to its full write-up; the verdicts live there.