Markov Chains and Spin Glasses - Stochastic Dynamics in Disordered Systems

Updated: November 30, 2025 • Created: June 11, 2025

The connection between Markov chains and spin glasses runs deep, providing both a computational framework for studying spin glass dynamics and a physical interpretation of optimization algorithms. This article explores how Markov chain Monte Carlo (MCMC) methods enable the simulation of spin glass systems, and how spin glass physics informs the design of optimization algorithms for problems like the Traveling Salesman Problem.

Markov Chains: A Primer

A Markov chain is a stochastic process where the probability of transitioning to a new state depends only on the current state, not on the history of previous states. This "memoryless" property, known as the Markov property, makes Markov chains powerful tools for modeling complex systems.

Formally, a Markov chain is defined by:

State space $S$: The set of all possible states
Transition probabilities $P_{ij}$ : The probability of transitioning from state $i$ to state $j$
Initial distribution $\pi_0$ : The probability distribution over states at time $t=0$

The transition probabilities satisfy:

$\sum_{j \in S} P_{ij} = 1 \quad \forall i \in S$

Stationary Distribution

For an ergodic Markov chain (one that can reach any state from any other state), there exists a unique stationary distribution $\pi^*$ such that:

$\pi^* = \pi^* P$

where $P$ is the transition matrix. The stationary distribution represents the long-term probability of being in each state.

Metropolis-Hastings Algorithm

The Metropolis-Hastings algorithm is a Markov chain Monte Carlo method for sampling from a probability distribution. It constructs a Markov chain whose stationary distribution is the desired distribution.

For a target distribution $\pi(x)$ , the algorithm works as follows:

Proposal: From current state $x$, propose a new state $x'$ with probability $q(x'|x)$
Acceptance: Accept the proposal with probability:

$A(x'|x) = \min\left(1, \frac{\pi(x') q(x|x')}{\pi(x) q(x'|x)}\right)$

Update: If accepted, move to $x'$; otherwise, stay at $x$

Metropolis Algorithm for Spin Glasses

For spin glass simulations, we use the simpler Metropolis algorithm, where the proposal distribution is symmetric ( $q(x'|x) = q(x|x')$ ). The acceptance probability becomes:

$A(x'|x) = \min\left(1, \frac{\pi(x')}{\pi(x)}\right)$

For the canonical ensemble in statistical mechanics, $\pi(x) \propto \exp(-E(x)/T)$ , where $E(x)$ is the energy and $T$ is temperature. This gives:

$A(x'|x) = \min\left(1, \exp\left(-\frac{\Delta E}{T}\right)\right)$

where $\Delta E = E(x') - E(x)$ is the energy change. This is exactly the Metropolis criterion used in spin glass simulations!

Spin Glass Dynamics as a Markov Chain

The evolution of a spin glass system under Monte Carlo dynamics forms a Markov chain:

States: All possible spin configurations $\{\sigma_i\}$
Transitions: Spin flips (or other local moves)
Transition probabilities: Given by the Metropolis acceptance criterion
Stationary distribution: The Boltzmann distribution $P(\{\sigma_i\}) \propto \exp(-E(\{\sigma_i\})/T)$

The Markov chain property ensures that the system's future depends only on its current configuration, not on how it reached that configuration. This makes the dynamics mathematically tractable while still capturing the essential physics.

Ergodicity and Mixing

For the Markov chain to properly sample the equilibrium distribution, it must be ergodic: any configuration must be reachable from any other configuration in a finite number of steps. In spin glasses, ergodicity can be broken at low temperatures, where the system gets trapped in local minima—this is the spin glass phase transition!

Mixing time measures how quickly the Markov chain converges to its stationary distribution. In spin glasses, mixing times can be exponentially long, reflecting the rugged energy landscape.

The Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a classic optimization problem: given a set of cities and distances between them, find the shortest route that visits each city exactly once and returns to the starting city.

Despite its simple statement, TSP is NP-hard, meaning no known polynomial-time algorithm can solve it exactly for large instances. The problem's energy landscape shares remarkable similarities with spin glasses: both feature many local minima separated by energy barriers.

TSP as a Spin Glass

We can map TSP to a spin glass-like system:

Spins: Binary variables indicating whether each edge is in the tour
Energy: Total tour length
Constraints: Each city must have exactly two edges (degree constraint)
Couplings: Distances between cities

The Hamiltonian becomes:

$\mathcal{H} = \sum_{i,j} d_{ij} \sigma_{ij} + \lambda \sum_i \left(\sum_j \sigma_{ij} - 2\right)^2$

where $d_{ij}$ is the distance between cities $i$ and $j$, $\sigma_{ij} \in \{0,1\}$ indicates if edge $(i,j)$ is in the tour, and the second term enforces the degree constraint with penalty $\lambda$ .

Simulated Annealing for TSP

Simulated annealing, inspired by spin glass physics, provides an effective heuristic for TSP:

Initialization: Start with a random tour (or use a greedy construction)
Moves: Propose tour modifications (e.g., 2-opt, 3-opt swaps)
Acceptance: Use Metropolis criterion with temperature schedule
Cooling: Gradually decrease temperature

The algorithm explores the solution space at high temperature, then gradually focuses on promising regions as temperature decreases.

Interactive TSP Simulation

Below is an interactive simulation demonstrating simulated annealing for the Traveling Salesman Problem. The simulation shows how the algorithm explores the solution space, gradually finding shorter tours as the temperature decreases.

Interactive simulation: Watch as the algorithm explores different tours. At high temperature, the tour changes dramatically. As temperature decreases, the algorithm converges to a near-optimal solution. The displayed length is the current tour length.

Markov Chain Mixing in Spin Glasses

The mixing time of the Markov chain in spin glasses depends critically on temperature:

High temperature ( $T > T_g$ ): Fast mixing, system explores configuration space efficiently
Near critical temperature ( $T \approx T_g$ ): Critical slowing down, mixing time diverges
Low temperature ( $T < T_g$ ): Very slow mixing, system trapped in local minima

This temperature-dependent mixing reflects the phase transition: above $T_g$ , the system is ergodic; below $T_g$ , ergodicity is broken.

Autocorrelation Time

The autocorrelation time $\tau$ measures how many Monte Carlo steps are needed before configurations become statistically independent:

$C(t) = \langle \sigma_i(0) \sigma_i(t) \rangle \propto \exp(-t/\tau)$

Near the critical temperature, $\tau$ diverges as $\tau \sim |T - T_g|^{-\nu z}$ , where $\nu$ and $z$ are critical exponents. This critical slowing down makes simulations near $T_g$ computationally expensive.

Applications Beyond Spin Glasses

The Markov chain framework extends far beyond spin glasses:

Machine Learning

MC sampling: Generate samples from complex probability distributions
Bayesian inference: Sample from posterior distributions
Generative models: Train and sample from probabilistic models

Optimization

Simulated annealing: General-purpose optimization heuristic
Genetic algorithms: Population-based search with Markov chain dynamics
Particle swarm optimization: Stochastic search algorithms

Statistical Physics

Ising models: Ferromagnets, antiferromagnets, spin glasses
Lattice models: Potts models, XY models, Heisenberg models
Protein folding: Sampling conformational space

Machine Learning with Markov Chains

Markov chain Monte Carlo methods are fundamental to modern machine learning, providing powerful tools for inference, training, and understanding complex models. The connection between spin glasses and machine learning runs deep—many machine learning problems can be mapped to spin glass-like energy landscapes.

Bayesian Inference and Posterior Sampling

In Bayesian machine learning, we want to infer the posterior distribution over model parameters $\theta$ given data $D$:

$P(\theta | D) = \frac{P(D | \theta) P(\theta)}{P(D)} \propto P(D | \theta) P(\theta)$

For complex models, this posterior is often intractable to compute analytically. MC methods enable us to sample from the posterior, generating a Markov chain of parameter values that converges to the true posterior distribution.

Key advantages of MC for Bayesian inference:

Uncertainty quantification: Unlike point estimates (like maximum likelihood), MC provides full posterior distributions, capturing uncertainty in predictions
Model averaging: Can average over multiple parameter configurations weighted by their posterior probability
Handles complex priors: Works with any prior distribution, not just conjugate priors
No overfitting: Bayesian approach naturally incorporates regularization through priors

Example: Neural Network Training

Instead of finding a single set of weights that minimize loss, Bayesian neural networks sample weights from the posterior:

$P(\mathbf{w} | \mathcal{D}) \propto \exp\left(-\frac{1}{T} \mathcal{L}(\mathbf{w})\right) P(\mathbf{w})$

where $\mathcal{L}(\mathbf{w})$ is the loss function and $P(\mathbf{w})$ is the prior over weights. The "temperature" $T$ controls the sharpness of the posterior—analogous to temperature in spin glasses!

Boltzmann Machines: Spin Glasses in Machine Learning

Boltzmann machines are probabilistic neural networks that are directly inspired by spin glass physics. They consist of:

Visible units $v_i$ : Represent observed data
Hidden units $h_j$ : Capture latent features
Connections $W_{ij}$ : Weights between units (analogous to couplings $J_{ij}$ in spin glasses)

The energy function is:

$E(\mathbf{v}, \mathbf{h}) = -\sum_{i,j} W_{ij} v_i h_j - \sum_i b_i v_i - \sum_j c_j h_j$

where $b_i$ and $c_j$ are biases. The probability of a configuration follows the Boltzmann distribution:

$P(\mathbf{v}, \mathbf{h}) = \frac{1}{Z} \exp(-E(\mathbf{v}, \mathbf{h}) / T)$

where $Z$ is the partition function—exactly like in spin glasses!

Training Boltzmann Machines:

Contrastive Divergence: Uses MC to approximate gradients
Persistent Contrastive Divergence: Maintains a persistent Markov chain across training iterations
Parallel Tempering: Uses multiple chains at different temperatures (like replica exchange in spin glasses)

The connection is explicit: Boltzmann machines are spin glasses with learnable couplings!

Understanding Neural Network Loss Landscapes

The loss landscape of deep neural networks shares remarkable similarities with spin glass energy landscapes:

Many local minima: Neural networks have exponentially many local minima, just like spin glasses
Barrier structure: Minima are separated by energy barriers
Flat minima: Wide, flat minima generalize better (analogous to robust spin glass configurations)
Critical points: Phase transitions in learning dynamics

MC for exploring loss landscapes:

By treating the loss function as an energy landscape and using MC to sample from:

$P(\mathbf{w}) \propto \exp(-\mathcal{L}(\mathbf{w}) / T)$

we can:

Find diverse solutions: Sample multiple good solutions, not just one
Understand landscape structure: Characterize the distribution of minima
Robust training: Average over multiple solutions for better generalization
Uncertainty estimation: Quantify confidence in predictions

Generative Models and Sampling

Many generative models rely on MC for sampling:

Restricted Boltzmann Machines (RBMs):

Used in deep belief networks
Generate samples via Gibbs sampling (a type of MC)
Each Gibbs step is a Markov chain transition

Generative Adversarial Networks (GANs):

Can use MC for better sampling
Langevin dynamics (continuous MC) for generating high-quality samples

Variational Autoencoders (VAEs):

Use variational inference, but can be improved with MC
Hybrid approaches combine variational methods with MC refinement

Training Deep Networks with MC

Traditional gradient descent can get stuck in sharp minima. MC-based training offers advantages:

Stochastic Gradient Langevin Dynamics (SGLD): Combines stochastic gradient descent with Langevin noise:

$\mathbf{w}_{t+1} = \mathbf{w}_t - \epsilon_t \nabla \mathcal{L}(\mathbf{w}_t) + \sqrt{2\epsilon_t / \beta} \boldsymbol{\eta}_t$

where $\boldsymbol{\eta}_t \sim \mathcal{N}(0, I)$ is Gaussian noise. This creates a Markov chain that samples from the posterior, finding flatter minima that generalize better.

Benefits:

Better generalization: Flatter minima generalize better
Uncertainty quantification: Provides posterior over weights
Robust to initialization: Less sensitive to starting point
Handles non-convexity: Can escape local minima

Parallel Tempering for Machine Learning

Just as parallel tempering helps spin glass simulations overcome energy barriers, it can improve machine learning:

Multi-temperature training:

Run multiple copies of the model at different "temperatures" (learning rates or noise levels)
High-temperature copies explore widely
Low-temperature copies refine solutions
Periodic swaps between temperatures help escape local minima

This is particularly effective for:

Training deep networks: Helps find better minima
Hyperparameter optimization: Explores hyperparameter space more effectively
Ensemble methods: Generates diverse models for better predictions

Variational Inference vs MC

Variational Inference (VI):

Approximates posterior with a simpler distribution
Fast but biased (may miss important modes)

MC:

Samples from true posterior (asymptotically exact)
Slower but unbiased

Best of both worlds:

Use VI for fast approximation
Refine with MC for accuracy
Hybrid methods combine both approaches

Practical Applications

1. Uncertainty in Predictions

Traditional neural networks give point predictions. Bayesian neural networks (trained with MC) provide:

Predictive distributions: Full probability distributions over outputs
Confidence intervals: Quantify uncertainty
Out-of-distribution detection: Identify when model is uncertain

2. Active Learning

Use uncertainty estimates from MC to:

Identify most informative data points to label
Reduce labeling costs
Improve model with fewer examples

3. Model Selection

MC enables:

Bayesian model comparison: Compare models using Bayes factors
Automatic relevance determination: Prune unnecessary features
Architecture search: Sample over network architectures

4. Robust Optimization

Instead of finding a single minimum, MC:

Samples from the posterior over solutions
Averages predictions (Bayesian model averaging)
Reduces overfitting and improves generalization

The Spin Glass–Machine Learning Connection

The deep connection between spin glasses and machine learning:

Spin Glass Concept	Machine Learning Analogy
Energy landscape	Loss landscape
Spin configurations	Model parameters
Couplings $J_{ij}$	Neural network weights
Temperature $T$	Learning rate / noise level
Phase transition	Training dynamics transition
Frustration	Conflicting training signals
Replica symmetry breaking	Multiple good solutions

Key insight: Understanding spin glass physics helps us understand why neural networks work, why they sometimes fail, and how to make them better.

Future Directions

Quantum Machine Learning:

Quantum annealers (like D-Wave) can solve spin glass problems
Quantum algorithms for training neural networks
Quantum MC methods

Neuromorphic Computing:

Hardware that directly implements spin glass dynamics
Energy-efficient neural network inference
Analog computing for machine learning

Understanding Generalization:

Spin glass theory explains why flat minima generalize
Phase transitions in learning dynamics
Connection between geometry of loss landscape and generalization

Advanced MC Methods

Parallel Tempering (Replica Exchange)

Parallel tempering runs multiple Markov chains at different temperatures simultaneously, periodically swapping configurations between chains. This helps overcome energy barriers:

$P(\text{swap}) = \min\left(1, \exp\left((\beta_i - \beta_j)(E_i - E_j)\right)\right)$

where $\beta = 1/T$ is the inverse temperature. High-temperature chains explore widely, while low-temperature chains refine solutions.

Cluster Algorithms

Cluster algorithms (e.g., Swendsen-Wang, Wolff) update large clusters of spins simultaneously, dramatically reducing autocorrelation times near critical points. These algorithms exploit the system's structure to achieve faster mixing.

Hybrid Monte Carlo

Hybrid Monte Carlo combines molecular dynamics (deterministic) with Monte Carlo (stochastic) updates, useful for continuous systems and quantum Monte Carlo.

Conclusion

Markov chains provide the mathematical framework for understanding and simulating spin glass dynamics. The Metropolis algorithm, born from statistical mechanics, enables computational studies of these complex systems while also inspiring optimization algorithms for problems like TSP and machine learning models.

The connection runs both ways: spin glass physics informs algorithm design (e.g., cooling schedules, mixing times), while Markov chain theory provides tools for analyzing spin glass dynamics (e.g., autocorrelation, ergodicity). In machine learning, MC methods enable Bayesian inference, uncertainty quantification, and robust training of deep networks.

As we develop more sophisticated MC methods—parallel tempering, cluster algorithms, quantum Monte Carlo—we gain new insights into both spin glass physics, optimization algorithms, and machine learning. The deep connections between statistical mechanics, probability theory, and computer science continue to yield powerful new methods for understanding complex systems and training better models.

Key Takeaways:

Markov chains provide the mathematical foundation for simulating spin glasses and training machine learning models
MC methods enable sampling from complex distributions, essential for Bayesian inference
Spin glass physics directly informs machine learning through Boltzmann machines and understanding of loss landscapes
Parallel tempering and other advanced MC methods improve both spin glass simulations and neural network training
Uncertainty quantification in machine learning benefits from MC-based Bayesian approaches

Related Articles:

Spin Glasses - Overview of spin glass physics
Simulated Annealing - Optimization algorithms inspired by spin glasses
Quantum Spin Glass - Quantum extensions and quantum Monte Carlo
Quantum Computing - Hybrid Approaches - Quantum annealing and connections to spin glass physics

References

Foundational Papers

Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087-1092. DOI: 10.1063/1.1699114

The original paper introducing the Metropolis algorithm, one of the first Monte Carlo methods for statistical mechanics.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97-109. DOI: 10.1093/biomet/57.1.97

Extended the Metropolis algorithm to the more general Metropolis-Hastings algorithm, enabling sampling from arbitrary distributions.
Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680. DOI: 10.1126/science.220.4598.671

Introduced simulated annealing as an optimization technique, drawing inspiration from statistical mechanics and spin glass physics.

Spin Glasses and Statistical Mechanics

Edwards, S. F., & Anderson, P. W. (1975). Theory of spin glasses. Journal of Physics F: Metal Physics, 5(5), 965. DOI: 10.1088/0305-4608/5/5/017

Foundational work on the theory of spin glasses, introducing the Edwards-Anderson model.
Sherrington, D., & Kirkpatrick, S. (1975). Solvable model of a spin-glass. Physical Review Letters, 35(26), 1792. DOI: 10.1103/PhysRevLett.35.1792

Introduced the Sherrington-Kirkpatrick model, a mean-field spin glass model that became central to understanding spin glass physics.
Parisi, G. (1983). Infinite number of order parameters for spin-glasses. Physical Review Letters, 43(23), 1754. DOI: 10.1103/PhysRevLett.43.1754

Revolutionary work on replica symmetry breaking in spin glasses, for which Parisi won the 2021 Nobel Prize in Physics.

Advanced MC Methods

Swendsen, R. H., & Wang, J. S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Physical Review Letters, 58(2), 86. DOI: 10.1103/PhysRevLett.58.86

Introduced the Swendsen-Wang cluster algorithm, dramatically improving MC efficiency near critical points.
Hukushima, K., & Nemoto, K. (1996). Exchange Monte Carlo method and application to spin glass simulations. Journal of the Physical Society of Japan, 65(6), 1604-1608. DOI: 10.1143/JPSJ.65.1604

Introduced parallel tempering (replica exchange) for spin glass simulations, enabling efficient sampling across temperature ranges.
Wolff, U. (1989). Collective Monte Carlo updating for spin systems. Physical Review Letters, 62(4), 361. DOI: 10.1103/PhysRevLett.62.361

Developed the Wolff cluster algorithm, another powerful cluster update method for spin systems.

Machine Learning and Neural Networks

Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1), 147-169. DOI: 10.1016/S0364-0213(85)80012-4

Introduced Boltzmann machines and their learning algorithm, directly connecting spin glass physics to machine learning.
Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771-1800. DOI: 10.1162/089976602760128018

Introduced contrastive divergence for training Boltzmann machines, making them practical for machine learning applications.
Welling, M., & Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. Proceedings of the 28th International Conference on Machine Learning (ICML-11), 681-688.

Introduced Stochastic Gradient Langevin Dynamics (SGLD), combining stochastic gradient descent with Langevin dynamics for Bayesian neural networks.

Loss Landscapes and Generalization

Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., & LeCun, Y. (2015). The loss surfaces of multilayer networks. Artificial Intelligence and Statistics, 192-204.

Explored the connection between neural network loss landscapes and spin glass energy landscapes, showing the similarity in structure.
Baldassi, C., Ingrosso, A., Lucibello, C., Saglietti, L., & Zecchina, R. (2016). Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses. Physical Review Letters, 116(12), 128101. DOI: 10.1103/PhysRevLett.116.128101

Applied spin glass theory to understand learning in neural networks with discrete synapses.
Baity-Jesi, M., Sagun, L., Geiger, M., Spigler, S., Arous, G. B., Cammarota, C., ... & LeCun, Y. (2018). Comparing dynamics: deep neural networks versus glassy systems. Journal of Statistical Mechanics: Theory and Experiment, 2018(12), 124013. DOI: 10.1088/1742-5468/aaecee

Direct comparison between neural network training dynamics and spin glass dynamics, revealing deep connections.

Parallel Tempering and Optimization

Earl, D. J., & Deem, M. W. (2005). Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23), 3910-3916. DOI: 10.1039/B509983H

Comprehensive review of parallel tempering methods and their applications across physics, chemistry, and optimization.
Kofke, D. A. (2002). On the acceptance probability of replica-exchange Monte Carlo trials. The Journal of Chemical Physics, 117(15), 6911-6914. DOI: 10.1063/1.1507776

Detailed analysis of replica exchange acceptance probabilities and optimization strategies.

Traveling Salesman Problem

Lin, S., & Kernighan, B. W. (1973). An effective heuristic algorithm for the traveling-salesman problem. Operations Research, 21(2), 498-516. DOI: 10.1287/opre.21.2.498

Classic paper on heuristic algorithms for TSP, including the 2-opt and 3-opt methods used in simulated annealing.
Johnson, D. S., & McGeoch, L. A. (1997). The traveling salesman problem: A case study in local optimization. In Local Search in Combinatorial Optimization (pp. 215-310). Princeton University Press.

Comprehensive study of local search methods for TSP, including simulated annealing and other metaheuristics.

Bayesian Inference and MC

Gelfand, A. E., & Smith, A. F. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398-409. DOI: 10.1080/01621459.1990.10476213

Early work establishing MC methods for Bayesian inference, showing their practical utility.
Neal, R. M. (2011). MC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 2.

Comprehensive introduction to Hamiltonian Monte Carlo (HMC), a powerful MC method combining molecular dynamics with Monte Carlo.

Modern Perspectives

Mezard, M., & Montanari, A. (2009). Information, Physics, and Computation. Oxford University Press.

Comprehensive textbook connecting information theory, statistical physics (including spin glasses), and computer science, with extensive coverage of MC methods.
Newman, M. E., & Barkema, G. T. (1999). Monte Carlo Methods in Statistical Physics. Oxford University Press.

Classic textbook on Monte Carlo methods in statistical physics, including detailed treatment of spin glass simulations.
MacKay, D. J. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.

Excellent introduction to information theory, Bayesian inference, and machine learning, with clear explanations of MC methods and their connections to statistical physics.

Markov Chains and Spin Glasses - Stochastic Dynamics in Disordered Systems

Markov Chains: A Primer

Stationary Distribution

Metropolis-Hastings Algorithm

Metropolis Algorithm for Spin Glasses

Spin Glass Dynamics as a Markov Chain

Ergodicity and Mixing

The Traveling Salesman Problem

TSP as a Spin Glass

Simulated Annealing for TSP

Interactive TSP Simulation

Markov Chain Mixing in Spin Glasses

Autocorrelation Time

Applications Beyond Spin Glasses

Machine Learning

Optimization

Statistical Physics

Machine Learning with Markov Chains

Bayesian Inference and Posterior Sampling

Boltzmann Machines: Spin Glasses in Machine Learning

Understanding Neural Network Loss Landscapes

Generative Models and Sampling

Training Deep Networks with MC

Parallel Tempering for Machine Learning

Variational Inference vs MC

Practical Applications

The Spin Glass–Machine Learning Connection

Future Directions

Advanced MC Methods

Parallel Tempering (Replica Exchange)

Cluster Algorithms

Hybrid Monte Carlo

Conclusion

References

Foundational Papers

Spin Glasses and Statistical Mechanics

Advanced MC Methods

Machine Learning and Neural Networks

Loss Landscapes and Generalization

Parallel Tempering and Optimization

Traveling Salesman Problem

Bayesian Inference and MC

Modern Perspectives

Explore Categories

Related Content

The Contemporary Landscape of Spin Glass Research

Simulated Annealing - From Spin Glasses to Optimization

Quantum Spin Glasses - Tunneling Through Energy Landscapes

Quantum Computing: Hybrid Approaches