Daniel Gray

Thoughts, Notes, Ideas, Projects

← Back to home

Markov Chains and Spin Glasses - Stochastic Dynamics in Disordered Systems

The connection between Markov chains and spin glasses runs deep, providing both a computational framework for studying spin glass dynamics and a physical interpretation of optimization algorithms. This article explores how Markov chain Monte Carlo (MCMC) methods enable the simulation of spin glass systems, and how spin glass physics informs the design of optimization algorithms for problems like the Traveling Salesman Problem.

Markov Chains: A Primer

A Markov chain is a stochastic process where the probability of transitioning to a new state depends only on the current state, not on the history of previous states. This "memoryless" property, known as the Markov property, makes Markov chains powerful tools for modeling complex systems.

Formally, a Markov chain is defined by:

  1. State space $S$: The set of all possible states
  2. Transition probabilities PijP_{ij}: The probability of transitioning from state $i$ to state $j$
  3. Initial distribution π0\pi_0: The probability distribution over states at time t=0t=0

The transition probabilities satisfy:

jSPij=1iS\sum_{j \in S} P_{ij} = 1 \quad \forall i \in S

Stationary Distribution

For an ergodic Markov chain (one that can reach any state from any other state), there exists a unique stationary distribution π\pi^* such that:

π=πP\pi^* = \pi^* P

where $P$ is the transition matrix. The stationary distribution represents the long-term probability of being in each state.

Metropolis-Hastings Algorithm

The Metropolis-Hastings algorithm is a Markov chain Monte Carlo method for sampling from a probability distribution. It constructs a Markov chain whose stationary distribution is the desired distribution.

For a target distribution π(x)\pi(x), the algorithm works as follows:

  1. Proposal: From current state $x$, propose a new state $x'$ with probability q(xx)q(x'|x)
  2. Acceptance: Accept the proposal with probability:

A(xx)=min(1,π(x)q(xx)π(x)q(xx))A(x'|x) = \min\left(1, \frac{\pi(x') q(x|x')}{\pi(x) q(x'|x)}\right)

  1. Update: If accepted, move to $x'$; otherwise, stay at $x$

Metropolis Algorithm for Spin Glasses

For spin glass simulations, we use the simpler Metropolis algorithm, where the proposal distribution is symmetric (q(xx)=q(xx)q(x'|x) = q(x|x')). The acceptance probability becomes:

A(xx)=min(1,π(x)π(x))A(x'|x) = \min\left(1, \frac{\pi(x')}{\pi(x)}\right)

For the canonical ensemble in statistical mechanics, π(x)exp(E(x)/T)\pi(x) \propto \exp(-E(x)/T), where E(x)E(x) is the energy and $T$ is temperature. This gives:

A(xx)=min(1,exp(ΔET))A(x'|x) = \min\left(1, \exp\left(-\frac{\Delta E}{T}\right)\right)

where ΔE=E(x)E(x)\Delta E = E(x') - E(x) is the energy change. This is exactly the Metropolis criterion used in spin glass simulations!

Spin Glass Dynamics as a Markov Chain

The evolution of a spin glass system under Monte Carlo dynamics forms a Markov chain:

  • States: All possible spin configurations {σi}\{\sigma_i\}
  • Transitions: Spin flips (or other local moves)
  • Transition probabilities: Given by the Metropolis acceptance criterion
  • Stationary distribution: The Boltzmann distribution P({σi})exp(E({σi})/T)P(\{\sigma_i\}) \propto \exp(-E(\{\sigma_i\})/T)

The Markov chain property ensures that the system's future depends only on its current configuration, not on how it reached that configuration. This makes the dynamics mathematically tractable while still capturing the essential physics.

Ergodicity and Mixing

For the Markov chain to properly sample the equilibrium distribution, it must be ergodic: any configuration must be reachable from any other configuration in a finite number of steps. In spin glasses, ergodicity can be broken at low temperatures, where the system gets trapped in local minima—this is the spin glass phase transition!

Mixing time measures how quickly the Markov chain converges to its stationary distribution. In spin glasses, mixing times can be exponentially long, reflecting the rugged energy landscape.

The Traveling Salesman Problem

The Traveling Salesman Problem (TSP) is a classic optimization problem: given a set of cities and distances between them, find the shortest route that visits each city exactly once and returns to the starting city.

Despite its simple statement, TSP is NP-hard, meaning no known polynomial-time algorithm can solve it exactly for large instances. The problem's energy landscape shares remarkable similarities with spin glasses: both feature many local minima separated by energy barriers.

TSP as a Spin Glass

We can map TSP to a spin glass-like system:

  • Spins: Binary variables indicating whether each edge is in the tour
  • Energy: Total tour length
  • Constraints: Each city must have exactly two edges (degree constraint)
  • Couplings: Distances between cities

The Hamiltonian becomes:

H=i,jdijσij+λi(jσij2)2\mathcal{H} = \sum_{i,j} d_{ij} \sigma_{ij} + \lambda \sum_i \left(\sum_j \sigma_{ij} - 2\right)^2

where dijd_{ij} is the distance between cities $i$ and $j$, σij{0,1}\sigma_{ij} \in \{0,1\} indicates if edge (i,j)(i,j) is in the tour, and the second term enforces the degree constraint with penalty λ\lambda.

Simulated Annealing for TSP

Simulated annealing, inspired by spin glass physics, provides an effective heuristic for TSP:

  1. Initialization: Start with a random tour (or use a greedy construction)
  2. Moves: Propose tour modifications (e.g., 2-opt, 3-opt swaps)
  3. Acceptance: Use Metropolis criterion with temperature schedule
  4. Cooling: Gradually decrease temperature

The algorithm explores the solution space at high temperature, then gradually focuses on promising regions as temperature decreases.

Interactive TSP Simulation

Below is an interactive simulation demonstrating simulated annealing for the Traveling Salesman Problem. The simulation shows how the algorithm explores the solution space, gradually finding shorter tours as the temperature decreases.

Interactive simulation: Watch as the algorithm explores different tours. At high temperature, the tour changes dramatically. As temperature decreases, the algorithm converges to a near-optimal solution. The displayed length is the current tour length.

Markov Chain Mixing in Spin Glasses

The mixing time of the Markov chain in spin glasses depends critically on temperature:

  • High temperature (T>TgT > T_g): Fast mixing, system explores configuration space efficiently
  • Near critical temperature (TTgT \approx T_g): Critical slowing down, mixing time diverges
  • Low temperature (T<TgT < T_g): Very slow mixing, system trapped in local minima

This temperature-dependent mixing reflects the phase transition: above TgT_g, the system is ergodic; below TgT_g, ergodicity is broken.

Autocorrelation Time

The autocorrelation time τ\tau measures how many Monte Carlo steps are needed before configurations become statistically independent:

C(t)=σi(0)σi(t)exp(t/τ)C(t) = \langle \sigma_i(0) \sigma_i(t) \rangle \propto \exp(-t/\tau)

Near the critical temperature, τ\tau diverges as τTTgνz\tau \sim |T - T_g|^{-\nu z}, where ν\nu and $z$ are critical exponents. This critical slowing down makes simulations near TgT_g computationally expensive.

Applications Beyond Spin Glasses

The Markov chain framework extends far beyond spin glasses:

Machine Learning

  • MC sampling: Generate samples from complex probability distributions
  • Bayesian inference: Sample from posterior distributions
  • Generative models: Train and sample from probabilistic models

Optimization

  • Simulated annealing: General-purpose optimization heuristic
  • Genetic algorithms: Population-based search with Markov chain dynamics
  • Particle swarm optimization: Stochastic search algorithms

Statistical Physics

  • Ising models: Ferromagnets, antiferromagnets, spin glasses
  • Lattice models: Potts models, XY models, Heisenberg models
  • Protein folding: Sampling conformational space

Machine Learning with Markov Chains

Markov chain Monte Carlo methods are fundamental to modern machine learning, providing powerful tools for inference, training, and understanding complex models. The connection between spin glasses and machine learning runs deep—many machine learning problems can be mapped to spin glass-like energy landscapes.

Bayesian Inference and Posterior Sampling

In Bayesian machine learning, we want to infer the posterior distribution over model parameters θ\theta given data $D$:

P(θD)=P(Dθ)P(θ)P(D)P(Dθ)P(θ)P(\theta | D) = \frac{P(D | \theta) P(\theta)}{P(D)} \propto P(D | \theta) P(\theta)

For complex models, this posterior is often intractable to compute analytically. MC methods enable us to sample from the posterior, generating a Markov chain of parameter values that converges to the true posterior distribution.

Key advantages of MC for Bayesian inference:

  1. Uncertainty quantification: Unlike point estimates (like maximum likelihood), MC provides full posterior distributions, capturing uncertainty in predictions
  2. Model averaging: Can average over multiple parameter configurations weighted by their posterior probability
  3. Handles complex priors: Works with any prior distribution, not just conjugate priors
  4. No overfitting: Bayesian approach naturally incorporates regularization through priors

Example: Neural Network Training

Instead of finding a single set of weights that minimize loss, Bayesian neural networks sample weights from the posterior:

P(wD)exp(1TL(w))P(w)P(\mathbf{w} | \mathcal{D}) \propto \exp\left(-\frac{1}{T} \mathcal{L}(\mathbf{w})\right) P(\mathbf{w})

where L(w)\mathcal{L}(\mathbf{w}) is the loss function and P(w)P(\mathbf{w}) is the prior over weights. The "temperature" $T$ controls the sharpness of the posterior—analogous to temperature in spin glasses!

Boltzmann Machines: Spin Glasses in Machine Learning

Boltzmann machines are probabilistic neural networks that are directly inspired by spin glass physics. They consist of:

  • Visible units viv_i: Represent observed data
  • Hidden units hjh_j: Capture latent features
  • Connections WijW_{ij}: Weights between units (analogous to couplings JijJ_{ij} in spin glasses)

The energy function is:

E(v,h)=i,jWijvihjibivijcjhjE(\mathbf{v}, \mathbf{h}) = -\sum_{i,j} W_{ij} v_i h_j - \sum_i b_i v_i - \sum_j c_j h_j

where bib_i and cjc_j are biases. The probability of a configuration follows the Boltzmann distribution:

P(v,h)=1Zexp(E(v,h)/T)P(\mathbf{v}, \mathbf{h}) = \frac{1}{Z} \exp(-E(\mathbf{v}, \mathbf{h}) / T)

where $Z$ is the partition function—exactly like in spin glasses!

Training Boltzmann Machines:

  1. Contrastive Divergence: Uses MC to approximate gradients
  2. Persistent Contrastive Divergence: Maintains a persistent Markov chain across training iterations
  3. Parallel Tempering: Uses multiple chains at different temperatures (like replica exchange in spin glasses)

The connection is explicit: Boltzmann machines are spin glasses with learnable couplings!

Understanding Neural Network Loss Landscapes

The loss landscape of deep neural networks shares remarkable similarities with spin glass energy landscapes:

  • Many local minima: Neural networks have exponentially many local minima, just like spin glasses
  • Barrier structure: Minima are separated by energy barriers
  • Flat minima: Wide, flat minima generalize better (analogous to robust spin glass configurations)
  • Critical points: Phase transitions in learning dynamics

MC for exploring loss landscapes:

By treating the loss function as an energy landscape and using MC to sample from:

P(w)exp(L(w)/T)P(\mathbf{w}) \propto \exp(-\mathcal{L}(\mathbf{w}) / T)

we can:

  1. Find diverse solutions: Sample multiple good solutions, not just one
  2. Understand landscape structure: Characterize the distribution of minima
  3. Robust training: Average over multiple solutions for better generalization
  4. Uncertainty estimation: Quantify confidence in predictions

Generative Models and Sampling

Many generative models rely on MC for sampling:

Restricted Boltzmann Machines (RBMs):

  • Used in deep belief networks
  • Generate samples via Gibbs sampling (a type of MC)
  • Each Gibbs step is a Markov chain transition

Generative Adversarial Networks (GANs):

  • Can use MC for better sampling
  • Langevin dynamics (continuous MC) for generating high-quality samples

Variational Autoencoders (VAEs):

  • Use variational inference, but can be improved with MC
  • Hybrid approaches combine variational methods with MC refinement

Training Deep Networks with MC

Traditional gradient descent can get stuck in sharp minima. MC-based training offers advantages:

Stochastic Gradient Langevin Dynamics (SGLD): Combines stochastic gradient descent with Langevin noise:

wt+1=wtϵtL(wt)+2ϵt/βηt\mathbf{w}_{t+1} = \mathbf{w}_t - \epsilon_t \nabla \mathcal{L}(\mathbf{w}_t) + \sqrt{2\epsilon_t / \beta} \boldsymbol{\eta}_t

where ηtN(0,I)\boldsymbol{\eta}_t \sim \mathcal{N}(0, I) is Gaussian noise. This creates a Markov chain that samples from the posterior, finding flatter minima that generalize better.

Benefits:

  • Better generalization: Flatter minima generalize better
  • Uncertainty quantification: Provides posterior over weights
  • Robust to initialization: Less sensitive to starting point
  • Handles non-convexity: Can escape local minima

Parallel Tempering for Machine Learning

Just as parallel tempering helps spin glass simulations overcome energy barriers, it can improve machine learning:

Multi-temperature training:

  • Run multiple copies of the model at different "temperatures" (learning rates or noise levels)
  • High-temperature copies explore widely
  • Low-temperature copies refine solutions
  • Periodic swaps between temperatures help escape local minima

This is particularly effective for:

  • Training deep networks: Helps find better minima
  • Hyperparameter optimization: Explores hyperparameter space more effectively
  • Ensemble methods: Generates diverse models for better predictions

Variational Inference vs MC

Variational Inference (VI):

  • Approximates posterior with a simpler distribution
  • Fast but biased (may miss important modes)

MC:

  • Samples from true posterior (asymptotically exact)
  • Slower but unbiased

Best of both worlds:

  • Use VI for fast approximation
  • Refine with MC for accuracy
  • Hybrid methods combine both approaches

Practical Applications

1. Uncertainty in Predictions

Traditional neural networks give point predictions. Bayesian neural networks (trained with MC) provide:

  • Predictive distributions: Full probability distributions over outputs
  • Confidence intervals: Quantify uncertainty
  • Out-of-distribution detection: Identify when model is uncertain

2. Active Learning

Use uncertainty estimates from MC to:

  • Identify most informative data points to label
  • Reduce labeling costs
  • Improve model with fewer examples

3. Model Selection

MC enables:

  • Bayesian model comparison: Compare models using Bayes factors
  • Automatic relevance determination: Prune unnecessary features
  • Architecture search: Sample over network architectures

4. Robust Optimization

Instead of finding a single minimum, MC:

  • Samples from the posterior over solutions
  • Averages predictions (Bayesian model averaging)
  • Reduces overfitting and improves generalization

The Spin Glass–Machine Learning Connection

The deep connection between spin glasses and machine learning:

Spin Glass Concept Machine Learning Analogy
Energy landscape Loss landscape
Spin configurations Model parameters
Couplings JijJ_{ij} Neural network weights
Temperature $T$ Learning rate / noise level
Phase transition Training dynamics transition
Frustration Conflicting training signals
Replica symmetry breaking Multiple good solutions

Key insight: Understanding spin glass physics helps us understand why neural networks work, why they sometimes fail, and how to make them better.

Future Directions

Quantum Machine Learning:

  • Quantum annealers (like D-Wave) can solve spin glass problems
  • Quantum algorithms for training neural networks
  • Quantum MC methods

Neuromorphic Computing:

  • Hardware that directly implements spin glass dynamics
  • Energy-efficient neural network inference
  • Analog computing for machine learning

Understanding Generalization:

  • Spin glass theory explains why flat minima generalize
  • Phase transitions in learning dynamics
  • Connection between geometry of loss landscape and generalization

Advanced MC Methods

Parallel Tempering (Replica Exchange)

Parallel tempering runs multiple Markov chains at different temperatures simultaneously, periodically swapping configurations between chains. This helps overcome energy barriers:

P(swap)=min(1,exp((βiβj)(EiEj)))P(\text{swap}) = \min\left(1, \exp\left((\beta_i - \beta_j)(E_i - E_j)\right)\right)

where β=1/T\beta = 1/T is the inverse temperature. High-temperature chains explore widely, while low-temperature chains refine solutions.

Cluster Algorithms

Cluster algorithms (e.g., Swendsen-Wang, Wolff) update large clusters of spins simultaneously, dramatically reducing autocorrelation times near critical points. These algorithms exploit the system's structure to achieve faster mixing.

Hybrid Monte Carlo

Hybrid Monte Carlo combines molecular dynamics (deterministic) with Monte Carlo (stochastic) updates, useful for continuous systems and quantum Monte Carlo.

Conclusion

Markov chains provide the mathematical framework for understanding and simulating spin glass dynamics. The Metropolis algorithm, born from statistical mechanics, enables computational studies of these complex systems while also inspiring optimization algorithms for problems like TSP and machine learning models.

The connection runs both ways: spin glass physics informs algorithm design (e.g., cooling schedules, mixing times), while Markov chain theory provides tools for analyzing spin glass dynamics (e.g., autocorrelation, ergodicity). In machine learning, MC methods enable Bayesian inference, uncertainty quantification, and robust training of deep networks.

As we develop more sophisticated MC methods—parallel tempering, cluster algorithms, quantum Monte Carlo—we gain new insights into both spin glass physics, optimization algorithms, and machine learning. The deep connections between statistical mechanics, probability theory, and computer science continue to yield powerful new methods for understanding complex systems and training better models.

Key Takeaways:

  1. Markov chains provide the mathematical foundation for simulating spin glasses and training machine learning models
  2. MC methods enable sampling from complex distributions, essential for Bayesian inference
  3. Spin glass physics directly informs machine learning through Boltzmann machines and understanding of loss landscapes
  4. Parallel tempering and other advanced MC methods improve both spin glass simulations and neural network training
  5. Uncertainty quantification in machine learning benefits from MC-based Bayesian approaches

Related Articles:

References

Foundational Papers

  1. Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953). Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6), 1087-1092. DOI: 10.1063/1.1699114

    The original paper introducing the Metropolis algorithm, one of the first Monte Carlo methods for statistical mechanics.

  2. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), 97-109. DOI: 10.1093/biomet/57.1.97

    Extended the Metropolis algorithm to the more general Metropolis-Hastings algorithm, enabling sampling from arbitrary distributions.

  3. Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680. DOI: 10.1126/science.220.4598.671

    Introduced simulated annealing as an optimization technique, drawing inspiration from statistical mechanics and spin glass physics.

Spin Glasses and Statistical Mechanics

  1. Edwards, S. F., & Anderson, P. W. (1975). Theory of spin glasses. Journal of Physics F: Metal Physics, 5(5), 965. DOI: 10.1088/0305-4608/5/5/017

    Foundational work on the theory of spin glasses, introducing the Edwards-Anderson model.

  2. Sherrington, D., & Kirkpatrick, S. (1975). Solvable model of a spin-glass. Physical Review Letters, 35(26), 1792. DOI: 10.1103/PhysRevLett.35.1792

    Introduced the Sherrington-Kirkpatrick model, a mean-field spin glass model that became central to understanding spin glass physics.

  3. Parisi, G. (1983). Infinite number of order parameters for spin-glasses. Physical Review Letters, 43(23), 1754. DOI: 10.1103/PhysRevLett.43.1754

    Revolutionary work on replica symmetry breaking in spin glasses, for which Parisi won the 2021 Nobel Prize in Physics.

Advanced MC Methods

  1. Swendsen, R. H., & Wang, J. S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Physical Review Letters, 58(2), 86. DOI: 10.1103/PhysRevLett.58.86

    Introduced the Swendsen-Wang cluster algorithm, dramatically improving MC efficiency near critical points.

  2. Hukushima, K., & Nemoto, K. (1996). Exchange Monte Carlo method and application to spin glass simulations. Journal of the Physical Society of Japan, 65(6), 1604-1608. DOI: 10.1143/JPSJ.65.1604

    Introduced parallel tempering (replica exchange) for spin glass simulations, enabling efficient sampling across temperature ranges.

  3. Wolff, U. (1989). Collective Monte Carlo updating for spin systems. Physical Review Letters, 62(4), 361. DOI: 10.1103/PhysRevLett.62.361

    Developed the Wolff cluster algorithm, another powerful cluster update method for spin systems.

Machine Learning and Neural Networks

  1. Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for Boltzmann machines. Cognitive Science, 9(1), 147-169. DOI: 10.1016/S0364-0213(85)80012-4

    Introduced Boltzmann machines and their learning algorithm, directly connecting spin glass physics to machine learning.

  2. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771-1800. DOI: 10.1162/089976602760128018

    Introduced contrastive divergence for training Boltzmann machines, making them practical for machine learning applications.

  3. Welling, M., & Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. Proceedings of the 28th International Conference on Machine Learning (ICML-11), 681-688.

    Introduced Stochastic Gradient Langevin Dynamics (SGLD), combining stochastic gradient descent with Langevin dynamics for Bayesian neural networks.

Loss Landscapes and Generalization

  1. Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., & LeCun, Y. (2015). The loss surfaces of multilayer networks. Artificial Intelligence and Statistics, 192-204.

    Explored the connection between neural network loss landscapes and spin glass energy landscapes, showing the similarity in structure.

  2. Baldassi, C., Ingrosso, A., Lucibello, C., Saglietti, L., & Zecchina, R. (2016). Subdominant dense clusters allow for simple learning and high computational performance in neural networks with discrete synapses. Physical Review Letters, 116(12), 128101. DOI: 10.1103/PhysRevLett.116.128101

    Applied spin glass theory to understand learning in neural networks with discrete synapses.

  3. Baity-Jesi, M., Sagun, L., Geiger, M., Spigler, S., Arous, G. B., Cammarota, C., ... & LeCun, Y. (2018). Comparing dynamics: deep neural networks versus glassy systems. Journal of Statistical Mechanics: Theory and Experiment, 2018(12), 124013. DOI: 10.1088/1742-5468/aaecee

    Direct comparison between neural network training dynamics and spin glass dynamics, revealing deep connections.

Parallel Tempering and Optimization

  1. Earl, D. J., & Deem, M. W. (2005). Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7(23), 3910-3916. DOI: 10.1039/B509983H

    Comprehensive review of parallel tempering methods and their applications across physics, chemistry, and optimization.

  2. Kofke, D. A. (2002). On the acceptance probability of replica-exchange Monte Carlo trials. The Journal of Chemical Physics, 117(15), 6911-6914. DOI: 10.1063/1.1507776

    Detailed analysis of replica exchange acceptance probabilities and optimization strategies.

Traveling Salesman Problem

  1. Lin, S., & Kernighan, B. W. (1973). An effective heuristic algorithm for the traveling-salesman problem. Operations Research, 21(2), 498-516. DOI: 10.1287/opre.21.2.498

    Classic paper on heuristic algorithms for TSP, including the 2-opt and 3-opt methods used in simulated annealing.

  2. Johnson, D. S., & McGeoch, L. A. (1997). The traveling salesman problem: A case study in local optimization. In Local Search in Combinatorial Optimization (pp. 215-310). Princeton University Press.

    Comprehensive study of local search methods for TSP, including simulated annealing and other metaheuristics.

Bayesian Inference and MC

  1. Gelfand, A. E., & Smith, A. F. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410), 398-409. DOI: 10.1080/01621459.1990.10476213

    Early work establishing MC methods for Bayesian inference, showing their practical utility.

  2. Neal, R. M. (2011). MC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo, 2(11), 2.

    Comprehensive introduction to Hamiltonian Monte Carlo (HMC), a powerful MC method combining molecular dynamics with Monte Carlo.

Modern Perspectives

  1. Mezard, M., & Montanari, A. (2009). Information, Physics, and Computation. Oxford University Press.

    Comprehensive textbook connecting information theory, statistical physics (including spin glasses), and computer science, with extensive coverage of MC methods.

  2. Newman, M. E., & Barkema, G. T. (1999). Monte Carlo Methods in Statistical Physics. Oxford University Press.

    Classic textbook on Monte Carlo methods in statistical physics, including detailed treatment of spin glass simulations.

  3. MacKay, D. J. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press.

    Excellent introduction to information theory, Bayesian inference, and machine learning, with clear explanations of MC methods and their connections to statistical physics.

Explore Categories

Related Content