Jethro's Braindump

Monte Carlo Methods

tags
Machine Learning Algorithms, Probabilistic Graph Models

Monte Carlo methods make use of random numbers to solve the following problems:

  1. Generating samples \(\left\{x^{( r)}\right\}^{R}_{r=1}\) from a given probability distribution \(P(x)\).
  2. Estimate expectation of functions under this distribution:

\begin{equation} \Phi = \langle \phi(x) \rangle = \int d^N P(x) \phi(x) \end{equation}

This probability distribution is called the target density. The target density is often the posterior of a model’s parameters, given observed data.

If we solve the first problem of sampling, then these samples can be used to solve the second problem via the Monte Carlo estimator:

\begin{equation} \hat{\phi} = \frac{1}{R}\sum_{r} \phi(\mathbf{x}^{( r)}) \end{equation}

If the samples are generated from \(P(x)\), then the expectation of \(\hat{\phi}\) is the same as the expectation of \(\phi\). The variance of \(\hat{\phi}\) decreases as \(\sigma^2/R\), where \(\sigma^2\) is the variance of \(\phi\). This is so important that it is restated here:

The accuracy of the Monte Carlo estimate is dependent only on the variance of \(\phi\), and not on the dimensionality of the space sampled.

Why is sampling hard?

Suppose we can evaluate \(P(x)\) up to a multiplicative constant \(Z\): $P^*(x) = \(P(x) Z\). To generate samples from \(P(x)\), we need to know the normalizing constant \(Z\). Even if we knew \(Z\), there is no obvious way to sample without enumerating most or all of the possible states.