Monte Carlo Methods

tags: Machine Learning Algorithms, Probabilistic Graph Models

Monte Carlo methods make use of random numbers to solve the following problems:

Generating samples ${x^{(r)}}_{r = 1}^{R}$ from a given probability distribution $P (x)$ .
Estimate expectation of functions under this distribution:

$Φ = ⟨ ϕ (x) ⟩ = \int d^{N} P (x) ϕ (x)$

This probability distribution is called the target density. The target density is often the posterior of a model’s parameters, given observed data.

If we solve the first problem of sampling, then these samples can be used to solve the second problem via the Monte Carlo estimator:

$\hat{ϕ} = \frac{1}{R} \sum_{r} ϕ (x^{(r)})$

If the samples are generated from $P (x)$ , then the expectation of $\hat{ϕ}$ is the same as the expectation of $ϕ$ . The variance of $\hat{ϕ}$ decreases as $σ^{2} / R$ , where $σ^{2}$ is the variance of $ϕ$ . This is so important that it is restated here:

The accuracy of the Monte Carlo estimate is dependent only on the variance of $ϕ$ , and not on the dimensionality of the space sampled.

Why is sampling hard?

Suppose we can evaluate $P (x)$ up to a multiplicative constant $Z$ : $P^*(x) = $P (x) Z$ . To generate samples from $P (x)$ , we need to know the normalizing constant $Z$ . Even if we knew $Z$ , there is no obvious way to sample without enumerating most or all of the possible states.

Jethro's Braindump

Monte Carlo Methods

Why is sampling hard?

Links to this note