Energy-based Models

The energy surface is a “contrast function” that takes low values on the data manifold and higher values everywhere else. The key idea is to have low energy for the observed data, and high energy everywhere else.

It is easy to make energy low on seen samples, but more difficult to make energy high on unseen samples.

We can also reinterpret Principle Component Analyses and K-means as energy-based models.

Strategies to Shape the Energy Function

Build the machine so that the volume of low energy stuff is constant:
- PCA, K-means, GMM, square ICA
Push down the energy of data points, push up everywhere else
- Maximum likelihood (requires tractable partition function, or variational approximation)
Push down the energy of data points, push up chosen locations
- Contrastive divergence, Ratio Matching, Noise Contrastive Estimation, Min Probability Flow, Adversarial Generator/GANs
Minimize the gradient and maximum curvature around data points
- Score matching
If \(F(Y) = ||Y - G(Y)||^{2}\), then make G(Y) as constant as possible
- Contracting auto-encoder, saturating auto-encoder
Train a dynamical system so that the dynamics goes to the data manifold
- Denoising auto-encoder, masked auto-encoder
Use a regularizer that limits the volume of space that has low energy
- Sparse coding, Sparse auto-encoder, LISTA & PSD, Variational auto-encoder