In supervised learning, we are given training data \(\mathcal{D}\), and we need to learn a function \(f\) that can make predictions for all possible input values. To do this, assumptions need to be made, because there are infinitely many function consistent with the training data. In general, there are 2 options:

- Restricting the class of functions considered
- Assigning a prior probability to every possible function

Restricting the class has several issues. First, if the class is too restrictive, we might not find a function that matches. If a class is not restrictive enough, we might overfit the training data.

Assigning a prior probability also has problems, because there are an infinite set of possible functions. This is where Gaussian processes come in.

A Gaussian process is a generalization of the Gaussian probability distribution. Whereas a probability distribution describes random variables which are scalars or vectors, a stochastic process governs the properties of functions. One can think of a function as an extremely long vector, with each entry specifying the function value \(f(x)\) at that input \(x\). If one asks only for properties of the function at a finite number of points, Gaussian processes yield the same answer ignoring the infinitely many other points.

## TODO Gaussian Process, not quite for dummies - Yuge Shi

## References

- Gaussian Processes for Machine Learning