Jethro's Braindump

Generalisation Error

We can define generalisation error as the discrepancy between \(E_in\) and \(E_out\). The Hoeffding Inequality characterises the generalization error with a probabilistic bound:

\begin{align} P[|E_{in}(g) - E_{out}(g)| > \epsilon] \le 2Me^{-2\epsilon^2N} \end{align}

Pick a tolerance level \(\delta\), and assert with probability \(1-\delta\) that

\begin{align} E_{out}(g) \le E_{in}(g) + \sqrt{\frac{1}{2N}\ln \frac{2M}{\delta}} \end{align}

Notice the error bound depends on \(M\), the size of the hypothesis set \(H\). Most learning models have infinite \(H\), including the simple perceptron. Hence, to study generalisation in such models, we need to derive a counterpart that deals with infinite \(H\).

Notice that the \(M\) factor was obtained by taking the disjunction of events. Let \(B_m\) be the bad event that \(|E_{in}(h_m) - E_{out}(h_m)| > \epsilon\). Notice that these bad events are often strongly overlapping, and the disjunction of these events form a much smaller area.

The mathematical theory of generalisation hinges on this observation. Upon accounting for the overlaps of different hypotheses, we will be able to replace the number of hypotheses \(M\) with an effective finite number, even while \(M\) is infinite.

Links to this note