Generalisation Error

We can define generalisation error as the discrepancy between $E_{i} n$ and $E_{o} u t$ . The Hoeffding Inequality characterises the generalization error with a probabilistic bound:

$\begin{array}{r} P [| E_{i n} (g) - E_{o u t} (g) | > ϵ] \leq 2 M e^{- 2 ϵ^{2} N} \end{array}$

Pick a tolerance level $δ$ , and assert with probability $1 - δ$ that

$\begin{array}{r} E_{o u t} (g) \leq E_{i n} (g) + \sqrt{\frac{1}{2 N} \ln \frac{2 M}{δ}} \end{array}$

Notice the error bound depends on $M$ , the size of the hypothesis set $H$ . Most learning models have infinite $H$ , including the simple perceptron. Hence, to study generalisation in such models, we need to derive a counterpart that deals with infinite $H$ .

Notice that the $M$ factor was obtained by taking the disjunction of events. Let $B_{m}$ be the bad event that $| E_{i n} (h_{m}) - E_{o u t} (h_{m}) | > ϵ$ . Notice that these bad events are often strongly overlapping, and the disjunction of these events form a much smaller area.

The mathematical theory of generalisation hinges on this observation. Upon accounting for the overlaps of different hypotheses, we will be able to replace the number of hypotheses $M$ with an effective finite number, even while $M$ is infinite.

Jethro's Braindump

Generalisation Error

Links to this note