Rademacher Complexity
In PAC Learning, We have shown that uniform convergence is a sufficient condition for learnability. Rademacher complexity measures the rate of uniform convergence. Rademacher complexity can also be used to provide generalization bounds.
Definition
Let us denote:
given
We define the representativeness of
Suppose we would like to estimate the representativeness of
If we define
The Rademacher complexity measure captures this idea by considering
the expectation of the above with respect to a random choice of
Let the variables in
More generally, given a set of vectors
The following lemma bounds the expected value of the
representativeness of
This lemma yields that, in expectation, the ERM rule finds a hypothesis which is close to the optimal hypothesis in \mathcal{H}.
Furthermore, for any
Furthermore, if
Using McDiarmid’s Inequality, we can derive bounds with better dependence on the confidence parameter:
Assume that for all
- With probability at least
, for all ,
In particular, this holds for
- With probability at least
, for all ,
- For any
, with probability at least ,
Massart’s lemma states that the Rademacher complexity of a finite set grows logarithmically with the size of the set.
Let
The contraction lemma shows that composing
Rademacher complexity of linear classes
We define 2 linear classes:
Let
The following lemma
Let