The Bias-Complexity Tradeoff
Training data can mislead the learner, and result in overfitting. To
overcome this problem, we can restrict the search space to some
hypothesis space
The No-Free-Lunch Theorem
The No-Free-Lunch theorem states that for binary classification tasks, for every learner there exists a distribution on which it fails. We say that the learner fails if, upon receiving i.i.d. examples from that distribution, its output hypothesis is likely to have a large risk, whereas for the same distribution, there exists another learner that will output a hypothesis with a small risk. In other words, every learner has tasks on which it fails while others succeed.
Therefore, when approaching a particular learning problem, defined by
some distribution
Error Decomposition
we can decompose the error of an
where
The approximation error is the minimum risk achievable by a predictor in the hypothesis class. This term measures how much risk we have because we restrict ourselves to a specific class (how much inductive bias we have). This approximation error does not depend on the sample size, and is solely determined by the hypothesis class chosen.
Under the realizability assumption, the approximate error is zero. In the agnostic case, the approximation error can be large (it always includes the error of the Bayes optimal predictor).
The estimation error is the difference between the approximation error and the error achieved by the ERM predictor. The estimation error results because the empirical risk (training error) is only an estimate of the true risk.
Since our goal is to minimize the total error, we face a tradeoff,
called the bias-complexity tradeoff. Choosing