Boosting Simple Learners
Abstract
We study boosting algorithms under the assumption that the given weak learner outputs hypotheses from a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are “rules-of-thumbs” from an “easy-to-learn class”. Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions:
(i) Oracle Complexity: we show that this restriction allows to circumvent a lower bound due to Freund
and Schapire (‘95, ‘12) on the number of calls to the weak learner. Unlike previous boosting algorithms
which aggregate the weak hypotheses by majority votes, in order to circumvent the lower bound we propose a boosting procedure that uses more complex aggregation rules, and we show this to be necessary.
(ii) Expressivity: Which tasks can be learned by boosting a base-class of bounded VC-dimension? Can
concepts that are “far away” from the class be learned? Towards answering this question we identify a
combinatorial-geometric parameter which quantifies the expressivity of a base-class when used as part of a boosting procedure. Along the way, we establish and exploit connections with Discrepancy Theory.
(i) Oracle Complexity: we show that this restriction allows to circumvent a lower bound due to Freund
and Schapire (‘95, ‘12) on the number of calls to the weak learner. Unlike previous boosting algorithms
which aggregate the weak hypotheses by majority votes, in order to circumvent the lower bound we propose a boosting procedure that uses more complex aggregation rules, and we show this to be necessary.
(ii) Expressivity: Which tasks can be learned by boosting a base-class of bounded VC-dimension? Can
concepts that are “far away” from the class be learned? Towards answering this question we identify a
combinatorial-geometric parameter which quantifies the expressivity of a base-class when used as part of a boosting procedure. Along the way, we establish and exploit connections with Discrepancy Theory.