Abstract

Complex classifiers may exhibit ``embarassing'' failures in cases that would be easily classified and justified by a human. Avoiding such failures is obviously paramount, particularly in domains where we cannot accept such unexplained behavior. In this work we focus on one such setting, where a label is perfectly predictable if the input contains certain features, and otherwise, it is predictable by a linear classifier. We define a related hypothesis class and determine its sample complexity.
We also give evidence that efficient algorithms cannot, unfortunately, enjoy this sample complexity. We then derive a simple and efficient algorithm, and also give evidence that its sample complexity is optimal, among efficient algorithms. Experiments on sentiment analysis demonstrate the efficacy of the method, both in terms of accuracy and interpretability.