Discrete Chebyshev Classifiers
Abstract
In large scale learning problems it is often easy
to collect simple statistics of the data, but hard
or impractical to store all the original data. A
key question in this setting is how to construct
classifiers based on such partial information. One
traditional approach to the problem has been to
use maximum entropy arguments to induce a
complete distribution on variables from statistics.
However, this approach essentially makes
conditional independence assumptions about the
distribution, and furthermore does not optimize
prediction loss. Here we present a framework for
discriminative learning given a set of statistics.
Specifically, we address the case where all
variables are discrete and we have access to
various marginals. Our approach minimizes the
worst case hinge loss in this case, which upper
bounds the generalization error. We show that for
certain sets of statistics the problem is tractable,
and in the general case can be approximated
using MAP LP relaxations. Empirical results
show that the method is competitive with other
approaches that use the same input.