PAC Learning Linear Thresholds from Label Proportions

Anand Brahmbhatt
Proc. NeurIPS 2023 (to appear)


Learning from label proportions (LLP) is a generalization of supervised learning in which the training data is available as sets or bags of feature-vectors (instances) along with the average instance-label of each bag. The goal is to train a good instance classifier. While most previous works in LLP have focused on training models on such training data, computational learnability in LLP only recently been explored by [Saket21,Saket22], who showed worst case intractability of properly learning linear threshold functions (LTFs) from label proportions while not ruling out efficient algorithms for this problem under distributional assumptions. In this work we show that it is indeed possible to efficiently learn LTFs using LTFs when given access to random bags of some label proportion in which feature-vectors are independently sampled from a fixed Gaussian distribution N(mu, Sigma), conditioned on the label assigned by the target LTF. Our method estimates a matrix by sampling pairs of feature-vector from the bags with and without replacement, and we prove that the principal component of this matrix necessarily yields the normal vector of the LTF. For some special cases with N(0, I) we provide a simpler expectation based algorithm. We include an experimental evaluation of our learning algorithms along with a comparison of with those of [Saket21, Saket22] and random LTFs, demonstrating the effectiveness of our techniques.

Research Areas