Algorithms and Hardness for Learning Linear Thresholds from Label Proportions

Proc. NeurIPS'22 (2022)
Google Scholar

Abstract

We study the learnability of linear threshold functions (LTFs) in the learning from label proportions (LLP) framework. In this, the feature-vector classifier is learnt from bags of feature-vectors and their corresponding observed label proportions which are satisfied by (i.e., consistent with) some unknown LTF. This problem has been investigated in recent work (Saket21) which gave an algorithm to produce an LTF that satisfies at least (2/5)-fraction of a satisfiable collection of bags, each of size <= 2, by solving and rounding a natural SDP relaxation. However, this SDP relaxation is specific to at most 2-sized bags and does not apply to bags of larger size.

In this work we provide a fairly non-trivial SDP relaxation of a non-quadratic formulation for bags of size 3. We analyze its rounding procedure using novel matrix decomposition techniques to obtain an algorithm which outputs an LTF satisfying at least (1/12)-fraction of the bags of size <= 3. We also apply our techniques to bags of size q >= 4 to provide a Omega(1/q)-approximation guarantee for a weaker notion of satisfiability. We include comparative experiments on simulated data demonstrating the applicability of our algorithmic techniques.

From the complexity side we provide a hardness reduction to produce instances with bags of any constant size q. Our reduction proves the NP-hardness of satisfying more than 1/q + o(1) fraction of a satisfiable collection of such bags using as hypothesis any function of constantly many LTFs, showing thereby that the problem is harder to approximate as the bag size q increases. Using a strengthened analysis, for q=2 we obtain a (4/9) + o(1) hardness factor for this problem, improving upon the (1/2) + o(1) factor shown by Saket21.