# Rishi Saket

Authored Publications

Sort By

Domain-Agnostic Contrastive Representations for Learning from Label Proportions

Jay Nandy

Jatin Chauhan

Balaraman Ravindran

Proc. CIKM 2022

Preview abstract
We study the weak supervision learning problem of Learning from
Label Proportions (LLP) where the goal is to learn an instance-level
classifier using proportions of various class labels in a bag – a
collection of input instances that often can be highly correlated. While
representation learning for weakly-supervised tasks is found to
be effective, they often require domain knowledge. To the best of
our knowledge, representation learning for tabular data
(unstructured data containing both continuous and categorical
features) are not studied. In this paper, we propose to learn diverse
representations of instances within the same bags to effectively
utilize the weak bag-level supervision. We propose a domain
agnostic LLP method, called "Self Contrastive Representation
Learning for LLP" (SelfCLR-LLP) that incorporates a novel self–
contrastive function as an auxiliary loss to learn representations on
tabular data for LLP. We show that diverse representations for
instances within the same bags aid efficient usage of the weak bag-
level LLP supervision. We evaluate the proposed method through
extensive experiments on real-world LLP datasets from e-commerce
applications to demonstrate the effectiveness of our proposed
SelfCLR-LLP.
View details

Algorithms and Hardness for Learning Linear Thresholds from Label Proportions

Proc. NeurIPS'22(2022)

Preview abstract
We study the learnability of linear threshold functions (LTFs) in the learning from label proportions (LLP) framework. In this, the feature-vector classifier is learnt from bags of feature-vectors and their corresponding observed label proportions which are satisfied by (i.e., consistent with) some unknown LTF. This problem has been investigated in recent work (Saket21) which gave an algorithm to produce an LTF that satisfies at least (2/5)-fraction of a satisfiable collection of bags, each of size <= 2, by solving and rounding a natural SDP relaxation. However, this SDP relaxation is specific to at most 2-sized bags and does not apply to bags of larger size.
In this work we provide a fairly non-trivial SDP relaxation of a non-quadratic formulation for bags of size 3. We analyze its rounding procedure using novel matrix decomposition techniques to obtain an algorithm which outputs an LTF satisfying at least (1/12)-fraction of the bags of size <= 3. We also apply our techniques to bags of size q >= 4 to provide a Omega(1/q)-approximation guarantee for a weaker notion of satisfiability. We include comparative experiments on simulated data demonstrating the applicability of our algorithmic techniques.
From the complexity side we provide a hardness reduction to produce instances with bags of any constant size q. Our reduction proves the NP-hardness of satisfying more than 1/q + o(1) fraction of a satisfiable collection of such bags using as hypothesis any function of constantly many LTFs, showing thereby that the problem is harder to approximate as the bag size q increases. Using a strengthened analysis, for q=2 we obtain a (4/9) + o(1) hardness factor for this problem, improving upon the (1/2) + o(1) factor shown by Saket21.
View details

Learnability of Linear Thresholds from Label Proportions

Proc. NeurIPS'21(2021)

Preview abstract
We study the problem of properly learning linear threshold functions (LTFs) in the learning from label proportions (LLP) framework. In this, the learning is on a collection of bags of feature-vectors with only the proportion of labels available for each bag.
First, we provide an algorithm that, given a collection of such bags each of size at most two whose label proportions are consistent with (i.e., the bags are satisfied by) an unknown LTF, efficiently produces an LTF that satisfies at least (2/5)-fraction of the bags. If all the bags are non-monochromatic (i.e., bags of size two with differently labeled feature-vectors) the algorithm satisfies at least (1/2)-fraction of them. For the special case of OR over the d-dimensional boolean vectors, we give an algorithm which computes an LTF achieving an additional Ω(1/d) in accuracy for the two cases.
Our main result provides evidence that these algorithmic bounds cannot be significantly improved, even for learning monotone ORs using LTFs. We prove that it is NP-hard, given a collection of non-monochromatic bags which are all satisfied by some monotone OR, to compute any function of constantly many LTFs that satisfies (1/2 + ε)-fraction of the bags for any constant ε > 0. This bound is tight for the non-monochromatic bags case.
The above is in contrast to the usual supervised learning setup (i.e., unit-sized bags) in which LTFs are efficiently learnable to arbitrary accuracy using linear programming, and even a trivial algorithm (any LTF or its complement) achieves an accuracy of 1/2. These techniques however, fail in the LLP setting. Indeed, we show that the LLP learning of LTFs (even for the special case of monotone ORs) using LTFs dramatically increases in complexity as soon as bags of size two are allowed. Our work gives the first inapproximability for LLP learning LTFs, and a strong complexity separation between LLP and traditional supervised learning.
View details