# Prateek Jain

Prateek Jain is a research scientist at Google Research India and an adjunct faculty member at IIT Kanpur. Earlier, he was a Senior Principal Researcher at Microsoft Research India. He obtained his PhD degree from the Computer Science department at UT Austin and his BTech degree from IIT Kanpur. He works in the areas of large-scale and non-convex optimization, high-dimensional statistics, and ML for resource-constrained devices. He wrote a monograph on Non-convex Optimization in Machine Learning summarizing many of his results in non-convex optimization. Prateek regularly serves on the senior program committee of top ML conferences and is an action editor for JMLR, and an associate editor for SIMODS. His work has won ICML-2007, CVPR-2008 best student paper award and more recently his work on alternating minimization has been selected as the 2020 Best Paper by the IEEE Signal Processing Society.
Please visit prateekjain.org for my personal website.

### Research Areas

Authored Publications

Google Publications

Other Publications

Sort By

LEARNING AN INVERTIBLE OUTPUT MAPPING CAN MITIGATE SIMPLICITY BIAS IN NEURAL NETWORKS

Anshul Nasery

Sravanti Addepalli

Will be submitted to ICLR 2023 (2023) (to appear)

Preview abstract
Deep Neural Networks (DNNs) are known to be brittle to even minor distribution
shifts compared to the training distribution . Simplicity Bias (SB) of DNNs – bias
towards learning a small number of simplest features – has been demonstrated
to be a key reason for this brittleness. Prior works have shown that the effect of
Simplicity Bias is extreme – even when the features learned are diverse, training
the classification head again selects only few of the simplest features, leading to
similarly brittle models. In this work, we introduce Feature Reconstruction Regularizer
(FRR) in the linear classification head, with the aim of reducing Simplicity
Bias, thereby improving Out-Of-Distribution (OOD) robustness. The proposed
regularizer when used during linear layer training, termed as FRR-L, enforces that
the features can be reconstructed back from the logit layer, ensuring that diverse
features participate in the classification task. We further propose to finetune the
full network by freezing the weights of the linear layer trained using FRR-L. This
approach, termed as FRR-FLFT or Fixed Linear FineTuning, improves the quality
of the learned features, making them more suitable for the classification task.
Using this simple solution, we demonstrate up to 12% gain in accuracy on the
recently introduced synthetic datasets with extreme distribution shifts. Moreover,
on the standard OOD benchmarks recommended on DomainBed, our technique
can provide up to 5% gains over the existing SOTA methods .
View details

Domain-Agnostic Contrastive Representations for Learning from Label Proportions

Jay Nandy

Jatin Chauhan

Balaraman Ravindran

Proc. CIKM 2022

Preview abstract
We study the weak supervision learning problem of Learning from
Label Proportions (LLP) where the goal is to learn an instance-level
classifier using proportions of various class labels in a bag – a
collection of input instances that often can be highly correlated. While
representation learning for weakly-supervised tasks is found to
be effective, they often require domain knowledge. To the best of
our knowledge, representation learning for tabular data
(unstructured data containing both continuous and categorical
features) are not studied. In this paper, we propose to learn diverse
representations of instances within the same bags to effectively
utilize the weak bag-level supervision. We propose a domain
agnostic LLP method, called "Self Contrastive Representation
Learning for LLP" (SelfCLR-LLP) that incorporates a novel self–
contrastive function as an auxiliary loss to learn representations on
tabular data for LLP. We show that diverse representations for
instances within the same bags aid efficient usage of the weak bag-
level LLP supervision. We evaluate the proposed method through
extensive experiments on real-world LLP datasets from e-commerce
applications to demonstrate the effectiveness of our proposed
SelfCLR-LLP.
View details

Matryoshka Representation Learning

Gantavya Bhatt

Aniket Rege

Matthew Wallingford

Aditya Sinha

Vivek Ramanujan

William Howard-Snyder

Sham Kakade

Ali Farhadi

NeurIPS 2022 (2022)

Preview abstract
Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.
View details

LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes

Matthew Wallingford

Vivek Ramanujan

Raghav Somani

Jae Sung Park

Krishna Pillutla

Sham Kakade

Ali Farhadi

Advances in Neural Information Processing Systems 34 (2021)

Preview abstract
Learning binary representations of instances and classes is a classical problem with several high potential applications. In modern settings, the compression of high-dimensional neural representations to low-dimensional binary codes is a challenging task and often require high-dimensions to be accurate. In this work, we propose a novel method for \textbf{L}earning \textbf{L}ow-dimensional binary \textbf{C}odes (\llc) for instances as well as classes for any standard classification dataset. Our method does {\em not} require any metadata about the problem and learns extremely low-dimensional binary codes ($\approx 20$ bits for ImageNet-1K). The learnt codes are super efficient while still ensuring {\em nearly optimal} classification accuracy for ResNet50 on ImageNet-1K. We demonstrate that the learnt codes do capture intrinsically important features in the data, by discovering an intuitive taxonomy over classes. We further quantitatively measure the quality of our codes by applying it to the efficient image retrieval as well as out-of-distribution (OOD) detection problems. For the retrieval problem on ImageNet-100, our learnt codes outperform $16$ bit HashNet by $2\%$ \& $15\%$ on MAP@1000 using only $10$ \& $16$ bits respectively. Finally, our learnt binary codes, without any fine-tuning, have the capability to do effective OOD detection out of the box. Code and models will be open-sourced.
View details

Private Alternating Least Squares: (Nearly) OptimalPrivacy/Utility Trade-off for Matrix Completion

Abhradeep Guha Thakurta

Li Zhang

Steffen Rendle

NA, NA (2021), NA

Preview abstract
We study the problem of differentially private (DP) matrix completion under user-level privacy. We design an $(\epsilon,\delta)$-joint differentially private variant of the popular Alternating-Least-Squares (ALS) method that achieves: i) (nearly) optimal sample complexity for matrix completion (in terms of number of items, users), and ii) best known privacy/utility trade-off both theoretically, as well as on benchmark data sets.
In particular, despite non-convexity of low-rank matrix completion and ALS, we provide the first global convergence analysis of ALS with {\em noise} introduced to ensure DP. For $n$ being the number of users and $m$ being the number of items in the rating matrix, our analysis requires only about $\log (n+m)$ samples per user (ignoring rank, condition number factors) and obtains a sample complexity of $n=\tilde\Omega(m/(\sqrt{\zeta}\cdot \epsilon))$ to ensure relative Frobenius norm error of $\zeta$. This improves significantly on the previous state of the result of $n=\tilde\Omega\left(m^{5/4}/(\zeta^{5}\epsilon)\right)$ for the private-FW method by ~\citet{jain2018differentially}.
Furthermore, we extensively validate our method on synthetic and benchmark data sets (MovieLens 10mi, MovieLens 20mi), and observe that private ALS only suffers a 6 percentage drop in accuracy when compared to the non-private baseline for $\epsilon\leq 10$. Furthermore, compared to prior work of~\cite{jain2018differentially}, it is at least better by 10 percentage for all choice of the privacy parameters.
View details

No Results Found