
Aditya Srinivas Timmaraju
Aditya Srinivas Timmaraju is a Senior Staff Research Engineer at Google DeepMind, passionate about pushing the boundaries of Artifical Intelligence to create a positive impact at planet scale. His current and past work spans across Large Language Models (LLMs), Stochastic Neural Networks, Reinforcement Learning and Responsible AI. His publications have been featured as oral presentations at multiple A* conferences. He has organized workshops at top-tier conferences like ACM FAccT. His Erdős number is 3.
His work has also been featured in major news publications like The Wall Street Journal, The New York Times, The Verge and others. His work on AI fairness has also been acknowledged as "a groundbreaking resolution [that] sets a new standard for addressing discrimination through machine learning" by DoJ press releases. His personal webpage can be accessed at adityatimm.com.
Research Areas
Authored Publications
Sort By
Matryoshka Model Learning for Improved Elastic Student Models
Chetan Verma
Cho-Jui Hsieh
Ngot Bui
Yang Zhang
Wen Chen
Xin Liu
Inderjit Dhillon
2025
Preview abstract
Industry-grade ML models are carefully designed to meet rapidly evolving serving constraints, which requires significant resources for model development. In this paper, we propose MatTA, a framework for training multiple accurate Student models using a novel Teacher-TA-Student recipe. TA models are larger versions of the Student models with higher capacity, and thus allow Student models to better relate to the Teacher model and also bring in more domain-specific expertise. Furthermore, multiple accurate Student models can be extracted from the TA model. Therefore, despite only one training run, our methodology provides multiple servable options to trade off accuracy for lower serving cost. We demonstrate the proposed method, MatTA, on proprietary datasets and models. Its practical efficacy is underscored by live A/B tests within a production ML system, demonstrating 20% improvement on a key metric. We also demonstrate our method on GPT-2 Medium, a public model, and achieve relative improvements of over 24% on SAT Math and over 10% on the LAMBADA benchmark.
View details
Matryoshka Model Learning for Improved Elastic Student Models
Cho-Jui Hsieh
Chetan Verma
Inderjit Dhillon
Xin Liu
Wen Chen
Ngot Bui
Yang Zhang
2025
Preview abstract
Production machine learning models in the industry are often devel-oped with a primary focus on maximizing model quality. However,these models must ultimately operate within the resource con-straints of their serving infrastructure, including limitations on com-pute, memory and bandwidth. The rapid evolution of serving hard-ware, particularly with advancements in accelerator technology,necessitates periodic retraining to leverage newer, more efficientinfrastructure. This cyclical retraining process is resource-intensive,demanding significant model development time and incurring sub-stantial training costs. This challenge is further amplified by thetrend towards increasingly complex models, which inherently re-quire greater computational resources for training and deployment.While prior work has explored techniques like supernet sub-modelextraction to address training efficiency, a critical gap remains: theefficient generation of a spectrum of high-quality models froman existing production model, a common requirement in diverseindustrial applications. To bridge this gap, we introduce a novel ap-proach leveraging a "Teaching Assistant" (TA) model, derived froma given production model (referred to as the Student model). Wedemonstrate that through co-training the Student and TA modelswith Matryoshka structure while using online distillation, we notonly enhance the Student model’s performance but also enable theflexible creation of a model family offering a compelling trade-offbetween model quality and model size.
View details