Nikhil Khani
Nikhil Khani is a seasoned technologist with a demonstrated history of leading impactful Machine Learning initiatives within the tech industry. Currently serving as a Staff Software Engineer at Google, Nikhil has landed critical projects related improving video recommendation quality at YouTube. He's the lead author on multiple patents and his expertise extends beyond recommendations. In his previous role at VMware (now Broadcom) Nikhil worked on improving cloud infrastructure using graphical neural networks as a Senior Machine Learning Engineer.
In addition to his professional endeavors, Nikhil actively contributes to the broader tech community. He serves as a program chair and peer reviewer for prestigious AI and ML conferences, shaping the landscape of technological discourse. His commitment to excellence and innovation has garnered him numerous accolades, including the Code Excellence Awards and multiple other awards for high quality improvements to YouTube Recommendations.
In addition to his professional endeavors, Nikhil actively contributes to the broader tech community. He serves as a program chair and peer reviewer for prestigious AI and ML conferences, shaping the landscape of technological discourse. His commitment to excellence and innovation has garnered him numerous accolades, including the Code Excellence Awards and multiple other awards for high quality improvements to YouTube Recommendations.
Authored Publications
Sort By
Zero-shot Cross-domain Knowledge distillation: A Case study on YouTube Music
Srivaths Ranganathan
Chieh Lo
Bernardo Cunha
Li Wei
Aniruddh Nath
Shawn Andrews
Gergo Varady
Yanwei Song
Jochen Klingenhoefer
2025
Preview abstract
Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ.
We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100X) large-scale video recommendation platform (YouTube) to a music recommendation application with significantly lower traffic. We present offline and live experiment results and share learnings from evaluating different KD techniques in this setting across two ranking models on the YouTube Music application. Our results demonstrate that zero-shot cross-domain KD is a practical and effective approach to improve the performance of a ranking model on a low traffic surface.
View details
Enhancing Online Ranking Systems via Multi-SurfaceCo-Training for Content Understanding
Gwendolyn Zhao
Yilin Zheng
Raghu Keshavan
Lukasz Heldt
Qian Sun
Fabio Soldo
Li Wei
Aniruddh Nath
Dapo Omidiran
Rein Zhang
Mei Chen
Lichan Hong
2025
Preview abstract
Content understanding is an important part in real-world recommendation systems. This paper introduces a Multi-surface Co-training (MulCo) system, designed to enhance online ranking systems by improving content understanding. The model is trained through a task-aligned co-training approach, leveraging objectives and data from multiple surfaces and various pre-trained em-beddings. It separates video content understanding into an offline model, enabling scalability and efficient resource use. Experiments demonstrate that MulCo significantly outperforms non-task-aligned pre-trained embeddings and achieves substantial gains in online satisfied engagement metrics. This system presents a practical solution to improve content understanding in multi-surface large-scale recommender systems.
View details
Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
Shuo Yang
Aniruddh Nath
Yang Liu
Li Wei
Shawn Andrews
Maciej Kula
Jarrod Kahn
Zhe Zhao
Lichan Hong
2024
Preview abstract
Knowledge Distillation (KD) is a powerful approach for compressing large models into smaller, more efficient models, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring the consistent and reliable generation of high-quality teacher labels from continuous data streams.
View details