Nikhil Khani

Nikhil Khani

Nikhil Khani is a seasoned technologist with a demonstrated history of leading impactful Machine Learning initiatives within the tech industry. Currently serving as a Staff Software Engineer at Google, Nikhil has landed critical projects related improving video recommendation quality at YouTube. He's the lead author on multiple patents and his expertise extends beyond recommendations. In his previous role at VMware (now Broadcom) Nikhil worked on improving cloud infrastructure using graphical neural networks as a Senior Machine Learning Engineer.

In addition to his professional endeavors, Nikhil actively contributes to the broader tech community. He serves as a program chair and peer reviewer for prestigious AI and ML conferences, shaping the landscape of technological discourse. His commitment to excellence and innovation has garnered him numerous accolades, including the Code Excellence Awards and multiple other awards for high quality improvements to YouTube Recommendations.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Zero-shot Cross-domain Knowledge distillation: A Case study on YouTube Music
    Srivaths Ranganathan
    Chieh Lo
    Bernardo Cunha
    Li Wei
    Aniruddh Nath
    Shawn Andrews
    Gergo Varady
    Yanwei Song
    Jochen Klingenhoefer
    2025
    Preview abstract Knowledge Distillation (KD) has been widely used to improve the quality of latency sensitive models serving live traffic. However, applying KD in production recommender systems with low traffic is challenging: the limited amount of data restricts the teacher model size, and the cost of training a large dedicated teacher may not be justified. Cross-domain KD offers a cost-effective alternative by leveraging a teacher from a data-rich source domain, but introduces unique technical difficulties, as the features, user interfaces, and prediction tasks can significantly differ. We present a case study of using zero-shot cross-domain KD for multi-task ranking models, transferring knowledge from a (100X) large-scale video recommendation platform (YouTube) to a music recommendation application with significantly lower traffic. We present offline and live experiment results and share learnings from evaluating different KD techniques in this setting across two ranking models on the YouTube Music application. Our results demonstrate that zero-shot cross-domain KD is a practical and effective approach to improve the performance of a ranking model on a low traffic surface. View details
    Enhancing Online Ranking Systems via Multi-SurfaceCo-Training for Content Understanding
    Gwendolyn Zhao
    Yilin Zheng
    Raghu Keshavan
    Lukasz Heldt
    Qian Sun
    Fabio Soldo
    Li Wei
    Aniruddh Nath
    Dapo Omidiran
    Rein Zhang
    Mei Chen
    Lichan Hong
    2025
    Preview abstract Content understanding is an important part in real-world recommendation systems. This paper introduces a Multi-surface Co-training (MulCo) system, designed to enhance online ranking systems by improving content understanding. The model is trained through a task-aligned co-training approach, leveraging objectives and data from multiple surfaces and various pre-trained em-beddings. It separates video content understanding into an offline model, enabling scalability and efficient resource use. Experiments demonstrate that MulCo significantly outperforms non-task-aligned pre-trained embeddings and achieves substantial gains in online satisfied engagement metrics. This system presents a practical solution to improve content understanding in multi-surface large-scale recommender systems. View details
    Bridging the Gap: Unpacking the Hidden Challenges in Knowledge Distillation for Online Ranking Systems
    Shuo Yang
    Aniruddh Nath
    Yang Liu
    Li Wei
    Shawn Andrews
    Maciej Kula
    Jarrod Kahn
    Zhe Zhao
    Lichan Hong
    2024
    Preview abstract Knowledge Distillation (KD) is a powerful approach for compressing large models into smaller, more efficient models, particularly beneficial for latency-sensitive applications like recommender systems. However, current KD research predominantly focuses on Computer Vision (CV) and NLP tasks, overlooking unique data characteristics and challenges inherent to recommender systems. This paper addresses these overlooked challenges, specifically: (1) mitigating data distribution shifts between teacher and student models, (2) efficiently identifying optimal teacher configurations within time and budgetary constraints, and (3) enabling computationally efficient and rapid sharing of teacher labels to support multiple students. We present a robust KD system developed and rigorously evaluated on multiple large-scale personalized video recommendation systems within Google. Our live experiment results demonstrate significant improvements in student model performance while ensuring the consistent and reliable generation of high-quality teacher labels from continuous data streams. View details
    ×