Xinyang Yi
Research Areas
Authored Publications
Sort By
Improving Training Stability for Multitask Ranking Models in Recommender Systems
Justin Gilmer
Li Wei
Lichan Hong
Mahesh Sathiamoorthy
KDD 2023 (2023)
Preview abstract
Recommender systems play an important role in YouTube, one of the largest online video platforms across the world. In this paper, we focus on a real-world multitask ranking model for YouTube recommendations.
While most of the recommendation research is dedicated to designing better models to improve user engagement and satisfaction, we found that research on stabilizing the training for such models is severely under-explored.
As the recommendation models become larger and more sophisticated, they are more vulnerable to training instability issues, \emph{i.e.}, the loss diverges (instead of converging) which can make the model unusable, wasting significant resources and blocking model iterations.
In this paper, we share our understanding and best practices we learned for improving the training stability of a multitask ranking model used in production. We show some properties of the model that lead to unstable training and speculate on the cause. Furthermore, we propose an effective solution to improve training stability based on our observations of training dynamics when model training starts to become unstable. Our experiments on a proprietary dataset show the effectiveness of the proposed method over several commonly used baseline methods.
View details
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Can Xu
Sriraj Badam
Trevor Potter
Daniel Li
Hao Wan
Elaine Le
Chris Berg
Eric Bencomo Dixon
(2021)
Preview abstract
How might we design Reinforcement Learning (RL)-based recommenders that
encourage aligning user trajectories with the underlying user satisfaction?
Three research questions are key: (1) measuring user satisfaction, (2)
combatting sparsity of satisfaction signals, and (3) adapting the training of
the recommender agent to maximize satisfaction. For measurement, it has been
found that surveys explicitly asking users to rate their experience with
consumed items can provide valuable orthogonal information to the
engagement/interaction data, acting as a proxy to the underlying user
satisfaction. For sparsity, i.e, only being able to observe how satisfied users
are with a tiny fraction of user-item interactions, imputation models can be
useful in predicting satisfaction level for all items users have consumed. For
learning satisfying recommender policies, we postulate that reward shaping in
RL recommender agents is powerful for driving satisfying user experiences.
Putting everything together, we propose to jointly learn a policy network and a
satisfaction imputation network: The role of the imputation network is to learn
which actions are satisfying to the user; while the policy network, built on
top of REINFORCE, decides which items to recommend, with the reward utilizing
the imputed satisfaction. We use both offline analysis and live experiments in
an industrial large-scale recommendation platform to demonstrate the promise of
our approach for satisfying user experiences.
View details
Learning to Embed Categorical Features without Embedding Tables for Recommendation
Preview
Tiansheng Yao
Ting Chen
Lichan Hong
KDD (2021)
Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations
Ji Yang
Lichan Hong
Yang Li
Simon Wang
Taibai Xu
WWW '20: Companion Proceedings of the Web Conference 2020April 2020 (2020)
Preview abstract
Learning query and item representations is important for building large scale recommendation systems. In many real applications where there is a huge catalog of items to recommend, the problem of efficiently retrieving top k items given user's query from deep corpus leads to a family of factorized modeling approaches where query and item are jointly embedded into a low-dimensional space. In this paper, we first showcase how to apply a two-tower neural network framework, which is also known as dual encoder in the natural language community, to improve a large-scale, production app recommendation system. Furthermore, we offer a novel negative sampling approach called Mixed Negative Sampling (MNS). In particular, different from commonly used batch or unigram sampling methods, MNS uses a mixture of batch and uniformly sampled negatives to tackle the selection bias of implicit user feedback. We conduct extensive offline experiments in the production dataset and show that MNS outperforms other baseline sampling methods. We also conduct online A/B testing and demonstrate that the two-tower retrieval model based on MNS significantly improves retrieval quality by encouraging more high-quality app installs.
View details
Recommending What Video to Watch Next: A Multitask Ranking System
Aditee Ajit Kumthekar
Aniruddh Nath
Li Wei
Lichan Hong
Mahesh Sathiamoorthy
Shawn Andrews
Zhe Zhao
Recsys 2019 (2019)
Preview abstract
In this paper, we introduce a large scale multi-objective ranking system for recommending what video to watch next on an industrial video sharing platform. The system faces many real-world challenges, including the presence of multiple competing ranking objectives, as well as implicit selection biases in user feedback. To tackle these challenges, we explored a variety of soft-parameter
sharing techniques such as Multi-gate Mixture-of-Experts so as to efficiently optimize for multiple ranking objectives. Additionally, we mitigated the selection biases by adopting a Wide & Deep frame-
work. We demonstrated that our proposed techniques can lead to substantial improvements on recommendation quality on one of the world’s largest video sharing platforms.
View details
Efficient Training on Very Large Corpora via Gramian Estimation
Walid Krichene
Nicolas Mayoraz
Li Zhang
Lichan Hong
John Anderson
ICLR 2019 (to appear)
Preview abstract
We study the problem of learning similarity functions over very large corpora using neural network embedding models. These models are typically trained using SGD with random sampling of unobserved pairs, with a sample size that grows quadratically with the corpus size, making it expensive to scale.
We propose new efficient methods to train these models without having to sample unobserved pairs. Inspired by matrix factorization, our approach relies on adding a global quadratic penalty and expressing this term as the inner-product of two generalized Gramians. We show that the gradient of this term can be efficiently computed by maintaining estimates of the Gramians, and develop variance reduction schemes to improve the quality of the estimates. We conduct large-scale experiments that show a significant improvement both in training time and generalization performance compared to sampling methods.
View details
Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations
Ji Yang
Lichan Hong
Lukasz Heldt
Aditee Ajit Kumthekar
Zhe Zhao
Li Wei
RecSys 2019
Preview abstract
Many recommendation systems need to retrieve and score items from a large corpus. A common approach to handle data sparsity and power-law item distribution is to learn item representations from its content features. Apart from many content-aware systems based on matrix factorization, in this paper, we consider a modeling framework with two-tower neural networks where one network called item tower is used to encode a wide variety of item features. Optimizing loss functions calculated from in-batch negatives, which are items sampled in a random batch, is a general recipe of training such two-tower models. However, batch loss is subject to sampling bias which could severely restrict model performance, particularly in the case of power-law distribution. In this work, we present a novel algorithm for estimating item frequency from streaming data. Our main idea is to sketch and estimate item occurrences via gradient descent. Through theoretical analysis and simulations, we show that the proposed algorithm can work without fixed item vocabulary, and is capable of producing unbiased estimation and being adaptive to item distribution change. We then apply the sampling-bias-corrected modeling approach to build a large scale retrieval system called Neural Deep Retrieval (NDR) for YouTube recommendations. The system is deployed to retrieve personalized suggestions from a corpus of tens of millions videos. We demonstrate the effectiveness of sampling bias correction through offline experiments on two real-world datasets. We also conduct live A/B testings to show that the NDR system leads to improved recommendation quality for YouTube.
View details
Preview abstract
Neural-based multi-task learning has been successfully used in many real-world large-scale applications such as recommendation systems. For example, in movie recommendations, beyond providing users movies which they tend to purchase and watch, the system might also optimize for users liking the movies afterwards. With multi-task learning, we aim to build a single model that learns these multiple goals and tasks simultaneously. However, the prediction quality of commonly used multi-task models is often sensitive to the relationships between tasks. It is therefore important to study the modeling tradeoffs between task-specific objectives and inter-task relationships.
In this work, we propose a novel multi-task learning approach, Multi-gate Mixture-of-Experts (MMoE), which explicitly learns to model task relationships from data. We adapt the Mixture-of-Experts (MoE) structure to multi-task learning by sharing the expert submodels across all tasks, while also having a gating network trained to optimize each task. To validate our approach on data with different levels of task relatedness, we first apply it to a synthetic dataset where we control the task relatedness. We show that the proposed approach performs better than baseline methods when the tasks are less related. We also show that the MMoE structure results in an additional trainability benefit, depending on different levels of randomness in the training data and model initialization. Furthermore, we demonstrate the performance improvements by MMoE on real tasks including a binary classification benchmark, and a large-scale content recommendation system at Google.
View details