Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations

Xinyang Yi

Ji Yang

Lichan Hong

Derek Zhiyuan Cheng

Lukasz Heldt

Aditee Ajit Kumthekar

Zhe Zhao

Li Wei

Ed Chi

RecSys 2019

Download Google Scholar

Abstract

Many recommendation systems need to retrieve and score items from a large corpus. A common approach to handle data sparsity and power-law item distribution is to learn item representations from its content features. Apart from many content-aware systems based on matrix factorization, in this paper, we consider a modeling framework with two-tower neural networks where one network called item tower is used to encode a wide variety of item features. Optimizing loss functions calculated from in-batch negatives, which are items sampled in a random batch, is a general recipe of training such two-tower models. However, batch loss is subject to sampling bias which could severely restrict model performance, particularly in the case of power-law distribution. In this work, we present a novel algorithm for estimating item frequency from streaming data. Our main idea is to sketch and estimate item occurrences via gradient descent. Through theoretical analysis and simulations, we show that the proposed algorithm can work without fixed item vocabulary, and is capable of producing unbiased estimation and being adaptive to item distribution change. We then apply the sampling-bias-corrected modeling approach to build a large scale retrieval system called Neural Deep Retrieval (NDR) for YouTube recommendations. The system is deployed to retrieve personalized suggestions from a corpus of tens of millions videos. We demonstrate the effectiveness of sampling bias correction through offline experiments on two real-world datasets. We also conduct live A/B testings to show that the NDR system leads to improved recommendation quality for YouTube.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities