Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations

Xinyang Yi; Ji Yang; Lichan Hong; Derek Zhiyuan Cheng; Lukasz Heldt; Aditee Ajit Kumthekar; Zhe Zhao; Li Wei; Ed Chi

Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations

Xinyang Yi

Ji Yang

Lichan Hong

Derek Zhiyuan Cheng

Lukasz Heldt

Aditee Ajit Kumthekar

Zhe Zhao

Li Wei

Ed Chi

RecSys 2019

Download Google Scholar

Abstract

Many recommendation systems need to retrieve and score items from a large corpus. A common approach to handle data sparsity and power-law item distribution is to learn item representations from its content features. Apart from many content-aware systems based on matrix factorization, in this paper, we consider a modeling framework with two-tower neural networks where one network called item tower is used to encode a wide variety of item features. Optimizing loss functions calculated from in-batch negatives, which are items sampled in a random batch, is a general recipe of training such two-tower models. However, batch loss is subject to sampling bias which could severely restrict model performance, particularly in the case of power-law distribution. In this work, we present a novel algorithm for estimating item frequency from streaming data. Our main idea is to sketch and estimate item occurrences via gradient descent. Through theoretical analysis and simulations, we show that the proposed algorithm can work without fixed item vocabulary, and is capable of producing unbiased estimation and being adaptive to item distribution change. We then apply the sampling-bias-corrected modeling approach to build a large scale retrieval system called Neural Deep Retrieval (NDR) for YouTube recommendations. The system is deployed to retrieve personalized suggestions from a corpus of tens of millions videos. We demonstrate the effectiveness of sampling bias correction through offline experiments on two real-world datasets. We also conduct live A/B testings to show that the NDR system leads to improved recommendation quality for YouTube.

Research Areas

Information retrieval

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Sampling-Bias-Corrected Neural Modeling for Large Corpus Item Recommendations

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs