Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Chuan Guo; Ali Mousavi; Xiang Wu; Dan Holtmann-Rice; Satyen Kale; Sashank Reddi; Sanjiv Kumar

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Chuan Guo

Ali Mousavi

Xiang Wu

Dan Holtmann-Rice

Satyen Kale

Sashank Reddi

Sanjiv Kumar

NeurIPS (2019) (to appear)

Download Google Scholar

Abstract

In extreme classification settings, embedding-based neural network models are currently not competitive with sparse linear and tree-based methods in terms of accuracy. Most prior works attribute this poor performance to the low-dimensional bottleneck in embedding-based methods. In this paper, we demonstrate that theoretically there is no limitation to using low-dimensional embedding-based methods, and provide experimental evidence that overfitting is the root cause of the poor performance of embedding-based methods. These findings motivate us to investigate novel data augmentation and regularization techniques to mitigate overfitting. To this end, we propose GLaS, a new regularizer for embedding-based neural network approaches. It is a natural generalization from the graph Laplacian and spread-out regularizers, and empirically it addresses the drawback of each regularizer alone when applied to the extreme classification setup. With the proposed techniques, we attain or improve upon the state-of-the-art on most widely tested public extreme classification datasets with hundreds of thousands of labels.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs