L1 and L2 Regularization for Multiclass Hinge Loss Models

Robert C. Moore; John DeNero

L1 and L2 Regularization for Multiclass Hinge Loss Models

Robert C. Moore

John DeNero

Symposium on Machine Learning in Speech and Natural Language Processing (2011)

Download Google Scholar

Abstract

This paper investigates the relationship between the loss function, the type of regularization, and the resulting model sparsity of discriminatively-trained multiclass linear models. The effects on sparsity of optimizing log loss are straightforward: L2 regularization produces very dense models while L1 regularization produces much sparser models. However, optimizing hinge loss yields more nuanced behavior. We give experimental evidence and theoretical arguments that, for a class of problems that arises frequently in
natural-language processing, both L1- and L2-regularized hinge loss lead to sparser models than L2-regularized log loss, but less sparse models than L1-regularized log loss. Furthermore, we give evidence and arguments that for models with only indicator features, there is a critical threshold on the weight of the regularizer below which L1- and L2-regularized hinge loss tends to produce models of similar sparsity.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

L1 and L2 Regularization for Multiclass Hinge Loss Models

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs