Can gradient clipping mitigate label noise?

Aditya Krishna Menon; Ankit Singh Rawat; Sanjiv Kumar; Sashank Reddi

Can gradient clipping mitigate label noise?

Aditya Krishna Menon

Ankit Singh Rawat

Sanjiv Kumar

Sashank Reddi

International Conference on Learning Representations (ICLR) (2020)

Google Scholar

Abstract

Gradient clipping is a widely-used technique in the training of deep networks, and is generally motivated from an optimisation lens: informally, it controls the dynamics of iterates, thus enhancing the rate of convergence to a local minimum. This intuition has been made precise in a line of recent works, which show that suitable clipping can yield significantly faster convergence than vanilla gradient descent. In this paper, we study gradient clipping from an robustness lens: informally, one expects clipping to provide robustness to noise, since one does not overly trust any single sample. Surprisingly, we prove that gradient clipping does not in general provide robustness to label noise. On the other hand, we show that robustness is achieved by a form of loss clipping. This yields a simple, noise-robust alternative to the standard cross-entropy loss which performs well empirically.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Can gradient clipping mitigate label noise?

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs