A unifying view on implicit bias in training linear neural networks

Chulhee Yun

Hossein Mobahi

Shankar Krishnan

ICLR(2021)

Google Scholar

Abstract

We study the implicit bias of gradient flow (i.e., gradient descent with infinitesimal step size) on linear neural network training. We consider separable classification and underdetermined linear regression problems where there exist many solutions that achieve zero training error, and characterize how the network architecture and initialization affects the final solution found by gradient flow. Our results apply to a general tensor formulation of neural networks that includes linear fully-connected networks, linear diagonal networks, and linear convolutional networks as special cases, while removing convergence assumptions required by prior research. We also provide experiments that corroborate our theoretical analysis.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

A unifying view on implicit bias in training linear neural networks

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

A unifying view on implicit bias in training linear neural networks

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities