Tight Hardness Results for Training Depth-2 ReLU Networks

Surbhi Goel
Adam Klivans
Daniel Reichman
Innovations in Theoretical Computer Science (ITCS)(2021), 22:1-22:14
Google Scholar


We prove several hardness results for training neural networks with a single hidden layer and the ReLU activation function. Our goal is to output a one-layer neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NP-hard already for a network with a single ReLU. We also prove NP-hardness for outputting a weighted sum of k ReLUs minimizing the squared error (for k > 1) even in the realizable setting (i.e., when the labels are consistent with an unknown one layer ReLU network). We are also able to obtain lower bounds on the running time in terms of 1/epsilon where epsilon is the desired additive error. To obtain our lower bounds, we use the Gap-ETH hypothesis as well as a new hypothesis regarding the hardness of approximating the well known Densest k-Subgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in 1/epsilon^2. This implies the first separation between proper and improper algorithms for learning a ReLU due to prior work by Geol et al. We also study the problem of properly learning a depth-2 network with ReLUs with bounded weights giving new (worst-case) upper bounds on the running time needed to learn such networks, that essentially matches our lower bounds in terms of the dependency on epsilon.