Jump to Content

Learning and Recovery in the ReLU Model

Arya Mazumdar
Proceedings of 57th Annual Allerton Conference on Communication, Control, and Computing, 2019
Google Scholar

Abstract

Rectified linear units, or ReLUs, have become a preferred activation function for artificial neural networks. In this paper we consider two basic learning problems assuming that the underlying data follow a generative model based on a simple network with ReLU activations. The first problem we study corresponds to learning a generative model in the presence of nonlinearity (modeled by the ReLU functions). Given a set of signal vectors $\mathbf{y}^i \in \mathbb{R}^d, i =1, 2, \dots , n$, we aim to learn the network parameters, i.e., the $d\times k$ matrix $A$, under the model $\mathbf{y}^i = \mathrm{ReLU}(A\mathbf{c}^i +\mathbf{b})$, where $\mathbf{b}\in \mathbb{R}^d$ is a random bias vector. We show that it is possible to recover the column space of $A$ within an error of $O(d)$ (in Frobenius norm) under certain conditions on the distribution of $\mathbf{b}$. The second problem we consider is that of robust recovery of the signal in the presence of outliers. In this setting, we are interested in recovering the latent vector $\mathbf{c}$ from its noisy nonlinear images of the form $\mathbf{v} = \mathrm{ReLU}(A\mathbf{c}) + \mathbf{e}+\mathbf{w}$, where $\mathbf{e} \in \mathbb{R}^d$ denotes the outliers with sparsity $s$ and $\mathbf{w} \in \mathbb{R}^d$ denote the dense but small noise. We show that the LASSO algorithm recovers $\mathbf{c} \in \mathbb{R}^k$ within an $\ell_2$-error of $O\big(\sqrt{{((k+s)\log d})/{d}}\big)$ when $A$ is a random Gaussian matrix.