Sample Efficient Linear Meta-Learning
Abstract
Meta-learning algorithms synthesizes and leverages the knowledge from a given set of tasks to rapidly learn new tasks using very little data. While methods like ANIL \cite{raghu2019rapid} have been demonstrated to be effective in practical meta-learning problems, their statistical and computational properties are ill-understood. Recent theoretical studies of meta-learning problem in a simple linear/non-linear regression setting still do not explain practical success of the meta-learning approach. For example, existing results either guarantee highly suboptimal estimation errors \cite{tripuraneni2020provable} or require relatively large number of samples per task \cite{}--$\Omega(d)$ samples where $d$ is the data dimensionality--which runs counter to practical settings. Additionally, the prescribed algorithms are inefficient and typically are not used in practice. %to achieve these sample complexity are high inefficient.
Similar to the existing studies, we consider the meta-learning problem in linear regression setting, where the regressors lie in a low-dimensional subspace \cite{tripuraneni2020provable}. We analyze two methods -- alternating minimization (MLLAM) and alternating gradient-descent minimization (MLLAM-GD) -- inspired by the popular ANIL~\cite{raghu2019rapid} method. For a constant subspace dimension both these methods obtain nearly-optimal estimation error, despite requiring only $\Omega(\mathrm{polylog}\,d)$ samples per task, which is similar to practical settings where each task has a small number of samples. But our analyses for the methods require the samples per task to grow logarithmically with number of tasks. We remedy this in the low-noise regime by augmenting the algorithms with a novel task subset selection scheme, which guarantees nearly optimal error rates even if the number of samples per task is constant with respect to (wrt) the number of tasks.