Optimal Sketching for Trace Estimation
Abstract
Matrix trace estimation is ubiquitous in machine learning applications and has traditionally relied on a simplistic Hutchinson's method, which requires $O(\log(1/\delta)/\epsilon^2)$ matrix-vector product queries to achieve an $\epsilon$ additive error with failure probability $\delta$. Recently, the Hutch++ algorithm was proposed, which reduces the number of matrix-vector queries from $O(1/\epsilon^2)$ to the optimal $O(1/\epsilon)$ on positive-semidefinite input matrices $A$, achieving a $(1\pm \epsilon)$-multiplicative approximation to the trace of $A$ with constant probability; however, in the high probability setting, the non-adaptive method suffers an extra $O(\sqrt{\log(1/\delta)})$ multiplicative factor in its query complexity. Non-adaptive methods are important, as they correspond to sketching algorithms, which are mergeable, highly parallelizable, and provide low-memory streaming algorithms as well as low-communication distributed protocols. In this work, we close the gap between non-adaptive and adaptive algorithms, showing that even non-adaptive algorithms can achieve $O(\sqrt{\log(1/\delta)}/\epsilon + \log(1/\delta))$ matrix-vector products. In addition, we prove matching lower bounds demonstrating that, up to a $\log \log(1/\delta)$ factor, no further improvement in the dependence on $\delta$ or $\epsilon$ is possible by any non-adaptive algorithm. Finally, our experiments demonstrate the superior performance of our sketch over adaptive methods, which are not parallelizable, as well as over the standard non-adaptive Hutchinson's method.