Chefs’ Random Tables: Non-Trigonometric Random Features

Valerii Likhosherstov
Frederick Liu
Adrian Weller
NeurIPS 2022 (2022) (to appear)

Abstract

We present \textit{chefs' random tables} (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels. CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps. We present variants of CRTs where RFs are positive -- a critical requirement for prominent applications in recent low-rank Transformers. Further variance reduction is possible by leveraging statistics which are simple to compute. One instantiation of CRTs, the FAVOR++ mechanism, is to our knowledge the first RF method for unbiased softmax kernel estimation with positive \& bounded RFs, resulting in strong concentration results characterized by exponentially small tails, and much lower variance. As we show, orthogonal random features applied in FAVOR++ provide additional variance reduction for any dimensionality $d$ (not only asymptotically for sufficiently large $d$, as for RKS). We exhaustively test CRTs and FAVOR++ on tasks ranging from non-parametric classification to training Transformers for text, speech and image data, obtaining new SOTA for low-rank text Transformers, while providing linear space \& time complexity.

Research Areas