Oblivious Sketching of High-Degree Polynomial Kernels
Abstract
Kernel methods are fundamental tools in machine learning that allow detection of non-linear
dependencies between data without explicitly constructing feature vectors in high dimensional
spaces. A major disadvantage of kernel methods is their poor scalability: primitives such as
kernel PCA or kernel ridge regression generally take prohibitively large quadratic space and (at
least) quadratic time, as kernel matrices are usually dense. Some methods for speeding up kernel
linear algebra are known, but they all invariably take time exponential in either the dimension
of the input point set (e.g., fast multipole methods suffer from the curse of dimensionality) or
in the degree of the kernel function.
Oblivious sketching has emerged as a powerful approach to speeding up numerical linear
algebra over the past decade, but our understanding of oblivious sketching solutions for kernel
matrices has remained quite limited, suffering from the aforementioned exponential dependence
on input parameters. Our main contribution is a general method for applying sketching solutions
developed in numerical linear algebra over the past decade to a tensoring of data points without
forming the tensoring explicitly. This leads to the first oblivious sketch for the polynomial
kernel with a target dimension that is only polynomially dependent on the degree of the kernel
function, as well as the first oblivious sketch for the Gaussian kernel on bounded datasets that
does not suffer from an exponential dependence on the dimensionality of input data points.
dependencies between data without explicitly constructing feature vectors in high dimensional
spaces. A major disadvantage of kernel methods is their poor scalability: primitives such as
kernel PCA or kernel ridge regression generally take prohibitively large quadratic space and (at
least) quadratic time, as kernel matrices are usually dense. Some methods for speeding up kernel
linear algebra are known, but they all invariably take time exponential in either the dimension
of the input point set (e.g., fast multipole methods suffer from the curse of dimensionality) or
in the degree of the kernel function.
Oblivious sketching has emerged as a powerful approach to speeding up numerical linear
algebra over the past decade, but our understanding of oblivious sketching solutions for kernel
matrices has remained quite limited, suffering from the aforementioned exponential dependence
on input parameters. Our main contribution is a general method for applying sketching solutions
developed in numerical linear algebra over the past decade to a tensoring of data points without
forming the tensoring explicitly. This leads to the first oblivious sketch for the polynomial
kernel with a target dimension that is only polynomially dependent on the degree of the kernel
function, as well as the first oblivious sketch for the Gaussian kernel on bounded datasets that
does not suffer from an exponential dependence on the dimensionality of input data points.