Theoretical Foundations for Learning Kernels in Supervised Kernel PCA

Modern Nonparametrics 3: Automating the Learning Pipeline, Neural Information Processing Systems, Workshop(2014)


This paper presents a novel learning scenario which combines dimensionality reduction, supervised learning as well as kernel selection. We carefully define the hypothesis class that addresses this setting and provide an analysis of its Rademacher complexity and thereby provide generalization guarantees. The proposed algorithm uses KPCA to reduce the dimensionality of the feature space, i.e. by projecting data onto top eigenvectors of covariance operator in a kernel reproducing space. Moreover, it simultaneously learns a linear combination of base kernel functions, which defines a reproducing space, as well as the parameters of a supervised learning algorithm in order to minimize a regularized empirical loss. The bound on Rademacher complexity of our hypothesis is shown to be logarithmic in the number of base kernels, which encourages practitioners to combine as many base kernels as possible.