Classification using Predictive State Smoothing (PRESS): A scalable kernel classifier for high-dimensional features with variable selection
Abstract
In this work we adapt the predictive state smoothing (PRESS) framework to
classification, which leads to a fully probabilistic, non-linear classifier
that estimates the minimal sufficient statistic for predicting class
membership probabilities. It can be used for high-dimensional problems, both
in number of observations and covariates, and allows for variable selection
using LASSO or Ridge penalties. We also establish a connection between the
metric learning aspect of PRESS kernel smoothing and an equivalent
state-dependent neural network representation. Out-of-sample prediction
performance is comparable to existing state-of-the-art classifiers on several
benchmark datasets. Yet a trained PRESS classifier provides meaningful
domain-specific insights based on regression coefficients using standard
frequentist as well Bayesian inference. Algorithms scale linearly in the
number of observations and can be easily implemented in R, STAN, or
TensorFlow.
classification, which leads to a fully probabilistic, non-linear classifier
that estimates the minimal sufficient statistic for predicting class
membership probabilities. It can be used for high-dimensional problems, both
in number of observations and covariates, and allows for variable selection
using LASSO or Ridge penalties. We also establish a connection between the
metric learning aspect of PRESS kernel smoothing and an equivalent
state-dependent neural network representation. Out-of-sample prediction
performance is comparable to existing state-of-the-art classifiers on several
benchmark datasets. Yet a trained PRESS classifier provides meaningful
domain-specific insights based on regression coefficients using standard
frequentist as well Bayesian inference. Algorithms scale linearly in the
number of observations and can be easily implemented in R, STAN, or
TensorFlow.