We introduce predictive state smoothing (PRESS), a novel semi-parametric regression technique for high-dimensional data using predictive state representations. PRESS is a fully probabilistic model for the optimal kernel smoothing matrix. We present efficient algorithms for the joint estimation of the state space as well as the non-linear mapping of observations to predictive states and as an alternative algorithms to minimize leave-one-out cross validation error. The proposed estimator is straightforward to implement using (stochastic) gradient descent and scales well for large N and large p. LASSO penalty parameters as well the optimal smoothness can be estimated as part of the optimization. Finally we show that out-of-sample predictions are on par with or better than alternative state-of-the-art regression methods on the abalone and MNIST benchmark datasets. Yet unlike alternative methods PRESS gives meaningful domain-specific insights and can be used for statistical inference via regression coefficients.