Near-optimal Offline and Streaming Algorithms for Learning Non-Linear Dynamical Systems
Abstract
We study the problem of learning vector valued non-linear dynamical systems from a single trajectory of {\em dependent or correlated} points. We assume a known link function $\phi : \mathbb{R} \to \mathbb{R}$ that satisfy a certain {\em expansivity property}. While the problem is well-studied in the linear case with strong learning guarantees even for non-mixing systems, the results in non-linear case hold only for mixing systems and even then the error rates are significantly sub-optimal. In this work, we bridge this gap in a variety of settings: a) we provide first optimal offline algorithm that can learn non-linear dynamical systems without mixing assumption, b) in the much harder one-pass, streaming setting we study a SGD with Reverse Experience Replay ($\sgdber$) method, and demonstrate that for mixing systems, it achieves nearly optimal performance even for heavy-tailed noise, c) we justify the expansivity assumption by showing that when the link function is ReLU --- a non-expansive but easy to learn link function with i.i.d. samples --- any method would require exponentially many samples (with respect to dimension $d$) from the dynamical system. We then compare various algorithms via. simulations and demonstrate that a naive application of SGD can be very sub-optimal. Indeed, our work demonstrates that learning with dependent data efficiently requires specialized algorithm design which is based on the knowledge the dependency structure present.