Time Series Prediction using Non-linear Auto Regressive Approach with Recursive Least-squares Support Vector Machines

In this project, I worked with the data from the SantaFe laser. The data corresponds to power fluctuations in a far-infrared laser. This dataset was made available by the Santafe Institute for a prediction competition. .

Challenge: Given a training section of the Time Series which constitutes the fluctuations of the laser, construct and train a model for predicting new data. Thus, the only information available for training is the Time Series itself.

Solution: I used a Non-Linear Auto Regressive (NAR) approach, where "future" points are models as a function of "current" data, and where this relating function is modeled using Least-Squares Support Vector Machines (LSSVM). More details coming up!

Programing Language and Packages: It was a used MatLab for this project. Particularly, it was used the Least-Squares Support Vector Machines Toolbox (LSSVMlab).

Time series prediction using NAR with LSSVM

Auto Regressive (AR): An approach to Time Series prediction is to use the common Auto Regressive method, where the value at a point in time $X_t$, is described as: $$\begin{align}X_t = c + \sum_i^pw_iX_{t-i} + \epsilon_t\tag{1}\label{ar} ,\end{align}$$where $X_{t-i}$ correspond to the values of the Time Series at posterior times to the interest point $X_t$, $c$ is a constant, $\epsilon_t$ is white noise, and $w_i$ are parameters of the model. This model is useful for estimating an $\hat{X_t}$ when the $w_i$ parameters are trained correctly.

Non-Linear Auto Regressive (NAR): However, for this project I used a different and more powerful approach. The AR model assumes the relation between the current point of time and the previous values is linear - which in many times is not true. Thus, for this project I used a Non-Linear variant. Equation$~\eqref{ar}$ can be altered non-linearly as $$\begin{align}X_t = f\left(c + \sum_i^pw_iX_{t-i}\right) = g\left(\mathbf{X}\right) \tag{2}\label{nar},\end{align}$$ where $f(\cdot)$ is some function that relates the inputs $X_{t-i}$ with $X_{t}$, and $\mathbf{X}$ is the vector form of the inputs $X_{t-i}$. This equation can again be used for estimating a value $\hat{X_t}$.

Modeling with LSSVM: The primal space formulation of LSSVM for non-linear function estimation is defined as $$\begin{align} h(\mathbf{X}) = \mathbf{w}^T \varphi (\mathbf{X}) + c, \tag{3}\label{lssvmprim}\end{align}$$ where $\mathbf{X} \in \mathbb{R}^p$, $\varphi (\cdot)$ is a function $\varphi:\mathbb{R}^n \rightarrow \mathbb{R}^{n_h} $ mapping to a high dimension, and $h\left(\mathbf{X}\right)$ is the hypothesis that the LSSVM provides. Thus, a parallel can be stablished here with Equation$~\eqref{nar}$ and the objective is to make $g\left(\mathbf{X}\right) = h\left(\mathbf{X}\right)$ - which defines our accuracy. Now, our job is to train the LSSVM to increase this accuracy, and reduce the error. Note that I chose an RBF-Kernel (which describes $\varphi^T\varphi$), due to its flexibility.

Sliding Window Algorithm: For preparing the training of the LSSVM I employed the sliding window algorithm to generate the training data set. Here I selected a window size $p$ and created a training set as $$\bigcup_{j=0}^{N-p-1} \left(\mathbf{X}_j,X_{p+j+1}\right)\tag{4}\label{slwind},$$ where $N$ is the total portion of time series available for training, $\mathbf{X}_i$ is the vector form of the inputs $X_{1+j}$ to $X_{p+j}$. Note that in case you want to create a training set, validation set and test set -which is strongly encouraged-, $N$ should be around 50% of the total time series portion.

Training: Furthermore, for the training of the model -after normalizing and centering the data in the origin- I used Coupled Simulated Annealing (CSA) for tuning the hyper-parameters related with LSSVM, and used Simulated Annealing for selecting a proper window size $p$ hyper-parameter.


Next, I show the results obtained using a window size $p$ of 30 points.


Example of LSSVM for Time Series Prediction with the Santafe Laser Data.
The Blue line is the ground truth while the Orange line constitutes the prediction.

Notice how the model starts deteriorating significantly after approximately the $40^{\text{th}}$ point. This is because after the $30^{\text{th}}$ point, the model is using entirely predicted values for predicting the $31^{\text{st}}$ point. Thus, it is using predicted values, which are bound to have errors, for predicting new values. This causes values to "pile up", and thus the model deteriorates as time passes.

Therefore, depending on the required accuracy, one could choose only the first predicted values and discard the rest. In an "Online" approach, after the first predictions are obtained and used, one could wait until new ground truth values are generated and use them for predicting again new data doing a simple quasi-Kalman Filter approach.