Switching Probabilistic Slow Feature Analysis for Time Series Data

Abstract—Slow feature analysis (SFA) is a machine learning method for extracting slowly time-varying feature from multidimensional time series data. Recently, probabilistic SFA (PSFA) that extends SFA to a probabilistic framework has been proposed. The PSFA can be applied to stationary time series data with noise and missing values. In order to deal with nonstationary time series data including change points, we propose a switching probabilistic slow feature analysis (switching PSFA) in this paper. By introducing a switching state space model, it is possible to extract slowly varying information even when system parameters change with time. Using the proposed method, we show that slowly time-varying components can be extracted more accurately from time-series data with non-stationarity.


I. INTRODUCTION
A large amount of multi-dimensional time-series data such as image data and sensor data has been handled, with the development of information technology and observation technology. In recent years, it has been required to make use of such large amount of data in order to discover useful knowledge [1]- [3].
Various machine learning studies have been conducted on feature extraction methods for high-dimensional time-series data, and slow feature analysis (SFA) has recently attracted attention as one of them, which was originally proposed from a viewpoint of human recognition system [4]. When a human acquires various visual information such as the position and shape of an object, the visual information is acquired from a large amount of retinal cells. Individual retinal signals are sensitive to slight environmental changes. However, higher sensations such as the position and shape of objects change very slowly compared to them. In this way, it is known in neuroscience that slowly changing components in visual information have an important role in the recognition of objects and space [4]- [6]. The SFA is an unsupervised feature extraction method developed based on the knowledge of visual information, and is an algorithm for extracting slowly time-varying feature from an input multi-dimensional data. SFA is a model proposed in the field of neuroscience, but it is also applied in the field of machine learning such as pattern recognition and information extraction [7]- [15]. Moreover, Manuscript [6]. While deterministic SFA is difficult to estimate for data that includes noise, PSFA has the advantage that observation noise can be considered. There are also discussions on the effects of observation noise on PSFA [16], [17].
Conventionally, PSFA has been applied to multidimensional time-series data that does not include change points. However, in time series data, the behavior of observed values may change suddenly from a certain time, and it is important to detect changes behind the time series [18]. For example, in sensor data related to body movements, it is required to detect actions from time-series data in which multiple actions are continuously mixed [19].
In this study, we propose a switching PSFA in order to extract hidden slow feature from non-stationary multidimensional time series data. We formulate switching PSFA by using a switching state space model, and find a change point for non-steady multi-dimensional time series data so that PSFA can be performed. The rest of this paper is organized as follows. In Section II, we describe conventional SFA. In Section III, we propose switching probabilistic slow feature analysis. We formulate probabilistic model of switching PSFA, and derive variational learning framework. In Section IV, the proposed method is evaluated using simulated data. Concluding remarks are given in Section V.

II. EXISTING METHODS
In this section, we first explain conventional frameworks of slow feature analysis (SFA): deterministic SFA and probabilistic SFA. Next, we describe parameter estimation method for probabilistic SFA using EM algorithm.

A. Deterministic SFA
Deterministic SFA is an unsupervised algorithm for extracting slowly time-varying features from multidimensional time series data [4], [20]. Schematic of SFA is shown in Fig. 1. Let ( ) be the given input observation multi-dimensional time-series data. The output ( ) extracted by the SFA consists of ( ) = ( ( ))using the transformation (⋅) . The transformation ( ) minimizes the square of the time derivative ( ) ≔ ⟨ 2 ⟩ = ⟨( ( ( ))) 2 ⟩ where ⟨⋅⟩ represents the time average. An element of output ( ) = ( ( )) that minimizes ( ) is called slow feature. Besides, the following constraint conditions are added to the formula, ⟨ ⟩ = 0 (zero mean), Here, (2) and (3) are used to normalize all outputs, and to avoid obvious solutions such as the output is zero. (4) is used to extract different information by making the outputs uncorrelated. In the following, we assume that the transformation (⋅) is a linear transformation and is expressed as ( ) = ( ) using the matrix .

B. Probabilistic SFA
In recent studies, SFA with probabilistic framework was proposed by adding constraints to the system model of linear Gaussian state space model [6]. We assume that ( ) is the -dimensional hidden variables including the slow feature and ( ) is the -dimensional observable variable of the state space model. Fig. 2 shows the graphical model of probabilistic SFA. The state space model of PSFA is expressed by the following system model and observation model respectively, In the system model as (5), the latent variable depends on −1 , and is a parameter matrix that determines the degree of dependence.
is an × diagonal matrix consisting of values for each , . For system noise, the covariance matrix is × diagonal matrix with diagonal elements 2 . In the observation model as (6), −1 is × matrix and is × unit matrix.
Here, the following restrictions are assumed in order to incorporate the properties of SFA, If is large, corresponding latent variable , depends strongly on the value of the previous time, and noise , becomes small, thus , changes slowly. On the other hand, if is small, it becomes less dependent on the value of the previous time, and noise , becomes large, therefore , changes quickly.
From (5) and (6), the state space model can be expressed by using the probability density function as follows: The likelihood function for estimating the parameters = { , −1 , , 2 } of PSFA is expressed as follows: (10)

C. Parameter Estimation Using EM Algorithm
In the PSFA proposed by Turner and Sahani [6], parameters are estimated assuming that the observation noise is zero. However, recent research has shown that observation noise affects the accuracy of slow feature estimation. In this section, we describe a PSFA proposed by Takeuchi and Omori [21]. It introduces an EM algorithm so that all parameters including observation noise can be estimated. The EM algorithm is a maximum likelihood estimation algorithm devised by Dempster et al. [22]. When it is difficult to apply the maximum likelihood estimation, the estimated maximum likelihood is calculated for incomplete data using the likelihood assuming that complete data is obtained [23]. The EM algorithm guarantees that a local optimal solution can be obtained by alternately repeating the E (Expectation) step and the M (Maximization) step. In the E step, the expected value of the log likelihood of the complete data ( 1: , 1: | ) is calculated based on the distribution of latent variables currently estimated using the Kalman smoother, Here, is the expected value of the log likelihood of the complete data in the PSFA model, is the number of steps in the EM algorithm, and the log likelihood is represented as In the M step, the expected value of the log likelihood obtained in the E step is maximized for the parameter , ( +1) = argmax ( | ( ) ).
The EM algorithm repeats the E and M steps up to a predetermined number of times or until the log likelihood converges. However, since the parameter estimated by the EM algorithm is a local optimal solution, it may differ from the true value depending on the initial parameter (1) .

III. PROPOSED METHOD: SWITCHING PROBABILISTIC SLOW FEATURE ANALYSIS
In this section, we formulate switching PSFA by applying the concept of switching state space model to the PSFA. In order to extract slowly time-varying features even for time series data including change points, we estimate latent variables and parameters using the variational Bayesian method. We derive variational Bayesian learning framework for switching PSFA.

A. Switching Framework for Probabilistic Slow Feature Analysis
The conventional PSFA assumes one system model, since it does not assume non-steady systems. In the proposed method, by introducing a switching state space model [24]- [27] into PSFA, it was possible to analyze time series data including change points. As shown in Fig. 3, switching state space model (SSSM) prepares multiple state variables For (15), if the observation is output from the ( ) , is a one-hot vector with unity in the -th element. The model for switch variables is as follows: This is the probability of transition from state −1 ( ) = 1 at time ( − 1 ) to state ( ) = 1 at time . This is each element of the transition matrix of × . The joint probability of the proposed switching PSFA is given by ( 1: , 1: (1) , ⋯ , 1: ( ) , 1: ) where ( ( ) | −1 ( ) ) is derived from (14). The log-likelihood is expressed by Here, represents the number of dimensions of the observation and ( ) is the covariance matrix of the observation noise. As shown in (18), when the switch variable has 1 in the -th element ( ( ) = 1), the likelihood is expressed by an observation model for -th SSM.

B. Variational Inference
In this study, we derive learning algorithm for parameters of switching probabilistic slow feature analysis by generalizing the EM algorithm [22]. The log-likelihood of the observation data to be maximized is expressed by the following equation: where we used Jensen's inequality. The purpose here is to maximize the lower bound of (19). In the E step, the posterior probability of ( 1: , 1: ) = ( 1: , 1: | 1: , ) is obtained using the current parameter estimates. However, it is quite difficult to find the exact posterior probability in SSSM. Therefore, we employ an estimation method using the variational Bayes method.
Here we derive a method for estimating parameters in the switching PSFA by means of variational framework. In SSSM, it is difficult to obtain the posterior probability distribution ( 1: , 1 Here, (20) is a metric of the distance between true distribution ( 1: , 1: | 1: ) and approximated distribution ( 1: , 1: ), and we aim to minimize (20) in E-step. The distribution ( 1: , 1: ) that approximates the posterior distribution is separated as follows: ( 1: , 1: ) where is defined as Here, is a normalization factor. The KL divergence of the posterior distribution ( 1: , 1: | 1: ) and its approximate ( 1: , 1: ) is minimized by repeatedly updating the following equations for ℎ ( ) and ( ) : where brackets ⟨⋅⟩ represent an expectation. 〈 ( ) 〉 is the probability that the -th element of the switch variable is 1 and is calculated by the forward-backward algorithm [28] using ( ) . Here, ( ) is obtained from the value obtained by using Kalman smoother [29]- [31] on the data weighted by ℎ ( ) . In this way, the E-step aims to minimize the KL divergence. In M step, the expected value of log likelihood is partially differentiated for each parameter and the parameter is updated. The update formula derived for parameters in the proposed switching PSFA is as follows: As for (30), it satisfies the condition 0 < ( ) < 1 in the solution of the cubic equation. Note that covariance matrix of the observation noise for each can also be derived by differentiation of the expected log-likelihood. Algorithm 1 summarizes the algorithm of the proposed method.

A. Experimental Settings
In this section, in order to show the effectiveness of the proposed switching PSFA, we estimate parameters using simulated data and evaluate latent variables compared to conventional PSFA. The observation data including the change points were synthesized from three different PSFA models. The number of data points is = 1500; we assume that the entire data consists of 500 data points generated from each model ( = 1, 2 and 3). The latent variable was set to 3 dimensions, and the observation data was set to 6 dimensions. The parameter ( ) of switching PSFA has a different value in each dimension, ( ) was set to different values for different . Covariance matrix of the observation noise is common. We use switching PSFA to estimate latent variable and switch variable with Kalman smoother and forward-backward algorithm in E step. We estimate the parameters such as ( ) = { ( ) , −1 ( ) , } and state transition rates with M-step. State transition rates are estimated as conventional switching state space model framework [24]. We repeat E-step and M-step alternately until log likelihood has converged.

B. Result
Here, we perform estimation of hidden variables yt using the proposed switching PSFA. Fig. 4 shows the result for estimating the switch variable by the forward-backward algorithms. It was shown that the switching of observation data model could be estimated by using switching PSFA from the state even when no information on change points or true parameters was given. Fig. 5 shows a comparison between the latent value yt obtained from the same observation data xt using the conventional PSFA and switching PSFA and the true value. Here, latent variables yt are calculated in E-step using a Kalman smoother. In Fig. 5, we find that the proposed method extracts the slow feature more accurately; the discrepancy between true and estimated slow feature is large in the existing PSFA compared with the proposed switching PSFA. Estimation accuracy is compared quantitatively between conventional PSFA and proposed switching PSFA by using mean squared error between true and estimated values of latent variables. Table I shows that our proposed switching PSFA extract latent variables more accurately compared with conventional PSFA. Since the conventional PSFA does not assume that the model is switched, it is impossible to estimate latent variables from observation data including change points. However, switching PSFA was able to estimate the latent variables well. It was also confirmed that the estimated value converged to the true value as each parameter was updated.

V. CONCLUSION
In this paper, the framework of the switching state space model is introduced to the probabilistic SFA, and the parameter update formula is derived using the EM algorithm and variational Bayesian method. It was also shown that latent variables, switch variables, and parameters can be estimated for time-series data including change points.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

AUTHOR CONTRIBUTIONS
TO designed research; KT and TO performed research; KT and TO analyzed the data; KT and TO wrote the paper; all authors had approved the final version.