Tuning Parameters in Deep Belief Networks for Time Series Prediction through Harmony Search

There have been several researches of applying Deep Belief Networks (DBNs) to predict time series data. Most of these works pointed out that DBNs can bring out better prediction accuracy than traditional Artificial Neural Networks. However, one of the main shortcomings of using DBNs in time series prediction concerns with the proper selection of their parameters. In this paper, we investigate the use of Harmony Search algorithm for determining the parameters of DBN in forecasting time series. Experimental results on several synthetic and real world time series datasets revealed that the DBN with parameters selected by Harmony Search performs better than the DBN with parameters selected by Particle Swarm Optimization (PSO) or random method in most of the tested datasets.


I. INTRODUCTION
Time series prediction is an important area of prediction in which past values of the same variable are collected and analyzed to develop a model describing the underlying relationship. The model is then used to extrapolate the time series in the future. There exist some well-known methods of time series prediction, such as ARIMA, exponential smoothing, artificial neural networks (ANNs), k-nearest-neighbors algorithm and support vector machines (SVMs). Among these methods, ANN is the most popular methods for time series prediction.
Deep neural network models, such as Deep Belief Networks (DBNs), have recently attracted the interest of many researchers in some applications on big data analysis. DBN is generative neural network model with many hidden layers, introduced by Hinton et al. [1] along with a greedy layer-wise learning algorithm. The building block of a DBN is a probabilistic model called Restricted Boltzmann Machine (RBM). DBNs and restricted Boltzmann machines (RBMs) have already been applied successfully to solve many problems, such as classification, dimensionality reduction and image processing.
There have been several researches of applying DBNs to predict time series data in finance ([2]- [7]), meteorology ([8]- [11]), industry ( [12]), internet traffic ( [13]) and chaotic time series ( [14], [15]). Most of these works pointed out that DBNs can bring out better prediction accuracy than traditional ANNs. However, one of the main shortcomings of using DBNs in time series prediction concerns with the proper selection of their parameters, i.e. the number of hidden units, learning rates, and so on. The task of model selection for deep neural networks aims at finding a suitable set of parameters that maximizes some fitness function, such as a classifier's accuracy. Although there exists some works related to using some population-based meta-heuristic techniques in parameter selection for DBNs, such as ( [16]- [18]), most of them aim to serve the field of image classification.
In this work, we propose a method of parameter selection for DBNs in time series prediction which is based on Harmony Search algorithm ( [19], [20]). This work is inspired by the work of Papa et al. in 2016 [18] which proposed a method of parameter selection for DBNs through Harmony Search. However, there are two major points in our work which makes it different from the previous work by  i) We propose a method of parameter selection for DBNs in time series prediction rather than for DBNs in image classification.
ii) We compare the performance of Harmony Search method to that of Particle Swarm Optimization (PSO) in parameter selection for DBNs in time series prediction.
In experiment, we use three real time series datasets: Sunspots and some financial/economic datasets and one synthetic dataset: Lorenz. Experimental results through four datasets revealed that DBN with parameters selected by Harmony Search performs better than DBN with parameters selected by PSO or by random method in most of the tested datasets for time series prediction.
The remainder of the paper is organized as follows. Section II provides some basic backgrounds about DBN and Harmony Search algorithm. In Section III, the DBN model for forecasting time series and a method of parameter selection for DBN in time series prediction through Harmony Search are introduced. Section IV reports the experiments to evaluate the performance of Harmony Search in parameter selection for DBN model in time series prediction. Finally, Section V gives some conclusions and future work.

A. Deep Belief Network
Deep Belief Networks have been proposed by Hinton [1] with remarkable success in image processing and AI areas. DBN models are based on stacking of Restricted Boltzmann Machines (RBMs) ( [21]).
RBM is a kind of stochastic artificial neural network with two connected layers: a layer of binary visible units (v, whose states are observed) and a layer of binary hidden units (h, whose states cannot be observed). The hidden units act as latent variables (features) that allow the RBM to model probability distribution over state vectors (see Fig. 1). The hidden units are conditionally independent given visible units. Given an energy function E(v, h) on the whole set of visible and hidden units, the joint probability is given by: where Z is a normalization partition function, which is obtained by summing up the energy of all possible (v, h) configurations.
The posterior probability of one layer given the other is easy to compute by the two following equations: where where Notice that  is the sigmoid function. Inference of hidden factor h given the observed v can be done because h is conditionally independent given v.
A DBN is a generative model with an input layer and an output layer, separated by l layers of hidden stochastic units. This multilayer neural network can be efficiently trained by composing RBMs in such a way that the feature activations of one layer are used as the training data for the next layer.
An energy-based model of RBMs can be trained by performing gradient ascent on the log-likelihood of the training data with respect to the RBM parameters. This gradient is difficult to compute analytically. Markov Chain Monte Carlo methods are well-suited for RBMs. One iteration of the Markov Chain works well and corresponding to the following sampling procedure: where the sampling operations are schematically described. Rough estimation of the gradient using the above procedure is denoted by CD-k, where CD-k represents the Contrastive Divergence algorithm [1] for performing k iterations of the Markov Chain up to v k .
The weight parameter is updated with the rate of change as shown in the following formula: where  represents the learning rate.

B. Harmony Search
Harmony Search is a meta-heuristic algorithm inspired by the improvisation process of music players. Musicians often improvise the pitches of their instrument searching for a perfect state of harmony ( [19], [20]). The main idea is to use the same process adopted by musicians to create new harmonies to obtain a near-optimal solution according to some fitness function. Each possible solution is modeled as a harmony, and each musician corresponds to one decision variable.
Let HM = (x 1 , x 2 ,…, x N ) be a set of harmonies that compose the Harmony Memory of size N, such that x i is an n-dimensional vector: The Harmony Search (HS) algorithm generates in each iteration a new harmony vector x' based on memory considerations, pitch adjustments and randomization (music improvisation). Then, the new harmony vector x' is evaluated in order to be accepted in the Harmony Memory: if x' is better than the worst harmony, the latter is replaced by the new harmony.
The Harmony Search needs two main parameters, which are harmony memory consideration rate (HMCR) and pitch adjustment rate (PAR). HMCR is the probability of choosing a value from the historic values stored in the Harmony Memory, and (1-HMCR) is the probability of randomly choosing one feasible value. Notice that for each decision variable, there exists a given set of feasible values. Besides, every component of the new harmony vector x' is examined to determine whether it should be pitch-adjusted or not, which is controlled by the Pitch-Adjustment-Rate (PAR). The pitch adjustment is often used to improve solutions and to escape from local optima. This mechanism shifts the values of some decision variables in the harmony by the (3) International Journal of Machine Learning and Computing, Vol. 11, No. 4, July 2021 formula: where bw is an arbitrary distance bandwidth and rand() is a random number between 0 and 1. The outline of Harmony Search is given as follows.

Algorithm 1 (Harmony Search)
1. Initialize the harmony memory HM of size N.
2. Evaluate the fitness of all solutions in HM.
3. repeat a. //improvise a new harmony for j = 1 to n do // n: the number of components if rand < HMCR //memory consideration randomly select x j in its domain. endif endfor b. Evaluate the fitness of x c. Update HM by replacing the worst HM member x worst by Update the best harmony vector e. set t = t +1 until stopping criterion is satisfied.
To improve the convergence of Harmony Search and overcome some shortcomings of Harmony Search, a new variant of HS, called Global-best Harmony Search (GHS), was proposed by Omran & Mahdavi, 2008 ([22]).
The GHS dynamically updates parameter PAR as follows: where PAR(t) is the pitch adjusting rate at iteration t, PAR min and PAR max are the minimum and the maximum adjusting rate, respectively, t is the iteration variable and NI is the number of iterations. GHS modifies the pitch adjustment step of HS in order to take advantage of the guiding information of the best harmony in the Harmony Memory. Therefore, the distance bandwidth parameter bw is removed from the improvisation step, and the decision variable j of the new harmony is computed using a random component in the best harmony. GHS has the same steps as Harmony Search with the exception that the process of improvising a new harmony is modified as in Algorithm 2.
Due to the guiding information of the best harmony in HM, GHS algorithm outperforms HS. The outline of Global-Best Harmony Search is given as follows. // best is the index of the best harmony in HM x j = x k best endif else x j = L j + rand(0,1).(U j -L j ). // U j , L j stand for the upper and lower bounds of decision // variable j endif endfor

III. SELECTING PARAMETERS IN DBN BASED ON HARMONY SEARCH
In this work we use a deep belief network (DBN) with two RBMs as shown in Fig. 2 to be a model for time series prediction. This DBN model was also used by Kuremoto et al. in 2014 [23] for time series prediction. The training process for DBNs consists of two stages: a pre-training stage which is a layer-by-layer unsupervised learning using Contrastive Divergence CD-k algorithm and a fine-tuning stage which is a global supervised learning using back-propagation algorithm. These training algorithms make DBN extract abstract features by their multiple layers.
The structure of a deep neural network should be designed to satisfy its processing objective. The number of layers, the number of units of each layer, learning rate and so on need to be selected when the model is applied to a real dataset. Here we adopt Harmony Search as a meta-heuristic to find the optimum numbers of units in input layer and hidden layer and learning rate of RBM. Suppose the prediction model is given in Fig. 2, in which 2 RBMs are used. The visible layer of RBM1 has n units, the hidden layer of RBM1 has m units and the visible layer of RBM2 also has m units. The hidden layer of RBM2 also has 1 unit, which is the output of the prediction model. Consider the learning rate of RBM  1 and the learning rate during using back-propagation algorithm  2 , a harmony vector is designed to be a 4-dimensional vector ( n, m,  1 ,  2 ) where n and m are integers and  1 ,  2 are in the range (0, 1).

Fig. 2. A prediction model constructed as a deep belief network with two
RBMs.
In this study, the mean squared error (MSE) is used as   (10) where n is the number of observations, y t is the actual value in time period t, and ŷ t is the forecast value for time period t.
Here, we propose a method for parameter selection of DBN model in time series prediction which is based on Harmony Search. The summarized Harmony Search algorithm for this purpose on a dataset is shown as follows:

Algorithm 3: (DBN parameter selection through Harmony Search) For each time series dataset
Step 1: Decide the size of Harmony Memory and limitation of iteration number I.
Step 2: Initialize the Harmony Memory randomly Step 3: Evaluate each harmony (a set of parameters) using the MSE when applying this set of parameters to the DBN model for forecasting the time series dataset under consideration.
Step 4: Improvise a new harmony (i.e. a new set of parameters) based on the current HM and apply it to the DBN model for forecasting the time series dataset under consideration. Update the Harmony Memory if the new harmony brings out the new MSE better than the worst harmony in the HM.
Step 5: Finish the algorithm if the current iteration index k reaches the maximum number of iterations I, else return to Step 4.
In this work, the Harmony Search used in Algorithm 3 is Global-Best Harmony Search.

IV. EXPERIMENTAL EVALUATION
The main purpose of this study is to evaluate the robustness of HS-based method in the context of selecting DBN parameters considering the task of time series prediction. We will compare the proposed HS-based DBN model selection against with random initialization of parameters and Particle Swarm Optimization-based DBN model selection in terms of prediction accuracy. We select Particle Swarm Optimization (PSO) [24] as another competitive method for selecting parameters of DBNs since this meta-heuristic has been applied for the same purpose in the work by Kuremoto et al., 2014 [14]. The HS algorithm we implement in this experiment is Global-Best-Harmony Search.
We implemented the DBN forecasting model with Tensorflow framework (using Python language) [25].
Tensorflow is an open-source software library developed by Google which is very useful in building , training, testing and visualizing deep neural networks. And we conducted the experiments on a Core i7-5500U 2.4 GHz, RAM 8GB PC.

A. Datasets
Here, the tested datasets consist of 3 real world time series datasets and one synthetic time series datasets. All these datasets are commonly-used by the research community in time series prediction and they are considered as the challenging datasets for prediction. They are described as follows.
1. Monthly sunspot numbers from January of 1949 to March of 1977. (This dataset from the web site: http://sidc.oma.be). This time series is in the field of astronomy and is a widely-used benchmark dataset in evaluation of several proposed methods for time series prediction. It consists of 305 data points. 4. This chaotic time series dataset is derived from the Lorenz system, given by the three differential equations: where, a = 10, b = 28, and c = 8/3. This time series consists of 1000 data points. Fig. 3 shows the plots of the three real world datasets. Fig.  4 shows the plot of the synthetic dataset (Lorenz). International Journal of Machine Learning and Computing, Vol. 11, No. 4, July 2021 Each dataset is divided into two sets: training set and test set. The training set and test set for each of the seven datasets are given in Table I.   TABLE I: TRAINING SET AND TEST SET FOR EACH OF SEVEN DATASETS  Dataset  Length  Training set  Test set   Sunspots  305  288  17   CPI  535  481  54   USD/GBP  295  285  10   Lorenz  1000  800  200 We have set some DBN parameters as follows: number of epochs for RBM to 50, number of iterations for CD algorithm to 2, and batch size in BP algorithm to 32.     The values of parameters used in the prediction experiments for Sunspots, CPI, USD/GBP and Lorenz datasets are shown in Table II, Table III, Table IV and Table  V respectively.

B. Experimental Results
Through about 3 experiments on each dataset with each method of parameter selection, the best values of the four main parameters n, m,  1 and 2 in DBN model for all datasets with three methods of selecting parameters are reported in Table VI. The experimental results on prediction accuracy based on MSE prediction error over the four datasets for each optimization technique are reported in Table VII.  [15] (MSE = 0.00002) even though this work selected parameters for DBN model by a random search . This is due to the fact that in this work we did not combine DBN model with chaos theory to deal with Lorenz dataset which is a well-known chaotic time series.
All the experimental results in Table VII imply that Harmony Search can be used as a meta-heuristic method in parameter selection for DBNs in time series prediction.
We also measured the execution times of tuning parameters for DBN model by the two methods: Harmony Search and PSO. The execution times of PSO-based method is higher than. Harmony Search-based method. Harmony Search is substantially faster than PSO technique since it does not update all possible solutions at each iteration, but only one.

V. CONCLUSION AND FUTURE WORK
In this paper, we propose and evaluate the use of Harmony Search, a meta-heuristic, in parameter selection of Deep Belief Networks for time series prediction. The parameters here consist of number of units in visible layer of RBM1, number of units in hidden layer of RBM1, learning rate in training RBM and learning rate in Back Propagation algorithm. Experimental results obtained reveal that the DBN method with the main parameters selected by Harmony Search performs better than DBN model with the main parameters selected by the random method or by PSO method in most of the tested datasets. These results imply that Harmony Search can be used as a robust meta-heuristic method in parameter selection for DBNs in time series prediction.
As for future work, we intend to apply Harmony Search in parameter selection of the DBN model with a better architecture proposed by Kuremoto et al. in [14] for forecasting chaotic time series. In this future research direction, we will apply chaos theory ( [26]- [30]) to determine the number of units in visible layer of RBM1 rather than including it in the set of parameters selected by Harmony Search. use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).