Ridge Polynomial Neural Network with Error Feedback for Time Series Forecasting

Time series forecasting has gained much attention due to its many practical applications. Higher-order neural network with recurrent feedback is a powerful technique that has been used successfully for time series forecasting. It maintains fast learning and the ability to learn the dynamics of the time series over time. Network output feedback is the most common recurrent feedback for many recurrent neural network models. However, not much attention has been paid to the use of network error feedback instead of network output feedback. In this study, we propose a novel model, called Ridge Polynomial Neural Network with Error Feedback (RPNN-EF) that incorporates higher order terms, recurrence and error feedback. To evaluate the performance of RPNN-EF, we used four univariate time series with different forecasting horizons, namely star brightness, monthly smoothed sunspot numbers, daily Euro/Dollar exchange rate, and Mackey-Glass time-delay differential equation. We compared the forecasting performance of RPNN-EF with the ordinary Ridge Polynomial Neural Network (RPNN) and the Dynamic Ridge Polynomial Neural Network (DRPNN). Simulation results showed an average 23.34% improvement in Root Mean Square Error (RMSE) with respect to RPNN and an average 10.74% improvement with respect to DRPNN. That means that using network errors during training helps enhance the overall forecasting performance for the network.


Introduction
Time series is a sequence of observations for a variable of interest made over time. Time series is used in many disciplines for things such as hourly air temperature, daily stock prices, weekly interest rates, monthly sales, quarterly unemployment rate, annual deaths from homicides and suicides, and electrocardiograph measurements. Time series can be categorized into different categories such as continuous and discrete time series, linear and nonlinear time series, and univariate and multivariate time series categories [1].
Univariate time series are obtained by recording a single phenomenon over time. Multivariate time series are recorded for more than one phenomenon over time [1]. A recording of a single phenomenon like annual volcano Carbon Dioxide (CO 2 ) emissions is an univariate time series, while a (CO 2 ) concentration of a gas furnace using (CO 2 ) concentrations and input gas flow rate is an example of a multivariate time series.
One of the main objectives of analysing time series is forecasting [2]. Forecasting is needed in many areas, including, marketing, production planning, financial risk management, and crisis management. Forecasting is needed because it guides decisions, decreases dependence on chance, and makes dealing with the environment more scientific [3,4].
Time series forecasting is defined as an estimation of the future behaviour of a time series using current and past observations [1]. Time series forecasting finds the relationship between past, present and future observations. Mathematically, univariate time series forecasting takes a series of data such as x 1 , x 2 , . . ., x N to estimate future values such as x N+h , where the integer h is the forecast horizon. The forecast horizon is defined as the time period in the future to which forecasts are calculated.
Various methods for time series forecasting have been developed. From statistics-based to intelligence-based, there are a range of methods available to make a forecast. Conventional statistical methods such as Auto Regressive (AR), Auto Regressive Moving Average (ARMA) and exponential smoothing are linear-based methods that assume linear relationships between past values. The non-linear relationships found in most real time series data cannot be captured using these methods [5][6][7][8].
Intelligent methods such as Artificial Neural Networks (ANNs) have been successfully used in time series forecasting [6,7,9,10]. ANN is an intelligence-based method which is inspired by biological nervous systems. During training, ANNs use historical data (i.e., current and past observations) to build a model that has the ability to forecast future observations.
ANNs have some advantages that attract researchers to use them in forecasting [7,11]. First, they have a non-linear input-output mapping nature that allow ANNs to approximate any continuous function with an arbitrarily degree of accuracy. Second, the non-linear inputoutput mapping is generated with little priori knowledge about the non-linearity in the series, so ANNs are less susceptible to model misspecification than other parametric non-linear methods. Third, the generalization capabilities of ANNs in a non-stationary environment remain accurate and robust. Fourth, ANNs have the capability of tolerating the presence of chaotic components that are found in many time series.
ANNs can be grouped based on network structure into feedforward and recurrent networks [12]. In feedforward networks, the information moves in one direction only from the input nodes to the output nodes through one-way network connections (i.e., weights). On other hand, the connections between the units in recurrent networks can form a cycle.
One of the most used feedforward ANNs in time series forecasting is Multilayer Perceptrons (MLPs) [13]. Due to the multi-layered structure of MLPs, they need a large number of units to solve complex nonlinear mapping problems, which results in a low learning rate and poor generalization [14]. To overcome these drawbacks, different types of single layer higher order feedforward neural networks have been presented. Ridge Polynomial Neural Network (RPNN) [15] is a higher order feedforward neural network that maintains fast learning and powerful mapping properties which make it suitable for solving complex problems [13].
For time series forecasting, an explicit treatment for the dynamics involved is needed for neural network models because the behaviour of some time series signals are related to past inputs which present inputs depend on [16]. The explicit treatment of dynamics can be achieved using recurrent feedback. A recurrent version of the RPNN was proposed by [16]. This model is called the Dynamic Ridge Polynomial Neural Network (DRPNN). DRPNN uses the output value from the output layer as a feedback connection to the input layer. The idea behind recurrent network is to learn the network dynamics of the series over time. As a result, the trained network use its memory when forecasting [17]. RPNN and DRPNN have been successfully applied to forecast time series [9,10,13,16] with DRPNN the most suitable for time series forecasting.
Instead of using network output feedback, network error feedback was used effectively with different models for time series forecasting as an additional input in the network [18][19][20][21][22][23]. Using network error as a feedback connection helps reduce the overall network error and increase forecasting accuracy. Due to the success of RPNNs, DRPNNs and models that use network error feedback for time series forecasting, in this study we propose the Ridge Polynomial Neural Network with Error Feedback (RPNN-EF). This paper is an extension of a report originally reported in the Second International Conference on Soft Computing and Data Mining (SCDM-2016) [24].
The contributions made by this study are as follows: • We proposed Ridge Polynomial Neural Network with Error Feedback (RPNN-EF) for time series forecasting.
• The novelty of the proposed approach is that we incorporated the concepts of higher order terms, recurrence and error feedback in the proposed model.
• To overcome the stability and convergence problems that could occur due to existence of recurrent feedback in the RPNN-EF, a sufficient condition based on an approach that used adaptive learning rate was developed by introducing a Lyapunov function.
• A comparative analysis of the proposed model with RPNN and DRPNN was completed using four time series, star brightness, monthly smoothed sunspot numbers, daily Euro/Dollar exchange rate, and Mackey-Glass time-delay differential equation. The proposed model was compared with other models in the literature.
The remainder of this study is organized as follows. In Section 2, we review existing ridge polynomial neural network based models. In Section 3, we present the proposed model for time series forecasting. Section 4 describes the experimental design. Section 5 presents results and discussion. The conclusion and ideas for future works are given in Section 6.

The Existing Ridge Polynomial Neural Network Based Models
Several feedforward neural networks have mapping capabilities for approximating reasonable functions [15]. A Multilayer perceptron (MLP) network is an example of these feedforward neural networks. MLP needs a large number of units to solve complex nonlinear mapping problems. Therefore, MLP is prone to a low learning rate and poor generalization [14]. To overcome multilayer network drawbacks, different types of single layer higher order feedforward neural networks were presented. One of these is the Pi-Sigma Neural Network (PSNN) which is a higher order feedforward neural network that consists of a single layer of trainable weights and product units in the output layer [25]. PSNN maintains powerful mapping capability and fast learning property without experiencing free parameter explosion problem [15]. PSNN demonstrated competent performance for various problems [25][26][27], however PSNN is not a universal approximator. explosion found in some types of higher order feedforward neural networks [15]. RPNN is constructed by adding different degrees of PSNNs during learning until a specified goal is achieved. It uses constructive learning, which is the growth of a small network structure during training until a specified goal is achieved. Fig 1 shows a generic RPNN architecture. It can be seen that RPNN consists of only a single layer of adjustable weights which helps to speed the learning. The output of this network is as follows: where σ is a non-linear activation function, k is the number of PSNN blocks, m is input vector dimension size, w is a trainable weight and x is the input. RPNN has been used for different problems such as time series forecasting [13], function approximation [15], classification [15,28] and pattern recognition [29]. It shows superior performance when compared to MLP.

Dynamic Ridge Polynomial Neural Network (DRPNN)
DRPNN is the recurrent version of RPNN. It uses network output from the output layer as a feedback connection to the input layer. That means that DRPNN is provided with memories which retains information for later use [16]. According to [16], explicit treatment of dynamics is needed for neural network models due the fact that the behaviour of some time series is related to past values on which the present values depend upon. Therefore, DRPNN is more suitable than RPNN for time series forecasting as found in [9,10,16].
The structure of DRPNN is shown in Fig 2. The difference between DRPNN and RPNN is the additional input node in DRPNN which is fed by the previous network output value. The output of DRPNN network is given by: where k is the number of PSNN blocks, σ is a non-linear transfer function, m is the input vector dimension size, w is a trainable weight, x is the input and y(t) is network output at a previous time step.

The Proposed Ridge Polynomial Neural Network with Error Feedback Model
Feedforward and recurrent networks have been used for time series forecasting. Recurrent networks have an advantage over feedforward networks in time series forecasting [9,16]. The reason for this is due to the behaviour of some time series in which present inputs depends on past inputs. Therefore, explicit treatment of dynamic is needed and is achieved using recurrence. By using recurrent feedback, the network takes advantage on external inputs as well as the entire history of the system inputs [9,16]. Network output or network error are feedback connections that were used in the literature as an additional input into the network [9,16,18,23]. Therefore, making an explicit treatment of dynamic.

Feedback Error Learning
Learning from error is not a new concept for neural networks. Backpropagation algorithm (BP) is widely used to train neural networks and is based on the concept of learning from error. BP calculates the difference between the desired output and network output, which is called error, and uses this error to direct training. Error Minimized Extreme Learning Machine (EM-ELM) [30] takes advantage of errors to control the growth of the network. It adds random hidden nodes singly or in a group and incrementally updates the weights to minimize errors in the training set. Another way to take advantage of errors is by using network error as a feedback connection. Different variations of network error have been given by different researchers. In [18,19], the authors calculate error by taking the difference between network output at time t + 1 and the desired value at time t. They used this error with a state space Neural Network (ssNN) for short-term temperature forecasting and with MLP for forecasting hourly energy consumption in buildings. The presented results demonstrated the successful capture of the dynamical behaviour of the models.
Instead of using the difference between network output at time t + 1 and the desired value at time t, the authors in [20][21][22][23] used the difference between desired value and network output at time t + 1. In [20], the authors proposed another way to use network errors. In the beginning, they initialized the training of an MLP network using training samples to find the initial structure for MLP. Then, using the initialized network the absolute forecasting error was calculated for each training sample and stored as an additional input to each sample. This was followed by the addition of an additional layer with the same number of hidden nodes found in the first hidden layer. This new structure was trained using new training samples. This process continues until the testing set's square error becomes less than the specified goal. The experiment proved that their method is better than a traditional MLP. However, adding more hidden layers to a network leads to a large number of free parameters, and results in longer training time and could cause poor generalization.
Another simple use of error was used in [21][22][23]. Error was calculated and inserted into Adaptive Neuro-Fuzzy Inference System (ANFIS) [21], Recurrent Nonlinear Autoregressive Moving Average (Rcurrent NARMA) [22] and Functional Neural Network (FNN) [23] in the next time step. The objective of using error as an input is to reduce the overall network error.
Adaptive control studies also take advantage of error learning which is called feedback error learning (FEL) [31][32][33]. FEL was proposed to establish a computational model of the cerebellum for learning motor control with interval models in the central nervous system [33].
Based on these studies, the main objective of using network error as an additional spatial dimension in the input space is to reduce the overall network error. That means showing a network the difference between the desired output and its output during training could enhance the overall forecasting performance for the network.

Network Structure and Weights Learning
RPNN-EF is constructed from a number of increasing order of Pi-Sigma units (PSNNs) with the addition of a recurrent connection from the output layer to the input layer. This recurrent connection is fed by network error, thus allowing the network to see errors in previous samples. Assuming that M is the number of external inputs x(t) in the network, and that e(t) is the network error for RPNN-EF in a previous time step. The overall inputs to the network are the concatenation of x(t) and e(t), and are referred to as Z(t) as shown below: From Eq (3), network output at time t + 1, which is denoted by y(t + 1), is calculated as follows: where k is the number of Pi-Sigma blocks (PSNNs), P k (t + 1) is the output at time (t + 1) of last added PSNN block, d P i ðt þ 1Þ is P i (t + 1) after its weights are frozen, σ is the sigmoid activation function, h j (t + 1) is the net sum of the sigma unit j in the last added PSNN block, and w gj is the adjustable weights between inputs and sigma units.
Like RPNN and DRPNN, RPNN-EF uses a constructive learning algorithm based on the asynchronous updating rule of Pi-Sigma units. RPNN-EF adds a Pi-Sigma block of increasing order to its structure when the relative different between the current and the previous errors is less than a specified threshold value. RPNN-EF updates its weights using the Real Time Recurrent Learning algorithm [34]. A standard error measurement used for training the network is the sum squared error: where d(t + 1) is the desired output and y(t + 1) is the network output as shown in Eq (4a). At every time, the weights between inputs g and sigma l are updated as follows: where η is the learning rate. The value of @Eðtþ1Þ @w gl is determined as: From Eq (5), we have: From Eqs (4) and (6), we have: Assuming D E is the dynamic system variable, which is defined as a set of quantities that summarizes all the information about the past behaviour of the system that is needed to uniquely describe its future behaviour [12], D E is: Substituting Eqs (11) into (12), we have: For simplification, the initial values for D E gl ðtÞ ¼ 0, and e(t) = 0.5 to avoid a zero value of D E gl ðtÞ ¼ 0 [9,16]. The weights updating rule is derived by substituting Eqs (9) and (13) into (7), as follows: Finally,

Stability issue
The ability to model the behaviour of arbitrary dynamical systems is one of the most useful properties of recurrent networks. The presence of a recurrent connection in RPNN-EF is expected to enhance its forecasting performance. Despite its potential of RPNN-EF feedback, the problems of complexity and difficult training could occur in RPNN-EF, as found in DRPNN [9]. These problems are summarized in two main points. First, calculating the gradients and updating the weights of a recurrent network is much more difficult than in a feedforward network due to dynamic system variables that affect both the gradient and the output. Second, learning could become unstable because the learning error may not monotonically decrease causing long convergence times.
In order to tackle these problems, a sufficient condition for the convergence of DRPNN was derived based on the stability theorem for a feedback network proposed by Atiya [35]. The aim of this theorem is to adjust the weights of the network to generate network outputs that are as close as possible to the desired output [9]. However, this solution could be too restrictive where a large network is necessary [35] or when working with constructive learning because it stops training with a small number of hidden units.
To overcome the stability and convergence problems of RPNN-EF, this study uses sufficient condition based on an approach that uses an adaptive learning rate developed by introducing a Lyapunov function.
First, let us define a Lyapunov function as follows: where e(t) represents error which is calculated by differencing the desired value from the predicted value. We use this error function because the RPNN-EF model is used to minimize it. According to Eq (16), the change in the Lyapunov function is determined by: The error difference can be represented by [36,37] eðt þ 1Þ ¼ eðtÞ þ DeðtÞ ð18Þ eðt þ 1Þ ffi eðtÞ þ @eðtÞ @w where 4w represents the weight change. Based on Eqs (13) and (14), we have: From Eqs (17) and (21), 4V(t) is represented as DVðtÞ where k . k F is the Frobenius norm which is calculated using a trace function [38]. A sufficient condition to ensure stability is 4V(t) < 0. Therefore, Eq (23) leads to: Eq (24) suggests an upper bound of η for a sufficient condition to ensure stability in RPNN-EF.

Constructive Learning Algorithm for the RPNN-EF
The proposed RPNN-EF is trained by the constructive learning algorithm based on the asynchronous update rule for PSNN. That means that the network structure grows from small to large as network learning proceeds until the desired level of specified error is reached. Before presenting the algorithm, we need the following notationsthreshold : threshold Mean Squared Error (MSE) for the training phase; c , p : the training MSE's for the current epoch and previous epoch, respectively; r: threshold for successive addition of a new PSNN blocks; η: initial learning rate; δ r , δ η : decreasing factors for r and η, respectively; k: degree of PSNN, as well as Epoch ID , and Epoch threshold : number of training epochs and maximum number of epochs to finish training, respectively.
The pseudo code used for RPNN-EF to update its weights is as follows:

Experimental Design
This section provides a step by step methodology describing the design of a neural networks to forecast time series.

Time Series used in the Experiments
Four time series were used in this study, namely star brightness (StarBrightness), monthly smoothed sunspot numbers (Sunspot), daily Euro/Dollar exchange rate (EUR/USD), and Mackey-Glass time-delay differential equation (Mackey-Glass).
Star brightness was recorded for 600 successive nights at midnight. This series was scaled by a factor of 1/30. This series was obtained from [39]. The monthly smoothed sunspot time series was downloaded from [40]. The sunspot time series was seen as a chaotic system with noise and is sensitive to initial conditions [41]. A sub-series in the sunspot time series from November 1834 to June 2001 consisting of 2, 000 months was selected. This interval was also selected by other researches [42,43]. The third series is the daily Euro/Dollar (EUR/USD) exchange rate. The data set contains 781 observations covering the period from January 3, 2005 to December 31, 2007 [44]. The data was collected from [45,46]. The last time series was generated from the Mackey-Glass time-delay differential equation which is defined as follows: where t is a variable, x is a function of t, and τ is the time delay. The initial values of the series are α = 0.2, β = −0.1, x(0) = 1.2, and τ = 17. It is known that with this setting the series shows chaotic behaviour. From the generated time series, 1000 data points were extracted as explained in [47]. This series can be found in the file mgdata.dat in MATLAB [47] or in https://raw.githubusercontent.com/dodikk/neuro-mut/master/src/NetworkConverter/ Samples/mgdata.dat. The settings used in this study for these series are shown in Table 1. These settings were also used in the studies of [42][43][44][48][49][50][51][52][53][54][55][56][57][58][59]. The used intervals for training and out-of-sample sets are shown in Figs 4-7.

Input-output data pairs
Training samples# Out-of-sample samples#

Network Topology and Training
Network model topology describes the architecture of the network models and the way that the network is organized. The selected network topology was directly trained on the training set and tested on the out-of-sample set. Training was performed by repeatedly showing the network examples of inputs, paired with the desired output. During training, the difference between the desired and actual outputs was computed in order to update network weights. Network topology and training parameters that were used in this study are shown in Table 2. Most of the settings are either based on previous works found in the literature [9,10,16] or by trial and error.
Since this study used the sigmoid transfer function, similar to [9,16], the data was scaled in the range [0.2, 0.8]. This is to avoid getting network outputs too close to the two endpoints of the sigmoid function [10]. The equation to scale the data is given by: where _ x refers to the normalized value, x refers to the observation value, min old , and max old , are the respective minimum and maximum values of all observations, respectively. min new , and max new , refer to the minimum and maximum of the new scaled series.

Performance Metrics
In this study, network performance was evaluated using commonly used metrics for time series forecasting such as Root Mean Squared Error (RMSE), Normalized Mean Squared Error (NMSE), Mean Absolute Error (MAE) and Signal to Noise Ratio (SNR). This study carried out t-tests with a significance level of 0.05 to highlight significant performance. The equation for these metrics are given by: Root Mean Squared Error (RMSE): Normalized Mean Squared Error (NMSE): Mean Absolute Error (MAE): Signal to Noise Ratio (SNR): SNR ¼ 10 Ã log 10 ðSigmaÞ ð32Þ where N, y andŷ represent the number of out-of-sample data, actual output and network output, respectively.

Results and Discussions
In this section, the simulation results for the forecasting of star brightness, monthly smoothed sunspot numbers, daily Euro/Dollar (EUR/USD) exchange rate, and Mackey-Glass time-delay differential equation are presented.

Best Average Simulation Results
Since network training is significantly influenced by its initial internal state, which involves different initial learning parameters and different sets of random weights, an average of 30 independent simulations were performed for all neural networks in order to obtain fair and more robust comparative evaluations. The average performance for the various neural network architectures are shown in Tables 3-6. The results shown in these tables are the de-normalized results. That means, we de-normalized the forecasted value and compared it with the original desired value. As seen from the four metrics results, the forecasting performance of the feedforward RPNN network is significantly better than the two recurrent networks for one-step ahead forecasting on the short time series (StarBrightness) (i.e., the symbol R is inside the cells that relate to DRPNN and RPNN-EF for StarBrightness series). Such results were found for one-step ahead forecasting with short time series in [60,61]. The two recurrent networks DRPNN and RPNN-EF are significantly better than the feedforward RPNN network for one-step ahead  forecasting on long time series (Sunspot) and for multi-step ahead forecasting on EUR/USD and Mackey-Glass time series. This means that the built memory during training in the recurrent networks for long series for one-step ahead forecasting and for multi-step ahead forecasting improves forecasting performance. Results in Tables 3-6 show that average improvements in all metrics for RPNN-EF performance are more than twice in the case of RPNN than DRPNN. These findings prove that explicit treatment of dynamics helps to improve forecasting performance. Furthermore, using network error feedback as an input helps reduce the overall network error and improve forecasting performance.
As seen in average improvements for the four metrics, the highest average improvement for RPNN-EF is with NMSE metric while the lowest average improvement is with SNR metric. This is because the variables used in the equations for these metrics depend on the time series itself. For example, the variance variable used with NMSE and the maximum value used with SNR. Therefore, these variables increase or decrease metric values and average improvements.
Overall, RPNN-EF outperforms other RPNN based models, which is seen in average improvements across all metrics. This performance was achieved with network orders equal to or less than that of other RPNN based models as shown in

Best Single Simulation Results
This section shows the results for the best single simulation achieved for each network model. As shown in Table 7, RPNN-EF has the smallest RMSE and NMSE values for all time series except for StarBrightness. Its performance for the StarBrightness time series is still acceptable. The forecast values were plotted with respect to observed values, as shown in Figs 9-12, which show a very strong relationship between forecasted and observed values for the Sunspot and Mackey-Glass time series. This is because the NMSE for these time series is very small when compared to other time series.
The best forecasting for RPNN-EF using out-of-sample data are shown in Figs 13-16. These figures indicate that the RPNN-EF model can follow the dynamic behaviour of the time series. The histogram for the forecasting errors of the best simulations for the time series are shown in  As seen in Figs 17-20, all histograms indicate that error distribution closely resembles a symmetric Gaussian distribution. Most of the errors are close to zero. This means the RPNN-EF is able to extract information from a time series.

Comparison of the Performance of Various Existing Models
In this section, we compare our results with other models in the literature. Based on our search results, this study did not find studies that use the same normalization range as the present study. For a fair comparison with recent studies, this study compared the de-normalized results for the RPNN-EF model with de-normalized published results in the literature or with studies that did not use any normalization method. Tables 8 and 9 show the comparison results for generalization capabilities using different methods for the Sunspot and Mackey-Glass time series, respectively. Generalization capabilities were measured by applying each model to forecast out-of-sample data. As was observed, RPNN-EF alone outperforms many hybrid methods. Therefore, hybridizing RPNN-EF with other models could produce higher forecasting accuracy. For all time series, the learning curves for RPNN-EF are remarkably stable and RMSE continuously reduced every time Pi-Sigma block is added to the network. Each spike shown in Figs 21-24 comes from the introduction of a new Pi-Sigma block to RPNN-EF except for a spike in the Mackey-Glass time series at epoch 54, which is due to an increase in RMSE, that rarely occurs with the sufficient condition due to the small values of the input signal as found in [62].

Conclusions and future work
In this study, a new approach called Ridge Polynomial Neural Network with Error Feedback (RPNN-EF) for time series forecasting was proposed. The goal of this study is to contribute a new approach for time series forecasting that take advantage of higher order terms, recurrence, and error feedback. This study demonstrated the effectiveness of the proposed model by testing it on four time series for one-step and multi-step ahead forecasting. This study compared RPNN-EF with the feedforward Ridge Polynomial Neural Network (RPNN) and the Dynamic Ridge Polynomial Neural Network (DRPNN). The results of the study are summarized as follows: • Recurrent networks are more suitable than feedforward networks for multi-step ahead forecasting.
• For one-step ahead forecasting with long training data, recurrent networks are better than feedforward networks because the dynamics of the time series captured and saved in the recurrent network's memory. For short training data, the dynamics of the time series are not captured well might due to the short length of the training samples. • Using network error feedback as an input helps reduce the overall network error more often than network output feedback, thus improving forecasting performance. Showing a network the difference between the desired output and its real output, which is known as error, during training helps enhance the overall forecasting performance for the network.
• Although RPNN-EF has the highest average performance, it uses network orders equal to or smaller than other RPNN models. • Sufficient conditions to ensure RPNN-EF stability helps RPNN-EF to become stable in most cases.
With regards to model development, the following can be considered for future investigation: • Applying the proposed model with more time series for one-step and multi-step ahead forecasting with different lengths to prove forecasting performance.
• This study focuses on univariate time series, which is data from a single time series. The ever more global nature of some series such as the world financial markets necessitates the inclusion of more global knowledge into neural network design. Multivariate series can look at the interdependence between several time series. Therefore, the use of multivariate series would be advantageous, since some market depends on other global markets and the inclusion of these series will potentially improve neural network forecasting performance. • Like other ridge polynomial based models, the main difficulty of using RPNN-EF is finding suitable values for its parameters. With respect to this deficiency, it might be worthwhile to consider how evolutionary and swarm intelligence techniques can be used to automatically generating suitable parameters for the network. Furthermore, these techniques can be used to optimize network weights.
• To increase the reliability of forecasting, an ensemble system that uses RPNN-EF with another techniques can be proposed.     • The use of error feedback recurrence with other neural network models and an evaluation of their forecasting performance.