Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A two-stage forecasting model using random forest subset-based feature selection and BiGRU with attention mechanism: Application to stock indices

  • Shafiqah Azman,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – original draft

    Affiliation Institute of Mathematical Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur, Malaysia

  • Dharini Pathmanathan ,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision, Writing – review & editing

    dharini@um.edu.my

    Affiliations Institute of Mathematical Sciences, Faculty of Science, Universiti Malaya, Kuala Lumpur, Malaysia, Universiti Malaya Centre for Data Analytics, Universiti Malaya, Kuala Lumpur, Malaysia, Center of Research for Statistical Modelling and Methodology, Faculty of Science, Universiti Malaya, Kuala Lumpur, Malaysia

  • Vimala Balakrishnan

    Roles Resources, Writing – review & editing

    Affiliations Department of Information Systems, Faculty of Computer Science and Information Technology, Universiti Malaya, Kuala Lumpur, Malaysia, College of Informatics, Korea University, Seoul, Republic of Korea.

Abstract

The heteroscedastic and volatile characteristics of stock price data have attracted the interest of researchers from various disciplines, particularly in the realm of price forecasting. The stock market’s non-stationary and volatile nature, driven by complex interrelationships among financial assets, economic developments, and market participants, poses significant challenges for accurate forecasting. This research aims to develop a robust forecasting model to improve the accuracy and reliability of stock price predictions using machine learning. A two-stage forecasting model is introduced. First, a random forest subset-based (RFS) feature selection with repeated -fold cross-validation selects the best subset of features from eight predictors: highest price, lowest price, closing price, volume, change, price change ratio, and amplitude. These features are then used as input in a bidirectional gated recurrent unit with an attention mechanism (BiGRU-AM) model to forecast daily opening prices of ten stock indices. The proposed model exhibits superior forecasting performance across ten stock indices when compared to twelve benchmarks, evaluated using root mean squared error (RMSE), mean absolute error (MAE), and the coefficient of determination, . The improved prediction accuracy enables financial professionals to make more reliable investment decisions, reducing risks and increasing profits.

1. Introduction

Analyzing stock market data is challenging due to non-stationarity, dynamic volatility, and the complex interrelationships between financial assets, economic developments, and market participants, which contribute to significant noise [1,2]. Reliable prediction of stock indices assists investors in maximizing profit while minimizing risk. Researchers often encounter difficulties when using traditional statistical models for forecasting due to the complexity, nonlinearity, and non-stationarity of stock market data [3]. The main challenge in predicting these kinds of data lies in effectively identifying the important features from a wide range of possible predictors, including historical prices, trading volumes, macroeconomic indicators, and market sentiment data. Without proper feature selection, forecasting models may suffer from high dimensionality, leading to overfitting, increased computational complexity, and reduced predictive accuracy [4].

1.1. Statistical time series models

Statistical time series models such as the autoregressive integrated moving average (ARIMA) model [5] and the generalized autoregressive conditional heteroscedasticity (GARCH) model [6] as an extension of Engle’s autoregressive conditional heteroscedasticity (ARCH) model [7], serve as important models in the analysis and forecasting of financial data. ARIMA models are designed to analyze and predict future points in a series by capturing trends, seasonality, and cyclical patterns through a combination of autoregressive, integrated, and moving average components. This makes ARIMA suitable for univariate time series data where past values and errors help predict future values. On the other hand, GARCH models specialize in modeling and forecasting volatility, addressing the heteroscedasticity present in financial time series data. By accounting for volatility clustering, where periods of high volatility tend to cluster together, GARCH models provide a robust framework for understanding and predicting the variability in stock prices. To enhance the capability of these models, hybrid statistical models with deep learning were introduced [8,9].

However, these methods are often not robust enough for dynamic financial markets due to the presence of significant noise caused by factors like economic conditions, political conflicts, large investors’ monopolization, and unpredictable trading patterns [10]. This highlights the need for advanced models capable of capturing the evolving characteristics of financial data.

1.2. Random forest for feature selection

To improve forecasting accuracy, effective feature selection is crucial. By selecting the important features and removing redundant ones, overfitting can be prevented, thus enhancing the efficiency of the forecasting model [4]. Random forest (RF) [11] is a powerful machine learning method for classification and regression problems. It is an ensemble-based algorithm constructed from decision tree predictors. The RF method leverages the law of large numbers, thereby effectively mitigating the issue of overfitting. In a regression context for feature selection, RF predictions equate to the average of predictions made by all individual decision trees. The advantage of RF for regression-based feature selection lies in its capability to effectively reduce computational complexity while capturing and modeling nonlinear and complex dependencies among features, thereby improving overall predictive performance [12,13].

In the study of feature selection for stock price prediction, RF is more commonly used as a classification model, and there is limited research on its use as a regression model. In this review, only studies where the RF regression model is employed are considered. RF with leave-one-out cross-validation was used in [14] to select 42 microeconomic variables before predicting stock prices with the LSTM model. In their study, they found that using RF for feature selection improved the model’s predictive accuracy, as it focuses on the most relevant variables, thus enhancing the overall performance of the LSTM model. RF is also used as a feature selection technique in [15] for predicting the Chinese stock market. In terms of feature selection based on importance, these studies are similar. They found that RF effectively identified key predictors from a vast dataset, which improved the accuracy of their stock market predictions. A more recent review study [16] emphasized RF’s effectiveness in handling large and noisy financial datasets, proving its importance in feature selection and predictive analysis for stock market data.

1.3. Deep learning forecasting models

Deep learning models are increasingly used for forecasting financial time series data due to their ability to address the shortcomings of traditional statistical models. Recurrent neural networks (RNNs) are particularly useful for capturing dynamic temporal features in stock market data, aiding in self-learning and sequential data analysis [17]. However, RNNs suffer from the vanishing gradient problem with long-sequenced data. The vanishing gradient problem in gradient-based training occurs when the gradients of network weights become extremely small and eventually approach zero, making it challenging for the model to update its weights effectively [18].

To address this, long-short-term memory (LSTM) model [19] and gated recurrent unit (GRU) model [20] were developed. Numerous forecasting models based on LSTM and GRU architectures have been proposed, including the model incorporating LSTM with convolutional neural network (CNN) [21] and the hybrid LSTM-GRU model [22]. A review study [23] emphasized LSTM’s preference among researchers for financial time series forecasting, however, a more recent studies [24,25] found that GRU outperforms LSTM in terms of overall performance, particularly with smaller datasets.

1.4. Bidirectional Gated Recurrent Unit (BiGRU), Attention Mechanism (AM) and BiGRU-AM

Bidirectional GRU models have shown significant potential in forecasting financial data by effectively utilizing past and future information to improve prediction accuracy and prevents information loss [26,27]. It allows the model to effectively model temporal dependencies throughout the sequence and in turn outperform its unidirectional GRU counterpart by ensuring a more comprehensive understanding of the temporal relationships within the data [28]. Recent works that successfully incorporated BiGRU to forecast stock price indices include the models by [29,30].

The attention mechanism (AM) is modelled after how the human brain selectively focuses on essential information. Its underlying concept is to prioritize essential data while minimizing less important aspects. AM improves processing efficiency by dynamically altering focus to increase sensitivity to critical inputs [31]. The implementation of AM in deep learning was first proposed by [32] for language translation. AM assigns higher weights to specific input components before mapping them to the final output, and has since become widely used in Natural Language Processing (NLP). The implementation of AM for time series data was first introduced by [33], assigning importance to certain input features from the past and future, allowing different attention weights to critical temporal features at different time steps. AM has since been widely used in deep learning models for predicting time series data [34, 35, 36].

The BiGRU-AM hybrid model has been successfully applied to the classification of HTTPS traffic [37], human emotions from EEG signals [38], and air target tactical intention recognition [39]. However, it has never been used in the setting of regression problems nor on financial time series data.

1.5. Contributions of the study

This study proposes a two-stage forecasting model for financial data with the following key contributions:

  • A random forest subset-based feature selection method is developed, combined with repeated -fold cross-validation to identify the best subset of features from eight potential predictor features.
  • The implementation of BiGRU-AM model in a regression setting is introduced for the first time, specifically for predicting stock prices.
  • A comprehensive comparative study was conducted to evaluate the performance of the proposed two-stage forecasting model against twelve benchmark models, demonstrating the effectiveness of the proposed approach in predicting the daily opening prices of ten stock indices from January 2000 to February 2022.

The remainder of this paper is organized as follows: Section 2 describes the data used and introduces the architecture of the model proposed; Section 3 reviews the results and discussion; and Section 4 concludes.

2. Methodology

2.1. Data description

To evaluate the performance of the proposed model, ten major stock indices were chosen [40], which are Dow Jones, Nasdaq, Nikkei 225, FTSE 100, S&P 500, CAC 40, IPC, DAX, AEX, and BEL 20. The datasets range from January 2000 to February 2022 and are made up of consecutive trading days (excluding weekends and public holidays). Table 1 shows the country, period (depending on data availability), and length of each dataset. The historical data for the stock indices is retrieved from Yahoo! Finance (accessed on 31 March 2022).

thumbnail
Table 1. Country-specific stock indices and periods observed for model evaluation.

https://doi.org/10.1371/journal.pone.0323015.t001

Table 2 lists the eight features chosen from [41] for inclusion in this study’s multivariate analysis. The opening price (F1), highest price (F2), lowest price (F3), closing price (F4), and volume (F5) can all be imported directly from Yahoo! Finance, while the change (F6), price change ratio (F7), and amplitude (F8) can be calculated as follows:

thumbnail
Table 2. Features investigated for multivariate forecasting.

https://doi.org/10.1371/journal.pone.0323015.t002

The stock indices data are divided into 3 sets, where the first 80% is the training set, the following 10% is the validation set, and the remaining 10% is the test set.

2.2. Stage 1: Random Forest Subset-based (RFS) for Feature Selection

In the first stage, the Random Forest subset-based (RFS) feature selection method identifies the best subset of features for predicting stock indices prices by generating all possible feature subsets, using the RF regression model to evaluate each subset’s prediction accuracy through repeated -fold cross-validation, and selecting the subset with the lowest Mean Absolute Error (MAE).

2.2.1. Random Forest (RF) for regression.

For RF in regression setting, the input vector consists of independent variables, and the output is numerical. The training data is drawn from the joint distribution of . An example of RF regression model with  decision trees is shown in Fig 1.

thumbnail
Fig 1. Random Forest regression model with decision trees.

https://doi.org/10.1371/journal.pone.0323015.g001

The process for RF for regression is

For to (number of trees in the forest):

  1. A bootstrap sample was drawn from the training data.
  2. A decision tree was trained on .
  3. At each node, the best split from a random subset of features were selected.
  4. The final prediction for a new input was the average of predictions from all individual trees: .

Detailed derivations and proofs can be found in [11].

2.2.2. Random Forest Subset-based (RFS) with Repeated -fold Cross-Validation.

The procedure for the RFS feature selection is as follows:

  1. Generating Feature Subsets: Create all possible subsets , of the input features , such that where . In this study, the input features are features from Section 3.1, while feature is the output feature.
  2. Repeated -Fold Cross-Validation: Divide data to -fold and use of the folds as training data. Evaluate subset of features in each fold and repeat for multiple times to ensure reliable performance.
  3. Selecting the Optimal Subset: Choose the subset of features with the lowest average MAE.

The RFS feature selection technique does an exhaustive search, thereby selecting the best subset of features from all possible combination of features and hence reducing likelihood of overfitting. Repeating the -fold cross-validation times mitigates the issue of a model being favored due to specific data partitioning, ensuring more reliable performance estimates [42]. This methodology offers a reliable and more robust feature selection that is essential in enhancing the prediction accuracy of the BiGRU-AM model.

A more detailed flowchart of the procedure is shown in Fig 2 and the pseudocode algorithm for the proposed RFS feature selection is provided as Algorithm 1.

thumbnail
Fig 2. Flowchart of RFS feature selection with folds and repeats.

https://doi.org/10.1371/journal.pone.0323015.g002

2.3. Stage 2: The Bidirectional GRU with Attention Mechanism (BiGRU-AM) Model for Stock Price Forecasting

In the second stage, implementation of the BiGRU-AM model in regression setting is introduced for stock price forecasting using the best subset of features selected in the first stage.

2.3.1. Gated Recurrent Unit (GRU) and Bidirectional GRU.

The GRU’s model structure is made up of two gates: the update gate and the reset gate. The update gate determines the importance of previous time step information and controls the amount of information received that must be passed to the future. The reset gate, on the other hand, governs how much past information is erased [20].

The computation for the update gate, and the reset gate, are as follows:

where is the new input, is the previous memory, is the current memory and is the candidate hidden state for . , , , , , are the weight matrices and , , are the bias vectors. * denotes the Hadamard product between two matrices and denotes the activation function.

A bidirectional structure of GRU, specifically named Bidirectional GRU (BiGRU) processes data in two directions: forward and backward. This dual processing enables the model to fully extract relevant information from both the front and back of the sequence data, enhancing the prediction performance by considering the entire context of the data. This architecture enables the model to capture complex temporal dependencies by considering the entire sequence during training [43]. However, while the model is trained on complete sequences, during actual forecast of the time series data, the model utilizes only past data up to the current time point. The backward processing during training serves to enhance the learning of temporal patterns, leading to improved predictive performance.

2.3.2 Attention Mechanism (AM).

As introduced by [33], the weighted feature replaces the original input as the input of the neural network for time series data. Specifically, the implementation of AM can be computed as follows:

Where the attention score is the attention score is computed based on the input , previous state and previous attention weight for time and observation . Details on how the mechanism of AM works can be found in [33].

2.4 Overall Model Architecture

The overall flow of the proposed model, which combines a first stage of RFS feature selection and a second stage of BiGRU-AM model forecasting, is shown in Fig 3. In the first stage, the features (F2–F8) are processed as the input variables to the RFS feature selection method discussed in Section 3.2, with the opening price (F1) as the target variable. The output of the RFS feature selection method is the best subset of predictor features for each stock index, selected based on the lowest MAE. In the second stage, the output from the first stage is used as the input variables for the BiGRU-AM model to forecast the target variable, F1. Specifically, the data is processed through two layers of GRU in opposite directions, and the resulting data is then processed through a layer of AM as the final step.

2.5 Model Evaluation

The root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination, R-squared () were used to evaluate the accuracy of the output from the second stage, the forecasted opening price of the stock indices. The best model is chosen based on the lowest RMSE and MAE values, as well as the highest . These metrics are computed as follows:

where is the total of data; is the actual opening price; is the mean of the actual opening price and is the forecasted opening price from the model.

Table 3 presents the proposed model (M13) alongside twelve benchmark models. The benchmark models are categorized as follows:

  • M1–M4 (Univariate models): These models use only the historical values of stock indices’ opening prices as the feature.
  • M5–M8 (Multivariate models): These models use all available features for forecasting.
  • M9–M12 (RFS-selected features): These models use features selected through the RFS (Random Feature Selection) method.

Each type of model—GRU, LSTM, SVM, and MLP—has three setups:

  • Zero predictors (Univariate): M1 (GRU), M2 (LSTM), M3 (SVM), M4 (MLP)
  • All predictors (Multivariate): M5 (GRU), M6 (LSTM), M7 (SVM), M8 (MLP)
  • RFS-selected predictors: M9 (GRU), M10 (LSTM), M11 (SVM), M12 (MLP)

This structure allows for an easy and clear comparison of the impact of RFS feature selection on forecasting accuracy. Lastly, M13 model is:

  • M13 (Proposed model): This model uses a two-stage approach with RFS feature selection in the first stage and a BiGRU-AM in the second stage.

Table 4 details the machine learning and deep learning hyperparameters for the benchmark models (M1–M12), while Table 5 details the deep learning hyperparameters for the proposed model (M13). The hyperparameters for each model were fine-tuned through trial and error. The specific process of hyperparameter selection is not discussed, as it is not the primary focus of this work.

thumbnail
Table 5. Hyperparameters used in the RNN model in the proposed framework.

https://doi.org/10.1371/journal.pone.0323015.t005

3. Results and discussion

The performance of the proposed model (M13) is assessed and compared with twelve benchmark models (M1 – M12) various metrics, including RMSE, MAE, and values. Additionally, visual representations, such as plots of actual opening prices and forecasted values from all thirteen models for all ten stock indices, are provided to illustrate the findings and underscore the effectiveness of the proposed two-stage approach in predicting the opening prices of stock indices.

3.1 Results of the feature selection

Table 6 displays the feature selection results. F2 (Highest prices), F3 (Lowest prices), F6 (Change), and F7 (Price change ratio) are the four most important features for predicting the opening price. Except for the FTSE 100 and IPC, F2 and F3 are selected for all the indices. F6 is selected for all indices except CAC 40 and AEX, while F7 is only excluded in FTSE 100 and BEL 20. Notably, F8 (amplitude) is not selected in any model for any stock.

thumbnail
Table 6. Selected features using the RFS feature selection method for the stock indices investigated.

https://doi.org/10.1371/journal.pone.0323015.t006

The highest price (F2) and the lowest price (F3) are critical features in predicting the opening price of stock indices because they reflect the trading range and provide insights into market volatility. The opening price of a stock reflects adjustments based on new information and captures the market sentiment and volatility from the previous trading day [45]. The highest and lowest prices show the range within which a stock has traded, indicating its potential volatility. This information is particularly valuable for day-traders and financial analysts, as it helps them gauge market sentiment and predict future opening prices. By incorporating the highest and lowest prices into predictive models, traders can make more informed decisions, aiming to maximize profit and minimize risk.

The change (F6) represents the price changes from its previous value and the price change ratio (F7) normalizes these changes, enabling direct comparison across different stocks and indices. It is often utilized by portfolio managers and investors to evaluate performance and risk across various asset classes. This normalization is crucial for making comparisons and evaluating performance across different stocks and indices [46]. Additionally, change is associated with volatility, underscoring the importance of considering volatility levels in predictive models [47]. The selection of these two features highlights their significance in capturing market dynamics and aiding in accurate forecasting.

Amplitude (F8) was not selected as a predictor feature by any model in predicting the opening price. Amplitude reflects the degree of stock activity [42]. In this study, the importance of features was examined in groups of subsets. This indicates that while amplitude could be an important feature on its own, its dependencies on other features in predicting the opening price might not be significant. The proposed RFS feature selection has been designed to focus on the importance of the features, while considering their interactions with other features, hence producing more reliable and accurate forecasts.

3.2. Forecasting accuracy evaluation

The RMSE, MAE, and values of the proposed model and its benchmark models for the selected dataset are shown in Table 7. M13 (the proposed model) produces the lowest RMSE and MAE values for almost all ten indices, except for the MAE of FTSE 100 and DAX. For these two cases, the proposed model’s MAE values are close to and comparable to the lowest values produced by M10 for the FTSE 100 and M9 for the DAX. For all indices, the values produced by the M13 model are the highest. The proposed model outperforms the other twelve benchmark models.

thumbnail
Table 7. The forecasting performance of different models on 10 stock indices.

https://doi.org/10.1371/journal.pone.0323015.t007

According to the results of Table 7, models with all the predictor features (M5 – M9) outperform models with zero predictor features (M1 – M4) in all cases. However, they are outperformed by their counterpart models with RFS-selected features (M10 – M12) in almost all cases, except for five instances: the GRU model for S&P 500, CAC 40, and BEL 20, the SVM model for DAX, and the LSTM model for BEL 20. This suggests that predictor features in multivariate model yields important temporal data, resulting in better model prediction when compared to their counterpart univariate models. Furthermore, the findings show that RFS feature selection efficiently reduce the dimensions of the data by selecting only the best subset of features that are essential in predicting the opening price which eventually prevent the overfitting problem. This is why models with RFS-selected features (M10 – M12) forecast perform better than the models with all features present (M5 – M9).

In addition, the proposed two-stage model (M13) provides the most accurate predictions across all cases. Forecasts from M13 produces the lowest RMSE and MAE, and highest in all but two cases: MAE in FTSE 100 and DAX. Even so, the MAE values for M13 in these two cases are close and comparable to the lowest MAE by M10 for FTSE 100 and by M9 for DAX. Following the selection in the first stage through RFS feature selection, the BiGRU-AM model was used for multivariate forecasting. SVM, MLP, GRU, and LSTM are the four baseline for the benchmark models used in this study. SVM and MLP are supervised learning techniques that were originally proposed as classifier algorithms before being extended to regression problems. GRU and LSTM, on the other hand, are two RNN examples [48]. RNN is well-suited for stock price forecasts due to its ability to handle sequential data and capture temporal dependencies effectively [49]. Due to its inherent advantages, GRU was selected as the foundational model to be to be extended. In [50], GRU was demonstrated to have shorter training time compared to LSTM. This is because GRU has fewer gates than LSTM and thus requires fewer parameters to train the data. As a result, it takes less time for the model to converge, overcoming the vanishing gradient problem. As highlighted in Section 1, the bidirectional version of GRU was used in this study due to its advantage over the unidirectional counterpart, as it helps avoid the problem of information loss that can occur in a unidirectional system [51]. Furthermore, the addition of AM to the model aids in assigning higher weights to critical temporal information that would otherwise be missed, improving forecast accuracy even further.

Overall, the plots of actual versus predicted opening prices for all ten stock indices suggest that the proposed model (M13) provides the most accurate predictions. Figs 4 and 5 display the plots for the Dow Jones and Nasdaq, respectively, while the plots for the other eight indices are available in the S1 File. In these plots, the black line represents the actual values, and the yellow line represents the predicted values from the proposed model (M13). These figures help to visualize how closely the proposed model (M13) follows the actual market fluctuations and captures the peaks and troughs of the indices, particularly for the Dow Jones, Nasdaq, and S&P 500. This observation aligns with and supports the statistical measures in Table 7, which indicate that the proposed two-stage model M13 outperforms the other twelve benchmark models in predicting opening prices of stock indices. By accurately capturing the trends in the stock indices, the proposed model confirms its superiority in model performance, highlighting the effectiveness of the methodology presented in this study.

thumbnail
Fig 4. The actual Dow Jones opening price and its forecasted values from different models.

https://doi.org/10.1371/journal.pone.0323015.g004

thumbnail
Fig 5. The actual Nasdaq opening price and its forecasted values from different models.

https://doi.org/10.1371/journal.pone.0323015.g005

4. Conclusions, limitations, and future directions

A two-stage model for forecasting financial time series was introduced in this study. In the first step, an RF subset-based feature selection method is used to choose the best subset of features by looking at all subsets of features using the repeated -fold cross validation method. This is done to ensure that the forecasting model does not incorporate features that are redundant. In the second stage, the selected features are fitted as inputs to a bidirectional GRU model with AM. The Bidirectional GRU model considers both forward and backward temporal aspects of the data, thereby enhancing model accuracy. Additionally, AM contributes by emphasizing critical temporal information. By a wide margin, this combination outperformed the other benchmark models discussed in this paper. Using the suggested RF subset-based approach to choose features helped find the best predictors to improve the performance of the bidirectional GRU model with AM.

The combination of BiGRU, attention mechanisms, and feature selection techniques significantly improves the accuracy and efficiency of stock price prediction models. These methods address the inherent complexities of stock data, such as nonlinearity and temporal dependencies, leading to more precise and interpretable predictions [52,53]. This work also has practical implications in that investors can use it to make investment decisions. This is because a forecasting model with higher prediction accuracy of stock price trends will aid in lowering the risk of profitably investing capital and maximizing investment profit.

It would be interesting to extend the current feature selection approach by adding a feature filtration layer based on causality tests to remove redundant or weakly correlated features before model training. In this way, the variables that provide the highest information to the model are selected, hence, it could reduce computational complexity while maintaining the accuracy, making the model more suitable for real-time stock prediction. The computational time for the feature selection method can also be further enhanced especially for larger datasets. Prior studies have demonstrated that causality-based feature selection improve the efficiency of deep learning model in financial forecasting [54]. To refine the prediction error and improve prediction accuracy, an error-corrected version of the bidirectional GRU with AM model is also of interest. Error correction models are proven to significantly improve the performance of BiGRU models by enhancing prediction accuracy, handling nonlinear data characteristics, reducing overfitting, and increasing model efficiency [55,56].

Supporting information

S1 File. S1 Fig. The actual Nikkei 225 opening price and its forecasted values from different models.

S2 Fig. The actual FTSE 100 opening price and its forecasted values from different models. S3 Fig. The actual S&P 500 opening price and its forecasted values from different models. S4 Fig. The actual CAC 40 opening price and its forecasted values from different models. S5 Fig. The actual IPC opening price and its forecasted values from different models. S6 Fig. The actual DAX opening price and its forecasted values from different models. S7 Fig. The actual AEX opening price and its forecasted values from different models. S8 Fig. The actual BEL 20 opening price and its forecasted values from different models.

https://doi.org/10.1371/journal.pone.0323015.s001

(ZIP)

References

  1. 1. Vogl M. Hurst exponent dynamics of S&P 500 returns: Implications for market efficiency, long memory, multifractality and financial crises predictability by application of a nonlinear dynamics analysis framework. Chaos Solitons Fractals. 2023;166:112884.
  2. 2. Arias-Calluari K, Najafi MN, Harré MS, Tang Y, Alonso-Marroquin F. Testing stationarity of the detrended price return in stock markets. Phys A: Stat Mech Appl. 2022;587:126487.
  3. 3. Tang L, Li J, Du H, Li L, Wu J, Wang S. Big data in forecasting research: a literature review. Big Data Res. 2022;27:100289.
  4. 4. Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics Data Analysis. 2020;143:106839.
  5. 5. Box GEP, Jenkins GM, Reinsel GC, Ljung GM. Time series analysis: Forecasting and control. Hoboken, NJ: John Wiley & Sons. 2015.
  6. 6. Bollerslev T. Generalized autoregressive conditional heteroskedasticity. J Econom. 1986;31(3):307–27.
  7. 7. Engle RF. Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica. 1982;50(4):987.
  8. 8. Hajirahimi Z, Khashei M. A novel parallel hybrid model based on series hybrid models of ARIMA and ANN Models. Neural Process Lett. 2022;54(3):2319–37.
  9. 9. Pan H, Tang Y, Wang G. A Stock index futures price prediction approach based on the MULTI-GARCH-LSTM mixed model. Mathematics. 2024;12(11):1677.
  10. 10. Li Y, Zheng W, Zheng Z. Deep robust reinforcement learning for practical algorithmic trading. IEEE Access. 2019;7:108014–22.
  11. 11. Breiman L. Machine Learning. 2001;45(1):5–32.
  12. 12. Wang H, Yilihamu Q, Yuan M, Bai H, Xu H, Wu J. Prediction models of soil heavy metal(loid)s concentration for agricultural land in Dongli: a comparison of regression and random forest. Ecol Indic. 2020;119:106801.
  13. 13. Niu D, Wang K, Sun L, Wu J, Xu X. Short-term photovoltaic power generation forecasting based on random forest feature selection and CEEMD: A case study. Appl Soft Comput. 2020;93:106389.
  14. 14. Nti IK, Adekoya AF, Weyori BA. Random forest based feature selection of macroeconomic variables for stock market prediction. Am J Appl Sci. 2019;16(7):200–12.
  15. 15. Yuan X, Yuan J, Jiang T, Ain QU. Integrated long-term stock selection models based on feature selection and machine learning algorithms for China stock market. IEEE Access. 2020;8:22672–85.
  16. 16. Sonkavde G, Dharrao DS, Bongale AM, Deokate ST, Doreswamy D, Bhat SK. Forecasting stock market prices using machine learning and deep learning models: A systematic review, performance analysis and discussion of implications. IJFS. 2023;11(3):94.
  17. 17. Jin X-B, Zheng W-Z, Kong J-L, Wang X-Y, Bai Y-T, Su T-L, et al. Deep-learning forecasting method for electric power load via attention-based encoder-decoder with Bayesian optimization. Energies. 2021;14(6):1596.
  18. 18. Hu Z, Zhang J, Ge Y. Handling vanishing gradient problem using artificial derivative. IEEE Access. 2021;9:22371–7.
  19. 19. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9(8):1735–80. pmid:9377276
  20. 20. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint. 2014;arXiv: 1406.1078.
  21. 21. Lu W, Li J, Li Y, Sun A, Wang J. A CNN-LSTM-Based Model to Forecast Stock Prices. Complexity. 2020;2020:1–10.
  22. 22. Touzani Y, Douzi K. An LSTM and GRU based trading strategy adapted to the Moroccan market. J Big Data. 2021;8(1):126. pmid:34603936
  23. 23. Sezer OB, Gudelek MU, Ozbayoglu AM. Financial time series forecasting with deep learning : A systematic literature review: 2005–2019. Appl Soft Comput. 2020;90:106181.
  24. 24. Sako K, Mpinda BN, Rodrigues PC. Neural Networks for Financial Time Series Forecasting. Entropy (Basel). 2022;24(5):657. pmid:35626542
  25. 25. Sheth D, Shah M. Predicting stock market using machine learning: best and accurate way to know future stock prices. Int J Syst Assur Eng Manag. 2023;14(1):1–18.
  26. 26. Li X, Ma X, Xiao F, Xiao C, Wang F, Zhang S. Time-series production forecasting method based on the integration of Bidirectional Gated Recurrent Unit (Bi-GRU) network and Sparrow Search Algorithm (SSA). J Pet Sci Eng. 2022;208:109309.
  27. 27. Hu W, Xiong J, Wang N, Liu F, Kong Y, Yang C. Integrated model text classification based on multineural networks. Electronics. 2024;13(2):453.
  28. 28. Zhou Q, Zhou C, Wang X. Stock prediction based on bidirectional gated recurrent unit with convolutional neural network and feature selection. PLoS One. 2022;17(2):e0262501. pmid:35120138
  29. 29. Mao Z, Wu C. Stock price index prediction based on SSA-BiGRU-GSCV model from the perspective of long memory. Kybernetes. 2024;53(12): 5905-–31.
  30. 30. Wang J, Cheng Q, Dong Y. An XGBoost-based multivariate deep learning framework for stock index futures price forecasting. Kybernetes. 2023;52(10): 4158–4177.
  31. 31. Niu Z, Zhong G, Yu H. A review on the attention mechanism of deep learning. Neurocomputing. 2021;452:48–62.
  32. 32. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Adv Neural Inf Process Syst. 2017;30.
  33. 33. Wang S, Wang X, Wang S, Wang D. Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting. Int J Electr Power Energy Syst. 2019;109:470–9.
  34. 34. Haq AU, Zeb A, Lei Z, Zhang D. Forecasting daily stock trend using multi-filter feature selection and deep learning. Expert Syst Appl. 2021;168:114444.
  35. 35. Wang X, Cai Z, Luo Y, Wen Z, Ying S. Long time series deep forecasting with multiscale feature extraction and Seq2seq attention mechanism. Neural Process Lett. 2022;54(4):3443–66.
  36. 36. Qiu J, Wang B, Zhou C. Forecasting stock prices with long-short term memory neural network based on attention mechanism. PLoS One. 2020;15(1):e0227222. pmid:31899770
  37. 37. Liu X, You J, Wu Y, Li T, Li L, Zhang Z, et al. Attention-based bidirectional GRU networks for efficient HTTPS traffic classification. Inform Sci. 2020;541:297–315.
  38. 38. Chen JX, Jiang DM, Zhang YN. A hierarchical bidirectional GRU model with attention for EEG-based emotion classification. IEEE Access. 2019;7:118530–40.
  39. 39. Teng F, Guo X, Song Y, Wang G. An air target tactical intention recognition model based on bidirectional GRU with attention mechanism. IEEE Access. 2021;9:169122–34.
  40. 40. Investing. Major World Market Indices. 2022 Available online: https://www.investing.com/indices/major-indices (accessed on January 5, 2022).
  41. 41. Niu T, Wang J, Lu H, Yang W, Du P. Developing a deep learning framework with two-stage feature selection for multivariate financial time series forecasting. Expert Syst Appl. 2020;148:113237.
  42. 42. de Rooij M, Weeda W. Cross-validation: A method every psychologist should know. Adv Methods Pract Psychol Sci. 2020;3(2):248–63.
  43. 43. Ullah J, Li H, Soupios P, Ehsan M. Optimizing geothermal reservoir modeling: A unified bayesian PSO and BiGRU approach for precise history matching under uncertainty. Geothermics. 2024;119:102958.
  44. 44. He K, Zhang X, Ren S, Sun J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE international conference on computer vision. 2015:1026–1034. https://doi.org/10.48550/arXiv.1502.01852
  45. 45. Avishay A, Gil C, Vladimir G. Stocks opening price gaps and adjustments to new information. Comput Econ. 2023:1–15. pmid:37362597
  46. 46. Muangprathub J, Intarasit A, Boongasame L, Phaphoom N. Portfolio risk and return with a new simple moving average of price change ratio. Wireless Pers Commun. 2020;115(4):3137–53.
  47. 47. Liu F, Umair M, Gao J. Assessing oil price volatility co-movement with stock market volatility through quantile regression approach. Resour Policy. 2023;81:103375.
  48. 48. Jeyachitra RK, Manochandar S. Machine learning and deep learning: Classification and regression problems, recurrent neural networks, convolutional neural networks. Multimodal Biometric and Machine Learning Technologies. 2023:173–225.
  49. 49. Vural NM, Ilhan F, Yilmaz SF, Ergut S, Kozat SS. Achieving online regression performance of LSTMs with simple RNNs. IEEE Trans Neural Netw Learn Syst. 2022;33(12):7632–43. pmid:34138720
  50. 50. Kisvari A, Lin Z, Liu X. Wind power forecasting – A data-driven method along with gated recurrent neural network. Renewable Energy. 2021;163:1895–909.
  51. 51. Basiri ME, Nemati S, Abdar M, Cambria E, Acharya UR. ABCDM: An attention-based bidirectional CNN-RNN deep model for sentiment analysis. Future Generation Computer Systems. 2021;115:279–94.
  52. 52. Duan Y, Liu Y, Wang Y, Ren S, Wang Y. Improved BIGRU model and its application in stock price forecasting. Electronics. 2023;12(12):2718.
  53. 53. Chen S, Ge L. Exploring the attention mechanism in LSTM-based Hong Kong stock price movement prediction. Quantitative Finance. 2019;19(9):1507–15.
  54. 54. Yu K, Guo X, Liu L, Li J, Wang H, Ling Z, et al. Causality-based feature selection. ACM Comput Surv. 2020;53(5):1–36.
  55. 55. Li L, Jing R, Zhang Y, Wang L, Zhu L. Short-term power load forecasting based on ICEEMDAN-GRA-SVDE-BiGRU and error correction model. IEEE Access. 2023;11:110060–74.
  56. 56. Yu H, Sun H, Li Y, Xu C, Du C. Enhanced short-term load forecasting: Error-weighted and hybrid model approach. Energies. 2024;17(21):5304.