Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A performance comparison of machine learning models for stock market prediction with novel investment strategy

  • Azaz Hassan Khan,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft

    Affiliation Department of Electrical Engineering and Computer Science, Jalozai Campus, University of Engineering and Technology, Peshawar, Pakistan

  • Abdullah Shah,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft

    Affiliation Department of Electrical Engineering and Computer Science, Jalozai Campus, University of Engineering and Technology, Peshawar, Pakistan

  • Abbas Ali,

    Roles Data curation, Formal analysis, Investigation, Methodology, Software, Writing – original draft

    Affiliation Department of Electrical Engineering and Computer Science, Jalozai Campus, University of Engineering and Technology, Peshawar, Pakistan

  • Rabia Shahid,

    Roles Investigation, Methodology, Resources, Supervision, Validation, Writing – review & editing

    Affiliation Department of Electrical Engineering and Computer Science, Jalozai Campus, University of Engineering and Technology, Peshawar, Pakistan

  • Zaka Ullah Zahid,

    Roles Investigation, Methodology, Project administration, Resources, Supervision, Visualization, Writing – review & editing

    Affiliation Department of Electrical Engineering and Computer Science, Jalozai Campus, University of Engineering and Technology, Peshawar, Pakistan

  • Malik Umar Sharif,

    Roles Investigation, Methodology, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Affiliation Department of Electrical Engineering and Computer Science, Jalozai Campus, University of Engineering and Technology, Peshawar, Pakistan

  • Tariqullah Jan,

    Roles Investigation, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Affiliation Department of Electrical Engineering, Main Campus, University of Engineering and Technology, Peshawar, Pakistan

  • Mohammad Haseeb Zafar

    Roles Funding acquisition, Methodology, Project administration, Resources, Writing – review & editing

    mhzafar@cardiffmet.ac.uk

    Affiliation Cardiff School of Technologies, Cardiff Metropolitan University, Cardiff, United Kingdom

Abstract

Stock market forecasting is one of the most challenging problems in today’s financial markets. According to the efficient market hypothesis, it is almost impossible to predict the stock market with 100% accuracy. However, Machine Learning (ML) methods can improve stock market predictions to some extent. In this paper, a novel strategy is proposed to improve the prediction efficiency of ML models for financial markets. Nine ML models are used to predict the direction of the stock market. First, these models are trained and validated using the traditional methodology on a historic data captured over a 1-day time frame. Then, the models are trained using the proposed methodology. Following the traditional methodology, Logistic Regression achieved the highest accuracy of 85.51% followed by XG Boost and Random Forest. With the proposed strategy, the Random Forest model achieved the highest accuracy of 91.27% followed by XG Boost, ADA Boost and ANN. In the later part of the paper, it is shown that only classification report is not sufficient to validate the performance of ML model for stock market prediction. A simulation model of the financial market is used in order to evaluate the risk, maximum draw down and returns associate with each ML model. The overall results demonstrated that the proposed strategy not only improves the stock market returns but also reduces the risks associated with each ML model.

Introduction

Stock markets being one of the essential pillars of the economy have been extensively studied and researched [1]. Forecasting the stock price is an essential objective in the stock market since the higher expected return to the investors can be guaranteed with better prediction [2]. The price and uncertainty in the stock market is predicted by exploiting the patterns found in the past data [3]. The nature of the stock market has always been vague for investors because predicting the performance of a stock market is very challenging. Various factors like the political disturbance, natural catastrophes, international events and much more must be considered in predicting the stock market [4]. The challenge is so huge that even a small improvement in stock market prediction can lead to huge returns.

The stock market can only move in one of the two directions: upwards (when stock prices rise) or downwards (when stock prices fall) [5]. Generally, there are four ways to analyze the stock market direction [6]. The most basic type of analysis is the fundamental analysis, which is the way of analyzing the stock market by looking at the company’s economic conditions, reports and future projects [7]. The second and most common technique is technical analysis [8]. In this method, the direction of the stock market is anticipated by looking at the stock market price charts and comparing it with its previous prices [9]. The third and most advanced technique is the Machine learning (ML) based analysis that analyzes the market with less human interaction [10]. ML models find the patterns inside historical data based on which they try to forecast the stock market prices for the future. The fourth technique, called sentimental-based analysis, analyzes the stock market prices by the sentiments of other individuals like activity on social media or financial news websites [11].

The difficulty of the stock market prediction drew the attention of numerous researchers worldwide. A number of papers have been presented that could predict the stock prices based on ML models. These models include Artificial Neural Network (ANN) [12], Decision Tree (DT) [13], Support Vector Machine (SVM) [14], K-Nearest Neighbors (KNN) [15], Random Forest (RF) [16] and Long Short-Term Memory networks (LSTM) [17]. The proposed systems either used a single ML model optimized for specific stocks [1820], or multiple ML models in order to analyze their performance on different stocks [2124]. Many advanced techniques like hybrid models were also employed in order to improve prediction accuracy [2527].

Different ML models like RF and stochastic gradient boosting were used to predict the prices of Gold and Silver with an accuracy of more than 85% [18]. A novel model based on SVM and Genetic Algorithm, called Genetic Algorithm Support Vector Machine (GASVM), was proposed to forecast the direction of Ghana Stock Exchange [19]. The proposed model achieved an accuracy of 93.7% for a 10-day stock price movement outperforming other traditional ML models. The Artificial Neural Network Garch (ANNG) model was used to forecast the uncertainty in oil prices [20]. In this model, first, the GARCH model is used to predict the oil price. This prediction is then used as input to ANN for improvement in the overall commodity price forecast by 30%.

Different ML models perform differently on the same historical data. Their performance depends on the type of data and the duration for which the past data is available. In many recent papers, multiple ML models were used on the same financial time series data to predict the future price of the stock to see the performance of each ML model [2124]. Comparative analysis of nine ML and two Deep Learning (DL) models was performed on Tehran stock market [21]. The main purpose of this analysis was to compare the accuracy of different models on continuous and binary datasets. The binary dataset was found to increase the accuracy of models. In [22], four ML models (ANN, SVM, Subsequent Artificial Neural Network (SANN) and LSTM) were used to predict the Bitcoin prices using different time frames. The results show that SANN was able to predict the Bitcoin prices with an accuracy of 65%, whereas LSTM showed an accuracy of 53% only. In another comparative study [23], four ML models (Multi-Layer Perceptron (MLP), SVM and RF) were used to forecast the prices for different crypto-currencies like Bitcoin, Ethereum, Ripple and Litecoin using their historical prices. MLP outperformed all other models with an accuracy ranging from 64 to 72%. Similar study was performed in [24] showing the performance comparison of different ML models on the same data.

In some recent studies, hybrid models (a combination of different ML models) are used to forecast stock prices. A hybrid model designed with the SVM and sentimental-based technique was proposed for Shanghai Stock Exchange prediction [25]. This hybrid model was able to achieve the accuracy of 89.93%. A system consisting of k-mean clustering and ensemble learning technique was developed to predict the Chinese stock market [26]. The hybrid prediction model obtained the best forecasting accuracy of the stock price on Chinese stock market. Another hybrid framework was developed in [27] for the Indian Stock Market, this model was developed using SVM with different kernel functions and KNN to predict profit or loss. The proposed system was used to predict the future of stock value. Although the accuracy of the hybrid systems is much higher but they are too complex to be implemented in real-life. Furthermore, a comparative analysis of the prior and proposed study has been shown in Table 1.

thumbnail
Table 1. Comparative analysis of previous and proposed study.

https://doi.org/10.1371/journal.pone.0286362.t001

In almost all the proposed ML-based systems, a primary limitation has been observed in the empirical results. The performance of the ML models were only gauged by their classification ability. Although, it is one of the important parameters being used for the evaluation of the ML model, but it is insufficient to determine the performance of the ML model for stock market prediction. The classification metrics do not take into the account some important factors like returns, maximum draw down, risk-to-reward ratio, transactional cost and the risks associated with each ML model. These factors must be considered in the evaluation of ML models for stock market predictions.

Research cContributions

The following are the major contributions of paper:

  • A performance comparison of nine ML models trained using the traditional methodology for stock market prediction using both performance metrics and financial system simulations.
  • Proposing a novel strategy to train the ML models for financial markets that perform much better than the traditional methodologies.
  • Proposing a novel financial system simulation that provides financial performance metrics like returns, maximum drawdown and risk-to-reward ratio for each ML model.

Paper organization

The rest of the paper is organized as follows: The next section explains the proposed methodology used in training nine ML models for stock market prediction. Section III analyses the outcomes of simulation models in detail. This section consists of ML models simulation as well as Financial models simulations. The conclusions and future directions are discussed in Sections IV and V respectively.

Methodology

In this paper, a software approach is used to apply different ML algorithms to predict the direction of the stock market for Tesla Inc. [28]. This prediction system is implemented in Python using frameworks like Scikit-learn [29], Pandas [30], NumPy [31], Alpaca broker [32] and Plotly [33].

The flowchart of the methodology is illustrated in Fig 1. The first step is to import the stock market data from Alpaca broker and preprocess it using various techniques. The imported stock market data has some information that is not needed in the proposed system. This unwanted data, like trade counts and volume-weighted average price, is removed in the preprocessing stage. Preprocessing also involves handling missing stock prices and cleaning data from unnecessary noise. Missing values can be estimated using interpolation techniques or just by taking the mean value of the point before and after the missing point.

Traditionally, the stock price at the end of the day (EOD) is used in ML-based systems. The variation in the stock price is usually the most in the first hour after the market is open. So, stock price within this hour is more effective than the EOD stock price. The direction of the market is set by the business done in this hour. So, in this paper, the stock price after 15 minutes, when the stock market is open, will also be extracted. The results from the stock price at EOD will be compared with the results from the proposed 15 minutes strategy.

Once the stock price data has been extracted, the subsequent stage involves computing various input features from the technical indicators and statistical formulas. Nine input features, listed in Table 2, are selected for the prediction purposes. These calculated input features are subjected to overfitting tests. These tests are essential because overfit data can cause reduction in the accuracy of the ML models [34].

Following the overfitting tests, the input data is divided into training and testing data. The data is then normalized using Min-Max normalization technique to prevent the biasing phenomenon. Normalization is performed using the following Eq (1): (1)

The input features and output variables are provided to the ML models in order to detect the patterns within the training data. Various ML models have been employed in this study. Table 3 shows the selected nine ML models to predict the direction of the stock market in this paper. The optimal parameters for each ML models are selected through GridSearchCV [35]. A scikit-learn function that helps in selecting best performing parameter for a particular model. After choosing the optimal parameters, the ML models are trained and tested.

In the next step, the outcome of the trained ML models is assessed using some performance metrics. There are a number of classification metrics that can be used to evaluate the performance of an ML algorithm [45]. Usually, three most powerful measures are chosen to classify these models with respect to their performance. The measures are accuracy, F1 score and Receiver Operator Characteristic and Area Under the Curve (ROC_AUC) [46]. The equations for Accuracy and F1_score are shown below: (2) (3)

For evaluation purposes the accuracy, ROC_AUC and F1_score are useful measures, however, they are not sufficient for all problems. Recall and precision are two additional well-known metrics for classification problems [47, 48]. The expression for Recall and Precision are also shown in below: (4) (5)

Additionally, a confusion matrix is used to summarize the performance of each ML model. It provides detailed insight into ML predictions by indicating False Positives (FP), True Positives (TP), False Negatives (FN) and True Negatives (TN) [49]. False Positives show that the model prediction is true while the real sample is false; True Positives show that the model prediction and the real sample both are true; False Negatives represent that the model prediction is false while the real sample is true; True Negatives show that the model prediction and real sample both are false.

In the next step, a novel financial model is developed and simulated to analyze the performance of the trained ML models. The financial performance metrics like Sharpe ratio, maximum drawdown, cumulative return and annual return [50] are used to analyze the performance of the trained ML models.

The Sharpe ratio is the measure of risk-free return while the maximum drawdown is the greatest decline in the value of the portfolio [51]. The equations for Sharpe ratio and maximum drawdown are shown in below: (6) (7) where Rp = Return of portfolio, Rf = Risk free rate, σ = Std of portfolio excess return, P = Peak value before largest drop, and L = Lowest value before new high.

Annual return is the return gained during the period of one year while the cumulative return is the total return on the invested capital within any specific time frame. The expressions for annual return and cumulative return are shown in Eqs (8) and (9). (8) (9) where, E = Ending value, I = Initial value and n = Number of years.

Experimental results

Dataset description and project specifications

Tesla Inc. is a major American automobile company producing technologically advanced electric vehicles. The company has recently obtained a lot of attention due to its stock prices. A drastic increase in revenue in the year 2021 made Tesla stocks very appealing for capitalists and investors around the world as shown in Table 4 [52].

Table 4 shows the annual growth of Tesla from 2016 to 2021. There has been an increase of almost 70.67% in the year 2021. By taking into account the stock volatility in the previous years and its recent growth, Tesla Inc. is an ideal candidate for this study.

The stock prices for Tesla Inc. from 2016 to 2021 are considered for experimental evaluations in this paper. Furthermore, the data is split into training data and test datasets. Table 5 shows the ranges of our datasets. The stock market data for Tesla Inc., downloaded from Alpaca broker, from 2016 to 2021 is shown in Fig 2. Additionally, the project specifications can be found in Table 6.

thumbnail
Fig 2. Imported stock prices of Tesla Inc. from Alpaca broker.

https://doi.org/10.1371/journal.pone.0286362.g002

thumbnail
Table 5. Training data and test data ranges for Tesla Inc. stocks.

https://doi.org/10.1371/journal.pone.0286362.t005

Machine learning models simulation

First, the optimal parameters settings for the nine ML models are selected through GridSearchCV. The selected optimal parametric settings for each model are shown in Table 7.

The simulations for stock market prediction are performed using Python on a Jupiter notebook. ML models were evaluated using Tesla Inc. stock prices for a 1-day time frame and 15-min time interval strategy. These models were first trained on the data from Jan 01, 2016 to Nov 15, 2020. The trained models were then validated on the test data from Nov 16, 2020 to Dec 31, 2021 as shown in Table 5.

Tables 810 show the classification report for nine different ML models. Tables 8 and 9 show the performance metrics for different ML models for a 1-day time frame and 15-min time interval strategy. These tables list the accuracy, F1 score, ROC AUC, precision and recall in percentage for all of the ML models. Table 10 shows the confusion matrix for the ML models. It lists the number of correct and wrong predictions made by each ML model.

thumbnail
Table 8. Classification metrics for Tesla Inc. stocks for 1-day time frame data.

https://doi.org/10.1371/journal.pone.0286362.t008

thumbnail
Table 9. Classification metrics for Tesla Inc. stocks for the proposed 15-min time interval strategy.

https://doi.org/10.1371/journal.pone.0286362.t009

ML models simulation results for 1-day time frame.

Table 8 shows the performance metrics of nine ML models optimized for a 1-day time frame. As shown in the table, the Logistic Regression achieved the highest accuracy of 85.51% while the Naive Bayes model is found to be the least accurate model with an accuracy of 73.49%. Other classification metrics in Table 8 show a similar tendency with Logistic Regression having the best performance followed by XG Boost and Random Forest.

The confusion matrix in Table 10 shows a similar trend. For Logistic Regression, the True Positives are 132 and the False Positives are 26 for the ‘Move Up’ class. The True Negatives are 110 and the False Negatives are 15 for the ‘Move Down’ class.

Based on the discussion above, it can be seen that the performance of Logistic Regression model is better than the rest of the models for 1-day time frame. Even though its accuracy among the nine ML models is only 85.51%.

The graphical illustration of the predictions made by the Logistic Regression model for a 1-day time frame can be seen in Fig 3. It can be seen that the trained Logistic Regression model is able to make more profits than losses. However, it is interesting to note that sometimes the predictions made by the LR model are wrong in the consecutive trades that results in more drawdown. For example, during the period 180 to 230 days, there are a total of 6 trades executed, out of which 4 are losses and 2 are profitable trades.

thumbnail
Fig 3. Graphical illustration of Logistic Regression predictions on Tesla stocks for (1-day time frame).

https://doi.org/10.1371/journal.pone.0286362.g003

ML model simulation results for the proposed 15-min strategy.

In this paper, a novel 15-min time interval strategy has been proposed. In this strategy, the initial 15-min time interval is filtered out from 1-day time frame. Then the filtered 15-min time frame is used to train and validate the ML models in order to make prediction for the time frame of 1-day.

Table 9 shows the performance metrics of the ML models optimized for a 15-min time interval strategy. As shown in Table, the Random Forest achieved the highest accuracy of 91.27% followed by XG Boost and ADA Boost model. The KNN model is found to be the least accurate model with an accuracy of 80.53%. Other classification metrics in Table 9 show a similar tendency with the Random Forest having the best performance model.

The confusion matrix in Table 10 shows a similar trend. For Random Forest, the True Positives are 130 and the False Positives are 15 for the ‘Move Up’ class. The True Negatives are 142 and the False Negatives are 11 for the ‘Move Down’ class. When the results in Tables 8 and 9 are compared, it can be observed that by employing the proposed methodology, the performance of all the ML models has been greatly improved.

The graphical illustration of the predictions made by the Random Forest model is shown in Fig 4, it shows the loss and profit in trades. It can also be observed that by using our proposed strategy, the number of consecutive losses has also been reduced. As shown in Fig 4(b), there are only 2 consecutive losses, which occurred during the period of 150 to 200. Factually, the proposed methodology has not only improved the performance metrics of the ML models but it also reduced the number of consecutive losses.

thumbnail
Fig 4. Graphical illustration of Random Forest predictions on Tesla stocks for (15-min time interval strategy).

https://doi.org/10.1371/journal.pone.0286362.g004

Financial models simulation

In this section, a novel financial simulation model is built that is able to make investment based on the decision of the ML model. Each ML model is evaluated using financial parameters to validate their performance and suitability for real-time stock market trading. The performance of ML models is gauged using cumulative return, annual return, maximum drawdown, Sharpe ratio and capital in hand at the end of the investment period.

Initially, a USD 10k is invested. A commission fee of 0.1% (Alpaca standard commission fee) is set for each buy or sell trade. Based on the prediction by the ML model, a decision regarding buying, holding or shorting a share is taken. A single share is bought or sold on each trade to validate the performance of ML models.

Figs 5 and 6 show the portfolio performance of ML models on Tesla Inc. stocks for a 1-day time frame and 15-min time interval strategy. These figures show how initial capital is used to buy and sell shares based on the decision made by the ML models. Each box in the figure represents one full year from Jan 01 till Dec 31. The portfolio of each ML model is compared to a benchmark that serves as a reference for all models. This benchmark is obtained using the positive gains of stock prices.

thumbnail
Fig 5. Portfolio analysis of ML models on Tesla Inc. stocks for 1-day time frame.

https://doi.org/10.1371/journal.pone.0286362.g005

thumbnail
Fig 6. Portfolio analysis of ML models for Tesla Inc. stocks on the proposed 15-min time interval strategy.

https://doi.org/10.1371/journal.pone.0286362.g006

Financial simulation results for 1-Day time frame data.

The simulated outcomes of the ML models to forecast the stock price of Tesla Inc. for a 1-day period are displayed in Table 11. In the previous section, it was shown that Logistic Regression had the highest accuracy as compared to the other ML models. Therefore, it is expected that this ML model will generate highest revenue. However, the outcome of the financial simulations shows different results. It can be seen in Table 11 that the Random Forest is the best ML model with an ending capital of USD 28,966. It has a cumulative return of 189.66%, and an annual return of 19.48%, with the highest Sharpe ratio of 0.68. The Random Forest did poorly at first but after the 2019 financial market crisis, it outperformed all other ML models. The maximum drawdown of the Random Forest model is -37.21% which happened during 2019 financial crisis as shown in Fig 7. This is the lowest drawdown by any ML model.

thumbnail
Fig 7. Maximum drawdown of Random Forest strategy for Tesla Inc. stocks on 1-day time frame.

https://doi.org/10.1371/journal.pone.0286362.g007

thumbnail
Table 11. Financial performance of ML models for Tesla Inc. stocks on 1-day time frame.

https://doi.org/10.1371/journal.pone.0286362.t011

The reason for better revenue generation by the Random Forest model is the quality of each True Positive and True Negative outcome. Even though the accuracy of the model is inferior to the Logistic Regression, each of its correct prediction resulted in more profit. The annual growth of Tesla Inc. from 2020 to 2021 is more than 70% as shown in Table 4. Any correct prediction during this time will result in greater revenue generation. Random Forest model outperformed all other models during this time as shown in Fig 5. Among the ML models, the Naive Bayes model shows the worst performance. Fig 5 shows that the Naive Bayes model is negative most of the time during the simulation. It is the only model with a negative cumulative return of -19.16% and worst Sharpe ratio of 0.1.

Financial simulation results for the proposed 15-min strategy.

The portfolio performance of the ML models using the proposed approach of a 15-min time interval strategy is shown in Fig 6. This figure shows that the performance of some of the models has improved significantly when compared with a 1-day time frame. It can also be noticed that the models maintained their stability throughout the financial crisis of 2019, which indicates a significant improvement in the real-time performance of the models.

Table 12 displays the outcome of the financial model simulation of ML models trained and validated on Tesla Inc. stocks for a 15-min time interval strategy. As expected, it can be seen that the Random Forest is the best performing model with an ending capital of USD 25,300. It records a cumulative return of 153% and annual return of 16.80% with the highest Sharpe ratio of 0.79. The maximum drawdown by the Random Forest model is—35.09% as shown in Fig 8, but it still able to generate the highest ending capital.

thumbnail
Fig 8. Maximum drawdown of Random Forest strategy for Tesla Inc. stock on the proposed 15-min time interval strategy.

https://doi.org/10.1371/journal.pone.0286362.g008

thumbnail
Table 12. Financial performance of ML models for Tesla Inc. stock on the proposed 15-min time interval.

https://doi.org/10.1371/journal.pone.0286362.t012

The above discussion shows that KNN is the worst performing model on the proposed strategy. Although, Random Forest is the best model in terms of portfolio returns but ANN is the most rewarding model with a Sharpe ratio of 0.91 on the proposed 15-min time interval strategy.

Conclusion

In this paper, nine ML models are used to predict the direction of the Tesla Inc. stock prices. The performance of this stock is first assessed for a 1-day time frame followed by a proposed 15-min time interval strategy. Following the traditional methodology, the Logistic Regression achieved the highest accuracy of 85.51% while Naive Bayes model is found to be the least accurate model with an accuracy of 73.49%. The proposed strategy significantly improved the classification performance of the ML models. With this strategy, the Random Forest model achieved the highest accuracy of 91.93% followed by XG Boost and ADA Boost. Conversely, the KNN model is found to be the least accurate model with an accuracy of 80.53%.

In this paper, it was shown that only classification metrics are not enough to justify the performance of ML models in the stock market. These metrics do not consider important factors like risk, maximum draw down and returns associate with each ML model. A simulation model of the financial market is used to simulate the trained ML models so that their performance is gauged with actual investment strategies. The evaluated results revealed that although some models are performing well in terms of portfolio returns on a traditional methodology but models on the proposed 15-min time frame strategy are significantly better in terms of risk to reward ratio and maximum drawdown. The evaluated result shows that Random Forest outperformed other models in terms of returns in both 1-day and 15-min time interval strategy.

Some other interesting observations are revealed by the comparison of the classification and financial results. The Logistic Regression model has the highest accuracy for a 1-day time frame data. So, it was expected that this ML model will generate the highest revenue. However, the outcome of the financial simulations showed different results. Similarly, the accuracy of the Random Forest model for a 15-min time interval strategy was much higher than the accuracy of the Random Forest model for a 1-day time frame. But instead of generating higher revenue on 15-min time frame strategy, it generated higher revenue on 1-day time frame. The above discussion revealed that however, the accuracy of the ML models is an important factor but the quality of each true positive outcome and true negative outcome is an equally important factor in the performance evaluation of the ML models for stock market prediction.

The overall results show that the proposed strategy has not only improved classification metrics but it also enhanced the stock market returns, risks and risk to reward ratio of each ML model. Additionally, the results also revealed that how important it is to consider both classification as well as financial analysis to evaluate the performance of the ML model on stock market.

Supporting information

S1 File. Github file.

The data and script has been uploaded to GitHub. It can be accessed using the following link: https://github.com/AzazHassankhan/Machine-Learning-based-Trading-Techniques/.

https://doi.org/10.1371/journal.pone.0286362.s001

(IPYNB)

References

  1. 1. Ghysels E. and Osborn D. R., “The Econometric Analysis of Seasonal Time Series,” Cambridge University Press, Cambridge, 2001.
  2. 2. Karpe M., “An overall view of key problems in algorithmic trading and recent progress,” arXiv, June. 9, 2020, Available online: 10.48550/arXiv.2006.05515
  3. 3. Clements M. P., Franses P. H. and Swanson N. R., “Forecasting economic and financial time-series with non-linear models,” International Journal of Forecasting, vol. 20, no. 2, pp. 169–183, 2004.
  4. 4. Khositkulporn P., “The Factors Affecting Stock Market Volatility and Contagion: Thailand and South-East Asia Evidence,” Ph.D. dissertation, Dept. Business Administration, Victoria University, Melbourne, Australia, Feb. 2013.
  5. 5. Wang L., “Dynamical Models of Stock Prices Based on Technical Trading Rules—Part III: Application to Hong Kong Stocks,” IEEE Transactions on Fuzzy Systems, vol. 23, pp. 1680–1697, Nov. 24, 2014.
  6. 6. Shah D., Isah H. and Zulkernine F., “Stock Market Analysis: A Review and Taxonomy of Prediction Techniques,” International Journal of Financial Studies, vol. 7, 27 May. 2019.
  7. 7. Segal T., “Fundamental Analysis,” Investopedia, Aug. 25, 2022, Available online: www.investopedia.com, Accessed on: 01-04-2022.
  8. 8. Ayala J., Torres M. G., Noguera J. L. V., Gómez-Vela F., Divina F., “Technical analysis strategy optimization using a machine learning approach in stock market indices,” Knowledge-Based System, vol. 225, Aug. 5, 2021.
  9. 9. Oğuz R. F., Uygun Y., Aktaş M. S. and Aykurt İ., “On the Use of Technical Analysis Indicators for Stock Market Price Movement Direction Prediction,” in Signal Processing and Communications Applications Conference, Sivas, Turkey, 2019.
  10. 10. Vijh M., Chandola D., Tikkiwal V. A. and Kumar A., “Stock Closing Price Prediction using Machine Learning Techniques,” International Conference on Computational Intelligence and Data Science, vol. 167, pp. 599-606, 2020.
  11. 11. Jariwala G., Agarwal H. and Jadhav V., “Sentimental Analysis of News Headlines for Stock Market,” IEEE International Conference for Innovation in Technology, Bangluru, India, pp. 1-5, 2020.
  12. 12. Guresen E., Kayakutlu G. and Daim T. U., “Using artificial neural network models in stock market index prediction,” Expert Systems with Applications, vol. 38, no. 8, pp. 10389–10397, Aug. 2011.
  13. 13. Wu M. C., Lin S. Y. and Lin C. H., “An effective application of decision tree to stock trading,” Expert Systems with Applications, vol. 31, no. 2, pp. 270–274, Aug. 2006.
  14. 14. Kim K.-J., “Financial time series forecasting using support vector machines,” Neurocomputing, vol. 55, nos. 1–2, pp. 307–319, Sep. 2003.
  15. 15. Subha M. V. and Nambi S. T., “Classification of stock index movement using k-nearest neighbours (k-NN) algorithm,” WSEAS Transactions on Information Science and Applications, vol. 9, no. 9, pp. 261–270, 2012.
  16. 16. Lohrmann C. and Luukka P., “Classification of intraday S& P500 returns with a random forest,” International Journal of Forecasting, vol. 35, no. 1, pp. 390–407, Jan. 2019.
  17. 17. Fischer T. and Krauss C., “Deep learning with long short-term memory networks for financial market predictions,” European Journal of Operational Research, vol. 270, no. 2, pp. 654–669, 2018.
  18. 18. Sadorsky P., “Forecasting solar stock prices using tree-based machine learning classification: How important are silver prices?,” The North American Journal of Economics and Finance, vol. 61, July. 4, 2020.
  19. 19. Nti I. K., Adekoya A. F. and Weyori B. A., “Efficient Stock-Market Prediction Using Ensemble Support Vector Machine,” Open Computer Science, July. 4, 2020.
  20. 20. Kristjanpoller W. and Minutolo M. C., “Forecasting volatility of oil price using an artificial neural network-garch model,” Expert Systems with Applications, vol. 65, pp. 233–241, Dec. 15, 2016.
  21. 21. Nabipour M., Nayyeri P., Jabani H., Shahab S. and Mosavi A., “Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous and Binary Data; a Comparative Analysis,” IEEE Access, vol. 8, pp. 150199–150212, Aug. 12, 2020.
  22. 22. Mudassir M., Bennbaia S., Unal D. and Hammoudeh M., “Time-series forecasting of Bitcoin prices using high-dimensional features: a machine learning approach,” Neural Computing and Applications, July. 4, 2020. pmid:32836901
  23. 23. Valencia F., Espinosa A. G. and Aguirre B. V., “Price Movement Prediction of Cryptocurrencies Using Sentiment Analysis and Machine Learning,” entropy, vol. 21, June. 14, 2019. pmid:33267303
  24. 24. Lin Y., Shancun L., Haijun Y. and Harris W., “Stock Trend Prediction Using Candlestick Charting and Ensemble Machine Learning Techniques with a Novelty Feature Engineering Scheme,” IEEE Access, vol. 9, pp. 101433–101446, July. 13, 2021.
  25. 25. Ren R., Wu D. D. and Liu T., “Forecasting Stock Market Movement Direction Using Sentiment Analysis and Support Vector Machine,” IEEE Systems Journal, vol.13, pp. 760–770, Mar. 27, 2018.
  26. 26. Xu Y., Yang C., Peng S. and Nojima Y., “A hybrid two-stage financial stock forecasting algorithm based on clustering and ensemble learning,” Applied Intelligence, pp. 3852–3867, July. 4, 2020.
  27. 27. Nayaka R. K., Mishra D. and Rath A. K., “A Naïve SVM-KNN based stock market trend reversal analysis for Indian benchmark indices,” Applied Soft Computing, vol. 35, pp. 670–680, Oct, 2015.
  28. 28. Tesla Inc., Available online: www.tesla.com, Accessed on: Feb. 1, 2022.
  29. 29. Scikit-Learn, Available online: www.scikit-learn.org, Accessed on: Feb. 15, 2022.
  30. 30. Pandas, Available online: www.pandas.org, Accessed on: Feb. 16, 2022.
  31. 31. Numpy, Available online: www.numpy.org, Accessed on: Feb. 3, 2022.
  32. 32. Alpaca, Available online: alpaca.markets, Accessed on: Jan. 1, 2022.
  33. 33. Plotly, Available online: www.plotly.com, Accessed on: March. 1, 2022.
  34. 34. J. Frankle and M. Carbin, “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks,” International Conference on Learning Representations (ICLR), 2019.
  35. 35. Ranjan G. S. K., Verma A. K. and Sudha R., “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” International Conference for Convergence in Technology, pp. 1-5, 2019.
  36. 36. Cao L. J. and Tay F. E. H., “Support vector machine with adaptive parameters in financial time series forecasting,” IEEE Transactions on Neural Networks, vol. 14, pp. 1506–1518, Nov, 2003. pmid:18244595
  37. 37. Patel H. and Prajapati P., “Study and Analysis of Decision Tree Based Classification Algorithms,” International Journal of Computer Sciences and Engineering, Vol.6, pp.74–78. 2018.
  38. 38. Yoon B., Jeong Y. and Kim S., “Detecting a Risk Signal in Stock Investment Through Opinion Mining and Graph-Based Semi Supervised Learning,” IEEE Access, vol. 8, pp. 161943–161957, Sept. 02, 2020.
  39. 39. Naik N. and Mohan B. R., “Novel Stock Crisis Prediction Technique—A Study on Indian Stock Market,” IEEE Access, vol. 9, pp. 86230–86242, June. 14, 2021.
  40. 40. Chen Y. and Hao Y., “A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction,” Expert Systems with Applications: An International Journal, vol. 80, pp. 340–355, Sep, 2017.
  41. 41. Yuan X., Yuan J., Jiang T. and Ain Q. U., “Integrated Long-Term Stock Selection Models Based on Feature Selection and Machine Learning Algorithms for China Stock Market,” IEEE Access, vol. 8, pp. 22672–22685, 2020.
  42. 42. Li G., Zhang A., Zhang Q., Wu D. and Zhan C., “Pearson Correlation Coefficient-Based Performance Enhancement of Broad Learning System for Stock Price Prediction,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 5, pp. 2413–2417, May 2022.
  43. 43. Kim S., Ku S., Chang W. and Song J. W., “Predicting the Direction of US Stock Prices Using Effective Transfer Entropy and Machine Learning Techniques,” IEEE Access, vol. 8, pp. 111660–111682, 2020.
  44. 44. Chen L., Qiao Z., Wang M., Wang C., Du R. and Stanley H. E., “Which Artificial Intelligence Algorithm Better Predicts the Chinese Stock Market?,” IEEE Access, vol. 6, pp. 48625–48633, 2018.
  45. 45. Choudhary R. and Gianey H., “Comprehensive Review On Supervised Machine Learning Algorithms,” International Conference on Machine learning and Data Science, pp. 37-43, 2017.
  46. 46. Nousi P., Tsantekidis A., Passalis N., Ntakaris A., Kanniai J., Tefas A., Gabbouj M. et al., “Machine Learning for Forecasting Mid-Price Movements Using Limit Order Book Data,” IEEE Access, vol. 7, pp. 64722–64736, 2019.
  47. 47. Ntakaris A., Mirone G., Kanniainen J., Gabbouj M. and Iosifidis A., “Feature Engineering for Mid-Price Prediction With Deep Learning,” IEEE Access, vol. 7, pp. 82390–82412, 2019.
  48. 48. A. George and A. Ravindran, “Distributed Middleware for Edge Vision Systems,” 2019 IEEE 16th International Conference on Smart Cities: Improving Quality of Life Using ICT & IoT and AI (HONET-ICT), Charlotte, NC, USA, 2019, pp. 193-194.
  49. 49. Lin Y. F., Huang T. M., Chung W. H. and Ueng Y. L., “Forecasting Fluctuations in the Financial Index Using a Recurrent Neural Network Based on Price Features,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 5, no. 5, pp. 780–791, Oct. 2021.
  50. 50. Shachmurove A. and Shachmurove Y., “Annualized and cumulative returns on venture-backed public companies categorized by industry,” Journal of Entrepreneurial Finance, vol. 9, pp. 41–60, no. 3, 2004.
  51. 51. Soleymani F. and Paquet E., “Financial Portfolio Optimization with Online Deep Reinforcement Learning and Restricted Stacked Autoencoder—DeepBreath,” Expert Systems with Applications, vol.156, pp. 113456, 2020.
  52. 52. Csi Market, Available online: www.csimarket.com, Accessed on: April. 1, 2022.