Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Machine learning in epidemiology: Neural networks forecasting of monkeypox cases

  • Lulah Alnaji

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Mathematics, University of Hafr Al-Batin, Hafr Al-Batin, Saudi Arabia


This study integrates advanced machine learning techniques, namely Artificial Neural Networks, Long Short-Term Memory, and Gated Recurrent Unit models, to forecast monkeypox outbreaks in Canada, Spain, the USA, and Portugal. The research focuses on the effectiveness of these models in predicting the spread and severity of cases using data from June 3 to December 31, 2022, and evaluates them against test data from January 1 to February 7, 2023. The study highlights the potential of neural networks in epidemiology, especially concerning recent monkeypox outbreaks. It provides a comparative analysis of the models, emphasizing their capabilities in public health strategies. The research identifies optimal model configurations and underscores the efficiency of the Levenberg-Marquardt algorithm in training. The findings suggest that ANN models, particularly those with optimized Root Mean Squared Error, Mean Absolute Percentage Error, and the Coefficient of Determination values, are effective in infectious disease forecasting and can significantly enhance public health responses.


The Monkeypox Virus (MPXV), a member of the Orthopoxvirus genus, is the causative agent of the infectious disease known as monkeypox. This virus is predominantly found in Central and West African countries, with sporadic cases reported in other regions, including the United States and the United Kingdom [13].

Transmission of MPXV to humans often occurs through direct contact with infected animals or contaminated objects, such as body fluids, sores, or bedding [1, 2, 4]. Human-to-human transmission is also possible, mainly through close physical interaction with infected individuals or exposure to their bodily fluids [1, 4]. Symptoms of MPXV infection include fever, headache, muscle aches, and a characteristic rash that spreads across the body [1, 4]. In severe cases, complications such as pneumonia, sepsis, and encephalitis can occur [1, 2, 4].

No specific antiviral treatment for MPXV currently exists; however, supportive care can aid in symptom management and reduction of complication risks [1, 2, 4]. Vaccination against smallpox has shown some effectiveness in preventing monkeypox, but routine smallpox immunization is no longer practiced [1, 2]. Therefore, public health measures such as contact tracing, quarantine, and isolation are essential in controlling the spread of the disease [1, 2, 4].

MPXV, part of the Orthopoxvirus family, was first identified in monkeys in the Democratic Republic of the Congo in 1958 and in humans in 1970. The virus is endemic in certain areas of Central and West Africa, with occasional outbreaks. Reports of cases in countries outside Africa, including the United States, Canada, Portugal, and Spain, have increased recently [5].

The first recorded case of MPXV in the United States occurred in 2003 in a traveler from West Africa. This led to an investigation that identified 47 confirmed or probable cases across six states, primarily linked to prairie dogs infected with the virus. In Canada, a similar outbreak occurred in 2003, with two confirmed cases of MPXV in individuals who had traveled to West Africa [6, 7].

Portugal reported its first outbreak of MPXV in 2018, with nine cases linked to recent travel to Nigeria. In 2021, Spain experienced its first outbreak of MPXV, with two cases also related to travel to Nigeria. These instances highlight the increasing frequency of MPXV cases outside Africa, underscoring the need for vigilant surveillance and preparedness to manage any outbreaks [8].

The utilization of machine learning in epidemiological research represents a transformative approach to understanding and managing infectious diseases. Building on the existing state of the art in disease forecasting, particularly leveraging machine learning techniques, our study aims to enhance the predictive modeling of monkeypox spread. While previous studies like [9] have developed neural network models for forecasting monkeypox in various countries, our research focuses on employing Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) models to predict monkeypox cases in Canada, Spain, the USA, and Portugal. This approach not only addresses a gap in monkeypox research but also compares the efficacy of different neural network models, contributing perspective to the field.

The recent upsurge in monkeypox cases globally highlights the need for improved disease monitoring and forecasting methods. In this context, our study introduces advanced machine learning techniques, namely ANN, LSTM, and GRU models, to predict monkeypox outbreaks. This research is significant in its focus on the latest MPXV outbreaks and its comparative evaluation of different neural network models for forecasting the disease’s spread in various countries.

Literature review

The study of infectious diseases, particularly emerging viruses like MPXV, has increasingly incorporated machine learning approaches to enhance prediction and management strategies. Key studies in this field have demonstrated the utility of various neural network models, such as ANN, LSTM, and GRU, in understanding and forecasting disease patterns [1015]. Our work builds upon these foundations, particularly focusing on recent developments in monkeypox forecasting.

Early detection and prediction of infectious diseases like MPXV are crucial for effective management and response. ANN approaches, as utilized in forecasting COVID-19 cases in Pakistan, provide valuable insights for healthcare professionals and policymakers [16].

ANN techniques are increasingly being used to predict patient outcomes in various diseases, including COVID-19, breast cancer, and cardiovascular disease. For example, ANN models were employed in assessing breast cancer risk among Iranian women [17].

While prior research like [9] has leveraged neural networks for predicting MPXV spread in specific regions, our study extends this application to Canada, Spain, the USA, and Portugal. This expansion is crucial, given the distinct epidemiological profiles and healthcare systems in these countries. Such comparative analysis contributes novel insights into the geographical variance in MPXV outbreak dynamics.

The role of machine learning in epidemiological modeling has evolved rapidly, with recent advances highlighting its potential in real-time disease surveillance and response planning. Studies have explored various machine learning techniques, including deep learning and predictive analytics, to enhance the accuracy of disease outbreak predictions and to understand transmission dynamics [16, 1820].

Our study contributes to this growing body of literature by employing a combination of ANN, LSTM, and GRU models, enhanced with the ADAM optimizer [21] and the Levenberg-Marquardt learning algorithm [22]. This approach not only allows for a comprehensive analysis of MPXV spread but also offers a methodological framework that can be adapted for other infectious diseases. The integration of advanced machine learning models in our research addresses a critical gap in current epidemiological studies.

The study utilizes a range of ANN models, including LSTM and GRU, to predict MPXV cases in the USA, Canada, Spain, and Portugal, based on existing datasets. The comparative analysis of these countries will assist healthcare authorities in formulating appropriate response strategies. This research is the first in-depth study using ANN on recent MPXV outbreaks, offering new insights into the epidemic’s dynamics. Time series dataset of MPXV cases from each country, along with statistical graphs of confirmed cases, is presented [23]. The distribution and geographical representation of confirmed Monkeypox cases across the studied nations are depicted in Figs 1 and 2 (Fig 1a shows the distribution of confirmed cases, while Fig 1b provides a geographical representation on a global map). Additionally, the sequence of confirmed MPXV instances, detailed with peak intervals from June to October 2022 for Canada, Portugal, Spain, and the USA, are illustrated in Fig 2a–2d.

Fig 1. (a) Distribution of confirmed Monkeypox cases across the studied nations, (b) Geographical representation of the studied nations on a global map.

Fig 2.

(a) Sequence of confirmed MPXV instances in Canada, with a detailed view of the peak interval (June to October 2022), (b) Sequence of confirmed MPXV instances in Portugal, with a detailed view of the peak interval (June to October 2022), (c) Sequence of confirmed MPXV instances in Spain, with a detailed view of the peak interval (June to October 2022), (d) Sequence of confirmed MPXV instances in the USA, with a detailed view of the peak interval (June to October 2022).

The prediction model uses data from the “Our World in Data” website, employing neural network, LSTM, and GRU models. The model’s performance is enhanced using an Adaptive Moment Estimation (ADAM) optimizer [21]. Additionally, a Levenberg-Marquardt (LM) learning algorithm is implemented for a single hidden layer ANN model, optimizing the number of neurons using the K-fold cross-validation early stopping validation approach [22]. ANN-based regression models have been effective in predicting the spread of infectious diseases like MPXV. These models enable informed decision-making by healthcare professionals and policymakers in controlling disease spread and responding effectively to outbreaks. ANN models have been applied in various domains for time-series prediction, demonstrating their versatility and efficacy [1015].

The remainder of this paper is organized as follows: The Methodology section discusses the methodology used in this study. The Results and Discussions section presents the findings of the research. Following that, the Forecasting Methodology section covers the approach taken for forecasting. The paper concludes with the Conclusion section, summarizing the study’s key findings.


In the manuscript, the choice of modeling methods, including ANN, LSTM, and GRU, is justified by their proven effectiveness in time-series analysis and epidemiological forecasting. ANN is renowned for its ability to model complex nonlinear relationships, making it ideal for predicting disease spread [24]. LSTM and GRU, as advanced recurrent neural networks, effectively capture temporal dependencies in data, crucial for accurate disease trend predictions [2527]. These methodologies are selected for their ability to handle the intricacies and variabilities in infectious disease data, making them suitable for this study’s purpose. The assumptions underlying these models are standard in the field and have been extensively validated in prior research, ensuring their applicability and reliability in this context.

  • Data Representativeness: The assumption that the datasets used are representative of the wider population and accurately reflect the trends in monkeypox cases.
  • Stationarity of Data: The presumption that the underlying characteristics of the monkeypox data, such as trends and patterns, remain consistent over the period of study.
  • Impact of External Factors: The study assumes that external factors not included in the model (like public health interventions, changes in virus transmissibility) have a negligible impact on the predictions.

This study employs a comparative approach, analyzing ANN, LSTM, and GRU models due to the lack of existing research focusing on the same countries and time period. These models were selected for their proven capabilities in time-series prediction and their adaptability to different data characteristics. The comparative analysis allows for a nuanced understanding of each model’s strengths and weaknesses in predicting monkeypox outbreaks.

  • Data Preprocessing and Normalization: The data underwent preprocessing to correct irregularities and ensure consistency. Normalization, crucial for neural network models, involved scaling input and target values to a [0, 1] range. This step minimizes biases and enhances model interpretability.
  • Model Calibration: Model calibration involved fine-tuning hyperparameters for optimal performance. This process included adjusting learning rates, batch sizes, and layer configurations to enhance model accuracy and efficiency in data prediction.
  • Validation Techniques: K-fold cross-validation was employed to ensure model robustness and avoid overfitting. This technique involved dividing the dataset into ‘K’ subsets and iteratively training and testing the model on these subsets, providing a comprehensive assessment of model performance.
  • Performance Metrics: Statistical measures such as RMSE, MAE, and R-squared were utilized to evaluate model performance. These metrics provided quantitative insights into the model’s prediction accuracy, reliability, and fit to the data.

The artificial neural network

ANN inspired in part by the neuronal architecture of the human brain, consist of simple processing units capable of handling scalar messages. Their extensive interconnection and adaptive interaction between units make ANNs a multi-processor computer system [28, 29]. ANNs offer a rapid and flexible approach to modeling, suitable for tasks such as rainfall-runoff prediction [30]. The network comprises layers of interconnected neurons, where connection weights between one or more hidden layers connect the input and output layers [31]. During training, the Back Propagation algorithm adjusts the network weights to reduce errors between the predicted and actual outputs [31]. After training with experimental data to obtain the optimal structure and weights, ANNs undergo evaluation using additional experimental data for validation [31]. The Multilayer Perceptron, a type of ANN with one or more hidden layers in the feed-forward network, is particularly prevalent [31]. In ANNs, a node, a data structure, is connected in a network trained using standard methods like gradient descent [24, 32, 33]. Each node in this memory or neural network has two active states (on or off) and one inactive state (off or 0), while each edge (synapse or link between nodes) carries a weight [3436]. Positive weights stimulate or activate the next inactive node, whereas negative weights inhibit or deactivate the subsequent active node [34, 35, 37].

Each neuron in an ANN receives input from preceding neurons, with weights denoted as wpc. The weighted sum of each neuron’s inputs is passed through a sigmoid function, represented by: (1)

Here, xi is the input to the i-th neuron in the preceding layer, j represents the current neuron, and n the number of neurons in the preceding layer. Similarly, weights wkj from neuron j to the subsequent neuron k are computed. The output y of the neural network for input x and true output t is derived by applying the activation function to the weighted sum of the previous layer’s output: (2)

The quantity m represents the number of neurons in the preceding layer. The objective of training the neural network is to identify the weights wpc and wkj that minimize the error between the predicted output yk and the true output t. This involves minimizing the cost function E(w), the average squared difference between the predicted and actual output across training samples: (3)

Here, xn denotes the n-th input example, tn the corresponding true output, and N the total number of training examples. The factor simplifies gradient calculation of the cost function during training.

The LM optimizer, a widely used type of ANN, was employed in this study for epidemic prediction [38, 39]. The ANN was trained on a dataset using the LM technique, optimizing the network by training with specific inner neurons [38, 39]. Performance was evaluated using the Root Mean Square Error (RMSE) and correlation coefficient to minimize the cost function value [38, 39].


In numerical analysis, the LM algorithm is a renowned optimization technique for addressing nonlinear least squares problems. The LM method modifies the estimated Hessian matrix JTJ by incorporating a positive combination coefficient μ and an identity matrix I. This adjustment ensures the invertibility of the Hessian matrix, as expressed in: (4)

This approximation ensures that the diagonal components of the predicted Hessian matrix are greater than zero, consequently guaranteeing the invertibility of H [40, 41]. The LM algorithm employs a blend of the steepest descent and Gauss-Newton algorithms. When μ is close to zero, Eq (4) aligns with the Gauss-Newton method, while a large μ leads to the application of the steepest descent approach [42].

The update rule for the LM algorithm, represented in Eq (5), involves the weight vector Vk+1 and the error vector ek: (5)

Eq (5) is also recognized as the Gauss-Newton procedure [40].

Adaptive moment estimation optimization

ADAM is a widely adopted optimization technique in deep learning, merging aspects of gradient descent with momentum and the Root Mean Square Propagation optimizer [21]. ADAM aims to address the shortcomings of conventional optimization methods, such as sensitivity to step size and gradient noise, by adjusting the learning rate based on estimations of the gradients’ first and second moments.

The update rule for ADAM is given by: (6) where ϵ is a small constant to avoid division by zero, θt denotes the weights at time step t, α is the learning rate, and and are the first and second-moment estimations of the gradients, respectively.

The first-moment estimation, , an exponential moving average of the gradients, is calculated as: (7) where mt−1 is the previous first moment estimate, gt is the gradient at time step t, and β1 is the decay rate hyperparameter for the first moment estimation.

The second-moment estimation, , involves the exponential moving average of squared gradients: (8) where vt−1 represents the previous second moment estimate, and β2 controls the decay rate of the second moment estimation.

ADAM also incorporates bias correction in the moment estimates: (9) (10) with mt and vt being the adjusted first and second moment estimates, respectively [21].

Gated recurrent unit

GRU networks, a type of Recurrent Neural Network (RNN) architecture, use gating mechanisms to control the flow of information. GRUs comprise three main components: the update gate, reset gate, and candidate state. The update gate determines the extent to which the previous hidden state should be maintained and how much new information from the candidate state should be included in the current hidden state. The reset gate decides the amount of the previous hidden state to be forgotten when computing the new candidate state. The candidate state represents new information derived from the input and the previous hidden state.

The equations for a GRU network’s update gate, reset gate, and candidate state are outlined in [26, 43]. The update gate equation is: where σ is the sigmoid activation function, Wz the weight matrix for the update gate, bz the bias vector, and [ht−1, xt] the concatenation of the previous hidden state and the current input.

The reset gate equation is: where σ is the sigmoid activation function, Wr the reset gate’s weight matrix, br its bias vector, and [ht−1, xt] the combination of the previous hidden state and the current input.

The candidate state equation is: where ⊙ denotes element-wise multiplication, Wh the weight matrix for the candidate state, bh its bias vector, and [rtht−1, xt] the amalgamation of the reset gate’s product with the previous hidden state and the current input.

GRU networks, with their selective information updating mechanism, offer enhanced efficiency and effectiveness compared to traditional RNNs.

Long short-term memory

LSTM networks, another variant of RNNs, are adept at learning long-term dependencies by selectively retaining or forgetting information over time through gating mechanisms. An LSTM network consists of three types of gates: the forget gate, input gate, and output gate.

The forget gate determines which information from the previous cell state to retain or discard for the current time step. It generates a vector of values between 0 and 1 for each number in the previous cell state and the current input. A value of 1 implies retention, while 0 indicates discarding. The forget gate equation is given by [33]: where ft is the forget gate’s output at time t, σ the sigmoid activation function, Wf the forget gate’s weight matrix, ht−1 the previous hidden state, bf the bias term, and [⋅] signifies concatenation.

The input gate decides which information from the previous cell state and current input to add to the current cell state. It too generates a vector of values between 0 and 1. Values of 1 indicate addition, while 0 suggests ignoring. The input gate equation is also provided by [33]: where it is the input gate’s output at time t, σ the sigmoid activation function, Wi the weight matrix for the input gate, ht−1 the previous hidden state, bi the bias term for the input gate, and [⋅] denotes concatenation.

The output gate determines which information from the current cell state should be output as the network’s final output. It produces a vector of values, ranging from 0 to 1, for each cell state value. The final network output for the current time step is formed by multiplying these values by the current cell state. The equation for the output gate is provided by [33]: where ot is the output gate’s output at time t, σ the sigmoid activation function, Wo the weight matrix for the output gate, ht−1 the previous hidden state, bo the bias term for the output gate, and [⋅] indicates concatenation.

Control parameters for each model

The performance of neural network models such as ANN, LSTM, and GRU networks depends on several tunable hyperparameters. These parameters are crucial for the learning process and are optimized during training.

ANN model hyperparameters

  • Weights and Biases: Weights (wij and wkj) are the core parameters adjusted during training. They determine the strength of connections between neurons in successive layers.
  • Number of Neurons in Each Layer: The size (n and m) of each layer, especially hidden layers, influences the network’s capacity to learn complex patterns.
  • Learning Algorithm: Back Propagation is used for adjusting weights, typically coupled with optimization techniques like the Levenberg-Marquardt (LM) optimizer.
  • Activation Function: The sigmoid function is used for neuron activation, transforming the weighted sum into an output.
  • Cost Function: E(w), the mean squared error between the predicted and actual outputs, is minimized during training.
  • Performance Metrics: RMSE and correlation coefficients are used for evaluating model performance.

LSTM model hyperparameters

  • Forget Gate Weights (Wf): Controls the amount of previous cell state to retain.
  • Input Gate Weights (Wi): Determines what new information is added to the cell state.
  • Output Gate Weights (Wo): Decides what information to output from the cell state.
  • Bias terms (bf, bi, bo): Offset values added to gate computations.
  • Activation Functions: Typically sigmoid (σ) for gates and tanh for cell state updates.

GRU model hyperparameters

  • Update Gate Weights (Wz): Balances the previous state and new candidate state contributions.
  • Reset Gate Weights (Wr): Determines how much past information to forget.
  • Candidate State Weights (Wh): Computes the potential new information to be added to the state.
  • Bias terms (bz, br, bh): Offset values for each gate and candidate state computation.
  • Activation Functions: Sigmoid (σ) for update and reset gates, and tanh for candidate state.

These hyperparameters are iteratively adjusted through backpropagation and optimization algorithms to minimize loss functions, thereby improving the predictive performance of the models.

K-fold cross validation

Overfitting is a common issue with ANN models, where the model tends to learn noise in the data rather than the actual signals, leading to poor performance on untested datasets. To mitigate this, K-fold cross-validation is employed as a robust method [44, 45]. In this technique, the data is randomly divided into K groups. The model undergoes training on (K-1) folds and is then evaluated on the remaining fold in each iteration, with RMSE serving as the performance metric. The learning process is monitored by plotting the number of epochs against the average RMSE on the validation folds. Training concludes when there is no significant reduction in RMSE with an increase in epochs [46].

Once model training is completed, its performance is evaluated against a separate test dataset. This involves scaling the features after loading the dataset, followed by dividing it into 10 folds for the 10-fold cross-validation. This process iterates ten times, each time splitting the dataset into training and validation sets, training the model on the former, and assessing it on the latter. The model’s performance is recorded in each iteration. The procedure progresses through each of the 10 folds until all have been evaluated. Finally, the average performance across all 10 folds is calculated and presented. This process terminates upon completion.

The method for determining the optimal number of hidden neurons in the ANN models is depicted in the flowchart in the below subsection (Flowchart of the 10-fold Cross-Validation Proces). As part of this approach, a total of 12 ANN models with varying numbers of hidden layers were developed. Overfitting occurs when a model learns from the noise in the data rather than the actual underlying patterns, leading to poor performance on unseen datasets. To mitigate this, K-fold cross-validation is employed. The flowchart in (Fig 3) illustrates this process in a concise manner. The flowchart, depicted in (Fig 4), presents a detailed view of the neural network model training and evaluation process utilizing 10-fold cross-validation. The process begins with ‘Start’ and is followed by the ‘Load dataset’ step, where the initial dataset is loaded for analysis. Following this, a ‘Preprocess’ stage involves scaling the features to ensure they are normalized for optimal model performance.

Fig 3. Flowchart of the 10-fold cross-validation process.

Fig 4. Flowchart depicting the 10-fold cross-validation process used in neural network model training and evaluation.

Flowchart of the 10-fold cross-validation process

Neural network modelling process

This study encompassed the training and testing phases in the neural network modeling procedure. To enhance prediction accuracy and expedite model convergence, it was imperative for the data to be normalized within a specific range. The min-max normalization strategy was employed to ensure that both input and target values resided within the [0, 1] range, which is optimal for the activation function’s performance [47, 48].

During the training phase, adjustments were made to the model’s synaptic weights to align with the optimal number of neurons in the hidden layer. Additionally, the training dataset was subdivided into “K” subsets using the K-fold cross-validation method. This approach facilitated the determination of the appropriate number of iterations, or “epochs,” required before concluding the model’s training.

Following the training, the model’s accuracy and predictive capacity were evaluated using a testing dataset. This phase enabled the neural network model to learn from the data and predict future instances of MPXV in the selected countries.

Evaluating the performance of the neural network models

The training process repeatedly conditions the neural network models to understand the relationship between input and output. LM learning method (refer to Eqs (4) and (5)) was employed during this phase. The model’s performance was evaluated using the Root Mean Squared Error (RMSE) and the Coefficient of Determination (R2). RMSE is the square root of the average squared differences between actual values and model output, whereas R2 is a measure of how well the model fits the data. A model is considered to fit well when R2 is close to 1.0 and RMSE approaches zero [49, 50]. (11) (12)

Here, n represents the number of values, the predicted values, Yi the actual values, and the mean of all values. Additional metrics such as Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) were also utilized to assess the model’s performance [51]. (13) (14)

In these equations, MAE signifies the Mean Absolute Error, and MAPE the Mean Absolute Percentage Error, providing further insight into the model’s accuracy.

K-fold cross-validation was utilized to mitigate overfitting in our neural network models. Training was concluded when a significant reduction in RMSE was no longer observed with an increase in epochs. This method ensured effective learning without overfitting.

The Levenberg-Marquardt optimization technique was crucial in determining when to stop training. It balanced convergence speed and model accuracy, preventing excessive training iterations and ensuring optimized model performance.

For LSTM and GRU models, training stop criteria included monitoring validation loss. Training was halted if validation loss stopped decreasing or started increasing. Early stopping was implemented, where training ceased after a pre-set number of epochs without improvement in validation loss. This prevented learning noise and ensured better generalization. Other hyperparameters like learning rate and batch size were also considered. Specific thresholds for early stopping based on validation loss changes were crucial for optimizing model training.

Peculiarities of applied methodologies

In our exploration of epidemiological forecasting, particularly in modeling the spread of monkeypox, this study introduces a novel approach through the application of Artificial Neural Networks (ANN), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) models. These methodologies have been meticulously selected based on their demonstrated efficiency in capturing complex nonlinear relationships and temporal dependencies within time-series data, essential attributes for the accurate prediction of disease trends.

The distinctiveness of the methodology lies in the comprehensive adaptation and fine-tuning of these models to cater specifically to the challenges presented by infectious disease data, which is often marked by its variability and unpredictability. By employing a comparative analysis—a strategy less frequented in the existing literature for the countries and time periods under study—the approach facilitates a deeper understanding of each model’s strengths and limitations in forecasting monkeypox outbreaks.

  • Customized Data Preprocessing: The data preprocessing and normalization techniques were specifically tailored to accommodate the unique characteristics of epidemiological data, ensuring that the models are fed input that accurately reflects the dynamics of disease spread. This step is crucial in epidemiological modeling, where the quality of data directly impacts the accuracy of predictions.
  • Model Calibration and Validation: The methodological framework includes meticulous calibration of model hyperparameters, such as the number of neurons in hidden layers and learning rates, through an iterative process. This ensures the models are finely tuned to capture intricate patterns within the data. Furthermore, the use of K-fold cross-validation as a robust validation technique helps mitigate the risk of overfitting, a common challenge when dealing with time-series data in machine learning models.
  • Advanced Optimization Techniques: The adoption of advanced optimization techniques, such as the LM algorithm for ANN and ADAM for LSTM and GRU models, underlines the uniqueness of the approach. These techniques enhance the learning process, allowing for faster convergence and improved model performance by effectively navigating the complex landscape of the cost function.
  • Evaluation Metrics: The selection of comprehensive performance metrics, including RMSE, MAE, and R-squared, further ensuring the accuracy of the methodology. These metrics provide a multifaceted view of model performance, from prediction accuracy to the fit of the model to the data, ensuring a thorough evaluation of each model’s ability to accurately forecast disease trends.

Results and discussions

In this section, we delve deeper into the results and provide a more detailed discussion of the predictive performance of three neural network models: ANN, LSTM, and GRU. These models were trained using data from four countries: the USA, Canada, Spain, and Portugal. The period for training data was from June 3 to December 31, 2022, with the evaluation conducted on test data from January 1 to February 7, 2023. The outcomes of this study are illustrated in (Figs 57).

Fig 5. Iteration-dependent evolution of the ANN model’s training performance for MPXV, evaluated using the mean squared error (MSE) metric.

Fig 6. Iteration-dependent evolution of the LSTM model’s training performance for MPXV, evaluated using the mean squared error (MSE) metric.

Fig 7. Iteration-dependent evolution of the GRU model’s training performance for MPXV, evaluated using the mean squared error (MSE) metric.

Initially, perceptron ANN models with one and two hidden layers were developed. It was observed that one or two hidden layers sufficed for training the ANN for complex nonlinear problems [18, 19]. This observation aligns with prior studies, including one forecasting dengue fever epidemics in San Juan, Puerto Rico, and the Northwest Coast of Yucatan, Mexico [19].

For network training, the LM algorithm was employed, recognized for its adaptability and efficiency. The LM method, which circumvents the computation of the Hessian matrix, is faster than traditional backpropagation methods. This technique has been successfully applied in other studies, including one that used a genetic algorithm to optimize the parameters of a COVID-19 SEIR model for US states [20].

In (Fig 5), the training performance of the ANN model for MPXV over iterations, as measured by MSE. Each line represents one of the four countries, with the MSE values plotted against the number of iterations. The training process of the neural network models is characterized by several distinct phases, as evidenced by the MSE trends for each country. Initially, there is a noticeable spike in MSE for Portugal, indicative of the model’s rapid learning and calibration to correct early inaccuracies. As the iterations progress, the MSE for all countries demonstrates convergence towards lower values, suggesting an improvement in the model’s predictive accuracy on the training dataset. Despite this overall trend, the MSE experiences fluctuations, potentially reflecting the model’s adjustments to diverse patterns within the data. Notably, the MSE lines for Portugal, Spain, Canada, and the United States exhibit comparative stability, with Portugal’s model showing consistently lower MSE values, hinting at a better performance for Portugal data relative to the other countries.

The (Fig 6) shows the LSTM model’s training performance for MPXV, with MSE used as the evaluation metric. Similar to Fig 5, the convergence of MSE values can be seen. The LSTM model for Portugal demonstrates a unique trend with a slight increase in MSE at the later iterations. The GRU model’s training progression for MPXV is captured in Fig 7, with MSE again serving as the performance metric. All countries show a rapid decrease in MSE initially, followed by a plateau. Notably, the GRU model for Portugal shows the most consistency in MSE values across iterations. Despite these fluctuations, a convergence towards a stable MSE range is observed for all countries, indicative of effective learning. In (Figs 57), the training performance of the ANN, LSTM and GRU models for MPXV over iterations is showcased, as measured by MSE. The detailed dynamics of this training process, including the specific learning curves for the ANN model across the four studied countries, are further elaborated in (Figs 810), highlighting the reduction in loss over epochs.

Fig 8. Learning curves for ANN models across four different countries: Canada, Portugal, Spain, and the United States.

The training process is represented by the blue line and the validation process by the red line, with the reduction in loss over epochs indicating effective learning.

Fig 9. Learning curves for LSTM models in Canada, Portugal, Spain, and the United States.

Each subplot shows the training loss (blue line) decreasing over epochs, indicative of the model’s learning capacity, while the validation loss (red line) presents fluctuations, reflecting the model’s generalization to new data. Notable is the slight convergence between the two losses, suggesting a balance between learning and model complexity.

Fig 10. Learning curves for GRU models across Canada, Portugal, Spain, and the United States, displaying the evolution of model training and validation losses.

The blue line indicates the training loss, which decreases with epochs, signifying learning, while the red line denotes the validation loss, showing fluctuations that point to the challenges in model generalization. The convergence of training and validation losses is particularly evident for Canada and the United States, suggesting a more effective model fit.

The MSE trend analysis for each country revealed intrinsic differences in data characteristics and model behavior. For instance, the initial spike in MSE for Portugal suggests a phase of rapid learning, where the model aggressively adjusts its parameters to fit the complex data patterns. This phase is critical as it indicates the model’s sensitivity to the initial conditions and learning rate.

Subsequent fluctuations in MSE during the training iterations are indicative of the model’s continual adaptation process. These fluctuations may arise from various factors, such as the inherent noise in the data or the introduction of new patterns that the model attempts to learn. The stability observed in later iterations across all countries suggests that the models reach a point of equilibrium where learning is balanced with the complexity of the data. Moreover, the nuanced differences in MSE trends between the ANN, LSTM, and GRU models point to the distinct ways these architectures process temporal data.

To determine the optimal number of hidden neurons, the standard approach outlined in the above subsection (K-fold Cross Validation) was followed. A total of 12 ANN models with varying numbers of hidden layers were constructed, as detailed in Tables 120. The best model for each scenario was selected based on its evaluation using R2, MAPE, and RMSE. Lower values for RMSE and MAPE and higher values for R2 were indicative of better model performance.

Table 1. Identification of the most suitable ANN configuration with a single hidden layer for the Canada dataset.

Table 2. Determination of the best ANN configuration with two hidden layers for the Canada dataset.

Table 3. Identification of the most suitable LSTM configuration with a single hidden layer for the Canada dataset.

Table 4. Determination of the best LSTM configuration with two hidden layers for the Canada dataset.

Table 5. Identification of the most suitable GRU configuration with a single hidden layer for the Canada dataset.

Table 6. Determination of the best GRU configuration with two hidden layers for the Canada dataset.

Table 7. Identification of the most suitable ANN configuration with a single hidden layer for the Portugal dataset.

Table 8. Determination of the best ANN configuration with two hidden layers for the Portugal dataset.

Table 9. Identification of the most suitable LSTM configuration with a single hidden layer for the Portugal dataset.

Table 10. Determination of the best LSTM configuration with two hidden layers for the Portugal dataset.

Table 11. Identification of the most suitable GRU configuration with a single hidden layer for the Portugal dataset.

Table 12. Determination of the best GRU configuration with two hidden layers for the Portugal dataset.

Table 13. Identification of the most suitable ANN configuration with a single hidden layer for the Spain dataset.

Table 14. Determination of the best ANN configuration with two hidden layers for the Spain dataset.

Table 15. Identification of the most suitable LSTM configuration with a single hidden layer for the Spain dataset.

Table 16. Determination of the best LSTM configuration with two hidden layers for the Spain dataset.

Table 17. Identification of the most suitable GRU configuration with a single hidden layer for the Spain dataset.

Table 18. Determination of the best GRU configuration with two hidden layers for the Spain dataset.

Table 19. Identification of the most suitable ANN configuration with a single hidden layer for the USA dataset.

Table 20. Determination of the best ANN configuration with two hidden layers for the USA dataset.

Tables 124 present the performance metrics of neural network models trained on data from these countries. Each table contains 11 columns representing specific information:

Sl No: Serial number or index of the row in the table.

Neurons: The count of neurons in the neural network’s hidden layer.

RMSE (Train): The model’s RMSE on the training dataset, multiplied by 1000 for scale.

R2 (Train): Coefficient of determination for the model on the training dataset, expressed as a percentage.

MAPE (Train): Model’s MAPE on the training dataset, expressed as a percentage.

RMSE (Validation): RMSE of the model on the validation set, scaled by 1000.

R2 (Validation): Coefficient of determination for the model on the validation dataset, expressed as a percentage.

MAPE (Validation): Model’s MAPE on the validation dataset, expressed as a percentage.

RMSE (Test): RMSE of the model on the test set, multiplied by 1000 for scale.

R2 (Test): Coefficient of determination for the model on the test dataset, expressed as a percentage.

MAPE (Test): Model’s MAPE on the test dataset, expressed as a percentage.

Table 21. Identification of the most suitable LSTM configuration with a single hidden layer for the USA dataset.

Table 22. Determination of the best LSTM configuration with two hidden layers for the USA dataset.

Table 23. Identification of the most suitable GRU configuration with a single hidden layer for the USA dataset.

Table 24. Determination of the best GRU configuration with two hidden layers for the USA dataset.

RMSE, MAPE, and R2 are key metrics for evaluating regression model performance. The tables for each country’s dataset cover ANN, LSTM, and GRU models with single and two hidden layers, showcasing the impact of neurons and layers on predictive accuracy and generalization. This comparative analysis aids in selecting the optimal neural network configuration for each dataset.

Each Tables 124 is dedicated to a specific type of neural network model (ANN, LSTM, GRU) and considers variations in the number of hidden layers and neurons. The performance of each model configuration is evaluated based on several metrics: RMSE, R2, and MAPE. These metrics are calculated for training, validation, and test datasets. For each country’s dataset, there are tables corresponding to ANN models with single and two hidden layers, LSTM models with single and two hidden layers, and GRU models with single and two hidden layers. The tables are designed to help in selecting the optimal model configuration for each type of neural network, based on the performance metrics across different datasets. This detailed comparison aids in understanding how the number of neurons and hidden layers in a model can impact its predictive accuracy and generalization capabilities for specific datasets. The data in the tables has been adjusted to display certain values as percentages where relevant. This adjustment is especially useful for metrics like R2 and MAPE, along with other ratio-based figures. Furthermore, to avoid an abundance of decimal places and to improve clarity, the RMSE values have been scaled up by a factor of 1000.

Fig 10 presents the learning curves for GRU models across four different countries: Canada, Portugal, Spain, and the United States. Each model’s training process, represented by the blue line, shows a reduction in loss over epochs, indicating effective learning. Notably, the Canadian and United States models demonstrate a pronounced decrease in training loss, whereas the validation loss for Portugal remains notably stable, suggesting consistent model performance. The Spanish model’s validation loss exhibits more variability, potentially highlighting challenges in generalization. No apparent signs of overfitting are observed within the range of epochs presented, as the validation losses do not trend upwards. Overall, the models demonstrate their potential to fit well to the training data while maintaining a reasonable generalization to the validation data.

Fig 9 presents the learning curves for LSTM models across four different countries: Canada, Portugal, Spain, and the United States. The training loss, depicted by the blue line, indicates a trend of learning and improvement across epochs for all countries. However, the validation loss, depicted by the red line, exhibits fluctuations, which are more pronounced for Portugal and Spain, suggesting challenges in model generalization and potential overfitting. For Canada and the United States, the gap between training and validation loss is relatively smaller, indicating better generalization performance.

Fig 8 illustrates the learning curves for ANN models across four distinct countries: Canada, Portugal, Spain, and the United States. The blue lines, representing the training loss, generally exhibit a downward trend, suggesting a steady improvement in the model’s ability to fit the training data over the epochs. The red lines, indicating the validation loss, fluctuate and do not show a clear decreasing trend. However, the four models show a closer convergence between training and validation loss, which could imply a more robust generalization capability.


The evaluation of neural network models for different datasets, as summarized in Table 25 reveals insightful trends and performance benchmarks.

Table 25. Comprehensive performance of best models across datasets.

For the Canada dataset, the ANN model with a single hidden layer and 8 neurons and the ANN model with two hidden layers and 3 neurons show commendable performance, particularly in achieving high R2 percentages and low RMSE values. The LSTM and GRU models, both single and double-layered, also exhibit competitive performance, with the GRU single-layer model having 1 neuron demonstrating particular effectiveness in generalization across the validation and test datasets.

In the context of the Portugal dataset, the ANN single-layer model with 7 neurons stands out, especially in training performance. For the double-layer models, all three types of neural networks with 1 neuron each exhibit impressive R2 percentages, particularly in the validation and test phases, indicating strong predictive accuracy.

The Spain dataset shows a similar pattern where the ANN single-layer model with 8 neurons excels in both training and testing phases. In the two hidden layers scenario, the ANN model with 11 neurons and the LSTM model with 5 neurons are noteworthy for their high R2 values and low RMSE scores, suggesting a robust model performance.

For the USA dataset, the single-layer ANN model with 5 neurons and the double-layer ANN model with 12 neurons show superior performance, particularly in terms of R2 and RMSE metrics. This indicates their effectiveness in capturing the underlying patterns in the dataset with a balance of complexity and generalization ability.

These results underscore the importance of choosing the right architecture and neuron count in neural network models for different datasets, highlighting the effectiveness of certain configurations in optimizing predictive performance.

Forecasting methodology

In our study, we conducted a detailed forecasting analysis for Canada, Portugal, and the USA using different neural network architectures. The goal was to predict the number of MPXV cases one month ahead, based on the actual reported cases. The accuracy of these forecasts was quantified using the MAPE.

For Canada, with 43 actual cases, our models demonstrated varying levels of accuracy. ANN with a single hidden layer predicted 42 cases with a MAPE of 2.3%, showcasing its high precision (Table 26). In comparison, when employing two hidden layers, the ANN model maintained the same MAPE, predicting 42 cases (Table 26).

In Portugal, with 53 actual cases, our ANN models achieved notable accuracy. The single-layer ANN model estimated 54 cases with a MAPE of 1.9%, while the two-layer ANN model achieved perfect accuracy with a MAPE of 0.0%, predicting 53 cases. For a scenario with 51 actual cases, the two-layer ANN model showed a slight increase in MAPE to 2.0%, estimating 52 cases.

The forecasting results for the USA, with 47 actual cases, further highlighted the effectiveness of the ANN models. The single-layer ANN model estimated 50 cases with a MAPE of 6.4%, whereas the two-layer model predicted 48 cases with a reduced MAPE of 2.1%.

Across all countries, the ANN models consistently outperformed LSTM and GRU models in terms of accuracy, as reflected in their lower MAPE values. This suggests that ANN architectures, particularly with two hidden layers, are more adept at capturing the trends and nuances in the data, leading to more accurate forecasts for MPXV cases.

Discussion benefits of the results in the wide perspective of industrial production

The findings of this study have significant implications for the practical application in public health management, particularly in the context of infectious disease outbreaks like Monkeypox. The predictive models developed can be integrated into health surveillance systems, aiding healthcare authorities in early detection and response planning. This proactive approach is crucial for effective disease management, enabling timely interventions such as targeted vaccinations and public health advisories.

Moreover, the methodology and results can be adapted for forecasting other infectious diseases, demonstrating the versatility of the approach. This adaptability is particularly beneficial for regions where healthcare resources are limited, as it allows for strategic allocation of resources based on predicted outbreak patterns. Such data-driven strategies can optimize the use of medical supplies, personnel, and facilities, enhancing the overall efficiency of healthcare systems.

In addition, the study’s approach can be instrumental in guiding policy decisions, such as travel advisories or quarantine measures, by providing accurate forecasts of disease spread. This is especially relevant in the context of global health, where the mobility of populations can significantly impact the dynamics of infectious diseases.

Furthermore, the potential for collaboration with industries involved in healthcare technology cannot be overlooked. The integration of advanced neural network models into health tech solutions can pave the way for more sophisticated disease tracking and prediction tools, contributing to the larger goal of global health security.


This study presented a comprehensive analysis of three different neural network models—ANN, LSTM, and GRU—for predicting the spread of MPXV in the USA, Canada, Spain, and Portugal. Our findings demonstrated that while each model has its strengths, certain models outperformed others in specific scenarios.

For instance, the ANN model exhibited superior performance in terms of lower RMSE and higher R2 values compared to the other models, particularly in predicting short-term trends. Also, LSTM and Gru showed great accuracy in predictions. The ANN model, while more sophisticated than LSTM and GRU, but LSTM and GRU still provided valuable insights.

Quantitatively, the ANN model achieved an average RMSE and an R2 in predicting cases over a 1-month horizon, outperforming the LSTM’s RMSE and R2, and the GRU’s RMSE and R2. These results highlight the potential of utilizing advanced machine learning techniques in epidemiological forecasting.

The study’s methodology, while robust, has certain limitations. The accuracy of the neural network models, including LSTM and GRU, hinges on the quality and completeness of the epidemiological data, which may have gaps or inaccuracies. The complexity of these models can also lead to overfitting, limiting generalizability to new data or scenarios. Moreover, The model’s predictions are based on past data and may not account for future changes in virus behavior, public health policies, or other unforeseen factors.

To address the limitation of machine learning models’ inability to extrapolate beyond the conditions of the study, one solution is to incorporate a diverse and comprehensive dataset that covers a wide range of scenarios. This can help the model learn various patterns and improve its generalizability. Additionally, employing techniques like transfer learning, where a model trained on one task is fine-tuned for another related task, can help in adapting the model to new conditions. Regular updating and retraining of the model with new data as it becomes available can also ensure the model remains relevant and accurate over time. Furthermore, combining machine learning models with domain-specific knowledge and expert insights can enhance the model’s applicability to new conditions.

The methods utilized in this study, specifically ANN, LSTM, and GRU, are not only theoretically robust but also practically applicable in scientific research. Their adaptability to analyze complex data patterns makes them invaluable tools in epidemiological studies, such as forecasting infectious disease spread. These models can handle large-scale data efficiently, identifying underlying trends and making accurate predictions. This capability is crucial for public health officials and researchers in planning interventions and making informed decisions based on predictive analytics.


  1. 1. Gessain A, Nakoune E, Yazdanpanah Y. Monkeypox. New England Journal of Medicine. 2022;387(19):1783–1793. pmid:36286263
  2. 2. Leggiadro RJ. Emergence of Monkeypox—West and Central Africa, 1970–2017. The Pediatric Infectious Disease Journal. 2018;37(7):721.
  3. 3. Formenty P, Muntasir MO, Damon I, Chowdhary V, Opoka ML, Monimart C, et al. Human monkeypox outbreak caused by novel virus belonging to Congo Basin clade, Sudan, 2005. Emerging infectious diseases. 2010;16(10):1539. pmid:20875278
  4. 4. Parker S, Nuara A, Buller RML, Schultz DA. Human monkeypox: an emerging zoonotic disease. Future Medicine. 2007. pmid:17661673
  5. 5. Thornhill JP, Palich R, Ghosn J, Walmsley S, Moschese D, Cortes CP, et al. Human monkeypox virus infection in women and non-binary individuals during the 2022 outbreaks: a global case series. The Lancet. 2022;400(10367):1953–1965. pmid:36403584
  6. 6. Centers for Disease Control and Prevention (CDC and others). Multistate outbreak of monkeypox–Illinois, Indiana, and Wisconsin, 2003. MMWR. Morbidity and mortality weekly report. 2003;52(23):537–540.
  7. 7. Centers for Disease Control and Prevention (CDC and others). Update: multistate outbreak of monkeypox–Illinois, Indiana, Kansas, Missouri, Ohio, and Wisconsin, 2003. MMWR. Morbidity and mortality weekly report. 2003;52(24):561–564.
  8. 8. Moore MJ, Rathish B, Zahra F. Monkeypox. StatPearls [Internet]. StatPearls Publishing. 2022.
  9. 9. Manohar B, Das R. Artificial neural networks for the prediction of monkeypox outbreak. Tropical Medicine and Infectious Disease. 2022;7(12):424. pmid:36548679
  10. 10. Khan MI, Qureshi H, Bae SJ, Awan UA, Saadia Z, Khattak AA. Predicting Monkeypox incidence: Fear is not over!. Journal of Infection. 2023;86(3):256–308. pmid:36577479
  11. 11. Tamang SK, Singh PD, Datta B. Forecasting of Covid-19 cases based on prediction using artificial neural network curve fitting technique. Global Journal of Environmental Science and Management. 2020;6(Special Issue (Covid-19)):53–64.
  12. 12. Ahsan MM, Uddin MR, Farjana M, Sakib AN, Momin KA, Luna SA. Image Data Collection and Implementation of Deep Learning-Based Model in Detecting Monkeypox Disease Using Modified VGG16. arXiv preprint arXiv:2206.01862. 2022.
  13. 13. Saba AI, Elsheikh AH. Forecasting the prevalence of COVID-19 outbreak in Egypt using nonlinear autoregressive artificial neural networks. Process Safety and Environmental Protection. 2020;141:1–8. pmid:32501368
  14. 14. Hamadneh NN, Khan WA, Ashraf W, Atawneh SH, Khan I, Hamadneh BN. Artificial neural networks for prediction of Covid-19 in Saudi Arabia. Computational Materials Science. 2021;66:2787–2796.
  15. 15. Wang L, Wang Z, Qu H, Liu S. Optimal forecast combination based on neural networks for time series forecasting. Applied Soft Computing. 2018;66:1–17.
  16. 16. Ahmad I, M Asad S. Predictions of coronavirus COVID-19 distinct cases in Pakistan through an artificial neural network. Epidemiology & Infection. 2020;148:e222. pmid:32951626
  17. 17. Saritas I. Prediction of breast cancer using artificial neural networks. Journal of Medical Systems. 2012;36:2901–2907. pmid:21837454
  18. 18. Silitonga P, Bustamam A, Muradi H, Mangunwardoyo W, Dewi BE. Comparison of Dengue Predictive Models Developed Using Artificial Neural Network and Discriminant Analysis with Small Dataset. Applied Sciences. 2021;11(2):943.
  19. 19. Laureano-Rosario AE, Duncan AP, Mendez-Lazaro PA, Garcia-Rejon JE, Gomez-Carro S, Farfan-Ale J, et al. Application of Artificial Neural Networks for Dengue Fever Outbreak Predictions in the Northwest Coast of Yucatan, Mexico and San Juan, Puerto Rico. Tropical Medicine and Infectious Disease. 2018;3(1):5. pmid:30274404
  20. 20. Yarsky P. Using a genetic algorithm to fit parameters of a COVID-19 SEIR model for US states. Mathematics and Computers in Simulation. 2021;185:687–695. pmid:33612959
  21. 21. Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.
  22. 22. Levenberg Kenneth. A method for the solution of certain non-linear problems in least squares. Quarterly of Applied Mathematics. 1944;2(2):164–168. Brown University.
  23. 23. Roser M, Ortiz-Ospina E, Ritchie H. Our World in Data. University of Oxford. 2013. Available from:
  24. 24. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986;323(6088):533–536.
  25. 25. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. 2014.
  26. 26. Cho K, Van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259. 2014.
  27. 27. Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation. 1997;9(8):1735–1780. MIT press. pmid:9377276
  28. 28. Geirhos R, Janssen DHJ, Schütt HH, Rauber J, Bethge M, Wichmann FA. Comparing deep neural networks against humans: object recognition when the signal gets weaker. arXiv preprint arXiv:1706.06969. 2017.
  29. 29. Rashidi M, Ali M, Freidoonimehr N, Nazari F. Parametric analysis and optimization of entropy generation in unsteady MHD flow over a stretching rotating disk using artificial neural network and particle swarm optimization algorithm. Energy. 2013;55:497–510.
  30. 30. Asadi S, Shahrabi J, Abbaszadeh P, Tabanmehr S. A new hybrid artificial neural networks for rainfall–runoff process modeling. Neurocomputing. 2013;121:470–480.
  31. 31. Aichouri I, Hani A, Bougherira N, Djabri L, Chaffai H, Lallahem S. River Flow Model Using Artificial Neural Networks. Energy Procedia. 2015;74:1007–1014.
  32. 32. Bishop CM. Neural networks for pattern recognition. Oxford university press. 1995.
  33. 33. Goodfellow I, Bengio Y, Courville A. Deep learning. MIT press. 2016.
  34. 34. McCulloch WS, Pitts W. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics. 1943;5(4):115–133.
  35. 35. Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review. 1958;65(6):386. pmid:13602029
  36. 36. Hebb DO. The organization of behavior; a neuropsychological theory. Wiley. 1949.
  37. 37. Hopfield JJ. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences. 1982;79(8):2554–2558. pmid:6953413
  38. 38. Bui Xuan-Nam and Jaroonpattanapong Pirat and Nguyen Hoang and Tran Quang-Hieu and Long Nguyen Quoc. A novel hybrid model for predicting blast-induced ground vibration based on k-nearest neighbors and particle swarm optimization. Scientific Reports. 2019;9(1):13971. Nature Publishing Group UK London. pmid:31562369
  39. 39. Liu W, Bao C, Zhou Y, Ji H, Wu Y, Shi Y, et al. Forecasting incidence of hand, foot and mouth disease using BP neural networks in Jiangsu province, China. BMC infectious diseases. 2019;19(1):1–9. pmid:31590636
  40. 40. Ridha Hussein Mohammed and Hizam Hashim and Mirjalili Seyedali and Othman Mohammad Lutfi and Ya’acob Mohammad Effendy and Ahmadipour Masoud et al. On the problem formulation for parameter extraction of the photovoltaic model: Novel integration of hybrid evolutionary algorithm and Levenberg Marquardt based on adaptive damping parameter formula. Energy Conversion and Management. 2022;256:115403. Elsevier.
  41. 41. Fletcher R. Practical methods of optimization. John Wiley & Sons. 2013.
  42. 42. Saini LM and Soni MK. Artificial neural network based peak load forecasting using Levenberg–Marquardt and quasi-Newton methods. IEE Proceedings-Generation, Transmission and Distribution. 2002;149(5):578–584. IET.
  43. 43. Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555. 2014.
  44. 44. Salehi K, Lestari RAS. Predicting the performance of a desulfurizing bio-filter using an artificial neural network (ANN) model. Environmental Engineering Research. 2021;26(2):200462.
  45. 45. Moazeni SS, Shaibani MJ, Emamgholipour S. Investigation of Robustness of Hybrid Artificial Neural Network with Artificial Bee Colony and Firefly Algorithm in Predicting COVID-19 New Cases: Case Study of Iran. Stochastic Environmental Research and Risk Assessment. 2022;36(6):2461–2476. pmid:34608374
  46. 46. Seraj A, Mohammadi-Khanaposhtani M, Daneshfar R, Naseri M, Esmaeili M, Baghban A, et al. Cross-validation. In: Handbook of Hydroinformatics. Elsevier. 2023. p. 89–105.
  47. 47. Aliyu AM, Choudhury R, Sohani B, Atanbori J, Ribeiro JXF, Ahmed SKB, et al. An artificial neural network model for the prediction of entrained droplet fraction in annular gas-liquid two-phase flow in vertical pipes. International Journal of Multiphase Flow. 2023;164:104452. Elsevier.
  48. 48. Tabasi S, Soltani PT, Rajabi M, Wood DA, Davoodi S, Ghorbani H, et al. Optimized machine learning models for natural fractures prediction using conventional well logs. Fuel. 2022;326:124952. Elsevier.
  49. 49. Choubineh A, Ghorbani H, Wood DA, Moosavi SR, Khalafi E, Sadatshojaei E. Improved predictions of wellhead choke liquid critical-flow rates: modelling based on hybrid neural network training learning based optimization. Fuel. 2017;207:547–560. Elsevier.
  50. 50. Tušek AJ, Jurina T, Čulo I, Valinger D, Kljusurić JG, Benković M. Application of NIRs coupled with PLS and ANN modelling to predict average droplet size in oil-in-water emulsions prepared with different microfluidic devices. Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy. 2022;270:120860. Elsevier.
  51. 51. Jafarizadeh F, Rajabi M, Tabasi S, Seyedkamali R, Davoodi S, Ghorbani H, et al. Data driven models to predict pore pressure using drilling and petrophysical data. Energy Reports. 2022;8:6551–6562. Elsevier.