Fig 1.
Machine learning for model-free prognostication.
The time series of normalized new infections in 4 countries were utilized in a machine learning algorithm based on autocorrelation. The black and red color lines represent the true and anticipated numbers of new cases. All values are on the scale of log plus 0.05. The sizes of the training set and the testing set were chosen to be about 80% and 20%. The tables underneath the graphs display three distinct error calculations. MAE = mean absolute error, RMSE = root mean square error, MAPE = mean absolute percentage error. Top left) Prediction of peaks for the data Germany (n = 819) based on the training set (n = 634) and the testing set (n = 185). Top right) prediction of peaks for the data Great Britain (n = 814) based on the training set (n = 592) and the testing set (n = 222). Bottom left) prediction of peaks for the data Australia (n = 819) based on the training set (n = 689) and the testing set (n = 130). Bottom right) Prediction of peaks for the data USA (n = 802) based on the training set (n = 635) and the testing set (n = 167).
Fig 2.
Validation of the machine learning for model-free prognostication.
The figure provides the comparative performance results for the 5 methods. The MCML method had the best performance of all forecasting methods and generated forecasting results with the lowest MAE, MAPE, and RMSE in each case. Top left) Comparisons of time-series forecasting using our method (maximum correlation machine learning (MCML) with the autoregressive integrated moving average (ARIMA), trigonometric seasonality, Box-Cox transformation, ARMA errors, trend, and seasonal components (TBATS), threshold autoregressive (TAR), Prophet forecasting model (PROPHET) in the Germany dataset (n = 819) based on the training set (n = 634) and the testing set (n = 185). Bottom left) Comparisons of our method (MCML) to the conventional time series methods for the Germany dataset. Top Right) Comparisons of time-series forecasting using our method (MCML) with the autoregressive integrated moving average (ARIMA), trigonometric seasonality, Box-Cox transformation, ARMA errors, trend, and seasonal components (TBATS), threshold autoregressive (TAR), Prophet forecasting model (PROPHET) in the Australia dataset (n = 819) based on the training set (n = 689) and the testing set (n = 130). Bottom right) Comparisons of our method (MCML) to canonical time series methods for the Australia dataset.
Fig 3.
Analyses of time series from various sources.
Non-linear time series data were obtained from financial markets and from entomology observations. Top and middle) Stock market fluctuations represent classical non-linear time series. We retrieved daily data for “INDEX_US_DOW JONES GLOBAL_DJIA” (n = 1521) and applied machine learning based on the training set (n = 1400) and the testing set (n = 121), where the black and red color line represent the true and predicted values of new cases, respectively. The y-axis values are unitless. Top) Prediction of peaks for “Open” Middle) Prediction of peaks for “Close”. Bottom) The Nicholson blowfly experiments were conducted in the 1950s with the intent of learning more about a sheep pest, the blowfly [31,32]. The data involve a system that is nonlinear, has time lags and might be described as non-stationary. Prediction for “eggs” in the data “blowfly97I” (n = 361) using machine learning based on the training set (n = 289) and the testing set (n = 72). Table) The table underneath the graphs display three distinct error calculations. MAE = mean absolute error, RMSE = root mean square error, MAPE = mean absolute percentage error.
Fig 4.
Feature-space plots for various sliding window durations and time lags.
For the South Africa data, the three readouts, ac, ami, normalized new cases (nc) are graphed pairwise against each other. A) increasing sliding window durations over 3 time lags. B) increasing time lags over 3 sliding window durations.
Fig 5.
Feature-space plots and local Lyapunov exponents.
A,B) The data for South Africa were analyzed. The panels on the left display the feature-space plots for the raw data. In the middle panels, ac + 1 and ami were scaled up by multiplication with the half-maximal value of new cases during the sliding window durations. In the right panels, the values for the new cases were divided by their maximum during the observation window. A) Feature-space plots. Shown are the pairwise feature-space plots for average mutual information (ami), autocorrelation (ac), and new cases (7-day moving average per million inhabitants) (nc), comprising string lengths of 150 days and time lags of 15 days. B) Lyapunov exponents over time. In the upper panel, the individual Lyapunov characteristic exponents are shown in comparison to the suitably scaled new cases over time, for each pair of readouts (i.e., pairwise among nc = new cases, ac = autocorrelation, ami = average mutual information). The red trace indicates the normalized new cases per day. The bottom panel displays the maximum Lyapunov exponents (MLE) over time (blue line) in comparison to the normalized new cases per day (red line).
Fig 6.
Return plot scaling and local Lyapunov exponents.
A) Lyapunov exponents over time in several countries. For the analysis, ac + 1 and ami were scaled up by multiplication with the half-maximal value of new cases during the sliding window durations. The string lengths are 150 days and time lags are 15 days. MLE = maximum Lyapunov exponent. B) Scaling of feature-space plots and maximum Lyapunov exponents. All univariate wavelet analyses are based on sliding window durations of 150 days and time lags of 15 days. Each row represents a country. The left column displays unscaled data, the middle column shows the scaling up of ac and ami (as in A)), the right column displays the added step of an arbitrary scale adjustment. This step entails (consecutive to the described prior steps) in Egypt ami*2, in South Africa nc*0.75, in Germany ami*1.7, in Sweden ami*2, in Brazil ac*2.2.
Fig 7.
Analysis of complex time series.
Conceptualization of alternative approaches to the study of non-linear data (the graph represents a time series measured in any applicable units). The top row (blue text) represents the model building approach, which makes assumptions, expresses them in sets of differential equations, and then tests their fit to existing measured data. The bottom row (green text) depicts the analysis of noisy complex data, which does not depend on model building but transforms the data for gaining insight. Strategies for cross-validation could improve the ability to make predictions (red text).