The evaluation of COVID-19 prediction precision with a Lyapunov-like exponent

In the field of machine learning, building models and measuring their performance are two equally important tasks. Currently, measures of precision of regression models’ predictions are usually based on the notion of mean error, where by error we mean a deviation of a prediction from an observation. However, these mean based measures of models’ performance have two drawbacks. Firstly, they ignore the length of the prediction, which is crucial when dealing with chaotic systems, where a small deviation at the beginning grows exponentially with time. Secondly, these measures are not suitable in situations where a prediction is made for a specific point in time (e.g. a date), since they average all errors from the start of the prediction to its end. Therefore, the aim of this paper is to propose a new measure of models’ prediction precision, a divergence exponent, based on the notion of the Lyapunov exponent which overcomes the aforementioned drawbacks. The proposed approach enables the measuring and comparison of models’ prediction precision for time series with unequal length and a given target date in the framework of chaotic phenomena. Application of the divergence exponent to the evaluation of models’ accuracy is demonstrated by two examples and then a set of selected predictions of COVID-19 spread from other studies is evaluated to show its potential.


Introduction
Making (successful) predictions certainly belongs among the earliest intellectual feats of modern humans. They had to predict the amount and movement of wild animals, places where to gather fruits, herbs, or fresh water, and so on. Later, predictions of the flooding of the Nile or solar eclipses were performed by early scientists of ancient civilizations, such as Egypt or Greece. The latter civilization gave birth to determinism, a philosophical view that all events in the future could be fully determined if we had knowledge of the current state of all matter and of all laws governing that matter [1].
However, at the end of the 19 th century, the French mathematicians Henri Poincare and Jacques Hadamard discovered the first chaotic systems and that they are highly sensitive to initial conditions. Small differences in initial conditions (due to errors in measurements or rounding errors) in such systems lead to widely diverging outcomes, rendering (precise) longterm predictions impossible in general [2]. Chaotic behavior can be observed in fluid flow, weather and climate, road and Internet traffic, stock markets, population dynamics, or a pandemic. Since absolutely precise predictions (of not-only chaotic systems) are practically impossible, a prediction is always burdened by an error. The smaller this error, the more valuable and helpful the prediction, while bad predictions are not only useless, but can be even harmful [3].
The precision of a regression model prediction is usually evaluated in terms of explained variance (EV), coefficient of determination (R 2 ), mean squared error (MSE), root mean squared error (RMSE), magnitude of relative error (MRE), mean magnitude of relative error (MMRE), and the mean absolute percentage error (MAPE), etc., see e.g. [4,5]. These measures are well established both in the literature and research, however, they also have their limitations. The first limitation emerges in situations when a prediction of a future development has a date of interest (a target date, target time). In this case, the aforementioned mean measures of prediction precision take into account not only observed and predicted values of a given variable on the target date, but also all observed and predicted values of that variable before the target date, which are irrelevant in this context. The second limitation, even more important, is connected to the nature of chaotic systems. The longer the time scale on which such a system is observed, the larger the deviations of two initially infinitesimally close trajectories of this system. However, standard (mean) measures of prediction precision ignore this feature and treat short-term and long-term predictions equally.
Therefore, the aim of this paper is to propose an alternative approach to the evaluation of prediction precision dealing with chaotic systems, where a prediction is related to a given target date, which utilizes the notion of the Lyapunov exponent, see [6,7]. In analogy to the Lyapunov exponent, a newly proposed divergence exponent expresses how much a (numerical) prediction diverges from observed values of a given variable at a given target time, taking into account only the length of the prediction and predicted and observed values at the target time. The larger the divergence exponent, the larger the difference between the prediction and observation (prediction error), and vice versa. Thus, the presented approach avoids the shortcomings mentioned in the previous paragraph.
This new approach is demonstrated in the framework of the COVID-19 pandemic. After its outbreak, many researchers have tried to forecast the future trajectory of the epidemic in terms of the number of infected, hospitalized, recovered, or dead. For the task, various types of prediction models have been used, such as compartmental models including SIR, SEIR, SEIRD and other modifications, see e.g. [8][9][10][11][12], artificial neural network models [13][14][15][16], Gompertz and logistics functions [17][18][19], ARIMA models [13,20], and many other approaches, see e. g. [21][22][23][24]. A survey on how deep learning and machine learning is used for COVID-19 forecasts can be found e.g. in [25,26]. General discussion on the state-of-the-art and open challenges in machine learning can be found e.g. in [27].
Since a pandemic spread is, to a large extent, a chaotic phenomenon, and there are many forecasts published in the literature that can be evaluated and compared, the evaluation of the COVID-19 spread predictions with the divergence exponent is demonstrated in the numerical part of the paper.
The data sources for this study included the Worldometers website [28], University of John Hopkins resource center [29] and CDC (Centers for Disease Control and Prevention) database [30].
The paper is organized as follows: in Section 2 Lyapunov and divergence exponents are introduced and their application is demonstrated with examples, Section 3 provides a numerical evaluation of selected models' predictions, Section 4 is devoted to a discussion and the Conclusions section closes the article.

Lyapunov and divergence exponents
The Lyapunov exponent quantitatively characterizes the rate of separation of (formerly) infinitesimally close trajectories in dynamical systems. Formally, the Lyapunov exponent is defined as follows [6,7]:

Definition 1
Let δZ(t) be a separation vector of two trajectories in a given phase space at the time t and let δZ(0) be an initial separation vector of the two trajectories at the time t = 0. Then, the Lyapunov exponent λ is defined via the following equation: Since physical systems are usually multi-dimensional, Lyapunov exponents from each dimension of a phase space form a spectrum, and the predictability of a system is determined by the Maximal Lyapunov exponent (MLE). The MLE is defined as follows [6,7]: The higher the Lyapunov exponent, the more chaotic the given dynamical system. Lyapunov exponents for classic physical systems are provided e.g. in [6,7,31,32].
A prediction of a pandemic spread and the real data about the spread can be analogically considered using two trajectories in a one-dimensional phase space (a < + space) which start at the time t = 0, when both trajectories are identical, and then they inevitably diverge at some time t > 0.
Drawing upon the analogy with the Lyapunov exponent in (1-2), we introduce a "divergence exponent" λ.

Definition 2
Let P(t) be a prediction of a pandemic spread (given as the number of infections, deaths, hospitalized, etc.) in the time t > 0, and let N(t) be a true (observed) value of a pandemic spread in the time t > 0. Then, the divergence exponent λ is given as which, after rearrangement, gives The larger the λ, the worse the prediction. In the case of an absolutely precise prediction, λ = 0.
For the sake of comparisons with the λ, one of the most common measures of prediction precision is the mean relative error (MRE): PðiÞ À NðiÞ NðiÞ where P(i) denotes the predicted value and N(i) the observed value at the point i.
The following (extremely oversimplified) example shows how the λ is calculated and one of its virtues.
Consider the pandemic spread from Table 1. At the beginning (t = 0) the variable N(t), which denotes the observed number of new daily infection cases, is set to 1 unit (for example 1,000 people). Two prediction models, P 1 , P 2 were constructed to predict future values of N(t), for five days ahead. While P 1 predicts exponential growth by the factor of 2, P 2 predicts that the spread will exponentially decrease by the factor of 2. After the predictions are made, reality shows that the spread is constant in time for t 2 {1, 2, 3, 4, 5}. Now, let's evaluate the precision of the model Values of the prediction P 1 (t) grow exponentially by the factor δ = 2. The factor δ can be easily obtained from the λ, see relation (3), as follows: Therefore, from the divergence exponent λ the coefficient δ can be reconstructed, and vice versa. The coefficient δ is a base of a corresponding power series expressing the divergence of a prediction. Now, consider the prediction P 2 (t). This prediction is arguably equally imprecise as the prediction P(t), as it provides values halving with time, while P(t) provided doubles. As can be checked by formula (4), the divergence exponent for P 2 (t) is 0.693 again. Therefore, over-estimating and under-estimating predictions are treated equally.
However, when we calculate the MRE of both predictions, we obtain: MRE(P 1 ) = 6.5, while MRE(P 2 ) = 0.766, which suggests that the prediction P 2 is much better.
Another virtue of the evaluation of prediction precision with a divergence exponent is that it enables a comparison of predictions with different time frames, which is demonstrated in the following example.
Consider a fictional pandemic spread from Table 2. Again, in the time t = 0, N(0) = 1, the prediction model P 3 is built and gives predictions for t from 1 to 8 days. We evaluate λ and MRE: þ ::: According to the MRE value, the prediction model P 3 is much worse than P 1 (see Example 1), since its MRE value is much higher. However, an attentive reader may have noticed that the model P 3 is exactly the same as the model P 1 , but provides a prediction for three additional days (on the contrary, the divergence exponent λ provides the same value for P 1 and P 3 ).
The root of the problem with different values of MRE for the predictions P 1 and P 3 , which are in fact identical, rests in the fact that MRE does not take into account the length of a prediction, and treats all predicted values equally (in the form of the sum in (5)). However, the length of a prediction is crucial in forecasting real chaotic phenomena, since prediction and observation naturally diverge more and more with time, and the slightest change in the initial conditions might lead to an enormous change in the future (Butterfly effect). A weather forecast one hour ahead is easy, a forecast for three days ahead is difficult, a forecast for a week ahead is extremely difficult, and a forecast for one month ahead is impossible due to the chaotic behavior of the Earth's atmosphere. Therefore, since MRE and similar measures of prediction accuracy do not take into account the length of a prediction, they are not suitable for the evaluation of chaotic systems, including a pandemic spread.

A comparison of selected COVID-19 predictions
In this section, selected predictions about the COVID-19 pandemic are evaluated and compared via the divergence exponent λ (and the relative error RE) introduced in the previous section. There have been hundreds of predictions of the COVID-19 spread published in the literature so far, hence for the evaluation and comparison of predictions only one variable was selected, namely the total number of infected people (or total cases, abbr. TC), and selected models with corresponding studies are listed in Table 3. The selection of these studies was based on two merits: first, only real predictions into the future with the clearly stated dates D 0 and D(t) (see below) were included, and, secondly, the diversity of prediction models was preferred.
The data in Table 3 include the model's number, name of the lead author, model's specification, forecasted country, date when the prediction was made (D 0 ), target date of the prediction (D t ), length of the prediction in days (t), predicted number of total cases at a target day (P(t)), observed number of total cases at a target day (N(t)), divergence exponent (λ) and relative error (RE). Fig 1 provides a graphical comparison of results in the form of a scatterplot, where each model is identified by its number, and models are grouped into five categories (distinguished by different colors): artificial neural network models, Gompertz models, compartmental models, Verhulst models and other models. Two models' outputs (models 13 and 24) were identified as outliers, and were removed from Fig 1. As can be seen both from Table 3 and Fig 1, the most successful prediction with respect to λ was provided by models (8) and (28), while the worst prediction came from (24). The most successful model with respect to RE was model (8) followed by model (2), while the worst predictions came from models (13) and (24).
Pearson's correlation coefficient between λ and RE was 0.55, indicating a medium strength of the linear relationship between both variables.

Discussion and limitations of the proposed approach
The evaluation of models' prediction precision by the divergence exponent (4) was illustrated in Section 3. Twenty-eight models' predictions with target dates published recently in the literature were evaluated and compared, see Table 3 and Fig 1. The primary purpose of this evaluation was to show the application and potential of the divergence exponent, not to draw some

PLOS ONE
The evaluation of COVID-19 prediction precision general conclusions about models' performances in predicting the COVID-19 spread. This would require significantly more data. Since the pandemic is not over, there are undoubtedly many forecasting studies yet to be published, hence a comprehensive study on models' performance with respect to the COVID-19 spread is conceivable in the near future. Though the evaluation of models' prediction precision with the divergence exponent can be applied in many other scientific fields where chaotic phenomena emerge, it has its limitations. It should be used only under specific circumstances, namely when a (numerical) characteristic of a chaotic system is predicted over a given time-scale and a prediction at a target time is all that matters. There are many situations where these circumstances are not satisfied, hence the use of the divergent exponent would not be appropriate. Consider, for example, daily car sales to be predicted by a car dealer for the next month. Suppose that the car dealer sells from zero to three cars per day, with two cars being the average daily sale. In this case, all days of the next month matter, and it is unrealistic to assume that sales at the end of the next month may reach hundreds or thousands, thus diverging substantially from the average.
In addition, standard measures of prediction precision (or rather prediction error), such as MAPE, have a nice interpretation in the form of a ratio, or a percentage. If, for example, MAPE = 8%, it means that a prediction deviates from an observation by 8% and an expert can conclude that the prediction was successful, since, according to the rule of thumb by Lewis [34], a prediction with MAPE under 10% is considered highly accurate. On the other hand, a prediction with MAPE over 50% is considered inaccurate [34]. Currently, there is no similar rule of thumb for the divergence exponent, so knowing its value does not provide a modeller with explicit information about the model's performance. Information acquired from the divergence exponent provides, however, a way for a relative comparison of different models' performances.

Conclusions
In this paper, a new measure of prediction precision for regression models and time series, a divergence exponent, was introduced. This new measure has two main advantages. Firstly, it takes into account the time-length of a prediction, since the time-scale of a prediction is crucial in the so-called chaotic systems. Secondly, it evaluates the model's prediction performance only with respect to the end time of the prediction (a target time, or a target date), and the final deviation of the prediction from the observation.
Models' performance evaluation with the divergence exponent was illustrated on predictions of the COVID-19 spread published recently in the literature. Altogether, twenty-eight different models were compared. Verhulst and Gompertz models performed among the best, but no clear pattern revealing the types of models that performed best or worst was found.
The future research can focus on a comparison of different kinds of machine learning models in different environments where chaotic systems prevail, including various fields, such as epidemiology, engineering, medicine, or physics.