Interpretable, non-mechanistic forecasting using empirical dynamic modeling and interactive visualization

Lee Mason; Amy Berrington de Gonzalez; Montserrat Garcia-Closas; Stephen J. Chanock; Blànaid Hicks; Jonas S. Almeida

doi:10.1371/journal.pone.0277149

Abstract

Forecasting methods are notoriously difficult to interpret, particularly when the relationship between the data and the resulting forecasts is not obvious. Interpretability is an important property of a forecasting method because it allows the user to complement the forecasts with their own knowledge, a process which leads to more applicable results. In general, mechanistic methods are more interpretable than non-mechanistic methods, but they require explicit knowledge of the underlying dynamics. In this paper, we introduce EpiForecast, a tool which performs interpretable, non-mechanistic forecasts using interactive visualization and a simple, data-focused forecasting technique based on empirical dynamic modelling. EpiForecast’s primary feature is a four-plot interactive dashboard which displays a variety of information to help the user understand how the forecasts are generated. In addition to point forecasts, the tool produces distributional forecasts using a kernel density estimation method–these are visualized using color gradients to produce a quick, intuitive visual summary of the estimated future. To ensure the work is FAIR and privacy is ensured, we have released the tool as an entirely in-browser web-application.

Citation: Mason L, Berrington de Gonzalez A, Garcia-Closas M, Chanock SJ, Hicks B, Almeida JS (2023) Interpretable, non-mechanistic forecasting using empirical dynamic modeling and interactive visualization. PLoS ONE 18(4): e0277149. https://doi.org/10.1371/journal.pone.0277149

Editor: Zakariya Yahya Algamal, University of Mosul, IRAQ

Received: October 19, 2022; Accepted: March 3, 2023; Published: April 3, 2023

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: The example data used in the tool is a combination of two CDC datasets: ‘Weekly Counts of Deaths by State and Select Causes, 2014-2019’, available at https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr, and ‘Weekly Provisional Counts of Deaths by State and Select Causes, 2020-2022’, available at https://data.cdc.gov/NCHS/Weekly-Provisional-Counts-of-Deaths-by-State-and-S/muzy-jte6. The code is available at https://github.com/episphere/forecast under the MIT license. This includes the specific code for the EpiForecast website, reusable functions to perform EDM on the web, and JavaScript classes for the interactive plots. An Observable Notebook which shows how the code can be imported and used is available at https://observablehq.com/@siliconjazz/edm-interpretable-forecasting.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

It is difficult to predict the future. Many forecasts fail to beat even basic benchmarks such as not projecting any changes at all [1], despite the fact that forecasters have access to more data and methods than ever before. The main issue is that the real-world is very complicated, it involves a lot of elements interacting in complex, non-linear ways [2–4]. Leading forecasting methods are often ineffective at dealing with this complexity, in part because they impose unrealistic or over-simplistic assumptions upon the data [5, 6]. On the other hand, there is limited evidence that complex methods lead to better forecasts than simple methods [7]. One way in which people attempt to make better forecasts is by using explicit, mechanistic models of the system dynamics which they wish to predict [8]. For example, in infectious epidemiology, an SIR (susceptible-infected-recovered) model aims to find the rates of change between three population groups: susceptible, infected, and recovered [9]. However, a mechanistic approach requires the forecaster to choose an appropriate model and correctly parameterize it, which is only possible if they have sufficient knowledge of the dynamics. Insufficient knowledge may lead the forecaster to choose an incorrect or over-simplistic mechanistic model, resulting in poor forecasts [10–13]. Indeed, even when the correct mechanistic model is chosen, it may be outperformed by a well-tuned model-free (non-mechanistic) method [14].

Non-mechanistic methods differ from mechanistic methods in that they do not require an explicit model of the dynamics [15, 16]. As such, they can be similarly applied to data from any domain so long as the data meets the method’s assumptions. Non-mechanistic methods include statistical modeling (e.g. ARIMA, exponential smoothing) [17], empirical dynamic modeling (e.g. simplex) [18], and deep learning (e.g. LSTM) [19, 20]. Non-mechanistic methods are powerful and flexible, but they are generally less interpretable than mechanistic methods because their parameters lack a domain-specific meaning [15]. Interpretability is useful because it makes the relationship between the data and the forecasts more transparent. If a forecasting method is complex and difficult to interpret, it can give the user an inflated confidence in the quality of the forecasts that cannot be easily checked by the representation of the data [7]. When a complex method makes a surprising forecast, it is difficult for the user to assess whether that forecast is a realistic consequence of some hidden pattern in the data, or merely a fit to the noise. Similarly, if a forecasting method is interpretable with interactive visual reference to the data, it is easier for a forecaster to incorporate their knowledge into the forecasts (i.e. to “own the forecasts”), leading to more applicable results and thus more effective applications [21–23].

In this paper, we introduce EpiForcast: a web tool which produces detailed forecasts using an easy-to-understand, non-mechanistic approach. The tool uses interactive visualization to ensure interpretability, which allows the user to assess and improve the forecasts using their own knowledge. To ensure EpiForecast is accessible to a variety of users, we have released it as an open-source, in-browser web application. EpiForecast can work privately with local data or pull data from online data sources, allowing users to engage with live datasets. We have also made the tool available in a reactive notebook [24], complete with additional interactive explanations of the underlying principles.

Results

Overview

We have produced a web-tool which uses interactive visualization and empirical dynamic modeling to make interpretable, non-mechanistic forecasts. The tool is available at https://episphere.github.io/forecast and is also available, with additional explanations, in an Observable Notebook at https://observablehq.com/@siliconjazz/edm-interpretable-forecasting. We encourage readers to use these resources before reading the rest of this section because it will make the following explanations clearer. The tool is built on vanilla, client-side JavaScript using a small number of libraries, most notably D3. The tool consists of four interactive plots with some additional controls (see Fig 1). The basic principles of the tool are based on empirical dynamic modeling (EDM), a simple and intuitive non-mechanistic technique which is used for forecasting, especially of non-linear systems [18, 25]. The EDM process is relatively straightforward. It first searches for the nearest dynamic neighbors, moments in the past where the dynamics most closely resemble the recent dynamics. Then it then looks at what happened after each of these moments and uses that to predict what may happen next (see Methods for a detailed explanation). Each neighbor is weighted by how closely it resembles the most recent dynamics. This information can be used to generate point forecasts. The simplest way to accomplish this, used here, is by calculating weighted mean of all the neighbors’ futures (see simplex method in Methods). However, in this tool we use visualization to deemphasize point predictions in favor of a more detailed representation of the range of potential futures which arise from the different dynamic neighbors.

Download:

Fig 1. Screenshot of the tool, available at https://episphere.github.io/forecast.

The example data depicted here is all-cause mortality from the Centers of Disease Control and Prevention (CDC). The user can also upload their own data or point the tool to data at a URL. a) The chronological time series, which shows the data accompanied by relevant information from the method. The blue dot is the current point, which is configurable by the user using the time-slider below the plot. The red dots are the nearest dynamic neighbors; their color is proportional to the respective neighbor’s weight. The purple shading to the right of the current point is the forecast area, a continuous combination of the neighbors’ futures. Darker areas in the forecast area are higher weighted, meaning they occur in more or higher weighted neighbors’ futures. The purple dashed line is drawn by joining the point forecasts in the forecast horizon; the point forecasts are a weighted average of the neighbors’ futures. b) An offset time-series plot which shows each of the neighbors and their futures, all offset to be on top of each other. This plot is useful for directly comparing the embedded neighbors and their futures. c) A phase-space plot, useful for inspecting dynamic patterns in the data (such as cycles). The phase space plot also clearly shows when the data has entered a new dynamic space, a case in which the EDM method will be less useful. d) Hyperparameters for the methods used by the tool, as well as a couple of additional visual options. e) A bar plot showing the distance of each neighbor to the current embedded point. This is useful for seeing explicitly which neighbors are closest, as well as seeing how that relates to their assigned color across the plots. The dashed line is the mean of all neighbors up to the selected current point in time. If a user hovers over one of the neighbors in any of the plots, then it is highlighted in all other plots. This allows the user to gain an immediate and concurrent sense of several aspects of the neighbor.

https://doi.org/10.1371/journal.pone.0277149.g001

Details of the tool

The core of EpiForecast is an interactive dashboard comprised of four plots: a traditional time-series plot, an offset time-series plot, a phase space plot, and a bar plot (Fig 1). These plots are accompanied by additional input elements which control various aspects of the tool, including the hyperparameters of the forecasting method. The tool shows CDC mortality data by default, but the user can provide their own time-series data using the data configuration element. Using the time slider beneath the traditional time-series plot, the user can select the time-step from which forecasts are made—we will refer to this as the “current” time-step. The main plot of the tool is a time-series plot marked with additional information (Fig 1A): the current time-step is marked with a purple dot, and the time-steps of the nearest neighbors are marked with red dots. The shade of red of each neighbor’s dot is proportional to the weight of that neighbor; higher weighted neighbors are a darker shade. To the immediate right of the current time-step is a shaded forecast area which extends tp timesteps into the future (see Methods for more details about the forecasting method parameters). The purpose of this area is to summarize, at a glance, the offset futures of all nearest neighbors. In brief, the higher opacity parts of this area are closer to the offset futures of (higher-weighted) neighbors. More detail about how this area is generated can be found in Methods. Optionally, the user can choose to display point forecasts, which are represented as a dashed purple line. Point forecasts are another way to summarize the neighbors’ futures, but they are less detailed than the visual representation provided by the shaded forecast area (see Fig 2).

Download:

Fig 2. This snapshot of the tool (using the default mortality data and with time point t = 87 selected) shows an example where point forecasts (the dashed purple line) are an ineffective summary of the forecasting results.

The forecasting method appears to identify two potential futures, one in which the number of deaths will rise to peak and then begin to decrease, and another in which the number of deaths decreases slowly before beginning to rise. In this case, point forecasts will fall between these two futures, meaning information about the two possibilities is lost in favor of a summary which is not representative of the method’s more nuanced results. That nuance can be preserved by representing all neighbors’ futures together, but that could become visually overwhelming with a larger number of neighbors or in more complex cases. We instead propose the shaded forecast area (the purple gradients in the forecast horizon) as a compromise. The shaded forecast area uses gaussian kernels to present a more detailed summary of the neighbors’ futures than the point forecasts, while combining similar trajectories for the sake of visual simplicity.

https://doi.org/10.1371/journal.pone.0277149.g002

The traditional time-series plot is accompanied by an offset time-series plot (Fig 1B) which shows the most recent dynamics (the embedded vector of the current time-point), the nearest neighbors, the neighbors’ futures, the shaded forecast area, and (optionally) point forecasts. It is, essentially, a zoomed in version of the traditional time-series plot, with each neighbor offset such that the neighbors and their futures can be directly compared. This allows the user to get a better idea of how point forecasts and the shaded forecast area are produced from the nearest neighbors and their immediate futures. Another plot in the tool is the phase-space plot (Fig 1C), which can be used to inspect dynamic patterns such as the cyclic patterns present in the mortality data. This plot also helps highlight when the dynamics deviate from previously observed patterns—for example the COVID-19 pandemic substantially changed the dynamics of the mortality data, and consequently a new region of the phase space plot was populated. Finally, the tool contains a bar plot which shows the Euclidean distances between each dynamic neighbor and the recent dynamics (Fig 1E). To contextualize these values, a dashed line shows the mean distance between each previous time-step’s recent dynamics and corresponding nearest neighbors.

Interaction is the critical feature of this tool. The central tenet argued in this report is that engaging multiple direct representations of the forecasts interactively leads to a deeper understanding of the underlying dynamics. To this end, EpiForecast uses the popular linking-and-brushing model in which a user can inspect a visual element in one plot and the highlight is reflected in the other plots [26]. For example, if the user hovers their mouse over a nearest neighbor dot on the traditional time-series plot, then the coloring of the visual elements in each of the other plots is changed to highlight that same neighbor. The same is true for the other plots. The traditional time-series plot also adheres to the “details on demand” principle of interactive visualization wherein additional details about a specific element are provided when the user requests them through interaction [27]. Specifically in EpiForecast, if the user hovers their mouse over a nearest neighbor dot in the traditional time-series plot then its future is displayed next to the current time-point, and a tooltip appears providing more details about the neighbor’s distance and weight. The user can also right-click on a nearest neighbor (in any of the four plots) to disable it, meaning it will no longer by considered when generating the shaded forecast area or calculating the point forecasts. The time-slider beneath the traditional time-series plot (Fig 1A) is another useful way to interact with the tool because it lets the user explore forecasts at previous time-steps. This allows the user to both gain a better understanding of the forecasting method and to assess the suitability of the method for their data.

The plots are accompanied by a panel of additional controls (Fig 1D) which allow the user to control the hyperparameters of the EDM method, the relative width of the gaussian kernels used to generate the shaded forecast area, and some minor visual options. This panel also contains a button to export the results as either a series of point forecasts or as multiple parallel series, one for each neighbor. Instructions for interacting with the tool are available in a video linked on EpiForecast’s web page and we encourage readers to play around with the interactive forecasts themselves to develop an understanding of how the interactivity facilitates better understanding.

It is difficult to compare the accuracy of EpiForecast with other forecasting techniques, because EpiForecast augments the forecasts with non-numeric concepts such as interaction and interpretability. However, it is useful to have a comparison with some basic benchmarks to justify certain design features, such as the shaded forecast area and the offset display of each neighbor’s future. To accomplish this, we have produced a notebook at https://observablehq.com/@siliconjazz/epiforecast-dist-accuracy —the user can provide their own data to this notebook. A copy of the results of the notebook on the default mortality data are available in S1 File.

Example

Using Fig 1, we will illustrate how a detailed and deconstructed representation of forecasts can help an analyst generate applicable forecasts. In this example, it is 2019/09/06 and a public health official wishes to forecast US all-cause mortality rates (the example data for the tool). The forecaster notices that the method is drawing from several different situations in the past: the decreasing side of a peak in 2014, the increasing side of peaks in 2016 and 2017, and the bottom of a peak in 2018. The forecaster knows that unnormalized mortality counts tend to increase over time, and they therefore decide that the 2014 neighbor’s contribution to the forecasts is not relevant, and thus they disable that neighbor. A number of influences in 2017/18 lead to a particularly deadly flu season (hence the larger peak in deaths). If the forecaster believed these influences were present again in 2019, they could disable the remaining neighbors in order to favor the neighbors from 2017. However, the forecaster is not sure, and thus leaves in the neighbors from 2016, 2017, and 2018. Finally, the forecaster exports the forecasts as 7 separate forecast series, one for each enabled neighbor. These exported series could now be analyzed in another environment using a suitable forecasting technique of the forecaster’s choice, or they could form the basis of judgmental forecasts.

Working privately with data

By default, the application shows the all-cause US mortality dataset from the Centers of Disease Control [28, 29], showcased in Fig 1, but users can supply their own data to the tool by uploading a CSV or JSON file, or by pointing the application to a corresponding URL. See the supplementary video or the tool’s web page for more instructions on how to upload your own data to the tool. It is important to note that all visualizations and computations take place within the safety of the user’s browser sandbox. No data or analytics circulate outside the user’s Web browser, fully preserving privacy in order to enable the visualization of both public and sensitive data.

Discussion

We have produced a FAIR web-tool which allows users to produce interpretable forecasts with their own data and then explore them in detail. This tool seeks to illustrate the role of interpretability via simplicity and interactivity, a key but often overlooked element of forecasting. If the forecaster can understand how a forecasting method produces results, then they can better assess the relevance and reliability of those results. As such, the primary goal of EpiForecast is not to produce more accurate point forecasts, but instead to further the “explainability” of forecasts by conveying a more detailed representation of the results—a representation which helps the user understand how the resulting forecasts were generated from the input data [21]. In a sense, this work treats forecasting as an exploratory process rather than an analytic one. Concrete numeric results are replaced by a more nuanced and detailed understanding of dynamic structures in the data and how these structures may be informative when forecasting. One way to achieve explainability is by choosing a method which in some way mirrors intuitive human reasoning. EDM is suitable for this because it based on the reasonable premise that immediate futures of similar pasts may provide insight into the future. EDM executes steps which are analogous to how a human may perform forecasting and, in doing so, produces a lot of information which is easy for a human to interpret. EpiForecast takes this information, visualizes it in multiple different ways, and makes it explorable through interaction.

EpiForecast uses the shaded forecasting area (see Fig 2) as a way to provide a more detailed representation of the information generated by EDM than would be gained from point forecasts. Interactivity is also used for this purpose. Point forecasts and the shaded forecast area are both ways to summarize the information from the EDM method, but with interactivity there is no need to summarize; we can just include all this information and the forecaster can view it on request. This provides the forecaster with a more complete understanding of the results and could thus reduce the bias which may arise from a static summary. The idea of using interaction visualizations to improve analytical insight is the core tenet of the field of visual analytics. There has been recent interest in using the principles of visual analytics to improve forecasting, and in particular to improve the forecaster’s understanding of the forecasting method [21, 30].

Our accompanying tool has, nevertheless, several notable limitations. The EDM method requires a lot of data [31] and is most useful when the embedded space is well populated in the region of the current embedded point. Therefore, the tool will not work as well when the data has a substantial trend because the current embedded point will often fall in an uninhabited region of embedded space and therefore neighbors may not represent meaningfully similar pasts. As such, the tool will also be less effective for time-series where the long-term trend dominates the short-term dynamics, which usually occurs for long term time-series with few points. To some extent, this can be addressed by detrending the data, and the explainability of our tool makes it easier to identify other potentially useful preprocessing steps. Furthermore, the explainability allows a user to quickly see when the method is producing inappropriate forecasts, reducing the chance that they will be misled. EDM has several parameters which must be tuned, all of which can have a substantial effect on the visualization. However, these parameters are easy to interpret and can be quickly configured in the tool. This issue could potentially be addressed further by introducing an automatic parameter selection algorithm. A potential impedance to adoption of our tool is the fact that, paradoxically, complexity is often associated with depth and accuracy which leads users to more trust complex and difficult-to-interpret methods over simple and interpretable ones [7]. Finally, at present this tool works only on univariate data, but the EDM method can be extended to handle multivariate data so the tool could be updated to support this in the future.

We found a single example of another tool using interactive visualization and nearest neighbors for forecasting [32]. However, this tool does not find the neighbors in the embedded dynamic space, but instead in the multivariate space. The data has several variables which are relevant to forecasting energy demand, and the neighbors are single points in time which are closest to the current point in time along these variables. Like our tool, the energy tool uses visualization to highlight how, exactly, the neighbors are like the current time point. But because the tool does not consider dynamics the interactive plots differ substantially from our approach. However, it also suggests that approaching the multivariate dynamic space is a natural evolution of the work reported here.

To ensure that this tool is FAIR [33] and preserves the privacy of the data, we have developed the tool as an in-browser JavaScript application. Only the code needs to be hosted and computation is done on the client side, which means the application is easy and inexpensive to host. The web itself is a natural environment to host FAIR applications due to the ubiquity of the web-browser as both an interactive platform and an execution engine. Finally, due to the in-browser, client-side nature of the tool, the privacy of the user’s data is ensured by default. In order to explore the participative modularity of this tool, its basic elements framed by interactive explanations are also made available in an Observable Notebook at https://observablehq.com/@siliconjazz/edm-interpretable-forecasting, which have important advantages over traditional notebooks [24, 34].

In conclusion, we have produced a tool which allows users to make forecasts using an non-mechanistic forecasting method and then explore those forecasts using interactive visualization. We hope that this tool will allow users to produce forecasts which are more informative, better understood, and more applicable to their respective domain. To ensure adherence to the FAIR principles, we have made this tool available as an open-source, entirely in-browser web application.

Methods

Empirical dynamic modeling (EDM)

Empirical dynamic modeling (EDM) is a time-series method which aims to empirically reconstruct the state-space of a system using delay embedding [18]. EDM can be used for a number of tasks, such as estimating the non-linearity of a system, but for our purposes we are interested in its use for forecasting. EDM has proven especially effective on complex, nonlinear systems, such as ecological models, but it requires a lot of data. EDM is effective at modeling univariate and multivariate data [18], but in this paper we will explore the univariate implementation. The input data is a univariate time-series with evenly spaced timepoints: (1)

The first step is to perform the delay embedding: placing each value in a vector with the E—1 values which precede it in time: (2)

Where E (a hyperparameter) is the embedding dimension (the number of values in each embedded vector). Next, the algorithm finds the n nearest neighbors of the most recent embedded vector , where n is another hyperparameter. That is, the n embedded vectors which have the least euclidean distance to ,. Here we define t_i to be the time-step of the i^th neighbor, and x_i to be the embedded vector of the i^th neighbor . Next, the algorithm retrieves the ‘future vector’ for each neighbor, a vector of values which immediately succeed the neighbor: (3)

Where tp is the forecast horizon, another hyperparameter which indicates how far into the future the algorithm will forecast. The algorithm then calculates a weight for each neighbor: (4) where d_i is the euclidean distance between and the neighbor , and is the mean euclidean distance between and each neighbor. The variable θ is another hyperparameter which specifies the extent to which the weight is affected by the distance, if θ = 0 then all neighbors are weighted equally at 1. This covers the information used by the tool to draw the shaded forecast area, details of how this is achieved are provided in Section 4.3. However, the information can also be used to generate a vector of point forecasts: forecasts with a single, numerical value. There are a few ways to accomplish this, but we have chosen the simplex method due to its simplicity (18). The simplex method calculates a forecast vector by taking a weighted average of the neighbors’ future vectors: (5)

Drawing the shaded forecast area

The shaded forecasting area is generated using a method based on weighted kernel density estimation. At t = T we have n neighbors, each with a future vector and a weight w_i. The following process is repeated for each timestep r in the forecast horizon: T+1 ≤ r ≤ T+tp. At t = T + r, the algorithm takes the corresponding value z_i,T+r from each neighbor’s future vector. At each value, the algorithm centers a gaussian kernel with height proportional to the neighbor’s weight and width controlled by a hyperparameter α. Specifically, at t = T+r for the i^th neighbor we have a kernel function: (6)

Where f is the probability density function of the normal distribution with mean 0 and variance 1, ɑ is the width of the kernel, and s and q are scaling parameters for the height and width respectively. The scaling parameter s = 1/(f(0)⋅n) is set to ensure the sum of all kernel peaks would be equal to 1 if all neighbors had a weight equal to 1. The “width” ɑ is the width of the density function f at probability density c–we set c = 0.0001, an arbitrary low value. The scaling parameter q is set to ensure this width definition. The user can scale the width of the kernel relative to the default width using the hyperparameter kw: ɑ = σ · kw. The overall equation for the shaded area is the sum over each kernel: (7)

Values are calculated for a range which covers all of the kernels, and each value is linearly mapped to a color on a color scale. These colors are used to draw a gradient at t = r on the full and offset time-series plots. For a visual example of this process, see Fig 3.

Download:

Fig 3. An example of how the weighted kernel density estimation works, and how the resulting values are mapped onto colors for the shaded forecasting area.

The example is from t = 82 and tp = 8 on the default mortality data, with the relative kernel width parameter, kernel_width set to 1.0. A gaussian kernel is placed at each neighbor’s value; higher weighted neighbors (represented with a darker color) correspond to taller and thinner kernels. The sum of the kernels forms the forecast distribution (black line) which is then linearly mapped onto an opacity value used to draw the color gradient.

https://doi.org/10.1371/journal.pone.0277149.g003

Supporting information

S1 File. Distributional accuracy benchmarks.

Comparison of the elements of the method with some basic benchmarks.

https://doi.org/10.1371/journal.pone.0277149.s001

(PDF)

References

1. Wright JH. Some observations on forecasting and policy. Int J Forecast. 2019 Jul;35(3):1186–92.
- View Article
- Google Scholar
2. Rutter H, Savona N, Glonti K, Bibby J, Cummins S, Finegood DT, et al. The need for a complex systems model of evidence for public health. The Lancet. 2017 Dec;390(10112):2602–4. pmid:28622953
- View Article
- PubMed/NCBI
- Google Scholar
3. Mohammadi N, Taylor JE. Thinking fast and slow in disaster decision-making with Smart City Digital Twins. Nat Comput Sci. 2021 Dec;1(12):771–3.
- View Article
- Google Scholar
4. Dolfin M, Leonida L, Outada N. Modeling human behavior in economics and social science. Phys Life Rev. 2017 Dec;22–23:1–21. pmid:28711344
- View Article
- PubMed/NCBI
- Google Scholar
5. Liu C, Hoi SC, Zhao P, Sun J. Online arima algorithms for time series prediction. In: Thirtieth AAAI conference on artificial intelligence. 2016.
- View Article
- Google Scholar
6. Lo JH. A study of applying ARIMA and SVM model to software reliability prediction. In: 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering [Internet]. Bali, Indonesia: IEEE; 2011 [cited 2022 Mar 3]. p. 141–4. Available from: http://ieeexplore.ieee.org/document/6007794/
7. Green KC, Armstrong JS. Simple versus complex forecasting: The evidence. J Bus Res. 2015 Aug;68(8):1678–85.
- View Article
- Google Scholar
8. Kandula S, Yamana T, Pei S, Yang W, Morita H, Shaman J. Evaluation of mechanistic and statistical methods in forecasting influenza-like illness. J R Soc Interface. 2018 Jul;15(144):20180174. pmid:30045889
- View Article
- PubMed/NCBI
- Google Scholar
9. Weiss HH. The SIR model and the foundations of public health. Mater Mat. 2013;0001–17.
- View Article
- Google Scholar
10. Moein S, Nickaeen N, Roointan A, Borhani N, Heidary Z, Javanmard SH, et al. Inefficiency of SIR models in forecasting COVID-19 epidemic: a case study of Isfahan. Sci Rep. 2021 Dec;11(1):4725. pmid:33633275
- View Article
- PubMed/NCBI
- Google Scholar
11. Urban MC, Bocedi G, Hendry AP, Mihoub JB, Pe’er G, Singer A, et al. Improving the forecast for biodiversity under climate change. Science. 2016 Sep 9;353(6304):aad8466. pmid:27609898
- View Article
- PubMed/NCBI
- Google Scholar
12. Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ. Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area Region of Sierra Leone, 2014–15 [Internet]. Epidemiology; 2017 Aug [cited 2022 Mar 3]. Available from: http://biorxiv.org/lookup/doi/10.1101/177451
- View Article
- Google Scholar
13. Holmdahl I, Buckee C. Wrong but Useful—What Covid-19 Epidemiologic Models Can and Cannot Tell Us. N Engl J Med. 2020 Jul 23;383(4):303–5. pmid:32412711
- View Article
- PubMed/NCBI
- Google Scholar
14. Perretti CT, Munch SB, Sugihara G. Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data. Proc Natl Acad Sci. 2013 Mar 26;110(13):5253–7. pmid:23440207
- View Article
- PubMed/NCBI
- Google Scholar
15. Lagergren J, Reeder A, Hamilton F, Smith RC, Flores KB. Forecasting and Uncertainty Quantification Using a Hybrid of Mechanistic and Non-mechanistic Models for an Age-Structured Population Model. Bull Math Biol. 2018 Jun;80(6):1578–95. pmid:29611108
- View Article
- PubMed/NCBI
- Google Scholar
16. Sundar S, Schwab P, Tan JZH, Romero-Brufau S, Celi LA, Wangmo D, et al. Forecasting the COVID-19 Pandemic: Lessons learned and future directions [Internet]. Public and Global Health; 2021 Nov [cited 2022 Mar 3]. Available from: http://medrxiv.org/lookup/doi/10.1101/2021.11.06.21266007
- View Article
- Google Scholar
17. Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. OTexts; 2018.
- View Article
- Google Scholar
18. Chang CW, Ushio M, Hsieh C hao. Empirical dynamic modeling for beginners. Ecol Res. 2017 Nov;32(6):785–96.
- View Article
- Google Scholar
19. Siami-Namini S, Tavakoli N, Siami Namin A. A Comparison of ARIMA and LSTM in Forecasting Time Series. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) [Internet]. Orlando, FL: IEEE; 2018 [cited 2022 Mar 3]. p. 1394–401. Available from: https://ieeexplore.ieee.org/document/8614252/
20. Siami-Namini S, Tavakoli N, Namin AS. The Performance of LSTM and BiLSTM in Forecasting Time Series. In: 2019 IEEE International Conference on Big Data (Big Data) [Internet]. Los Angeles, CA, USA: IEEE; 2019 [cited 2022 Mar 3]. p. 3285–92. Available from: https://ieeexplore.ieee.org/document/9005997/
21. Lu Y, Steptoe M, Buchanan V, Cooke N, Maciejewski R. Evaluating Forecasting, Knowledge, and Visual Analytics. In: 2021 IEEE Workshop on TRust and EXpertise in Visual Analytics (TREX) [Internet]. New Orleans, LA, USA: IEEE; 2021 [cited 2022 Mar 3]. p. 32–9. Available from: https://ieeexplore.ieee.org/document/9619888/
22. Arvan M, Fahimnia B, Reisi M, Siemsen E. Integrating human judgement into quantitative forecasting methods: A review. Omega. 2019 Jul;86:237–52.
- View Article
- Google Scholar
23. Alvarado-Valencia J, Barrero LH, Önkal D, Dennerlein JT. Expertise, credibility of system forecasts and integration methods in judgmental demand forecasting. Int J Forecast. 2017 Jan;33(1):298–313.
- View Article
- Google Scholar
24. Perkel JM. Reactive, reproducible, collaborative: computational notebooks evolve. Nature. 2021 May 6;593(7857):156–7. pmid:33941927
- View Article
- PubMed/NCBI
- Google Scholar
25. Ye H, Beamish RJ, Glaser SM, Grant SCH, Chao Hsieh, Richards LJ, et al. Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. Proc Natl Acad Sci. 2015 Mar 31;112(13):E1569–76. pmid:25733874
- View Article
- PubMed/NCBI
- Google Scholar
26. Raidou RG. Visual Analytics for the Representation, Exploration, and Analysis of High-Dimensional, Multi-faceted Medical Data. In: Rea PM, editor. Biomedical Visualisation [Internet]. Cham: Springer International Publishing; 2019 [cited 2023 Feb 23]. p. 137–62. (Advances in Experimental Medicine and Biology; vol. 1138). Available from: http://link.springer.com/10.1007/978-3-030-14227-8_10
27. Cui W. Visual Analytics: A Comprehensive Overview. IEEE Access. 2019;7:81555–73.
- View Article
- Google Scholar
28. Center for Disease Control. Weekly Provisional Counts of Deaths by State and Select Causes, 2020–2022 [Internet]. 2022 [cited 2022 Mar 3]. Available from: https://data.cdc.gov/NCHS/Weekly-Provisional-Counts-of-Deaths-by-State-and-S/muzy-jte6
- View Article
- Google Scholar
29. Center for Disease Control. Weekly Counts of Deaths by State and Select Causes, 2014–2019 [Internet]. Available from: https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr
- View Article
- Google Scholar
30. Nowak S, Bartram L, Haegeli P. Designing for Ambiguity: Visual Analytics in Avalanche Forecasting. In: 2020 IEEE Visualization Conference (VIS) [Internet]. Salt Lake City, UT, USA: IEEE; 2020 [cited 2022 Mar 3]. p. 81–5. Available from: https://ieeexplore.ieee.org/document/9331311/
31. Hsieh C, Anderson C, Sugihara G. Extending Nonlinear Analysis to Short Ecological Time Series. Am Nat. 2008 Jan;171(1):71–80. pmid:18171152
- View Article
- PubMed/NCBI
- Google Scholar
32. Grimaldo AI, Novak J. Combining Machine Learning with Visual Analytics for Explainable Forecasting of Energy Demand in Prosumer Scenarios. Procedia Comput Sci. 2020;175:525–32.
- View Article
- Google Scholar
33. Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Dec;3(1):160018. pmid:26978244
- View Article
- PubMed/NCBI
- Google Scholar
34. Pimentel JF, Murta L, Braganholo V, Freire J. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) [Internet]. Montreal, QC, Canada: IEEE; 2019 [cited 2022 Mar 3]. p. 507–17. Available from: https://ieeexplore.ieee.org/document/8816763/

[ref1] 1. Wright JH. Some observations on forecasting and policy. Int J Forecast. 2019 Jul;35(3):1186–92.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Rutter H, Savona N, Glonti K, Bibby J, Cummins S, Finegood DT, et al. The need for a complex systems model of evidence for public health. The Lancet. 2017 Dec;390(10112):2602–4. pmid:28622953
View Article
PubMed/NCBI
Google Scholar

[5] View Article

[6] PubMed/NCBI

[7] Google Scholar

[ref3] 3. Mohammadi N, Taylor JE. Thinking fast and slow in disaster decision-making with Smart City Digital Twins. Nat Comput Sci. 2021 Dec;1(12):771–3.
View Article
Google Scholar

[9] View Article

[10] Google Scholar

[ref4] 4. Dolfin M, Leonida L, Outada N. Modeling human behavior in economics and social science. Phys Life Rev. 2017 Dec;22–23:1–21. pmid:28711344
View Article
PubMed/NCBI
Google Scholar

[12] View Article

[13] PubMed/NCBI

[14] Google Scholar

[ref5] 5. Liu C, Hoi SC, Zhao P, Sun J. Online arima algorithms for time series prediction. In: Thirtieth AAAI conference on artificial intelligence. 2016.
View Article
Google Scholar

[16] View Article

[17] Google Scholar

[ref6] 6. Lo JH. A study of applying ARIMA and SVM model to software reliability prediction. In: 2011 International Conference on Uncertainty Reasoning and Knowledge Engineering [Internet]. Bali, Indonesia: IEEE; 2011 [cited 2022 Mar 3]. p. 141–4. Available from: http://ieeexplore.ieee.org/document/6007794/

[ref7] 7. Green KC, Armstrong JS. Simple versus complex forecasting: The evidence. J Bus Res. 2015 Aug;68(8):1678–85.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Kandula S, Yamana T, Pei S, Yang W, Morita H, Shaman J. Evaluation of mechanistic and statistical methods in forecasting influenza-like illness. J R Soc Interface. 2018 Jul;15(144):20180174. pmid:30045889
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref9] 9. Weiss HH. The SIR model and the foundations of public health. Mater Mat. 2013;0001–17.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref10] 10. Moein S, Nickaeen N, Roointan A, Borhani N, Heidary Z, Javanmard SH, et al. Inefficiency of SIR models in forecasting COVID-19 epidemic: a case study of Isfahan. Sci Rep. 2021 Dec;11(1):4725. pmid:33633275
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref11] 11. Urban MC, Bocedi G, Hendry AP, Mihoub JB, Pe’er G, Singer A, et al. Improving the forecast for biodiversity under climate change. Science. 2016 Sep 9;353(6304):aad8466. pmid:27609898
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref12] 12. Funk S, Camacho A, Kucharski AJ, Lowe R, Eggo RM, Edmunds WJ. Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area Region of Sierra Leone, 2014–15 [Internet]. Epidemiology; 2017 Aug [cited 2022 Mar 3]. Available from: http://biorxiv.org/lookup/doi/10.1101/177451
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref13] 13. Holmdahl I, Buckee C. Wrong but Useful—What Covid-19 Epidemiologic Models Can and Cannot Tell Us. N Engl J Med. 2020 Jul 23;383(4):303–5. pmid:32412711
View Article
PubMed/NCBI
Google Scholar

[41] View Article

[42] PubMed/NCBI

[43] Google Scholar

[ref14] 14. Perretti CT, Munch SB, Sugihara G. Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data. Proc Natl Acad Sci. 2013 Mar 26;110(13):5253–7. pmid:23440207
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref15] 15. Lagergren J, Reeder A, Hamilton F, Smith RC, Flores KB. Forecasting and Uncertainty Quantification Using a Hybrid of Mechanistic and Non-mechanistic Models for an Age-Structured Population Model. Bull Math Biol. 2018 Jun;80(6):1578–95. pmid:29611108
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref16] 16. Sundar S, Schwab P, Tan JZH, Romero-Brufau S, Celi LA, Wangmo D, et al. Forecasting the COVID-19 Pandemic: Lessons learned and future directions [Internet]. Public and Global Health; 2021 Nov [cited 2022 Mar 3]. Available from: http://medrxiv.org/lookup/doi/10.1101/2021.11.06.21266007
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref17] 17. Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. OTexts; 2018.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref18] 18. Chang CW, Ushio M, Hsieh C hao. Empirical dynamic modeling for beginners. Ecol Res. 2017 Nov;32(6):785–96.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref19] 19. Siami-Namini S, Tavakoli N, Siami Namin A. A Comparison of ARIMA and LSTM in Forecasting Time Series. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) [Internet]. Orlando, FL: IEEE; 2018 [cited 2022 Mar 3]. p. 1394–401. Available from: https://ieeexplore.ieee.org/document/8614252/

[ref20] 20. Siami-Namini S, Tavakoli N, Namin AS. The Performance of LSTM and BiLSTM in Forecasting Time Series. In: 2019 IEEE International Conference on Big Data (Big Data) [Internet]. Los Angeles, CA, USA: IEEE; 2019 [cited 2022 Mar 3]. p. 3285–92. Available from: https://ieeexplore.ieee.org/document/9005997/

[ref21] 21. Lu Y, Steptoe M, Buchanan V, Cooke N, Maciejewski R. Evaluating Forecasting, Knowledge, and Visual Analytics. In: 2021 IEEE Workshop on TRust and EXpertise in Visual Analytics (TREX) [Internet]. New Orleans, LA, USA: IEEE; 2021 [cited 2022 Mar 3]. p. 32–9. Available from: https://ieeexplore.ieee.org/document/9619888/

[ref22] 22. Arvan M, Fahimnia B, Reisi M, Siemsen E. Integrating human judgement into quantitative forecasting methods: A review. Omega. 2019 Jul;86:237–52.
View Article
Google Scholar

[65] View Article

[66] Google Scholar

[ref23] 23. Alvarado-Valencia J, Barrero LH, Önkal D, Dennerlein JT. Expertise, credibility of system forecasts and integration methods in judgmental demand forecasting. Int J Forecast. 2017 Jan;33(1):298–313.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref24] 24. Perkel JM. Reactive, reproducible, collaborative: computational notebooks evolve. Nature. 2021 May 6;593(7857):156–7. pmid:33941927
View Article
PubMed/NCBI
Google Scholar

[71] View Article

[72] PubMed/NCBI

[73] Google Scholar

[ref25] 25. Ye H, Beamish RJ, Glaser SM, Grant SCH, Chao Hsieh, Richards LJ, et al. Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. Proc Natl Acad Sci. 2015 Mar 31;112(13):E1569–76. pmid:25733874
View Article
PubMed/NCBI
Google Scholar

[75] View Article

[76] PubMed/NCBI

[77] Google Scholar

[ref26] 26. Raidou RG. Visual Analytics for the Representation, Exploration, and Analysis of High-Dimensional, Multi-faceted Medical Data. In: Rea PM, editor. Biomedical Visualisation [Internet]. Cham: Springer International Publishing; 2019 [cited 2023 Feb 23]. p. 137–62. (Advances in Experimental Medicine and Biology; vol. 1138). Available from: http://link.springer.com/10.1007/978-3-030-14227-8_10

[ref27] 27. Cui W. Visual Analytics: A Comprehensive Overview. IEEE Access. 2019;7:81555–73.
View Article
Google Scholar

[80] View Article

[81] Google Scholar

[ref28] 28. Center for Disease Control. Weekly Provisional Counts of Deaths by State and Select Causes, 2020–2022 [Internet]. 2022 [cited 2022 Mar 3]. Available from: https://data.cdc.gov/NCHS/Weekly-Provisional-Counts-of-Deaths-by-State-and-S/muzy-jte6
View Article
Google Scholar

[83] View Article

[84] Google Scholar

[ref29] 29. Center for Disease Control. Weekly Counts of Deaths by State and Select Causes, 2014–2019 [Internet]. Available from: https://data.cdc.gov/NCHS/Weekly-Counts-of-Deaths-by-State-and-Select-Causes/3yf8-kanr
View Article
Google Scholar

[86] View Article

[87] Google Scholar

[ref30] 30. Nowak S, Bartram L, Haegeli P. Designing for Ambiguity: Visual Analytics in Avalanche Forecasting. In: 2020 IEEE Visualization Conference (VIS) [Internet]. Salt Lake City, UT, USA: IEEE; 2020 [cited 2022 Mar 3]. p. 81–5. Available from: https://ieeexplore.ieee.org/document/9331311/

[ref31] 31. Hsieh C, Anderson C, Sugihara G. Extending Nonlinear Analysis to Short Ecological Time Series. Am Nat. 2008 Jan;171(1):71–80. pmid:18171152
View Article
PubMed/NCBI
Google Scholar

[90] View Article

[91] PubMed/NCBI

[92] Google Scholar

[ref32] 32. Grimaldo AI, Novak J. Combining Machine Learning with Visual Analytics for Explainable Forecasting of Energy Demand in Prosumer Scenarios. Procedia Comput Sci. 2020;175:525–32.
View Article
Google Scholar

[94] View Article

[95] Google Scholar

[ref33] 33. Wilkinson MD, Dumontier M, Aalbersberg IjJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Dec;3(1):160018. pmid:26978244
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref34] 34. Pimentel JF, Murta L, Braganholo V, Freire J. A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) [Internet]. Montreal, QC, Canada: IEEE; 2019 [cited 2022 Mar 3]. p. 507–17. Available from: https://ieeexplore.ieee.org/document/8816763/

Figures

Abstract

Introduction

Results

Overview

Details of the tool

Example

Working privately with data

Discussion

Methods

Empirical dynamic modeling (EDM)

Drawing the shaded forecast area

Supporting information

S1 File. Distributional accuracy benchmarks.

References