Figures
Abstract
Measles is an important infectious disease system both for its burden on public health and as an opportunity for studying nonlinear spatio-temporal disease dynamics. Traditional mechanistic models often struggle to fully capture the complex nonlinear spatio-temporal dynamics inherent in measles outbreaks. In this paper, we first develop a high-dimensional feed-forward neural network model with spatial features (SFNN) to forecast endemic measles outbreaks and systematically compare its predictive power with that of a classical mechanistic model (TSIR). We illustrate the utility of our model using England and Wales measles data from 1944-1965. These data present multiple modeling challenges due to the interplay between metapopulations, seasonal trends, and nonlinear dynamics related to demographic changes. Our results show that while the TSIR model yields similarly performant short-term (1 to 2 biweeks ahead) forecasts for highly populous cities, our neural network model (SFNN) consistently achieves lower root mean squared error (RMSE) across other forecasting windows. Furthermore, we show that our spatial-feature neural network model, without imposing mechanistic assumptions a priori, can uncover gravity-model-like spatial hierarchy of measles spread in which major cities play an important role in driving regional outbreaks. We then turn our attention to integrative approaches that combine mechanistic and machine learning models. Specifically, we investigate how the TSIR can be utilized to improve a state-of-the-art approach known as Physics-Informed-Neural-Networks (PINN) which explicitly combines compartmental models and neural networks. Our results show that the TSIR can facilitate the reconstruction of latent susceptible dynamics, thereby enhancing both forecasts in terms of mean absolute error (MAE) and parameter inference of measles dynamics within the PINN. In summary, our results show that appropriately designed neural network-based models can outperform traditional mechanistic models for short to long-term forecasts, while simultaneously providing mechanistic interpretability. Our work also provides valuable insights into more effectively integrating machine learning models with mechanistic models to enhance public health responses to measles and similar infectious disease systems.
Author summary
Mechanistic models have been foundational in developing an understanding of the transmission dynamics of infectious diseases including measles. In contrast to their mechanistic counterparts, machine learning techniques including neural networks have primarily focused on improving forecasting accuracy without explicitly inferring transmission dynamics. Effectively integrating these two modeling approaches remains a central challenge. In this paper, we first develop a high-dimensional neural network model to forecast spatiotemporal endemic measles outbreaks and systematically compare its predictive power with that of a classical mechanistic model (TSIR). We illustrate the utility of our model using a detailed dataset describing measles outbreaks in England and Wales from 1944–1965, one of the best-documented and most-studied nonlinear infectious disease systems. Our results show that overall, our neural network model outperforms the TSIR in all forecasting windows. Furthermore, we show that our neural network model can uncover the mechanism of hierarchical spread of measles where major cities drive regional outbreaks. We then develop an integrative approach that explicitly and effectively combines mechanistic and machine learning models, improving simultaneously both forecasting and inference. In summary, our work offers valuable insights into the effective utilization of machine learning models, and integration with mechanistic models, for enhancing outbreak responses to measles and similar infectious disease systems.
Citation: Madden WG, Jin W, Lopman B, Zufle A, Dalziel B, E. Metcalf CJ, et al. (2024) Deep neural networks for endemic measles dynamics: Comparative analysis and integration with mechanistic models. PLoS Comput Biol 20(11): e1012616. https://doi.org/10.1371/journal.pcbi.1012616
Editor: Benjamin Althouse, University of Washington, UNITED STATES OF AMERICA
Received: May 27, 2024; Accepted: November 4, 2024; Published: November 21, 2024
Copyright: © 2024 Madden et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All code and data are available on the GitHub repo: https://github.com/WyattGMadden/deep_measles_dynamics.
Funding: BL, ML, AZ, WM are supported by the cooperative agreement CDC-RFA-FT-23-0069 from the CDC’s Center for Forecasting and Outbreak Analytics. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Following the COVID-19 pandemic, there has been a marked increase in machine learning research focused on enhancing the forecasting of infectious diseases. This body of work primarily sought to develop highly predictive models for real-time application during the peak of the health crisis [1, 2]. A portion of these studies has endeavored to meld classical mechanistic approaches to infectious disease with machine learning, either through the post-hoc analysis of machine learning outputs in light of established disease dynamics [3, 4], or by directly integrating mechanistic insights into the machine learning models [5–7]. Our research advances these efforts by developing neural-network-based models tailored to the complex spatiotemporal multi-year transmission dynamics of endemic measles, leveraging a well-characterized infectious disease system and a rich historical dataset describing outbreaks in pre-vaccination England and Wales.
Measles is one of the most highly transmissible and strongly immunizing pathogens. Spatiotemporal patterns of pre- and post-vaccination measles incidence are among the most well-documented, and well-studied, nonlinear infectious disease systems. Measles exhibits complex spatiotemporal dynamics driven by the interplay between seasonal forcing, susceptible recruitment due to births and spatial coupling between populations. These dynamics range from regular multiannual infection patterns in large populations [8] to coexisting attractors [9]. By contrast, measles dynamics in small highly vaccinated populations dominated by chaotic patterns driven by stochastic extinction [10].
For example, before widespread vaccination in the late 1960s, measles epidemics in England and Wales were dominated by highly regular periodic (often biennial) cycles in large cities whose populations are at or above the Critical Community Size (CCS)—the population size required to maintain endemic transmission—of approximately 300,000 individuals [11]. Following the widespread vaccination in the late 1960s, the epidemics shifted from highly regular cycles to largely irregular dynamics [12]. Due to its simple natural history and long time series of data, measles incidence in England and Wales has provided a fruitful testing ground for better understanding spatiotemporal nonlinear epidemiological dynamics, and developing semi-mechanistic statistical modeling approaches more broadly [13–18].
A suite of previous analyses has demonstrated the utility of deterministic and stochastic (semi-) mechanistic models, notably the time-series-SIR (TSIR) model [13], a discrete approximation of the S-I-R model, and other successful inferential approaches including particle filtering [19], in characterizing the dynamics in large urban populations. However, in general, these models have not primarily focused on generating long-term forecasting accuracy. While machine learning models, including a recent work leveraging the Least Absolute Shrinkage and Selection Operator (LASSO) [15], have shown improved forecasting skills for endemic measles dynamics, they generally lack deep mechanistic interpretability.
These models also do not explicitly consider spatial interactions between locations which is a known driver for measles transmission, particularly between less populous locations (e.g., small towns) and population centers (e.g., core cities) [16]. To this end, we first train a high-dimensional neural network explicitly incorporating both spatial and temporal features (SFNN) to forecast measles incidence over 1,452 cities and towns from 1944 to 1965 and assess forecast performance over a range of different forecast steps. We also employ explainability (XAI) methods to shed light on how the neural network reveals mechanistic relationships when making predictions.
Following this we turn our attention to integrative approaches that have the potential to simultaneously provide high forecasting performance and mechanistic interpretabililty. Specifically, we focus on the so-called physics-informed neural network (PINN) methods, a class of integrative neural networks that incorporate physics differential equations into the model fitting procedure [20, 21]. PINN methods are able to preserve high predictive performance while incorporating and inferring scientific parameters, and have only recently been extended from physics differential equations to infectious disease mechanistic equations [6, 22]. While previous pioneering work [6, 22–24] has demonstrated the ability of PINN methods to improve disease incidence forecasts, the applications have not focused on long-term prediction and inference of the transmission dynamics in the context of endemic childhood infections. We build a PINN model which integrates a machine learning model directly with a mechanistic S-I-R model, and is able to address these shortcomings by augmenting the measles transmission dynamics with reconstructed latent susceptible dynamics from the TSIR model.
Our results demonstrate that appropriately designed machine learning models can outmatch more traditional mechanistic modeling approaches with respect to forecasting accuracy while effectively uncovering mechanistic infectious disease dynamics, in both a post-hoc and an integrative fashion. First, the high-dimensional neural network (SFNN), overall, outperforms the TSIR model for all forecast windows and in the majority of towns and cities in E&W, but with the most notable improvement for long-term predictions. The explainability (XAI) methods applied to our SFNN uncover the mechanism of hierarchical spread from large core cities to less populous towns without imposing such a mechanism a priori. Specifically, our results suggest that the relative role of spatial hierarchical spread increases as the population size of towns decrease, which is consistent with previous findings leveraging gravity model formulations [25]. Second, we compare the performance of a PINN model augmented with TSIR-reconstructed latent susceptible dynamics (referred to as TSIR-PINN) to a PINN model with naively constrained susceptible dynamics (referred to as Naive-PINN). We demonstrate that inclusion of the TSIR-reconstructed susceptible dynamics (in the TSIR-PINN model) improves the inference of disease parameters while simultaneously providing high forecasting accuracy. Together these findings illustrate the potential for a new suite of methods to provide improved integration between mechanistic models and machine learning approaches for infectious disease modeling, achieving high predictive performance while simultaneously ensuring accurate scientific inference of the spatiotemporal dynamics of measles and similar infectious disease systems.
Results
Neural network model (SFNN) outperforms TSIR model in forecasting endemic measles dynamics
The TSIR model estimates measles dynamics by leveraging incidence data and birth data [13, 18, 26] (see Materials and methods for full model specification). It provides a computationally inexpensive and highly tractable alternative approach to the continuous-time S-I-R model and has been shown to excel in short-term forecasting for measles incidence in large populous cities [17]. However, the TSIR model generally does not perform well for long-term forecasts. Furthermore, incorporating spatial interaction among multiple locations into the TSIR is a steep statistical challenge [27], which limits its utility for characterizing and forecasting (typically less regular outbreaks) in less populous towns whose population size is less than the CCS.
With these deficiencies in mind we employ a neural network explicitly incorporating spatial and temporal features (SFNN). Specifically, our SFNN considers not only measles incidence lags as features, but also potentially important spatial features including the measles incidence lags in, and distances to, the nearest (ten) towns/cities and the (seven) highest population cities (see Materials and methods for full model specification). The seven highest population cities were chosen because these were identified as having populations greater than that of the critical community size (CCS) of 300,000, an empirical threshold at which chains of infections are locally sustained [28].
Our results show that our spatiotemporally featured neural network (Fig 1) generally outperforms the TSIR for both short- and long-term predictions, across different population sizes (Fig 2) and train/test year cutoffs (S1 Table). For very short-term predictions (e.g., when k = 1, where k is the number of biweeks ahead of the targeted prediction), our neural network model SFNN notably outperforms the TSIR in less populous towns where the TSIR model traditionally has struggled with. As the population size of the prediction target increases, the performance of SFNN gradually approaches that of the TSIR (Fig 2B). As the forecasting time window widens (e.g., k > 4), the added predictive accuracy of the SFNN, in comparison with the TSIR, becomes more significant, but with an opposite trend: the performance of the SFNN now improves with the population size.
The SFNN architecture with input features grouped according to feature type, (maximum) 3 hidden layers of (maximum) dimension 1201, and 1 output layer of dimension 1 for incidence forecasts. Number of hidden layers and hidden layer dimension differ by forecasting window.
(A) Within-city SFNN RMSE versus TSIR RMSE, colored by log(population), faceted by k-step ahead forecast. (B) Difference between the within-city-standardized RMSE for TSIR and the within-city-standardized RMSE for SFNN; loess regression curves are fitted.
We also test whether our SFNN can capture annual to biennial bifurcation of measles epidemics in E&W caused by susceptible response to the late 1940s baby boom [18]. While our results (S1 Fig) suggest that our SFNN trained on the limited data prior to 1948 has limited medium- and long-term predictive accuracy for the outbreak size, it largely captures the bifurcation of the seasonal pattern for smaller forecasting windows (k ≤ 4).
Neural network model (SFNN) can uncover mechanism of spatial hierarchical spread
Previous work has demonstrated the presence of gravity-like dynamics in measles outbreaks [16, 25, 29]. For instance, dynamics in small towns are shown to be driven by the mechanism of spatial hierarchical spread in which infections in large cities can serve as reservoirs for seeding infections in less populous regions [25]. To assess if the neural network is learning such a mechanism, we employ feature importance methods that estimate how predictions rely on information from certain features and groups of features. We specifically use SHAP values [30] to investigate the relative importance of a core city to the measles spread in locations with different population sizes (see Materials and methods for more details).
Our results (Fig 3) show that the (lagged) incidence in large cities are relatively more important for less populous cities/towns. This suggests that our neural network model is able to reveal the mechanism of spatial hierarchical spread in the endemic measles spatio-temporal dynamics. This is notable both for the indication that our neural network SFNN is able to employ spatial features in a complex manner that reveals mechanistic dynamics, without explicitly imposing spatial hierarchy in the model a priori, and as an example of a post-hoc XAI method that reaffirms a theorized dynamic in a disease system.
The SHAP value measures the relative importance of the incidence of a core city (e.g., London) for making incidence prediction among cities/towns with different population sizes, which can be heuristically treated as the relative importance to the local transmission of measles in a particular city/town. Core city incidence lag features are shown to be more important when predicting incidence for less populous cities/towns. Specifically, the mean relative absolute SHAP value for each of the core city incidence lag features has an inverse relationship with log population. Cities and towns are categorized (on the x-axis) into 10 groups according to the quantiles of their population sizes.
Latent susceptible dynamics reconstruction using TSIR improves inference and forecasts of the integrative PINN framework
Next we turn our attention to integrative approaches that combine mechanistic and machine learning models. We consider the general conceptual framework of Physics-Informed Neural Network (PINN), a class of integrative neural networks that incorporate physics differential equations [20]. PINNs regularize a neural network by including a loss term which matches differential equations with observed gradient approximations garnered during the fitting process (typically using automatic differentiation methods). They hold the promise of preserving the high predictive capabilities and expressibility of neural networks while integrating scientific relationships directly into the model. Though PINNs are classically employed as a surrogate model for computationally intensive differential equation solvers [20], they also enable parameter inference and let (physics) dynamics partially drive predictions in an integrative fashion. These latter aims have been the primary impetus of existing methods to extend PINNs to spread of infectious disease [22] and are also the motive for us improving the PINN framework.
Here, we investigate the utility of a customized PINN model augmented by the reconstruction of the latent susceptible dynamics leveraging the TSIR model, referred to as TSIR-PINN (see Materials and methods for full model specification). We compare the added benefit of our approach to a naive PINN model without such augmentation of the latent dynamics (referred to as Naive-PINN). We apply both models to London measles incidence data and assess a two-year-ahead forecast window, demonstrating that the Naive-PINN fails to make accurate predictions and parameter inference, while our TSIR-PINN model utilizing TSIR-reconstructed susceptible dynamics is able to capture and predict the transmission dynamics reasonably accurately (Fig 4). In particular, the TSIR-PINN model estimates an R0 value (Fig 4) which is largely consistent with previous estimates [18]. The TSIR-PINN model also outperforms the Naive-PINN with respect to test-set Mean Absolute Error (MAE) and correlation (Table 1).
(A) TSIR-PINN test-set 52-step-ahead incidence predictions for London more closely match true incidence, when compared to those for Naive-PINN. (B) PINN parameter values are notably different between TSIR-PINN and Naive-PINN models over 2,500 epochs. The parameter v (black lines) correspond to the R0. Convergence is rapidly achieved when fitting the TSIR-PINN model, while convergence is less clear for the Naive-PINN model. More importantly, the TSIR-PINN model estimates an R0 of 26.8 which is broadly consistent with the literature, while the Naive-PINN estimates an R0 of 5.7.
TSIR-PINN outperforms Naive-PINN by both measures.
Our results suggest that including the TSIR-reconstructed (latent) susceptible dynamics (in our TSIR-PINN) can improve parameter inference while maintaining the predictive capabilities of a PINN modeling framework. These results provide important insights into more rigorously incorporating partially observed epidemic data into a PINN model, which may facilitate future developments and applications of PINN-based epidemic models.
Discussion
Measles is among one of the most well-documented infectious disease systems and is known for its complex spatio-temporal dynamics. Spatiotemporal dynamics of measles infection, driven by interplay between seasonal forcing and susceptible recruitment dynamics [17], range from simple limit cycles to chaos, with the domination of stochastic extinction in small, highly vaccinated populations [18, 31, 32]. As such, measles serves as an excellent test bed for developing modeling techniques aimed at understanding similar nonlinear infectious disease systems.
Flexible machine learning approaches hold much potential for forecasting measles dynamics. Deep neural networks in particular are known to be highly flexible for incorporating various types of data structures and capture highly nonlinear relationships, and can efficiently handle large datasets and numerous features. However interpretation and inference is often difficult due to high dimensional model parameterizations and lack of scientific knowledge integration. Two broad classes of methods are suitable for improving mechanistic interpretability of machine learning models for infectious disease dynamics: post-hoc explainability (XAI) methods which conduct post-hoc analysis on model outputs to understand underlying drivers of predictions, and direct integration of mechanistic models or other scientific priors into machine learning models. Here we detail one example of each of these classes and demonstrate their effectiveness in accurately characterizing measles spatio-temporal dynamics while preserving high forecasting performance.
Our high-dimensional feed-forward SFNN overall performs well for all forecasting windows and the majority of cities. More noteworthy is its ability to outperform TSIR for the difficult forecasting scenarios of long forecast windows (ranging from six months to two years) and less populous towns with sparse, less regular outbreaks. Neural networks are known as a “black box” method, indicating that the way in which the model uses specific covariates to arrive at a forecast is not readily apparent from parameter inspection. This is the primary downside of employing such machine learning methods when compared to mechanistic and semi-mechanistic methods such as the TSIR, which provide ample opportunities for parameter inference and assessment in relation to scientific knowledge and hypotheses. To surmount this limitation there are a collection of post-hoc methods that allow methodical interrogation of machine learning output.
Our application of one such method, the SHAP value XAI calculation [30], is able to provide insights into how our SFNN predictions are being driven by a combination of input variables that has scientifically meaningful interpretation. Specifically we show that our SFNN uncovers the mechanism that outbreaks in large cities may influence measles transmission in smaller towns/cities. This is consistent with previously theorized mechanism which suggest a similar dynamic of hierarchical spread of infections from large cities to smaller towns [14].
While this post-hoc method is insightful and relatively straightforward to apply due to its lack of interference with model-training, we push the neural-network inferential capability further with a fully integrative PINN based model (TSIR-PINN) that incorporates reconstructed latent susceptible dynamics from the seminal semi-mechanistic TSIR model. Previous work combining neural networks with compartmental models often require separate ad-hoc steps for model estimation or prediction [33, 34]. Here, by fusing mechanistic compartmental models with the neural network in the loss function used during model training, we are able to jointly regularize all parameters with respect to the SIR constraints while conducting parameter inference and maintaining forecast performance. We show that by including the reconstructed latent susceptible population in our TSIR-PINN, both forecasting performance and parameter estimation are improved when compared to the Naive-PINN model (which does not utilize the augmented latent susceptible dynamics using TSIR). While PINN-based models have previously been applied to infectious disease data, our work is a step forward in terms of more rigorous inference of and integration with latent aspects of the transmission dynamics, which is crucial in enabling long-term forecast windows and mechanistic interpretability. Our results provide key insights into incorporating partially observed epidemic data into a PINN-based modeling framework, which may facilitate future developments and applications of PINN-based epidemic models. There are several potential future directions we can explore.
The PINN-based formulation introduced here provides solely point estimates of disease dynamic parameters, and one area ripe for further development is the incorporation of rigorous statistical uncertainty quantification into these methods that might enable probabilistic statements about both model parameters and predictive output. There is also potential for this work to be extended to other disease systems, incorporating mechanisms and assessing hypotheses that are specific to these areas of study. We demonstrate in this work that accurately augmenting the latent aspects of the underlying transmission dynamics enables the PINN to perform well in both prediction and estimation. The augmentation scheme employed here, the TSIR model, has been successfully applied to other disease systems, including HFMD [35], COVID-19 [36], and RSV [37], which would allow straightforward application of our TSIR-PINN method. It is worth noting that other latent dynamic augmentation schemes, such as methods incorporating Approximate Bayesian Computation [38], can be considered to enable models similar to TSIR-PINN. Furthermore, we have only applied the TSIR-PINN model to London measles incidence data. London measles incidence is arguably the “gold-standard” dataset for understanding and assessing measles dynamics due to having a strong mechanistic signal. Our work follows many previous studies that also focus exclusively or primarily on London when studying measles incidence data and dynamics [8, 9, 11, 13, 15, 18]. Thus London is a natural test bed to compare the performance of TSIR-PINN relative to the Naïve-PINN, showing the added value of integrating a mechanistic model (TSIR) within the PINN architecture. That said, there is potential to extend the TSIR-PINN model with gravity model mechanisms that incorporates distance and population size into the unsupervised portion of the loss, which would allow us to include additional cities (or all cities, such as with the SFNN). Also, we have focused on comparing the relative performance of the TSIR-PINN to the Naive-PINN, and thus there remains room to further explore model formulations that may result in even higher prediction accuracy, such as shared embeddings or explicitly spatial or temporal architectures such Convolutional or Long Short-Term Memory Neural Networks (CNNs, LSTMs respectively) [39, 40]. Finally, there is potential for application of machine learning methodology that instead of imposing compartmental model structures a priori and inferring parameter values, focuses on hypothesis generation and model structure discovery in the context of infectious disease [41–44]. This could automate some of the fine-tuning of model structure required for these highly bespoke models and aid modellers at earlier stages in their research.
In summary, our results show that appropriately designed neural network-based models can outperform traditional mechanistic models in forecasting, while simultaneously providing mechanistic interpretability. Our work also offers valuable insights into the more effectively integrating machine learning models with mechanistic models to enhance public health responses to measles and similar infectious disease systems.
Materials and methods
Study data
We train and assess our models on biweekly measles incidence counts across 1,452 cities/towns in England & Wales during the pre-vaccination period from 1944 to 1965 (Fig 5). Separate models are fitted for different k-step ahead forecasts, ranging from 1 to 52 biweekly time steps ahead.
(A) Cities/towns are colored by log measles incidence on the first biweek of 1961. The England and Wales map is made with Natural Earth vector map data. (B) The seasonal measles trend is apparent across the four most populous cities in England and Wales from 1944 to 1965.
Feed-forward spatial feature neural network model (SFFN)
For each k-step ahead we fit a separate feed-forward spatial feature neural network (SFNN) with 1–3 hidden linear layers of dimension 240–1201, linear input/output layers, and ReLu [45] activation functions (Fig 1), where the number of hidden layers and hidden layer dimension differ by forecasting window (S2 Table).
We include a range of features, including birth counts, population size, lagged incidence counts and lagged incidence counts and distances for the seven cities with a population higher than the critical community size of 300,000 which has previously been identified as “core cities” that drive epidemics in connected cities/towns. [28]. We also incorporate spatial features, including the lagged incidence counts of the nearest ten cities and their distances. This potentially enables the neural network to learn spatial dynamics that the TSIR model does not capture. Birth and population features are from the nearest time step less than or equal to t − k while still sharing the same biweek of the year. Lagged features range from t − k to t − Tlag, where t is the target time step and Tlag ranges from 26 to 130, depending on the forecasting window (S2 Table). Neural networks are fitted in Pytorch [46] using the Adam [47] optimizer with Mean Squared Error (MSE) loss, and are trained on incidence data ranging from 1949 to 1961 with incidence data ranging from 1961 to 1965 held out for testing. Hyperparameters (including Tlag, number of hidden layers, hidden layer dimension, and Adam optimizer weight decay) are chosen using grid search with the Ray Tune [48] Python library. This procedure selects the optimal combination of hyperparameters based on minimum test-MSE after 10 epochs for all combinations of Tlag ∈ {26, 52, 78, 104, 130}, hidden dimension ∈ {240, 721, 1201}, number of hidden layers ∈ {1, 2, 3}, and weight decay ∈ {0.0001, …, 0.1}, for each k forecasting window (S2 Table).
Time-series SIR (TSIR) model
We compare the neural networks to the TSIR (time-series susceptible-infected-recovered) model, a popular semi-mechanistic technique that approximates the continuous-time SIR model and has been shown to accurately capture the dynamics of measles outbreaks in major cities [13]. TSIR provides a computationally inexpensive and highly tractable alternative to the classic SIR compartmental model, and is described by the following equations:
(1)
(2)
where St is reconstructed as
at each time step and with
the average number of susceptible individuals in the population. Zt is estimated from Eq 2 by regressing the cumulative births against the cumulative incidence as follows,
(3)
and the log-linearized Eq 1.
For each k-step ahead, target time set t, and city, a separate TSIR model is fit on time steps t − 130 to t − k. One-step ahead forecasts are then made recursively with Eqs 1 and 2 until time t forecast is reached. We employ the tsiR R package for TSIR model fitting, and refer the reader to the package documentation for details not specified here [26].
Neural network interpretability methods (SHAP)
We use the SHAP (SHapley Additive exPlanations) method [30] to assess neural network feature importance, specifically relying on sampling-based approximation methods [49, 50] from the Captum [51] Python library. SHAP values are estimated by randomly permuting (input) feature groups, calculating the change in model output due to a particular permutation and finally averaging across all permutations. Features are grouped according to lag type; that is, incidence lags are grouped together, high-population-city incidence lags are grouped together, etc.
In our analysis, we first estimate the normalized absolute SHAP value associated with a particular feature group for each observation. We then, within a particular city or town, calculate the average of all the normalized absolute SHAP values associated with a particular feature group of interest, across all the observations of that city or town. Together these provide a measure of the relative importance of a particular feature group for the predictions made for a city/town.
Physics-Informed-Neural-Network model
The neural network architecture for the PINN models is different from the previously described neural network, due to incorporation of compartmental S-I-R equations and parameters in the model’s loss function. We start with a Feed-Forward Neural Network with 2 hidden layers of dimension 128, linear input layer, and a 2 dimensional output layer for the TSIR-reconstructed susceptible (STSIR) and observed incidence (I). GeLU [52] activation functions are used on the hidden layers and a softplus activation function is used on the output layer. Features include time, lagged incidence counts, and lagged TSIR-reconstructed susceptibles, with the time feature transformed with Gaussian Random Fourier feature mappings [53]. Neural networks are again fit in PyTorch using the Adam optimizer with the same train/test as previously, though here we employ a Mean Absolute Error (MAE) loss comprised of the following components:
(4)
where λFF and λPINN are tunable hyperparameters, and
(5)
(6)
where
and
are the FF predictions at time t.
Here and
are the output of the following compartmental SIR equations at time t:
(7)
(8)
where Bt is the number of births at time t, βt is a seasonal transmission rate at time t, γ is the recovery rate, Nt is the population at time t, and
and
are the approximations of the relevant gradients, which are calculated at each epoch using autograd in PyTorch [46].
We parameterize βt as follows:
(9)
where Eq 9 implies a seasonal transmission rate with three free parameters: ν, α1, and α2. ν is the baseline transmission rate, while α1 and α1 are seasonal parameters controlling sinusoidal annual fluctuations.
We assume γ = 1 due to the measles recovery period being approximately equal to the biweekly scale of the data [54], thus the parameters employed in βt are the sole learnable parameters for this MAEPINN component of the loss. By matching to
, we are providing an unsupervised soft constraint on the neural network to adhere to compartmental equation dynamics and vice versa.
One explanation for why incorporation of the TSIR reconstructed susceptibles in the PINN improves prediction and estimation is as follows. The predicted incidence, It, determines (part of) the loss, and the incidence is structurally related to the susceptible dynamics of measles (see Eqs 1 and 2). As such, incorporating the TSIR reconstructed susceptibles imposes more reasonable constraints on the (predicted) incidence, ultimately resulting in more stable fitting and improved predictions. To assess this impact, we also fit versions of the above models with naively constrained latent susceptibles, such that all STSIR components are replaced with SNaive, and are fit as unconstrained parameters. To aid in model stability we fit the TSIR-PINN and Naive-PINN models 100 times each and take the final predictions as the mean of the predictions over all model runs.
Supporting information
S1 Fig. London SFNN bifurcation assessment.
Our SFFN model trained on the limited data prior to 1948 predicts change of seasonality (i.e., annual to biennial bifurcation in late 1940s) in London, for steps-ahead ranging from 1–4. It is noted that, due to the lack of training data in this case, our SFNN does not perform well in capturing the magnitude of the incidence in general.
https://doi.org/10.1371/journal.pcbi.1012616.s001
(TIFF)
S1 Table. Forecasting year train/test cutoff sensitivity analysis.
The average within-city standardized test-set RMSE for TSIR and SFNN for all combinations of k forecasting windows ∈ {1, 4, 12, 20, 34, 52} and train/test cutoff years ∈ {1960, 1961, 1962} demonstrates that the improvement of the SFNN model over the TSIR is stable across train/test cutoff points.
https://doi.org/10.1371/journal.pcbi.1012616.s002
(PDF)
S2 Table. Optimal SFNN hyperparameters.
Tuned hyperparameter values, as determined by Ray Tune grid search, indicate the optimal number of incidence feature time lags, hidden dimension, and weight decay value, for each forecasting window.
https://doi.org/10.1371/journal.pcbi.1012616.s003
(PDF)
References
- 1. Rustam F, Reshi AA, Mehmood A, Ullah S, On BW, Aslam W, et al. COVID-19 Future Forecasting Using Supervised Machine Learning Models. IEEE Access. 2020;8:101489–101499.
- 2. Du H, Dong E, Badr H, Petrone M, Grubaugh N, Gardner L. Incorporating variant frequencies data into short-term forecasting for COVID-19 cases and deaths in the USA: a deep learning approach. EBioMedicine. 2023;89:104482. pmid:36821889
- 3.
Rodriguez A, Tabassum A, Cui J, Xie J, Ho J, Agarwal P, et al. DeepCOVID: An Operational Deep Learning-driven Framework for Explainable Real-time COVID-19 Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence. 2021;35:15393–15400.
- 4. Temenos A, Tzortzis IN, Kaselimi M, Rallis I, Doulamis A, Doulamis N. Novel Insights in Spatial Epidemiology Utilizing Explainable AI (XAI) and Remote Sensing. Remote Sensing. 2022;14(13).
- 5.
Arik S, Li CL, Yoon J, Sinha R, Epshteyn A, Le L, et al. Interpretable Sequence Learning for Covid-19 Forecasting. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in Neural Information Processing Systems. vol. 33. Curran Associates, Inc.; 2020. p. 18807–18818.
- 6.
Rodríguez A, Cui J, Ramakrishnan N, Adhikari B, Prakash BA. EINNs: epidemiologically-informed neural networks. In: Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence. AAAI’23/IAAI’23/EAAI’23. AAAI Press; 2023.
- 7. Nguyen DQ, Vo NQ, Nguyen TT, et al. BeCaked: An Explainable Artificial Intelligence Model for COVID-19 Forecasting. Sci Rep. 2022;12:7969. pmid:35562369
- 8. Brownlee J. An investigation into the periodicity of measles epidemics in London from 1703 to the present day by the method of the periodogram. Phil Trans R Soc Lond B. 1918;208:225–250.
- 9. Becker A, Zhou S, Wesolowski A, Grenfell B. Coexisting attractors in the context of cross-scale population dynamics: Measles in London as a case study. Proceedings Biological sciences. 2020;287:20191510. pmid:32315586
- 10. Finkenstädt B, Bjornstad O, Grenfell B. A stochastic model for extinction and recurrence of epidemics: Estimation and inference for measles outbreaks. Biostatistics (Oxford, England). 2003;3:493–510.
- 11. Bartlett MS. Measles Periodicity and Community Size. Journal of the Royal Statistical Society Series A (General). 1957;120(1):48–70.
- 12. Bolker BM, Grenfell BT. Impact of vaccination on the spatial correlation and persistence of measles dynamics. Proceedings of the National Academy of Sciences. 1996;93(22):12648–12653. pmid:8901637
- 13. Finkenstadt BF, Grenfell BT. Time Series Modelling of Childhood Diseases: A Dynamical Systems Approach. Journal of the Royal Statistical Society Series C (Applied Statistics). 2000;49(2):187–205.
- 14. Xia Y, Bjørnstad O, Grenfell B, DeAngelis AEDL. Measles Metapopulation Dynamics: A Gravity Model for Epidemiological Coupling and Dynamics. The American Naturalist. 2004;164(2):267–281. pmid:15278849
- 15. Lau MSY, Becker A, Madden W, Waller LA, Metcalf CJE, Grenfell BT. Comparing and linking machine learning and semi-mechanistic models for the predictability of endemic measles dynamics. PLOS Computational Biology. 2022;18(9):1–14. pmid:36074763
- 16. Grenfell B, Bjørnstad O, Kappey J. Travelling waves and spatial hierarchies in measles epidemics. Nature. 2001;414:716–723. pmid:11742391
- 17. Grenfell BT, Bjørnstad ON, Finkenstädt BF. Dynamics of Measles Epidemics: Scaling Noise, Determinism, and Predictability with the TSIR Model. Ecological Monographs. 2002;72(2):185–202.
- 18. Becker AD, Wesolowski A, Bjørnstad ON, Grenfell BT. Long-term dynamics of measles in London: Titrating the impact of wars, the 1918 pandemic, and vaccination. PLOS Computational Biology. 2019;15(9):1–14. pmid:31513578
- 19. Endo A, van Leeuwen E, Baguelin M. Introduction to particle Markov-chain Monte Carlo for disease dynamics modellers. Epidemics. 2019;29:100363. pmid:31587877
- 20. Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics. 2019;378:686–707.
- 21. Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L. Physics-informed machine learning. Nature Reviews Physics. 2021;3(6):422–440.
- 22. Shaier S, Raissi M, Seshaiyer P. Data-driven approaches for predicting spread of infectious diseases through DINNs: Disease Informed Neural Networks; 2022.
- 23. Berkhahn S, Ehrhardt M. A physics-informed neural network to model COVID-19 infection and hospitalization scenarios. Adv Contin Discret Model. 2022;2022(1):61. pmid:36320680
- 24. Ning X, Guan J, Li X, Wei Y, Chen F. Physics-Informed Neural Networks Integrating Compartmental Model for Analyzing COVID-19 Transmission Dynamics. Viruses. 2023;15(8):1749. pmid:37632091
- 25. Lau M, Becker A, Korevaar H, Caudron Q, Shaw D, Metcalf CJ, et al. A competing-risks model explains hierarchical spatial coupling of measles epidemics en route to national elimination. Nature Ecology & Evolution. 2020;4:1–6. pmid:32341514
- 26. Becker A, Grenfell B. tsiR: An R package for time-series Susceptible-Infected-Recovered models of epidemics. PLoS One. 2017;12(9):e0185528. pmid:28957408
- 27. Jandarov R, Haran M, Bjørnstad O, Grenfell B. Emulating a Gravity Model to Infer the Spatiotemporal Dynamics of an Infectious Disease. Journal of the Royal Statistical Society Series C: Applied Statistics. 2013;63(3):423–444.
- 28. Keeling MJ, Grenfell BT. Disease Extinction and Community Size: Modeling the Persistence of Measles. Science. 1997;275(5296):65–67. pmid:8974392
- 29. Bharti N, Xia Y, Bjornstad ON, Grenfell BT. Measles on the Edge: Coastal Heterogeneities and Infection Dynamics. PLOS ONE. 2008;3(4):1–7. pmid:18398467
- 30.
Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. NIPS’17. Red Hook, NY, USA: Curran Associates Inc.; 2017. p. 4768–4777.
- 31. Ferrari MJ, Grais RF, Bharti N, Conlan AJK, Bjørnstad ON, Wolfson LJ, et al. The dynamics of measles in sub-Saharan Africa. Nature. 2008;451(7179):679–684. pmid:18256664
- 32. Dalziel BD, Bjørnstad ON, van Panhuis WG, Burke DS, Metcalf CJE, Grenfell BT. Persistent Chaos of Measles Epidemics in the Prevaccination United States Caused by a Small Change in Seasonal Transmission Patterns. PLOS Computational Biology. 2016;12(2):1–12. pmid:26845437
- 33. Bousquet A, Conrad WH, Sadat SO, et al. Deep learning forecasting using time-varying parameters of the SIRD model for Covid-19. Scientific Reports. 2022;12:3030. pmid:35194090
- 34.
Nadler P, Arcucci R, Guo Y. A Neural SIR Model for Global Forecasting. In: Alsentzer E, McDermott MBA, Falck F, Sarkar SK, Roy S, Hyland SL, editors. Proceedings of the Machine Learning for Health NeurIPS Workshop. vol. 136 of Proceedings of Machine Learning Research. PMLR; 2020. p. 254–266. Available from: https://proceedings.mlr.press/v136/nadler20a.html.
- 35. Takahashi S, Liao Q, Van Boeckel TP, Xing W, Sun J, Hsiao VY, et al. Hand, Foot, and Mouth Disease in China: Modeling Epidemic Dynamics of Enterovirus Serotypes and Implications for Vaccination. PLoS Medicine. 2016;13(2):e1001958. pmid:26882540
- 36. Baker RE, Park SW, Yang W, Vecchi GA, Metcalf CJE, Grenfell BT. The impact of COVID-19 nonpharmaceutical interventions on the future dynamics of endemic infections. Proceedings of the National Academy of Sciences. 2020;117(48):30547–30553. pmid:33168723
- 37. Wambua J, Munywoki PK, Coletti P, Nyawanda BO, Murunga N, Nokes DJ, et al. Drivers of respiratory syncytial virus seasonal epidemics in children under 5 years in Kilifi, coastal Kenya. PLOS ONE. 2022;17(11):1–13. pmid:36441757
- 38. Minter A, Retkute R. Approximate Bayesian Computation for infectious disease modelling. Epidemics. 2019;29:100368. pmid:31563466
- 39. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–444. pmid:26017442
- 40. Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997;9(8):1735–1780. pmid:9377276
- 41. Hao Z, Liu S, Zhang Y, Ying C, Feng Y, Su H, et al. Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications; 2023.
- 42. Brunton SL, Proctor JL, Kutz JN. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences. 2016;113(15):3932–3937. pmid:27035946
- 43. Schmidt M, Lipson H. Distilling Free-Form Natural Laws from Experimental Data. Science. 2009;324(5923):81–85. pmid:19342586
- 44. Chen Z, Liu Y, Sun H. Physics-informed learning of governing equations from scarce data. Nature Communications. 2021;12:6136. pmid:34675223
- 45.
Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10. Madison, WI, USA: Omnipress; 2010. p. 807–814.
- 46.
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. In: PyTorch: An Imperative Style, High-Performance Deep Learning Library. Red Hook, NY, USA: Curran Associates Inc.; 2019.
- 47.
Kingma D, Ba J. Adam: A Method for Stochastic Optimization. International Conference on Learning Representations. 2014;.
- 48.
Liaw R, Liang E, Nishihara R, Moritz P, Gonzalez JE, Stoica I. Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint arXiv:180705118. 2018;.
- 49. Štrumbelj E, Kononenko I. An Efficient Explanation of Individual Classifications using Game Theory. J Mach Learn Res. 2010;11:1–18.
- 50. Castro J, Gómez D, Tejada J. Polynomial calculation of the Shapley value based on sampling. Computers & Operations Research. 2009;36(5):1726–1730.
- 51. Kokhlikyan N, Miglani V, Martin M, Wang E, Alsallakh B, Reynolds J, et al. Captum: A unified and generic model interpretability library for PyTorch; 2020.
- 52. Hendrycks D, Gimpel K. Gaussian Error Linear Units (GELUs); 2023.
- 53.
Tancik M, Srinivasan P, Mildenhall B, Fridovich-Keil S, Raghavan N, Singhal U, et al. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, editors. Advances in Neural Information Processing Systems. vol. 33. Curran Associates, Inc.; 2020. p. 7537–7547.
- 54.
Black FL. Measles. In: Evans AS, editor. Viral Infections of Humans: Epidemiology and Control. New York: Plenum; 1984. p. 397–418.