Improving probabilistic infectious disease forecasting through coherence

doi:10.1371/journal.pcbi.1007623

Fig 1.

Data example for the three test seasons under consideration (2016/2017, 2017/2018, 2018/2019) season for all 10 Health and Human Services (HHS) regions and the national level.

At any given epiweek, the national wILI (black) is a weighted sum of regional wILI, where the weights correspond to the population size of the region. We can see that wILI is highly seasonal and varies heavily by region. Region population sizes (in millions) are given next to the region in the legend.

More »

Expand

Fig 2.

Mock example of independent forecasts (red) and projected forecasts (blue) for three regions, National, HHS1 and HHS2.

Both the blue and the red point represent a triple of ILI forecast values for each region. Independent forecasts are projected onto the space satisfying the constraint of regional level forecasts summing to national level. The blue plane represents the set of points that satisfy the coherence constraint, namely that the weighted combination of region-level forecasts equals the National level forecast. Different projection matrices are able to map the red point to the blue point at different locations on the blue plane.

More »

Expand

Fig 3.

Graphical example of how mean squared error (MSE) can decrease while skill gets worse for two region example.

A: Purple histograms represent the 10,000 realizations of , while green histograms are the corresponding . The purple and green points illustrate a particular example of the projection matrix forecasting process. The solid vertical lines denote the true value for each region. B: Top panel shows distribution of MSE for minus corresponding MSE for . MSE for is greater than the MSE for for all realizations. B: Bottom panel shows single-bin skill score for minus skill score for . The incoherent forecasts are better or equal to the skill for the coherent forecasts for all iterations, with an average improvement greater than 0. This shows the the MSE of the coherent forecasts has decreased (since the difference between the original and projected is positive) and the forecast skill has decreased (since the difference between the original and projected is again positive). Since a decrease in MSE means an improvement and a decrease in forecast skill means a lack of improvement, we see that coherence can have opposite effects on the two scores.

More »

Expand

Fig 4.

Real data example of model predictive densities for the 1 week ahead target on epiweek 201901 for the 2018/2019 season across all 11 regions.

The y-axis represents the probability density for a given wILI bin value on the x-axis. Notice how the regional samples do not change much under the coherence constraint, but the national forecasts noticeably change. We can also see variable levels of density “smoothing” produced by each method, with the greatest amount of smoothing under the Unordered weighted ordinary least squares (WOLS) method. This smoothing of forecast density also lowers the magnitude of the peak density across all HHS regions, but increases the magnitude of the peak in the nation. However, the overall location of the forecast density remains consistent across all projection methods.

More »

Expand

Fig 5.

Unordered OLS sampling from probabilistically coherent joint distribution given a collection of marginal distributions.

Note that the corresponding weighted ordinary least squares (WOLS) method is obtained by replacing P with P_w.

More »

Expand

Fig 6.

Ordered OLS sampling from probabilistically coherent joint distribution given a collection of marginal distributions.

Note that the corresponding WOLS method is obtained by replacing P with P_V.

More »

Expand

Table 1.

Experimental setup for evaluating probabilistic coherence approaches.

An evaluation point is defined as a unique region, season, epiweek, target, and model combination.

More »

Expand

Table 2.

Percent of forecasts improved over original forecast distribution for the four coherence methods described, in addition to the baseline bottom up model under both single-bin and multi-bin skill.

We omit the independent forecasts since they are the reference model for percentage of forecasts improved. Notice that WOLS unordered showed the greatest improvement under single-bin scoring but showed the least improvement under multi-bin scoring. This demonstrates that the scoring rule used influences the performance of the coherence methods.

More »

Expand

Fig 7.

Best performing method under single-bin (left) and mutli-bin (right) in terms of forecast skill averaged over all targets (1-4 week ahead), regions (HHS1-10 & National) and broken down by model-season combination.

The y-axis represents a unique season model combination which has been made anonymous to protect participant teams identity.

More »

Expand

Fig 8.

Difference between single-bin forecast skill of projection method and forecast skill of independent forecasts averaged over all regions and epiweeks broken down by target (left), season (right), and region (bottom).

Each point represents a single model-season combination. Box-whisker forecasts and represent the inter-quartile range as well as the maximum and minimum in forecast skill difference between projected method and independent forecasts. The improvements in single-bin forecast skill are consistent across season and target for the unordered WOLS. However, the improvements are only consistent across the HHS regions, not the national region.

More »

Expand

Fig 9.

Average variance of forecasts, averaged over season, epiweek, target, and model.

Notice that the unordered WOLS increases the variance across HHS regions, which is reflected in the improvements under single-bin scoring. However, the variance of the unordered WOLS decreases at the national level, which is also the only region without significant benefit under single-bin scoring. The optimal model under multi-bin scoring (ordered OLS) retains the same variance of the original forecast distribution for the HHS regions, but slightly increases the variance slightly for the nation. This demonstrates the effect of the scoring has on projection method choice.

More »

Expand