Recalibrating probabilistic forecasts of epidemics
Fig 8
Improvement in mean log score after recalibration, averaged over all 27 FluSight forecasters, by number of training seasons.
We perform three runs for each of the nine available seasons and n ∈ {1, 2, 4, 8}, where a run consists of randomly sampling n other seasons to train recalibration for each of the 27 FluSight forecasters. Each point in the plot is averaged over 9 × 3 = 27 runs. As expected, the parametric method is more robust to limited training data than the nonparametric method.