Recalibrating probabilistic forecasts of epidemics

doi:10.1371/journal.pcbi.1010771

Recalibrating probabilistic forecasts of epidemics

Fig 3

Mean log score, averaged over all forecasters, for the different recalibration methods.

A window size of k corresponds to training recalibration on forecasts within k weeks of the given forecast week, where available, inclusive. Log score is averaged over 9 seasons, 11 locations, and 29 weeks (higher log score is better). The largest window sizes slightly hurt the performance of the parametric model, and the smallest window sizes significantly hurt the nonparametric model. Averaged over all forecasters, the improvement in performance due to calibration is roughly equal to the improvement in performance by reducing the forecast horizon by a week.

doi: https://doi.org/10.1371/journal.pcbi.1010771.g003