Aggregating soft labels from crowd annotations improves uncertainty estimation under distribution shift

doi:10.1371/journal.pone.0323064

Aggregating soft labels from crowd annotations improves uncertainty estimation under distribution shift

Fig 13

Significance testing for the RTE task. We apply the Bonferroni correction across the total tests (N = 56).

Green indicates the method in the row is significantly better than the method in the column. Red indicates the method in the row is significantly worse than the method in the column. Grey indicates no statistically significant difference.

doi: https://doi.org/10.1371/journal.pone.0323064.g013