Aggregating soft labels from crowd annotations improves uncertainty estimation under distribution shift
Fig 6
Comparison of the average CLL and F1 score on the POS tagging task using different combinations of distributions for aggreagation.
Points are the average performance across all combinations of a given number of distributions and error bars are 95% confidence intervals.