Aggregating soft labels from crowd annotations improves uncertainty estimation under distribution shift

doi:10.1371/journal.pone.0323064

Aggregating soft labels from crowd annotations improves uncertainty estimation under distribution shift

Table 3

F1 and calibrated log likelihood. Results are averaged over 10 random seeds; standard deviation is given in the subscript. Tasks marked by * are subject to input data distribution shift while datasets marked by † are subject to annotator pool distribution shift. Methods marked by ‡ are those which estimate either worker skill or item difficulty. Aggregating the individual soft-labeling methods yields classifiers with consistently good uncertainty estimation (best on all text based tasks) and generally good raw performance in terms of F1 across tasks.

doi: https://doi.org/10.1371/journal.pone.0323064.t003