Aggregating soft labels from crowd annotations improves uncertainty estimation under distribution shift
Table 2
Negative log likelihood of each annotation method with respect to the expert annotations in each dataset. Individual soft-labeling methods vary between tasks, while aggregating maintains best or near-best NLL.