Fig 1.
Overall schema of datasets (boxes) and processes (arrows) that led to the analyses (red).
Top row: The Expert Labeled dataset was used a gold standard to analyze how well the different experimental groups (blue boxes) performed. Bottom row: the labeling from each experimental group was used to train an ML classifier. Each ML classifier was then tested against an expert-labeled test set.
Fig 2.
Example image used during training to demonstrate correct placement of bounding boxes around tassels.
Fig 3.
Left: Sample participant-drawn boxes. Right: The Red box is the gold standard box and black is a participant-drawn box.
Fig 4.
Density of precision recall pairs by group.
Density based on a total of 61,888 participant-drawn boxes. A: Master MTurkers. B: MTurkers. C: Course Credit participants. D: Violin plots showing the distribution of F-measure per image per user, where white circles: distribution median; black bars: second and third quartiles; black lines 95% confidence intervals.
Table 1.
Parameter estimates from the ANOVA with master MTurk group as baseline.
Table 2.
Parameter estimates in linear mixed effects regression of time spent each image.
Fig 5.
Both accuracy and time per question change as participants progress through the task.
A: Time spent in log scale as a function of image order. B: Mean F value decreases very slightly over the survey process.
Fig 6.
Best Linear Unbiased Predictors for images.
BLUPs are calculated in both analyses for Fmean and time in log scale. Color represents image difficulty determined by expert.