Removing facial features from structural MRI images biases visual quality assessment
Fig 3
Rater 4 issued visibly different ratings, generally more optimistic, than the other raters in both (defaced and nondefaced) conditions.
The gray lines highlight the evolution of the rating between the nondefaced image and its defaced counterpart. The full white line in the violin plot represents the median of the distribution, while the dashed white lines represent the 25% and 75% quantiles. Comparing the median of the rating distribution from the nondefaced vs. defaced images, it is visible that different raters presented different bias magnitudes. Our most experienced rater (Rater 1) showed the largest bias. Rater 4’s rating distribution diverged from that of the other raters, being more optimistic overall about the quality of the images. Rater 4 also displayed a lower spread in quality assessments, which translated into the narrowest 95% LoA (Fig 2). Lastly, low ratings tended to be more biased by defacing as they showed a steeper evolution line, sometimes jumping one unit or more (equivalent to switching categories in the appreciation of quality, e.g., going from “poor” to “acceptable”). Higher ratings displayed gaps within 0.5 units. BA plots support the same observation (Fig 2 and Fig B in S1 Text). Source tabular data to generate this figure are found within the S1 Data.