Figure 1.
Test-retest repeatability of latent value decisions (mosaic charts).
(A) 2-way {VID, Not VID} latent value decisions: = 89.7%. (B) 3-way latent value decisions {NV, VEO, VID} including category “value for exclusion only”:
= 84.6%. These mosaic plots depict the tabular data from Table 1, indicating for each category of initial test response (y-axis) the proportion of each category of retest response (x-axis).
Table 1.
Test-retest repeatability of latent value decisions (3-way contingency table).
Figure 2.
Repeatability and reproducibility of 2-way latent value decisions {VID vs. Not VID}.
Percentage of examiners rating each latent VID (y-axis), in rank order (x-axis), color-coded by repeatability; n = 252 latents on which at least 3 examiners were retested. Examiners were initially unanimous on 107 of these 252 latents; value decisions changed on 3 of these. Reproducibility rates were based on 53.2 mean examiners per latent (s.d. 21.7); repeatability rates were based on 5.0 mean examiners per latent (s.d. 2.3).
Table 2.
Repeatability of comparison decisions on RandomNonMates dataset.
Table 3.
Repeatability of comparison decisions on RandomMates dataset.
Figure 3.
Repeatability and reproducibility of 2-way individualization decisions {VID individualization, other}.
Percentage of examiners individualizing mated image pairs (y-axis), in rank order by VID individualization (x-axis), colored-coded by repeatability. Y-axis is based on 4,006 initial decisions (excludes false negative responses; 10.3 mean examiners per image pair; s.d. 2.6). Color-coding is based on 792 retest decisions on 389 mated image pairs (RandomMates dataset; 2.0 mean examiners per image pair; s.d. 1.1). Non-repeated decisions occurred on 46 of the 389 image pairs. Examiners were initially unanimous on 257 of the 389; decisions were not repeated on 2 of these.
Figure 4.
Repeatability (A) and reproducibility (B) of individualization decisions by difficulty.
(A) Retest decisions by difficulty where the initial test decision was an individualization (269 paired decisions (test-retest) on 147 image pairs, 144 of which were mated). (B) Reproducibility of individualization decisions by difficulty (1,615 individualization decisions (15,990 paired examiner responses) by the 72 examiners on 249 image pairs, 246 of which were mated). Results for exclusion decisions were similar (Information S7).
Table 4.
Repeatability and reproducibility of individualization and exclusion decisions, by examiner assessment of difficulty.
Figure 5.
Percentage agreement on latent value.
2-way {VID, Not VID} and 3-way {NV, VEO, VID} latent value repeatability is measured within the initial test (“Days”), and between the test and retest (“Months”). Reproducibility is computed from the initial test results. All statistics are limited to the 72 retest participants; “N” indicates the number of decisions and, parenthetically, the number of distinct latents. Confidence intervals for these estimates are discussed in Information S9.
Figure 6.
Percentage agreement on comparisons of mated and nonmated image pairs.
2-way Mates {VID individualization, other}, 2-way Nonmates {exclusion, other}, 3-way {VID individualization, any exclusion, other}, and 7-way {NV, VEO inconclusive, VEO exclusion, VEO individualization, VID inconclusive, VID exclusion, VID individualization}. Repeatability is computed from the RandomMates and RandomNonMates datasets; reproducibility is computed from the initial test results. While the 2-way and 3-way decisions correspond to common operational practice, only a subset of the 7-way distinctions would correspond to any specific operational practice. All statistics are limited to the 72 retest participants; “N” indicates the number of decisions and, parenthetically, the number of distinct image pairs. Confidence intervals for these estimates are discussed in Information S9).
Figure 7.
Mosaic displays of 7-way contingency tables for repeatability and reproducibility of examiner decisions.
(A) repeatability of nonmated comparison decisions (648 test-retest decision pairs on 210 nonmated pairs); (B) repeatability on mated comparisons (1,018 test-retest decision pairs on 436 mated pairs); (C) reproducibility on nonmated comparisons (19,025 inter-examiner decision pairs derived from 2,066 decisions on 219 nonmated pairs); (D) reproducibility on mated comparisons (51,380 inter-examiner decision pairs derived from 5,134 decisions on 499 mated pairs). The corresponding contingency tables are presented in Information S4. Chart B is adjusted to correct for the disproportionate number of false negative errors that were deliberately included in the retest: the height of the exclusion rows was reweighted to correspond to the proportions occurring on the initial test (6.3% of mated pairs in the initial test were false negatives, vs. 13.6% selected for the retest).
Table 5.
Examiner responses on the six image pairs (labeled A–F in [5]) that resulted in false positive errors.
Table 6.
Repeatability of false negative errors on (A) FalseNeg and (B) FalseNeg_M datasets.
Table 7.
Repeatability and reproducibility for mated pairs, contingent upon whether the initial decision was false negative.