Table 1.
Number of raters and mean scores across sites.
The total number of raters (nr) was 59.
Fig 1.
Demonstration of the synthetic agreement data across differences in the parameter ranges and presence of asymmetry.
The x-axes all represent the ground truth value of the variable, and the y-axes represent the “observed” values. Data are depicted based on different values of a uniform noise parameter (0 ≤ ω ≤ 1) that governs what proportion of the data is merely uniform noise over the interval [0, 10], and a disagreement parameter (σ ≥ 0), which governs the variance around the diagonal line. Panel A (upper three rows, shown in blue) depicts the synthetic data in which there was asymmetrical levels of agreement across the score domain. Panel B (lower three rows, shown in red) depict synthetic data in which there was symmetrical agreement over the score domain.
Fig 2.
Histograms of ratings for each value of the ground truth Alda score available in the first wave dataset from Manchia et al. [1].
Each histogram represents the distribution of ratings (nr = 59) for a single one of twelve assessment vignettes. The gold standard (“ground truth”) Alda score, obtained by the Halifax consensus sample, is depicted as the title for each histogram. Plots in blue are those for vignettes with gold standard Alda scores less than 7, which would be classified as “non-responders” under the dichotomized setting. Vignettes with gold standard Alda scores ≥ 7 are shown in red, and represent the dichotomized group of lithium responders.
Fig 3.
Mutual information between gold standard and observed Alda scores in relation to the observation noise (α) and whether the scale is in its raw or dichotomized form (lithium responder [Li(+)] is Alda score ≥ 7; non-responder [Li(-)] is Alda score < 7).
Panels A-C show the inferred joint distributions of the observed (xo for raw, yo for discrete) and gold standard (x* for raw, y* for discrete) values at different levels of observation noise (α ∈ {0, 10, 100}). Panel D plots the mutual information for the raw (red) and discrete (blue) settings of the Alda score across increasing values of α. Recall that here we set ξ = 11/2.
Fig 4.
Mutual information (MI) for dichotomized (solid lines) and continuous (dashed lines) distributions on synthetic data with asymmetrical (upper row, Panel A) and symmetrical (lower row, Panel B) properties with respect to agreement.
X-axes represent the dichotomization thresholds at which we recalculate the dichotomized MI. Mutual information is depicted on the y-axes. Plot titles indicate the different diagonal spread (σ) parameters used to synthesize the synthetic datasets. Solid lines (for dichotomized MI) are surrounded by ribbons depicting the 95% confidence intervals over 10 runs at each combination of parameters (τ, ω, β, σ).
Fig 5.
Statistical power achieved with the Pearson coefficient (a continuous measure of association; blue lines) and Fisher’s exact test (a measure of association between dichotomized variables; red lines) for synthetic data with symmetrical (upper row) and asymmetrical (lower row) properties with respect to agreement.
Columns correspond to the level of uniform “overall” noise (ω) added to the data, representing prior uncertainty. X-axes represent the diagonal spread (σ), and the y-axes represent the test’s statistical power for the given sample size and estimated effect sizes. Data subjected to Fisher’s exact test were dichotomized at either a threshold of 5 (the “Median Split,” denoted by ‘+’ markers in red) or 3 (the “Tail Split,” denoted by the dot markers in red). For all series, dark lines denote means and the ribbons are 95% confidence intervals over 100 runs.