Heterogeneity analysis provides evidence for a genetically homogeneous subtype of bipolar-disorder

doi:10.1371/journal.pone.0314288

Fig 1.

In this figure we illustrate the absolute (right) and relative (left) snp overlap between the studies available to us. The relative-overlap is calculated using the Szymkiewicz–Simpson coefficient (i.e., the overlap-coefficient between sets X and Y is |X ∩ Y|/min(|X|, |Y|)). Guided by the relative-overlap and genotyping platform used, we divided the studies into four arms (shown along the coordinate axes). The first arm contains only the single ‘BDRN’ data-set, which we use as a training/discovery set to search for heterogeneity (see Methods). We reserve the remaining studies (organized into three arms) for replication. Note that the training-set overlaps strongly with arm-2, and less strongly with arm-3 and arm-4. The magnitude of this overlap will constrain how faithfully any patterns of differential-expression found in arm-1 can possibly manifest within the other arms (see Figs 3–5).

More »

Expand

Fig 2.

In this figure we show the output of the half-loop biclustering algorithm applied to the BDRN cohort in arm-1 (limited to those SNPs with maf ≥0.25).

As described in the main text, the algorithm proceeds iteratively, eliminating rows and columns from the case-subject-array D until all have been removed. At each iteration i, the remaining submatrix D(i) comprises case-subjects and allele-combinations . At each iteration we record the ‘row-trace’ , which is the covariate-corrected average level of differential-expression between D(i) and the control-subjects X. In the top row of subplots we show the row-trace for the data (red) as well as for 128 label-shuffled trials (black). Each of the row-traces has been transformed into an iteration-dependent z-score (estimated using the distribution of label-shuffled trials at that iteration). In the bottom row we show the corresponding empirical p-value, as estimated for each iteration using the label-shuffled trials. The dashed black-line corresponds to the 95th percentile (i.e., a significance value of 0.05 if each iteration were considered independently). If the signal were homogeneous we would expect to see the red trace begin at a high value and decay relatively monotonically. By contrast, we see strong evidence for heterogeneity; the red trace is far from monotonic. The overall p-value for the data (red-trace), estimated using the strategy in [64], is p ≲ 1/64. Note that the trace is significant over a range of iterations, including i ∈ [175, 350].

More »

Expand

Fig 3.

In this figure we illustrate the replication of the bicluster in arm-2.

Note that the SNP-overlap between arm-1 and arm-2 is ∼ 85%. On the top we show A(i) in red and A′(i) in green. On the bottom we show the associated p-values for A(i) and A′(i), calculated with respect to H0 and H0′ for each iteration individually. Standard significance-levels 0.05 and 0.01 are shown in dashed- and dotted-lines, respectively. The interval i ∈ [175, 350] is highlighted in white. Note that both A(i) and A′(i) have peaks within the range that the trace was significant (c.f. Fig 2). The overall replication for arm-2 within the interval i ∈ [175, 350] is estimated at p ≲ 10^-12.

More »

Expand

Fig 4.

This figure is similar to Fig 3, except that we use arm-3 instead of arm-2.

The overall replication for arm-3 within the interval i ∈ [175, 350] is estimated at p ≲ 10³. Note that the SNP-overlap between arm-1 and arm-3 is only ∼ 50%.

More »

Expand

Fig 5.

This figure is similar to Fig 3, except that we use arm-4 instead of arm-2.

The overall replication for arm-3 within the interval i ∈ [175, 350] is estimated at p ≲ 10³. Note that the SNP-overlap between arm-1 and arm-4 is only ∼ 30%.

More »

Expand

Fig 6.

This figure plots the ratio of BDI to BDII subjects within (light-green, left y-axis) as a function of the iteration i (left) and the number of removed case-subjects (right). The dark-green line corresponds to the negative-log-probability (right y-axis) of observing a ratio at least as large by chance. The dashed and dotted horizontal lines indicate 0.05 and 0.01 significance values, respectively. Note that the BDI population is over-represented across a range of iterations including i ∈ [175, 350], implying that the bicluster we observe is significantly enriched for BDI subjects.

More »

Expand

Fig 7.

In each subplot we show in yellow the (vertical) for arm-2 as a function of the number of SNPs corresponding to each -threshold (horizontal, log-scale).

Additionally, we show for a particular iteration i (with i varying across subplots). The color-code used for ranges from blue to pink, corresponding to the iteration index i. Note that, by using the bicluster to inform the PRS, the performance typically improves. This improvement in performance becomes marked when the number of SNPs is limited to a relatively small fraction of the total (e.g., ∼ 1% of the total, corresponding to a log₁₀(#) of ∼ 3).

More »

Expand

Fig 8.

This figure uses circles to displays the same information as Fig 7 (corresponding to replication arm-2).

In this figure we use an algebraic-scale for the horizontal-axis (rather than a log-scale) in order to better emphasize the interval where the number of SNPs used is between 1K and 10K. The results for replication arm-3 and arm-4 are shown using squares and triangles, respectively.

More »

Expand

Fig 9.

This figure is similar to Fig 8, except that we limit ourselves only to those case-subjects in the replication-arms which are classified as BDI.

This subset corresponded to 66% (M = 3834), 84% (M = 2995) and 75% (M = 5107) of the case-population for arms 2, 3 and 4, respectively. The corresponding AUC-values are denoted by and in the main text. For reference the training-arm had M = 1645 BDI case-subjects, corresponding to 65% of the case-population in arm-1.

More »

Expand

Fig 10.

This figure is similar to Fig 8, except that we limit ourselves only to those case-subjects in the replication-arms which are classified as BDII.

This subset corresponded to 19% (M = 1082), 12% (M = 435) and 16% (M = 1060) of the case-population for arms 2, 3 and 4, respectively. The corresponding AUC-values are denoted by and in the main text. For reference the training-arm had M = 788 BDII case-subjects, corresponding to 31% of the case-population in arm-1.

More »

Expand

Fig 11.

This figure is similar to Fig 8, and uses the data from Figs 8, 9 and 10.

This time we combine the information across all three replication-arms, and calculate replication AUC-values for this combined data-set. We then convert these AUC-values into liability-scores (see [71]). The results for all the cases ( and ) are shown with an asterisk ‘*’, whereas the results for only the BD1-cases ( and ) are shown with an ‘×’, and the results for only the BD2-cases ( and ) are shown with a diamond. In each case the yellow curves correspond to the liability-scores derived from the population-wide PRS, whereas the cyan-magenta curves correspond to the liability-scores derived from the bicluster-informed PRS. Note that our overall results are closely matched by the BD1-cases, but not by the BD2-cases.

More »

Expand

Table 1.

Here we list some of the pathways from the go_bp ontology.

Shown here are only the 32 most significant pathways as determined by κ(175, l). Each pathway is listed alongside approximations to its individual over-representation p-value (estimated using the hypergeometric-distribution). The −log₁₀(p)-values are listed for iterations 175–350 (see top row). Those annotations with an individual over-representation p-value smaller than 0.05 are in bold.

More »

Expand