One for all and all for One: Improving replication of genetic studies through network diffusion

doi:10.1371/journal.pgen.1007306

Fig 1.

Summary of analysis steps.

A set of AD genes that are reproducible (RAD genes) across different genetic studies was assembled through literature curation. The RAD genes were assigned a high initial risk score, and graph theoretical diffusion was employed to derive network diffusion scores for the rest of the genes in the network. Scores obtained from genetic screens and network diffusion were integrated to derive a new prioritization.

More »

Expand

Table 1.

RAD genes and the type of study that identified them.

More »

Expand

Table 2.

Proximity between RAD genes in PPI network.

Each RAD gene was ranked (in comparison to the other 19,972 genes in the network) based upon its degree (number of interactions in network), its ASP distance to the RAD genes, and total diffusion distance from the RAD genes. The average ranking of the RAD genes was 7,949 using ASP (60th percentile, t-test p = 0.015) and 6,959 for diffusion (65th percentile, t-test p = 0.00054).

More »

Expand

Table 3.

Proximity of non-RAD hub genes to RAD genes.

More »

Expand

Fig 2.

Filtering on network score improves replication rate.

The replication rate was computed for all genes surpassing the significance threshold for each GWAS. This procedure was repeated in each bootstrapped dataset and the average replication rate was determined (purple). This process was repeated using increasingly strict filters on the network diffusion scores. The baseline replication rate without utilizing network scores (naïve method) is represented by the purple points. The strictest network filter (red) has a consistently higher replication rate than the naïve method.

More »

Expand

Fig 3.

Comparison of GWAS and network Z-scores.

A. Transformed Z-scores are uncorrelated. B. Genes with high network scores had higher replication rates compared to those with low network scores, as further visualized and confirmed statistically as shown in Fig 4. Reprate = replication rate.

More »

Expand

Fig 4.

Support vector machine training to predict GWAS and network Z-score weights.

Selection of genes with a high replication rate (> 0.7, blue points) and low replication rate (<0.1, red points) yielded a balanced number of genes in each replication class (high/low). A linear SVM model was trained to predict replication class using the GWAS and network Z-scores of each gene. Genes represented as X's were used as support vectors for the training of the SVM, whereas genes represented as O's were not. Both network and GWAS Z-scores contributed to the decision boundary, as demonstrated by the significance of their predicted coefficients using logistic regression (GWAS: p <2.0×10⁻¹⁶, Network: p = 0.0016).

More »

Expand

Table 4.

Top predicted AD genes using combination approach.

More »

Expand

Table 5.

GSEA results after ranking genes by combined Z-scores.

More »

Expand

Table 6.

GSEA results after ranking genes by GWAS only Z-scores.

More »

Expand