Host factor prioritization for pan-viral genetic perturbation screens using random intercept models and network propagation

doi:10.1371/journal.pcbi.1007587

Fig 1.

Integrated host factor prioritization from viral infection RNAi screening data using a two-stage procedure.

(A) We normalized and integrated data from RNAi perturbation screens of four different positive-sense RNA viruses. (B) Stage 1: We estimate pan-viral effects γ = {γ₁, …, γ_G} from the integrated data sets for each of G genes using a random effects model and rank the genes by their absolute effect size. The gene effects represent the impact of a genetic knockdown of the life cycle on the entire group of viruses. (C) Stage 2: To account for genes that have not been knocked down in the RNAi screens, and to possibly account for false negatives in our rankings using biological prior knowledge, we map the gene effects γ_g onto a protein-protein interaction network. We then propagate the inferred estimates over the graph using network diffusion resulting in a final ranking of genes that are predicted to have a significant impact on the pan-viral replication cycle.

More »

Expand

Table 1.

Meta data of positive-sense ssRNA viral RNAi screens.

The data sets are derived from separate screens using different cell lines, readout types or infection stages. We use six RNAi screens for Chikungunya virus, Dengue virus, Hepatitis C virus, and SARS coronavirus.

More »

Expand

Fig 2.

Comparison of readouts for unnormalized vs. normalized data.

Every box-plot shows the distribution of readouts of a single plate on the x-axis. (a) Before normalization between plate readouts are hardly comparable due to batch and spatial effects. (b) After normalization the data are eventually centered and scaled to unit variance yielding comparable phenotypes.

More »

Expand

Fig 3.

Stability analysis on simulated and biological data.

We assessed the stability of our random effects model using the Jaccard index and Spearman’s correlation coefficient (y-axis) given the first i ∈ {10, 25, 50, 75, 100} highest ranked genes from 100 bootstrap samples (x-axis). (a) For low error variance σ² = 1, gene rankings are highly stable. While increasing the error variance keeps correlations stable, Jaccard indexes reduce. The network diffusion is stable against increasing error variances having similar Jaccard indexes and correlation for medium and high error variance. (b) On the biological data set increasing the number of viruses does not significantly reduce Jaccard indexes or correlations for the random effects, with the exception for the correlations for 10 genes. The network diffusion has stable Jaccard indexes for increasing virus numbers at around 60%. The correlations between bootstrap samples, however, decrease with a higher number of viruses.

More »

Expand

Table 2.

First 20 host dependency and restriction factors selected by the ranking of the network diffusion using a restart probability of r = 0.35.

‘Ranking’ shows the rank after network diffusion, ‘Gene effect’ shows the effect sizes γ_g inferred by the hierarchical model, the other columns show virus specific effects ρ_vg.

More »

Expand

Fig 4.

Effect matrix of pathogen-specific gene effect strengths ρ_vg.

The 25 strongest hits when sorting by absolute effect sizes γ_g are shown. Every column shows one virus and every row represents the effect size of a gene knockdown on the specific virus ρ_vg. For some of the genes, such as DYRK1B, PKN3, CDK6 or CSNK2B, the knockdown has an either all-positive, or all-negative effect on the viral replication cycle.

More »

Expand

Fig 5.

Validation of UBC, PLCG and EP300 against a negative control and a positive control, PI4K, for HCV.

UBC and PLCG1 show significant p-values at the 5%-level for all validated siRNAs. The positive control PI4K also was highly significant, while the two siRNAs used for EP300 did not show a significant trend.

More »

Expand