Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response

Robin van der Lee; Qian Feng; Martijn A. Langereis; Rob ter Horst; Radek Szklarczyk; Mihai G. Netea; Arno C. Andeweg; Frank J. M. van Kuppeveld; Martijn A. Huynen

doi:10.1371/journal.pcbi.1004553

Abstract

The RIG-I-like receptor (RLR) pathway is essential for detecting cytosolic viral RNA to trigger the production of type I interferons (IFNα/β) that initiate an innate antiviral response. Through systematic assessment of a wide variety of genomics data, we discovered 10 molecular signatures of known RLR pathway components that collectively predict novel members. We demonstrate that RLR pathway genes, among others, tend to evolve rapidly, interact with viral proteins, contain a limited set of protein domains, are regulated by specific transcription factors, and form a tightly connected interaction network. Using a Bayesian approach to integrate these signatures, we propose likely novel RLR regulators. RNAi knockdown experiments revealed a high prediction accuracy, identifying 94 genes among 187 candidates tested (~50%) that affected viral RNA-induced production of IFNβ. The discovered antiviral regulators may participate in a wide range of processes that highlight the complexity of antiviral defense (e.g. MAP3K11, CDK11B, PSMA3, TRIM14, HSPA9B, CDC37, NUP98, G3BP1), and include uncharacterized factors (DDX17, C6orf58, C16orf57, PKN2, SNW1). Our validated RLR pathway list (http://rlr.cmbi.umcn.nl/), obtained using a combination of integrative genomics and experiments, is a new resource for innate antiviral immunity research.

Author Summary

Viruses pose a continuous threat to human health, even though our immune systems have evolved to neutralize invading viruses. As part of the innate immune system, the RIG-I-like receptors (RLRs) are essential for detecting viruses during infection. Recognition of viral RNA by the RLRs triggers an antiviral response that inhibits viral replication, protects uninfected cells, and attracts specialized immune cells. Better understanding of the innate antiviral response may reveal novel targets for antiviral therapeutics and vaccine development. However, that requires knowledge about which genes and proteins are involved. In the present study, we systematically investigated the wealth of available genomics data (including gene expression, protein interactions, transcription regulation and genome sequences) and discovered no less than 10 distinctive properties of genes known to be part of the antiviral RLR pathway. By combining these properties in a statistical framework, we predicted 187 novel RLR pathway components. Our validation experiments showed that ~50% of the predicted candidate genes have a significant effect on antiviral signaling. These results, together with independent computational and literature-based confirmation, demonstrated the validity of our combined bioinformatics and experimental approach. Our study expands the collection of known antiviral genes, opening up new avenues for research into innate antiviral immunity.

Citation: van der Lee R, Feng Q, Langereis MA, ter Horst R, Szklarczyk R, Netea MG, et al. (2015) Integrative Genomics-Based Discovery of Novel Regulators of the Innate Antiviral Response. PLoS Comput Biol 11(10): e1004553. https://doi.org/10.1371/journal.pcbi.1004553

Editor: Bjoern Peters, La Jolla Institute for Allergy and Immunology, UNITED STATES

Received: September 3, 2015; Accepted: September 12, 2015; Published: October 20, 2015

Copyright: © 2015 van der Lee et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Data Availability: All relevant data supporting the results of this paper are available in the Supplemental Tables and at http://rlr.cmbi.umcn.nl/.

Funding: RvdL, QF, ACA and MAH were supported by the Virgo consortium, funded by the Dutch government (FES0908), and by the Netherlands Genomics Initiative (050-060-452). QF and MAL were funded by personal grants from the Netherlands Organization for Scientific Research (NWO-017.006.043 and NWO-863.13.008, respectively). RS is supported by the Metakids Foundation. MGN was supported by an ERC Consolidator Grant (#310372). FJMvK was supported by a ECHO grant from the Netherlands Organization for Scientific Research (NWO-CW-700.59.007). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Viruses are a major cause of human disease, as highlighted by the pandemics of influenza viruses, HIV–1, and the current outbreak of the Ebola virus. Pattern recognition receptors (PRR) are among the first molecules that detect viruses during infection. The RIG-I-like receptors (RLRs, one class of PRRs) are part of the RLR pathway, which forms a crucial innate antiviral defense system [1,2]. Two RLRs, RIG-I and MDA5, reside in the cytosol where they recognize non-self 5’-triphosphate RNA molecules with short double-stranded regions and long double-stranded RNAs (dsRNA), respectively [3]. Activation of the receptors triggers a complex signaling network, key steps of which are the activation of the mitochondrial adapter MAVS, subsequent recruitment of the TBK1 and IKK complexes, phosphorylation/activation of IRF3 and NFκB, and translocation of these transcription factors to the nucleus. These steps ultimately lead to the production of type I interferons (IFNα/β) and proinflammatory cytokines, which are crucial for establishing an antiviral state in infected as well as neighboring cells, and also modulate the adaptive immune response [4]

The importance of the RLR system is further demonstrated by the observation that viruses of all types employ strategies to interfere with its activation, often at multiple steps [5,6]. Better understanding of viral interaction with the pathway has resulted in novel targets for the development of antiviral therapeutics and attenuated live vaccines, for example viruses lacking functional RLR antagonists [7]. Furthermore, mutations in RIG-I, MDA5, MAVS and other RLR pathway components are associated not only with strong susceptibility to infections, but also IFN-associated autoimmune disorders [8–10].

Previous studies into virus-host interactions and the innate antiviral pathways have used genomics approaches, often generating large data sets describing physical or genetic interactions [11–14]. Other publications have taken a comparative approach based on model organisms [15] or used over-expression screening systems [16,17]. Together, these studies have identified numerous genes with antiviral activity, including members of the RLR pathway. However, it remains important to systematically assess the quality of individual data sets as such screens report distinct sets of genes, often with limited overlap between them. Combining the many available genomics data sets in a statistical framework potentially allows for a more systematic discovery and categorization of genes involved in the RLR pathway. Indeed, Bayesian integration of large-scale data that includes weighing individual datasets for their predictive potential has been successful in other cellular systems, for example identifying novel protein interactions [18], mitochondrial disease genes [19], and small RNA pathway genes [20].

In this work we systematically exploit the wealth of available (gen)omics data, including transcriptomics and proteomics data, genome sequences, protein domain information, and functional genomics, to discover descriptive molecular signatures of the RLR pathway system. Bayesian integration of these data, together with comprehensive computational and experimental validation, confidently identifies novel genes involved in antiviral RIG-I signaling.

Results

The RIG-I-like receptor (RLR) pathway is a highly interconnected and diverse molecular system. We investigated whether available genomics data contain sufficient signal to accurately describe RLR pathway components, and whether such data could be used to prioritize novel genes for a possible role in RLR signaling.

Ten molecular signatures of RLR pathway components in genomics data

To discover molecular signatures that distinguish RLR pathway components from other genes, we explored a wide variety of genome-scale data describing different aspects of the virology and biology of the pathway. Some of these data we used directly, while other data were used as the basis for further calculations (Table 1). We quantitatively assessed the predictive power of each data set using a literature-curated standard of 49 known RLR pathway components from InnateDB [21] (‘RLR genes’, S1 Fig) and a set of 5,818 ‘non-RLR genes’ that are unlikely to be part of the pathway (i.e. genes with known functions not directly related to the innate antiviral response, such as development, housekeeping and neurological processes, see Methods). Below we describe 10 signatures for predicting novel RLR pathway components. The first five signatures are based on the relationship of RLR genes with viruses, whereas the second set of five signatures are based on properties of the RLR pathway itself.

Download:

Table 1. Ten molecular signatures from genomics data used for predicting novel RLR pathway components.

https://doi.org/10.1371/journal.pcbi.1004553.t001

Virus-based signatures

Positive selection in primates.

Viruses evade recognition or interfere with the immune systems of their hosts to achieve successful infection [7]. This involves interactions between viral and host proteins, the interfaces of which are under constant pressure to change [22]. We collected data on recurrent positive selection in the primate lineage, based on maximum likelihood analysis of sequence alignments [23]. RLR pathway components, e.g. the mitochondrial signaling adapter MAVS [24] and transcription factor IRF7 [25], are enriched for rapidly evolving genes (9% of 49 RLR genes) compared to genes that are unlikely to be part of the pathway (5% of 5,818 non-RLR genes, 1.7-fold enrichment, non-significant P = 0.38, one-tailed Fisher’s exact test, Fig 1A, Tables 1 and S1).

Download:

Fig 1. Bayesian integration of ten molecular signatures of RLR pathway components from genomics data.

(A) Distributions of the 49 known RLR pathway components (RLR genes, green) and 5,818 genes unlikely to be part of the pathway (non-RLR genes, red) across the 10 molecular signature data sets we identified as predictive of the RLR system (see also Table 1). Data sets were binned into discrete intervals and fractions of (non-)RLR genes add up to one. Arrows indicate the behavior of RIG-I across the data. The top five signatures describe the relationship of RLR signaling with viruses; the bottom five describe properties of the pathway itself. (B) Boxplots of the genome-wide integrated RLR score (Bayesian posterior probability score). Genes were grouped into one of five classes: known RLR genes (green, see [A]), components of other PRR signaling pathways (‘TLR, CLR, NLR, cytDNA’; purple), genes functioning in other aspects of the innate immune response (‘innate immunity’; blue), and non-RLR genes (red, see [A]). The remaining genes are classified as ‘other’ (gray). (C) The 50 genes with the highest RLR scores. Representative RLR and other innate antiviral response genes are indicated. The pie chart shows the occurrences of the different gene classes in the top 354 RLR ranks. (D) Receiver operating characteristic (ROC) curve illustrating the performance of the integrated RLR score (solid black line) and the individual molecular signatures (black dots) for predicting known RLR versus non-RLR genes. Sensitivity and specificity were calculated at various score thresholds (for the RLR score), or at specific thresholds that include all bins with positive likelihood ratio scores (for the individual data sets; see (A)). The asterisk denotes the sensitivity and specificity corresponding to a false discovery rate (FDR) of 57% (top 354 genes). Note that, to avoid circularity, the predictive ability of the co-expression, protein domain and RLR pathway PPI data sets in (A) and (D) was assessed using the set of TLR, CLR, NLR, cytDNA genes instead of the RLR genes (see Methods).

https://doi.org/10.1371/journal.pcbi.1004553.g001

Protein-protein interactions (PPI) with viruses.

The next signature is based on the physical interactions between host RLR pathway components and viruses. Viral proteins often interact with many host proteins during their infection cycle, including those involved in antiviral defense [26,27]. Extraction of virus-human PPIs from specialized databases [28] revealed ~2,600 human proteins that are reported to interact with at least one viral protein (virus-interacting human proteins, S2 Fig). These virus-interacting human proteins include the majority of RLR pathway components (35/49 = 71%), while they include a significantly smaller fraction of non-RLR genes (1,000/5,818 = 17%, 4.2-fold enrichment, P = 1.4 × 10⁻¹⁶, one-tailed Fisher’s exact test, Fig 1A and Table 1). Among the RLR genes, TRAF2 (4 PPIs), DDX3, MAPK9, and the NFκB subunit RELA (3 PPIs each) have reported interactions with the largest number of distinct virus species.

Viral miRNA target.

Another mechanism that viruses use to interfere with the antiviral activity of human cells is down-regulation of gene expression by miRNAs [29]. We collected 128 miRNAs encoded by nine, mainly herpes DNA viruses (S2 Table) [30], most of which have confirmed physiological relevance. Predicted target sites of these miRNAs to the 3’UTR of human transcripts were then used to calculate for each gene a score representing the likelihood that viruses affect its expression (viral miRNA targeting score). For example, our method assigned IKBKE (IKKε) a relatively strong viral miRNA targeting score of 2.7. Indeed, Kaposi’s sarcoma-associated herpesvirus miR-K12-11 has been shown to inhibit translation of IKKε transcripts, leading to suppression of interferon signaling [31]. Analysis of the viral miRNA targeting scores revealed that RLR genes tend to have stronger scores than non-RLR genes (P = 0.03, one-tailed Mann-Whitney U test). Although the statistical significance of this trend is only marginal, we included it as a molecular signature of RLR genes as it still provides a moderate enrichment over non-RLR genes (1.3-fold enrichment among genes with the strongest viral miRNA targeting scores, Table 1), and even weak features can substantially improve the predictions for novel RLR genes.

Differential expression upon infection.

Next, we asked whether RLR pathway genes are differentially expressed upon virus infection. To answer this, we used in-house gene expression data of human lung epithelial cells (A549) exposed to four respiratory viruses (respiratory syncytial virus, human metapneumovirus, parainfluenza virus, or measles virus), for which gene expression was measured at 6, 12 and 24 hours after infection (see Methods). Analysis of the transcriptomes revealed that many RLR genes (31%) underwent substantial expression changes (log₂ fold change >0.5) in cells infected with the respiratory viruses, compared to the uninfected cells. This compares to a much smaller fraction of non-RLR genes (9%, 3.5-fold enrichment, P = 1.3 × 10⁻⁵, one-tailed Fisher’s exact test, Fig 1A and Table 1). The differentially expressed RLR genes include well-known interferon-stimulated genes (ISGs) like ISG15, DDX58 (RIG-I), IRF7, IFIH1 (MDA5), and TRIM25, which are the top five RLR genes most induced by the respiratory viruses studied (log₂ fold change >1.5 compared to uninfected cells, S3 Fig). Thus, even though we expect many RLR genes to already be expressed before viral infection, their expression levels are reinforced in infected cells.

Antiviral host factor.

RNAi screening potentially allows the identification of host factors that limit virus replication, such as genes involved in the innate antiviral response, although most studies focus on hits with the opposite effect (i.e. factors required by viruses for replication) [11]. We performed a meta analysis of hits from seven large-scale RNAi studies in human cells, identifying 173 genes with antiviral activity against HIV–1, influenza, hepatitis C (HCV), West Nile, or enterovirus infection (S3 Table). In contrast to our expectation that RLR genes would be common among these antiviral host factors, this data set is one of the weaker predictors of RLR genes: the 173 antiviral host factors contain only a single gene (IRF3) that belongs to the set of 49 known RLR genes (~2%) compared to 56 of 5,818 non-RLR genes (~1%, 2.1-fold enrichment, non-significant P = 0.38, one-tailed Fisher’s exact test, Fig 1A and Table 1).

Pathway-based signatures

Co-expression with RLR pathway.

To aid in finding novel RLR genes, we screened >450 human expression studies for genes that co-express with known RLR pathway components using a two-step approach [32]. First, we weighed individual expression data sets for their propensity to predict new RLR genes: experiments in which the whole group of known RLR genes show high co-expression with each other received a higher weight and contributed most to the calculations. In the second step, we calculated the co-expression of all genes with the RLR genes. As expected, RLR genes display significantly higher co-expression scores with each other than with the rest of the genome or with non-RLR genes (P ≈ 10⁻²⁷ for both comparisons, one-tailed Mann-Whitney U test, S4 Fig). However, RLR genes also score higher than components of other PRR signaling pathways (Toll-like receptor [TLR], C-type lectin receptor [CLR], NOD-like receptor [NLR], and cytosolic dsDNA sensing [cytDNA] pathways; P = 4.5 × 10⁻¹³, one-tailed Mann-Whitney U test). Cross-validation by leave-one-out analysis confirms that the weighted co-expression approach retrieves RLR genes more readily than other PRR pathway genes, or genes involved in other aspects of innate immunity (S4D Fig), demonstrating specificity for identifying RLR genes in the co-expression data.

RLR pathway protein domain.

Analysis of RLR pathway protein sequences revealed the presence of 40 unique domains, 25 of which were significantly over-represented compared to the full human proteome (Benjamini-Hochberg-corrected Fisher’s exact P < 0.01, S4 Table). These include protein kinase domains (12-fold enrichment, P = 1.1 × 10⁻⁸; present in IKKα/β/ε, MAP kinases, TBK1), the TBK1/IKKi binding domain (TANK, TBK1BP1, AZI2), caspase and death domains (CASP8/10, FADD), IRF domains (IRF3/7), and the DExD/H box RNA helicase domain (15-fold enrichment, P = 2.7 × 10⁻³; RIG-I, MDA5, LGP2). We then assessed the domain organizations of all human proteins and determined a set of 711 proteins containing one or more of the domains enriched in RLR components. These proteins are predictive for RLR components with an enrichment score of 8.9 (Table 1).

Innate antiviral transcription factor (TF) binding motifs.

Signaling through the RLR pathway triggers the activation of key transcription factors (TFs) like IRF3, IRF7, AP–1 and NFκB. Activation of these TFs leads to the production of type I interferons and proinflammatory cytokines that eventually activate STAT1 and STAT2 [2]. STAT1 and STAT2 in turn stimulate transcription of interferon-stimulated genes (ISGs), which include many RLR pathway components. To further explore the transcription regulation of RLR pathway components, we analyzed their gene promoters for the presence of TF binding motifs that are highly conserved across the genomes of 29 placental mammals, such as primates, rodents and many farm animals [33]. Conserved IRF and NFκB motifs are highly abundant in the promoters of RLR genes (Fisher’s exact P = 3.3 × 10⁻³ and P = 2.0 × 10⁻⁴, respectively; S5 Table), suggesting the pathway is partly self-regulating as has been observed for individual components. Interestingly, a conserved IRF motif was detected not only in the promoters of IRF7 itself and in all three RIG-I-like receptor family members (DDX58 [RIG-I], IFIH1 [MDA5], DHX58 [LGP2]), but also in TRIM25, ISG15, and CYLD; three factors controlling RIG-I signaling activation by regulating the level of K63 polyubiquitination. In order to predict novel RLR components, we searched for genes containing conserved IRF binding motifs (several motif variants, collectively recognized by IRF1-9), STAT binding motifs (several motif variants, collectively recognized by STAT1-6), AP–1 binding motifs, or NFκB binding motifs (Fig 1A). We found 3,558 genes across the human genome containing one of these motifs in their promoters. This large number partially stems from similarities in the DNA binding preferences of TFs that belong to the same family, but does not mean that all identified genes are regulated by the RLR pathway. For example, STAT motifs not only occur in the promoters of ISGs, but also in genes involved in cellular proliferation, differentiation and apoptosis. Nevertheless, genes containing one of the four conserved TF motifs already show a good predictive value for RLR pathway components (enrichment score of 2.2, S1 Table). Genes with more than one motif are even more likely to be RLR genes: 789 genes contain two motifs (2.6-fold enrichment) and 161 genes contain three or all four motifs (2.4-fold enrichment).

NFκB activation mediator.

Host factors that regulate NFκB activation often also affect the RLR pathway. Indeed, the 154 hits that were picked up in a genome-wide siRNA screen of Epstein-Barr virus-induced NFκB activation [34] include a much larger fraction of known RLR genes (6/49 = 12%) than non-RLR genes (36/5,818 < 1%, 20-fold enrichment, P = 1.0 × 10⁻⁶, one-tailed Fisher’s exact test, Fig 1A and Table 1). Thus, these 154 NFκB activation mediators are likely to contain novel RLR pathway components as well.

RLR pathway PPI.

Finally, to find novel RLR genes we assessed the human protein interaction network connecting the RLR pathway. PPI databases [35] report 3,504 interactions between 1,750 unique proteins and 47 of the 49 RLR components, the only exceptions being DAK and NLRX1. Of the 47 RLR proteins with reported PPIs, 41 are involved in a total of 147 interactions within the pathway (i.e. between two pathway members). This network of RLR components has significantly more connections with each other than do random networks of the same size and interaction degree distribution (physical interaction enrichment score = 3.6, P < 1.0 × 10⁻⁶ [36]). Using the PPI data, we obtained for each human protein the number of interacting RLR pathway components (Fig 1A). We found a total of 1,397 proteins reported to interact with one or two RLR proteins. These proteins are predictive for RLR components with an enrichment score of 1.8 (S1 Table). A further 221 proteins interact with three of four RLR proteins (6.3-fold enrichment) and 132 proteins interact with five or more RLR proteins (15.8-fold enrichment). TBK1 (18 interactions), TRAF2 and MAVS (both 16) top the list, supporting their roles as central players in the RLR system [12]. Thus, an increasing number of interactions with RLR proteins indicates a higher likelihood that a protein is part of the RLR pathway.

Bayesian integration of molecular signatures provides genome-wide probabilities for RLR pathway components

The RLR pathway components published thus far probably constitute only part of the total proteins with a function in this pathway. To prioritize novel high-confidence genes for a role in the RLR pathway, we integrated the 10 identified molecular signatures of RLR genes in a naive Bayesian classifier [18,19] (see Methods). This approach weighs data sets based on their predictive value (i.e. their ability to separate known positives and negatives; Fig 1A, Tables 1 and S1) so that ‘better’ data contribute more to the predictions. Each human gene received a posterior probability score (‘RLR score’) reflecting the likelihood that the gene is part of the RLR pathway based on its behavior in the collected genomics data. A score of zero indicates equal probabilities of a gene being an RLR versus a non-RLR gene. S6 Table presents the genome-wide ranking of RLR scores (also available at http://rlr.cmbi.umcn.nl/).

As expected, known RLR pathway components have the highest RLR scores (Figs 1B and S5). Two-thirds (32/49) of these rank within the first 150 genes. The top ranking genes are IRF7, RIG-I, IKKε, subunits of NFκB, TRADD, TRAF2, MDA5, and IKKγ (NEMO) (Fig 1C). Other examples of well-described RLR pathway components include IRF3 (rank 51), ISG15 (52), MAVS (102), and LGP2 (114). Genes that are unlikely to play a role in the pathway (the set of non-RLR genes) generally have very low RLR scores, although some of these received high scores as well (Fig 1B). This is not unexpected, as even though this large set of genes was selected from function annotations generally unrelated to the innate antiviral response, this does not preclude that individual genes (also) function in the RLR pathway.

To gain insight into what kind of genes are present among the RLR predictions, we examined their functions by pathway and gene ontology enrichment analyses. The top 354 genes with the highest RLR scores (corresponding to high-confidence predictions, see below) have strong links with other pathways of the innate immune response, such as TLR, NLR, interferon, and cytokine signaling (S6 and S7 Figs). Antiviral defense functions are also among the most frequent and significant terms associated with the high-scoring genes (S8 Fig and S7 Table). Other important biological processes include various apoptosis-related functions, cancer and cell cycle pathways, and regulation of metabolic processes and protein localization. Furthermore, the top predictions include a wide range of protein families, notably proteasome subunits, ubiquitin(-like) conjugating enzymes, and genes involved in phosphatidylinositol signaling (which was recently shown to affect the type I IFN response [14]). Finally, 22% of the top predictions are induced in cells treated with interferons (i.e. they are interferon-stimulated genes, ISGs) and ~18% are part of the common host transcription response to pathogens (Table 2). Together, these observations indicate that our framework successfully predicts genes with a likely role in the innate antiviral response and suggests other cellular systems and functions required for this response.

Download:

Table 2. Overlap between innate (antiviral) response data sets and the top 354 RLR predictions excluding known RLR genes.

https://doi.org/10.1371/journal.pcbi.1004553.t002

Performance estimates and independent data establish the reliability of the RLR score

We further computationally assessed the reliability of the integrated RLR score by estimating the sensitivity, specificity and false discovery rate (FDR) of the predictions using the positive (RLR genes) and negative (non-RLR genes) standards. Integration of the data sets achieved better sensitivity and specificity than any of the individual data sets (Fig 1D), thereby enriching for RLR genes and depleting false positives (S5, S9 and S10 Figs). At an RLR rank threshold of 354 (RLR score -1.10), the framework correctly predicts 78% of the known RLR genes with a specificity of 98.4% (Fig 1D). At this threshold, only ~57% of the novel predictions are estimated to be false (S11 Fig, adjusted FDR to match the expected total number of genes involved in the RLR pathway, see Methods). This compares to a genome-wide false discovery rate (i.e. when predicting genes randomly) of ~99%. Thus, the integrated RLR score increases the probability of correctly identifying novel RLR genes by a factor of 43 compared to random classification.

Because we used the same gene sets for calculating the RLR scores and estimating the performance of the resulting predictions (i.e. without systematic cross-validation), there exists a danger of circular reasoning. Therefore, we also carefully validated the quality of the results using various independent and external data sets. First, we examined the high RLR scores for genes that have a known function in innate immunity, but not in the RLR pathway, and therefore were not part of our training set. Components of other PRR signaling pathways (TLR, CLR, NLR, cytDNA) have lower scores than RLR genes, but much higher scores than the rest of the genome (Fig 1B). The same is true for genes functioning in other aspects of the innate immune response (Fig 1B). Of the 225 novel predictions (i.e. those genes that are not part of the training sets) in the top 354 (FDR of 57%, see above), 142 (~63%) are part of these innate immunity gene lists (Fig 1C). Thus, the majority of high-scoring genes with no known link to the RLR pathway in fact have a function in other PRR pathways or other parts of innate immunity, supporting the relevance of our predictions.

Second, we compared our predictions to six recent data sets that are relevant to the innate (antiviral) response but that were in no way part of the RLR score calculations. The overlap with the 354 top genes, excluding known RLR genes, is significantly larger than expected by chance for all these data sets (Table 2). For example, the top predictions include: (i) 19 of 45 (42%) interferon-stimulated genes with validated antiviral activity against e.g. HIV–1, HCV, yellow fever, West Nile or chikungunya virus [16], (ii) 27 proteins from a set of 241 (11%) that interact with the type I IFN protein network during pattern recognition, among which are five confirmed modulators of IFNβ expression and antiviral activity [12], (iii) nine tripartite motif (TRIM) family genes, five of which enhance RIG-I-induced activation of IFNβ, NFκB and ISRE (IFN-stimulated response element) promoters [17], and (iv) 38 human proteins interacting with innate immune-modulating viral open reading frames (viORFs) from 30 viruses [37]. (v) Furthermore, the type I IFN response has recently been proposed to play a role in antifungal immunity [38,39] and the top RLR predictions are strongly enriched for genes expressed in PBMCs stimulated with the fungal pathogen Candida albicans: almost half (39/89 = 44%) of these occur in our top predictions (P = 4.7 × 10⁻⁴⁷, one-tailed Fisher’s exact test, Table 2). (vi) Finally, the overlap between our predictions and a genome-wide screen for regulators of RIG-I-mediated IFNβ production is, at only nine, marginal but significant (9/226 genes = 4%, P = 9.5 × 10⁻³, one-tailed Fisher’s exact test, Table 2) [14]. In summary, these diverse and independent experimental data support the validity of our integrated RLR score for predicting genes with a role in the innate antiviral response.

RNAi validation screens confirm the high predictive value of the integrated RLR score

To further determine the predictive power of our in silico predictions, we selected 187 candidate RLR genes for experimental validation (S6 and S8 Tables). These include 127 high-confidence candidates from the top 354, which have not been previously linked to the RLR pathway, supplemented with 60 candidates we selected from the top 1000 predictions, mainly on the basis of limited functional characterization in general (Fig 2A). Importantly, candidates with a known role in RLR signaling, other branches of PRR pathways, or apoptosis were excluded as we were most interested in finding novel components of the RLR pathway.

Download:

Fig 2. RNAi screens validate a role for the novel RLR candidates in RIG-I-mediated IFNβ induction.

(A) Flow chart of the RNAi validation screens. 187 candidate RLR genes were screened for RIG-I pathway activity in three different RNAi screens. In screens 1 and 2, HeLa cells stably expressing an IFNβ promoter-controlled firefly luciferase (Fluc) reporter were stimulated with a 5’-ppp-containing RIG-I RNA ligand. The 57 hits (15 up, 42 down) with the largest effect on IFNβ induction upon siRNA knockdown in screen 1 (stringent Z-score <-2 or >2) were tested again in screen 2 with a different set of siRNAs. The 19 top hits from screen 2 were then picked for screen 3, which is similar to the first two screens except that it measures IFNβ mRNA levels using quantitative real-time qRT-PCR. (B) Correlation between the negative control-based robust Z-scores of RNAi screens 1 and 2. The 57 top hits with Z-scores <-2 or >2 in screen 1 were tested again in screen 2 (purple data points). N.T., non-transfected; SCR, scrambled. (C) Overview of the 19 novel RIG-I pathway genes with the largest effects on IFNβ induction in screens 1 and 2 (Z-score <-2 in both screens). Black data points correspond to genes whose knockdown also causes a reduction in IFNβ mRNA levels in screen 3. (D) RNAi screen 3. 13 of the 19 top hits from screens 1 and 2 also reduce RIG-I-mediated IFNβ mRNA production (black bars). Experiments were performed in triplicate (n = 3). Bars (mean±SEM) display the fold induction of IFNβ mRNA (corrected for actin mRNA levels) compared to the mock-treated control. Statistical significance was assessed by one-way analysis of variance (ANOVA) followed by Dunnett’s multiple comparison test, comparing the values for each of the 19 test genes to the combined negative control conditions (scrambled and LGP2, red bars). ** P < 0.01; *** P < 0.001. (E) Correlation between the in silico integrated RLR score and the probability of experimental confirmation in RNAi screen 1. The dark purple line represents all 94 hits with Z-score <-1.25 or >1.25; the light purple line represents the top 57 hits with Z-score <-2 or >2. The 187 experimentally tested genes were rank-ordered based on the RLR score and precision was calculated sequentially as the fraction of validated hits among all tested genes having a certain RLR score or higher.

https://doi.org/10.1371/journal.pcbi.1004553.g002

For the selected candidates we performed a medium-throughput RNAi screen (RNAi screen 1) using HeLa cells stably expressing an IFNβ promoter-controlled firefly luciferase reporter (HeLa-IFNβ-Fluc). To activate the RLR pathway and induce Fluc reporter expression we used a known small 5’-ppp-containing RIG-I ligand [40]. This setup led to specific activation of RIG-I, as RIG-I or MAVS siRNA transfection, but not MDA5 or scrambled siRNAs, resulted in loss of reporter activity (Figs 2B, S12 and S13). All negative controls (non-transfected, scrambled and MDA5 siRNAs) scored within 1.25 median absolute deviations of the plate normalized IFNβ induction levels (Z-score cutoff <-1.25 or >1.25, Fig 2B). At this cutoff, siRNA knockdown of 94 candidates (50% of all candidates tested) affected RIG-I-mediated IFNβ induction (Figs 2A and S13A–S13D and S8 Table). Among these, knockdown of 59 genes decreased RIG-I-mediated IFNβ induction (down-hits) and 35 genes increased IFNβ induction (up-hits). It is important to note that the experimental approach only activates the RIG-I branch of the RLR pathway and will not confirm predicted RLR candidates that regulate MDA5 activation and downstream signaling to MAVS. Thus, among the 93 non-confirmed candidates, there might still be novel regulators of the MDA5-mediated IFNβ induction pathway, which should be further investigated. Altogether, the integrated RLR score is clearly a strong and reliable predictor for novel regulators of the RIG-I pathway.

From the 94 confirmed hits, we picked the 57 top hits with the largest effect (stringent Z-score <-2 or >2) for a second RNAi screen using a different set of siRNAs (RNAi screen 2, Fig 2A). In this second RNAi screen, only a single up-hit (7% of 15 up-hits tested) showed a Z-score >1.25. Besides this hit, two negative control wells also had a Z-score >1.25 (Figs 2B, S13E–S13H), which suggests that the single confirmed up-hit might be unreliable. The poor reproducibility of the up-hits might be attributed to the screening approach. For instance, we used a large amount of 5’-ppp-containing RIG-I ligand (see Methods), leaving limited room for increased pathway activation. In contrast, the second RNAi screen confirmed 26 down-hits at Z-score <-1.25 (62% of the 42 down-hits tested). Of these, 19 genes (45% of tested down-hits) could be confirmed at a conservative Z-score <-2 (Fig 2A, 2B and 2C, S8 Table). Taken together, the two RNAi screens, guided by the predicted RLR candidates, have substantiated the validity of our approach and have revealed potential novel regulators of the RIG-I receptor pathway.

To gain further understanding of how the 19 top hits affect RIG-I-mediated IFNβ promoter activation, another RNAi screen was performed (RNAi screen 3). In contrast to the first two screens, here we did not use the IFNβ promoter-controlled Fluc reporter translation as readout, but we measured IFNβ mRNA levels using quantitative real-time (qRT)-PCR. As expected, knockdown of RIG-I and MAVS abrogated 5’-pppRNA-induced IFNβ mRNA transcription, while MDA5 knockdown [40] and LGP2 knockdown, which regulates only the MDA5-mediated IFNβ mRNA transcription, had no effect (Fig 2D). Of the 19 top hits from the first two RNAi screens, 13 genes (68%) in this third screen again showed a reduction in RIG-I pathway activation. Nine of these showed a significant reduction (NUP98, TRIM14, C16orf57, PSMA3, G3BP1, DDX17, MAP3K11, SNW1, CDK11B; P < 0.01, one-way ANOVA with Dunnett’s post hoc test; Fig 2A and 2D), suggesting that these gene products play a so far uncharacterized role in the RIG-I signaling pathway upstream of IFNβ mRNA transcription.

In summary, using RNAi-based screening methods we validated more than 50% of the tested candidates. To further assess the predictive power of the in silico integrated RLR score, we ranked the experimentally tested genes based on their RLR score and sequentially calculated the fraction of hits (either considering all 94 hits from RNAi screen 1, or only the 57 top hits) among all tested genes having a certain RLR score or higher (Fig 2E). Higher RLR scores were experimentally confirmed more often, indicating that these indeed correspond to more confident predictions. Further analysis revealed that there is no molecular signature that solely explains the predictions of the validated hits; rather the integrated score of the 10 molecular signatures is important (S14 Fig).

Discussion

Knowledge about the constituents of biological systems and pathways is an essential step towards understanding their function in health and disease. In this study, we showed that existing biological data can be exploited to successfully identify novel components of a key intracellular defense pathway; the antiviral RIG-I-like receptor (RLR) pathway. The RLR pathway is important for detecting viral infections, and its dysfunction can increase susceptibility to infections with viruses [8] and fungi [39], but is also associated with autoimmunity [9,10]. We systematically investigated a large variety of genome-scale data for their ability to predict RLR pathway components, covering most of the important (gen)omics data types such as protein-protein interactions, gene (co-)expression, genetic interactions from RNAi screens, comparative genome analysis, and transcription regulation. In these data, we found five virus-based and five pathway-based molecular signatures of RLR pathway components, providing insight into the determinants of antiviral signaling and type I interferon production. Bayesian integration of the signatures led to the genome-wide prioritization of novel RLR pathway components. We subsequently validated the predictions by comparing them with various independent data sets and experimentally confirmed more than 50% of 187 selected novel RLR candidates for a role in RIG-I-stimulated IFNβ induction. These results reiterate the potential of computational assessment and combination of available biological data as a complementary approach to studies generating novel large-scale data sets.

Identification of predictive signatures using a knowledge-based approach

To identify defining signatures of RLR genes in genomics data, we largely depended on current knowledge of the biology of the RLR system and its relationship with viruses. For example, since previous studies had shown that viral antagonism of specific RLR pathway components is prevalent [5,6], one of the first features we investigated, and indeed established, was that human-virus PPIs are a general theme for the RLR pathway as a whole. Similarly, guided by previous observations, we demonstrated that RLR genes conform to the tendency of immunity genes to evolve rapidly and commonly contain innate antiviral TF binding motifs, such as IRF and NFκB, in their promoters.

We also included several criteria that are effective for many different biological systems, but were specifically aimed at predicting novel RLR pathway genes in our case, such as the RLR co-expression calculations and RLR protein domain occurrences. We decided not to include associations based on text mining of published literature (e.g. co-mentioning of gene names in abstracts), because such approaches in our hands only enriched for genes already known to be involved in the RLR pathway and therefore compromised our ability to identify novel candidates. Finally, we settled on using a total of 10 molecular signatures that are relevant and predictive for the RLR system. Inclusion of additional data sets, generated for example by future experimental techniques, and substitution of existing data with novel and improved versions, will likely refine this data-driven definition of RLR genes over time and lead to updated Bayesian RLR probabilities that could further improve prediction accuracy.

A major challenge in our study arises from the fact that the RLR pathway is highly interconnected with other intracellular pathways, such as other innate PRR pathways (e.g. TLR and cytosolic DNA sensing), the stress response pathway, mitogen-activated protein kinase (MAPK) signaling cascades (e.g. TRAF2 and 6 lead to the p38 MAP kinases), and apoptosis (e.g. via CASP8 and 10) (S1 Fig) [2,12,41,42]. Although our approach for predicting novel RLR components relied on a well-defined set of genes known to make up the core of RLR signaling, the overlap with other systems was a potential confounding factor. For example, most molecular signatures of RLR genes identified here, especially the virus-based properties such as PPIs with viruses, rapid evolution, and differential expression during infection, could also apply to genes involved in other aspects of antiviral immunity. Nevertheless, combination of the right signatures achieved reasonable specificity for RLR genes (Fig 1). Thus, we have extended an approach previously used for identifying components of membrane-enclosed organelles such as the mitochondrion [19] and showed that it is also possible to capture the complexity of a diverse and interconnected intracellular signaling pathway. The presented approach for identifying predictive signatures, followed by Bayesian integration, could potentially be applied to any cellular system.

Contributions of the individual signatures to the RLR predictions

Using the sets of known RLR and non-RLR genes, we could systematically assess the relative quality of the individual data sets for predicting novel RLR genes. Indeed the 10 molecular signatures have different predictive values as shown by the likelihood ratio scores (Fig 1A, Tables 1 and S1), and thus contribute with different weights to the integrated Bayesian RLR score. The data types with the strongest predictive value include NFκB activation mediators, RLR pathway protein domains, and both PPI signatures (PPIs within the RLR pathway and PPIs between human and viral proteins). In contrast to our expectations, antiviral host factors identified in high-throughput RNAi experiments had a relatively small contribution. Besides raw predictive ability, we also considered the coverage of the data sets. For example, there are only few (<200) NFκB activation mediators and antiviral host factors, while the data sets on RLR co-expression, viral miRNA targets, and innate antiviral TF binding motifs identified many more genes (>4,000). Integration of all data sets with their varying coverage and predictive value into a single RLR score resulted in a classifier that is superior to the individual data sets (Fig 1D). This is underscored by the observation that the individual signatures by themselves are unable to explain the predictions for the experimentally validated RLR candidates, and only the integrated RLR score explains all validated genes (Figs 2E and S14).

Independent studies validate additional RLR candidates

Aside from our own experimental validation strategies, recent independent studies have confirmed a role for 15 of our predicted RLR candidates in the RLR pathway during viral infection (Table 3). Most of these publications appeared during the course of our study, and thus were not part of the knowledge or data used for predicting novel RLR genes. For example, TRIM14 (RLR rank 491) has been demonstrated to interact with MAVS leading to activation of IRF3 and NFκB via IKKγ (NEMO) [43]. Indeed, our predictions marked TRIM14 as a strong candidate RLR gene and all our RNAi screens confirmed it as a component required for optimal RIG-I signaling (Fig 2). Two additional high-confidence RLR predictions for which we validated an effect in all three RNAi screens have recently been validated externally as well: G3BP1 [44] and CDC37 [13].

Download:

Table 3. Validations of our predicted RLR candidates by independent studies.

https://doi.org/10.1371/journal.pcbi.1004553.t003

Of the 15 genes recently described in the literature, 11 were part of the candidate RLR genes tested in our RNAi screens (Table 3). Of these, seven genes affected RIG-I-mediated IFNβ induction in RNAi screen 1 (Z-score <-1.25 or >1.25) and showed a consistent effect in RNAi screen 2. Therefore, our experimental screening condition appears to detect these described RIG-I pathway regulators with a sensitivity of ~64% (7/11). Furthermore, four out of four down-hits from our experiments (i.e. genes that decreased IFNβ induction when knocked down, hence positive regulators) that have been described in the literature were indeed described as positive regulators of RIG-I signaling (Table 3). Given that our experimental approach detected most, but not all, of the published RIG-I regulators, a substantial number of our predicted RLR candidates not validated by our RNAi screens might still play a role in for example a different cell type, downstream of type I IFN production, or regulate the pathway via MDA5/LGP2 activation. For example, RNF114 (RLR rank 181, Z-score RNAi screen 1 = 1.68, Table 3) is an ISG and therefore needs to be up-regulated via a positive feedback loop to fully contribute to RLR pathway stimulation [45]. This gene was not confirmed in all RNAi screens, perhaps because the time of RIG-I stimulation in our screens (6 hours) was simply too short. Similar biological reasons could limit the detection of an effect for other genes as well. Therefore we conclude that the hits identified in our RNAi validation experiments may be a conservative estimate of the number of correct RLR predictions.

Novel RIG-I pathway components DDX17 and SNW1 could regulate activation of transcription factors NFκB and IRF3

We identified 13 novel RIG-I pathway regulators that reduced IFNβ induction in all three RNAi screens (Fig 2). These include cell cycle gene CDK11B, heat shock protein HSPA9B, MAP kinase MAP3K11, proteasome subunit PSMA3, nucleoporin NUP98 [46], and the recently identified RLR regulators CDC37 [13], G3BP1 [44] and TRIM14 [43] (Table 3). The remaining five genes, DDX17 (DEAD box helicase 17), C6orf58, C16orf57 (USB1, U6 snRNA biogenesis 1), PKN2 (serine/threonine protein kinase N2), and SNW1 (SNW domain containing 1), are overall least characterized. To obtain a first suggestion about how these genes might regulate RLR signaling, we searched for connections with the known human and viral protein interaction networks. Next, we discuss the reported interactions of DDX17 and SNW1 with the RLR pathway.

DEAD box RNA helicase DDX17 was recently found to bind Rift Valley fever virus RNA and restrict viral replication in an interferon-independent manner [47]. Our data now suggest a role for DDX17 in RIG-I-mediated IFNβ production as well. DDX17 has reported protein interactions with two other RIG-I regulators identified in our study: CDC37 and CSNK2A1. Interestingly, DDX17 also interacts with the peptidylprolyl cis/trans isomerase PIN1 [48], which inhibits RIG-I-mediated IFNβ production by inducing degradation of IRF3 [49]. Furthermore, DDX17 was present among a set of ISG15-modified (ISGylated) proteins in HeLa cells treated with IFNβ [50]. Thus, DDX17 could function in IRF3 activation by acting as a negative regulator of PIN1 and might be regulated by ISGylation (Fig 3). Lastly, DDX17 seems to be a preferred target of viral interference, having reported interactions with six different viruses (e.g. HIV–1 Rev and influenza virus A NS1, Fig 3).

Download:

Fig 3. Human and viral protein interaction networks connecting the known RLR pathway with the newly identified RIG-I factors DDX17 and SNW1.

Human proteins are represented by circles, viral proteins by rounded rectangles (purple nodes). Green nodes represent known components of the RLR pathway. Orange nodes (DDX17 and SNW1) are novel RIG-I pathway components discovered in our study, which are connected to the RLR network through interactions with the green nodes. Edges between human proteins represent physical interactions (both low- and high-throughput) obtained from BioGRID Release 3.3 [54]. Interactions between human and viral proteins were obtained from the PHISTO database (29 Sep. 2014) [28]. See S1 Fig for a more complete representation of the RLR pathway containing the curated set of 49 known RLR genes. LaCV, La Crosse virus; EBV, Epstein-Barr virus; SFSV, Sandfly fever Sicilian virus; PRRSV, Porcine reproductive and respiratory syndrome virus; HPV, Human papillomavirus.

https://doi.org/10.1371/journal.pcbi.1004553.g003

SNW1 is an intrinsically disordered protein [51,52] that interacts with two other newly identified RLR regulators from our study, namely PKN2 [53] and C16orf57 [54]. SNW1 also interacts with the IKBKG (NEMO) protein [55], which is required for NFκB and IRF3 activation [56]. Given that our data shows that knockdown of SNW1 reduces IFNβ induction, SNW1 could be involved in NEMO regulation and thereby contribute to activation of the RLR pathway TFs, NFκB and IRF3 (Fig 3). The fact that SNW1 was also identified in a siRNA screen for mediators of virus-induced NFκB activation [34] strengthens this hypothesis. Further studies should be conducted to resolve the precise mode-of-action.

The genome-wide prioritization of RLR pathway components is a new resource for innate antiviral immunity research

We have validated the integrated RLR score with various experimental, literature and computational approaches. Our confirmations of a substantial fraction of the predicted RLR genes suggest the value of the prioritized list as a whole. The genome-wide prioritization of RLR pathway components is available in S6 Table and at http://rlr.cmbi.umcn.nl/, and can be used as a resource in several ways. For example, it can serve in the evaluation of data sets relevant to the innate antiviral and antifungal responses (Table 2). Many labs routinely consult internal data sets to decide which genes to study further. Comparison of such lists with for example high-scoring RLR candidates could provide insights into the quality of individual data sets for identifying antiviral genes and provide complementary hints about which genes could be important. Finally, the RLR resource could be used for prioritizing genetic variants in patients suffering from severe susceptibility to viral infections or inflammatory disorders caused by inappropriate production of type I interferons.

Conclusions

In this work, we have combined integrative genomics with experiments to discover 10 molecular signatures of a cellular signaling system that is central to human infectious disease: the innate antiviral RIG-I-like receptor (RLR) pathway. The described signatures span multiple layers of genomics data and provide new insights into the regulation of virus detection and immune signaling. Probabilistic integration of the data resulted in a confident genome-wide ranking of candidate RLR pathway genes. RNAi validation experiments confirmed 94 of 187 novel RLR candidates tested, including 13 novel factors with strong effects on antiviral signaling. These results, together with independent computational and literature-based confirmation, demonstrated the validity and high accuracy of our approach. Our study expands the collection of known antiviral genes, opening up new avenues for research into innate antiviral immunity.

Methods

Human reference proteome and mapping

All data sets were calculated for and mapped to a reviewed reference set of 20,245 human proteins from UniProtKB/Swiss-Prot, release 2011_11 [57]. This set consists of one manually annotated record for each validated protein-coding gene. Gene/protein identifier mapping was performed using a mapping table from the same UniProt release. Ambiguously mapped identifiers were curated manually.

Molecular signature data sets

To systematically define RLR pathway components, we mined genome-scale data from a wide variety of sources. The data describe different aspects of the biology of the pathway; from the DNA to the protein level, highlighting evolutionary processes, virus-host interactions, sequence families, etc. We finally settled on 10 data sets that collectively distinguish RLR pathway components from other genes (see Table 1 for an overview and brief descriptions):