a Current address: Department of Cellular and Molecular Pharmacology, University of California, San Francisco, California, United States of America
b Current address: Midwestern University-Arizona College of Osteopathic Medicine, Glendale, Arizona, United States of America
c Current address: Department of Microbiology, School of Medicine, St. George's University, Grenada, West Indies
d Current address: Laboratoire de Bioinformatique des Génomes et des Réseaux, Université Libre de Bruxelles, Bruxelles, Belgium
e Current address: Department of Biochemistry and Molecular Biology, Louisiana State University Health Science Center, New Orleans, Louisiana, United States of America
f Current address: Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada and Samuel Lunenfeld Research Institute, Mt. Sinai Hospital, Toronto, Ontario, Canada
Conceived and designed the experiments: NG HY MP ORR JAM EK FPR JRC JAD JQ DEH KM MV ALB. Performed the experiments: AD DB RF ORR AB BZ KWH KH MG AC RR. Analyzed the data: NG HY MP JCM BS NS. Contributed reagents/materials/analysis tools: DSL MAC PB NAC JQ KM. Wrote the paper: NG HY MAC MEC DEH KM MV ALB.
¶ Member of the Genomic Variation and Network Perturbation Center of Excellence in Genomic Science, Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.
The authors have declared that no competing interests exist.
Many human diseases, arising from mutations of disease susceptibility genes (genetic diseases), are also associated with viral infections (virally implicated diseases), either in a directly causal manner or by indirect associations. Here we examine whether viral perturbations of host interactome may underlie such virally implicated disease relationships. Using as models two different human viruses, Epstein-Barr virus (EBV) and human papillomavirus (HPV), we find that host targets of viral proteins reside in network proximity to products of disease susceptibility genes. Expression changes in virally implicated disease tissues and comorbidity patterns cluster significantly in the network vicinity of viral targets. The topological proximity found between cellular targets of viral proteins and disease genes was exploited to uncover a novel pathway linking HPV to Fanconi anemia.
Many “virally implicated human diseases” - diseases for which there is scientific consensus of viral involvement - are associated with genetic alterations in particular disease susceptibility genes. We proposed and demonstrated that for two human viruses, Epstein-Barr virus and human papillomavirus, topological proximity should exist between host targets of viruses and genes associated with virally implicated diseases on host interactome networks (local impact hypothesis). For representative EBV- and HPV16- implicated diseases, genes in the neighborhood of viral targets in the host interactome have significantly shifted expression levels in virally implicated disease tissues, in line with the local impact hypothesis. The viral neighborhoods in the host interactome, along with their disease associations, defined as “viral disease networks”, contain connections known to be informative upon disease mechanisms as well as diseases whose associations with viruses are not yet known. We prioritized these diseases for their candidacy as potential virally implicated diseases based on network topology, and benchmarked this prioritization of candidate diseases using relative risk measurement which depicts population-based clinical associations between candidate diseases and viral infection. Exogenous expression of HPV viral proteins in a human cell line offered evidence for a novel disease pathway that links HPV to Fanconi anemia.
Functional interactions between cellular targets of viral proteins and disease susceptibility genes
To test this hypothesis we focused on Epstein-Barr virus (EBV) and human papillomavirus (HPV) type 16, two human viruses that differ in their host tropism, genome and proteome size, and disease etiology. We find that the disease susceptibility genes of known virally implicated diseases are in the immediate network vicinity of the host proteins that are targeted by these viruses. We could identify a viral disease module for EBV and HPV, representing a subnetwork of the interactome that contains key mechanistic pathways responsible for the observed virus-disease associations. A computational prioritization procedure, joined by large-scale comorbidity and expression pattern analyses, identified new potential mechanistic disease pathways. To validate several of these pathways, HPV16 E6 and E7 oncogenes were independently expressed in primary human fibroblast (IMR90) and keratinocyte (HFK) cell populations to identify disease-associated genes whose expression levels were significantly altered in these E6/E7-expressing cell populations. We could identify a novel pathway that links HPV to a specific form of Fanconi Anemia. The systematic network-based framework we applied works to decipher the interplay between viruses and disease phenotypes.
We define as “virally implicated diseases” those diseases whose association with a particular virus is supported by peer-reviewed publications in the literature. This list includes not only diseases for which there is universally accepted consensus that a virus is causal (such as cervical cancer for HPV16 and Burkitt's lymphoma for EBV), but also diseases which have some reproducible evidence of viral association but for which the mechanistic pathways are not worked out. There is significant and legitimate controversy and subjectivity regarding which diseases are virus-associated or virally implicated, so to avoid infusing personal bias in the selection process, we turned to several recently published authoritative review articles
Most of the selected virally implicated diseases (13 for EBV and 9 for HPV16) are genetic diseases in that they have been associated with mutations in at least one human gene (
|
||
EBV-implicated diseases | Mapped genes | ICD-9 code(s) |
1. B cell lymphomas incl. Burkitt's lymphoma |
|
200 |
2. Breast cancer |
|
174, 217, 239.3 |
3.Hemophagocytic lymphohistiocytosis |
|
288.4 |
4. Hepatocellular carcinoma |
|
155, 211.5 |
5. Lung cancer |
|
162, 231 |
6. Nasopharyngeal carcinoma |
|
147 |
7. Severe combined immunodeficiencyi |
|
279.2 |
8. Stomach carcinoma |
|
151 |
9. T cell lymphomas |
|
202 |
10. Classical Hodgkin lymphoma |
|
201 |
11. Salivary carcinoma |
|
142 |
12. Wiskott-Aldrich syndromei |
|
279.12 |
13. X-linked lymphoproliferative disorderi |
|
238.79 |
14. Infectious mononucleosis |
|
075 |
15. Lymphocytic interstitial pneumonia |
|
516.8 |
16. Oral hairy leukoplakia |
|
528.6 |
17. Thymus carcinoma |
|
164 |
|
||
HPV-implicated diseases | Mapped genes | ICD-9 code(s) |
1. Bladder cancer |
|
188 |
2. Breast cancer |
|
174, 217, 239.3 |
3. Colon cancer |
|
153 |
4. Head and neck squamous carcinoma |
|
173 |
5. Ovarian cancer |
|
183, 220 |
6. Prostate cancer |
|
185, 233.4 |
7. Squamous cell carcinoma of the lung |
|
162, 231 |
8. Carcinoma of cervix uteri |
|
180, 233.1 |
9. Laryngeal carcinoma |
|
161 |
10. Bowen disease |
|
230–234 |
11. Conjunctival carcinoma |
|
190.3 |
12. Intraepithelial neoplasia |
|
233 |
13. Oral carcinoma |
|
140, 141 |
14. Oral leukoplakia |
|
528.6 |
To explore the role of macromolecular networks in virus-disease associations we collected four categories of biological connections: 1) lists of previously published experimental virus-human protein-protein
To test our hypothesis that genes associated with virally implicated diseases are located in the network vicinity of viral targets (
The relative shortness of the paths from viral targets to disease genes validates the hypothesis that genes in the “neighborhood” of viral targets are more likely associated with virally implicated diseases, compared to genes in distant regions of the host interactome. But still, given the small world nature of the interactome, large numbers of proteins are within a few hops of the viral targets, potentially implicating hundreds of diseases for which there is no known relationship to HPV or EBV. Accordingly, a procedure is needed to identify the set of host cellular components (genes, proteins, and metabolites) that are most likely impacted by the virus, representing the network neighborhood of viral targets. Do the three kinds of interactions used to build the interactome — protein-protein, metabolic and regulatory interactions — play a comparable role in linking viral targets to virally-implicated diseases, and how deep into the interactome should one go, keeping in mind that most proteins are approximately three links from the viral proteins.
To find the optimal neighborhood responsible for the phenotypic impact of a virus, we tested several “configurations” that govern the maximum hops allowed from the viral targets for each type of biological interaction. The simplest configuration includes only viral targets, while the more extended configurations capture increasing number of hops along the links of the interactome network, connecting an increasing number of proteins. The best configuration, as measured by the odds ratio of the enrichment of virally implicated diseases, defined the optimal neighborhood as the viral targets themselves and the genes regulated by them, and was the same for both viruses (Figure S4A,B in
According to the local impact hypothesis, the genes regulated by viral targets should have significantly altered expression levels in virally implicated disease tissues within the viral disease modules. To test this, we collected microarray gene expression data for two representative EBV-implicated diseases, Burkitt's lymphoma and B cell lymphoma
Given the high interconnectivity of the host interactome, the number of all potential distinct paths linking viral targets to genes (or gene products) associated with virally implicated diseases exceeds 10200 for both viruses (
The neighborhoods of viral targets in the host interactome, along with their disease associations, represent “viral disease networks” (
The uncovered viral disease networks contain several diseases that have not been previously associated with infection by the corresponding viruses (grey squares in
To independently benchmark the prioritization of candidate diseases, we turned to relative risk measurement
To demonstrate the value of the network-based approach to generate new biological hypotheses, we explored whether the cellular perturbations induced by expression of individual viral proteins are similar to those seen in particular disease phenotypes. We generated primary human keratinocyte (HFK) populations with stable expression of the HPV16 E6 or E7 oncoproteins and analyzed the gene expression profiles of multiple independent samples for these cells in concert with expression data from IMR90 cells expressing HPV16 E6 or E7 proteins (
Seven out of 39 diseases have high relative risks among HPV patients (
The clinical connection between Fanconi anemia and HPV associated tumors has been subject to debate. Not debatable is that FA patients have a much-increased risk in developing squamous cell carcinomas (SCCs) at anatomical sites infected by HPVs. Our analysis does not necessarily mean that SCCs in Fanconi patients are caused by HPV, but that they arise by similar molecular mechanisms. The well-documented interplay between E7 and FA and our discovery of a possible connection between E6, FANCC and BRCA1 support this hypothesis. Moreover, we observe a relative risk of 3.7 among female HPV patients (mostly cervical cancer patients) toward Fanconi anemia using the US-wide Medicare data, which further supports the identified molecular level relationship between Fanconi anemia and HPV (
Given the large number of functional interactions present in human cells and the many possible paths among cellular components, uncovering the precise impact of a virus upon the host interactome is an enormously complicated task. Here we provide evidence that a large proportion of the effect of a virus can be accounted for locally in the network space, which allowed us to develop and test a general methodology designed to elucidate the consequences of viral impacts on the host interactome network, and to prioritize candidate diseases for potential viral implications.
A predictive methodology should ideally take into account cell tropism. Tissue-specific gene expression data can be merged with our analysis (
The strategy developed here is not unique to EBV and HPV16. Although the strategy should work better for carcinogenic pathogens, given how well-studied proteins involved in cancer are, it is equally applicable to any pathogen for which protein interactions between the pathogen and the host proteome have been mapped. While still limited by the incompleteness of genome- and proteome-scale datasets
Yeast two-hybrid screens (Y2H) between EBV and HPV16 viral proteins and ∼12,200 human proteins encoded by a library of full length human open reading frame (ORFs) clones in Human ORFeome v3.1
Raw data of the gene expression datasets used (GSE2350, GSE2392 and GSE15156) was obtained from Gene Expression Omnibus (GEO)
To obtain the disease associated genes that are differentially expressed in viral protein induced cell populations, HPV16 E6 and E7 oncogenes were independently transfected into primary human fibroblast (IMR90) and keratinocyte (HFK) cell populations. Affymetrix Human Gene1.0 ST and Human Genome U133 Plus 2.0 arrays, respectively, were used to measure gene expression profiles for five or more replicate samples in each of the cell types. Array data were normalized by RMA, batch effects were removed using ComBat, and the limma package in R/Bioconductor was used to identify differential expression.
Relative risk (RR) was calculated as the ratio between the observed co-occurrence and probabilistically-inferred (assuming independence) co-occurrence of two diseases, based on the patient medical history data from United States (U.S.) Medicare, which contains the clinical diagnosis record of each hospital visit (in ICD-9 codes) of 13 million U.S. patients at age 65 or older
The statistical significance of the average shortest path between viral targets and genes associated with a given virally implicated disease was calculated by randomly sampling human diseases from OMIM (full table of disease in
Full list of interactions and gene-disease associations in viral disease networks, including the sources of data. (
(XLS)
OMIM genes and diseases and their corresponding ICD-9 codes.
(XLS)
Relative risk analysis.
(XLS)
Full list of diseases prioritized by the flow algorithm for (
(XLS)
Supporting information, tables and figures are provided in this document.
(PDF)
We thank X.S. Liu, T. Liu, H. Yu, C. Hidalgo, G. Adelmant and all members of the CCSB Center of Excellence in Genomic Sciences (CEGS) for useful discussions and suggestions.