Predicting Abdominal Aortic Aneurysm Target Genes by Level-2 Protein-Protein Interaction

Abdominal aortic aneurysm (AAA) is frequently lethal and has no effective pharmaceutical treatment, posing a great threat to human health. Previous bioinformatics studies of the mechanisms underlying AAA relied largely on the detection of direct protein-protein interactions (level-1 PPI) between the products of reported AAA-related genes. Thus, some proteins not suspected to be directly linked to previously reported genes of pivotal importance to AAA might have been missed. In this study, we constructed an indirect protein-protein interaction (level-2 PPI) network based on common interacting proteins encoded by known AAA-related genes and successfully predicted previously unreported AAA-related genes using this network. We used four methods to test and verify the performance of this level-2 PPI network: cross validation, human AAA mRNA chip array comparison, literature mining, and verification in a mouse CaPO4 AAA model. We confirmed that the new level-2 PPI network is superior to the original level-1 PPI network and proved that the top 100 candidate genes predicted by the level-2 PPI network shared similar GO functions and KEGG pathways compared with positive genes.


Introduction
Abdominal aortic aneurysm (AAA) is characterized by permanent abdominal aortic dilation, with the maximum diameter of diseased aorta reaching 1.5 times that of the adjacent aorta. Due to a lack of apparent signs and symptoms, most AAAs, when first diagnosed, are at risk of rupture and hemorrhage with very high fatality rate. This complex cardiovascular disease is modulated by multiple genes related to extracellular matrix degradation, oxidative stress, inflammation, and apoptosis [1]. To date, the only clinical treatment of AAA is invasive surgical repair, and no proven drug therapy is available. Exploring novel anti-aneurismal candidate genes could shed light on promising strategies for AAA prevention and therapy.
In general, functionally related genes are likely to cluster in the same networks [2]. Analysis revealed that genes related to a particular disease tend to have higher and more synchronized expression and tend to interact among each other [4] [5,6]; approximately 70-80% of proteins share at least one function with an interacting partner [7]. Furthermore, proteins of similar function and cellular location tend to cluster together. Approximately 63% of the interactions occur between proteins with common functions, and 76% occur between proteins in the same subcellular compartment [8].
Most protein-protein interaction (PPI)-based bioinformatics studies for predicting disease related genes are based on direct PPIs, although the systems and dimensions considered vary. A machine-learning approach [9] to analyzing protein-protein interaction data has become popular and has been applied to diverse biological problems, including gene classification [10], prediction of function, and cancer tissue classification. Some approaches used to predict disease genes are based on using combined PPI network topological features [11,12] to construct a combined classifier, or on analysis of protein sequences [5]. The candidate genes predicted by these methods must have at least one direct interaction with a known disease gene. Thus, the scope of annotation for a candidate gene is limited by the annotation of its interacting partners [13]. Therefore, previous research using level-1 PPI networks to predict candidate disease genes may omit genes without direct interactions with known disease genes but that share substantial functional similarities with level-2 neighbors [13].
Recently, it was shown that if two proteins do not interact directly but share more common interacting partners than two proteins chosen at random, these two proteins are likely to have close functional associations [14]. This is referred to as indirect functional association. Through the use of indirect interaction data and topological weight, researchers are able to augment the protein-protein interaction network, thereby improving the precision of clusters predicted by existing clustering algorithms [15]. Chua and colleagues reported the use of indirect protein-protein interactions between level-2 neighbors for the prediction of protein complexes [15]. However, whether this method can be extended to the prediction of human disease gene candidates remains elusive.
We propose a novel method to predict candidate AAA disease genes by assessing indirect protein interactions. By comparison with the original PPI network, we found that using a novel level-2 PPI network is superior to the original method in terms of both quantity and quality of the predicted candidate genes. Moreover, we verified our results in vivo in a CaPO 4 -induced aneurysm mouse model.

Data Sources and Preprocessing
Genes defined as Abdominal Aortic Aneurysm (AAA) disease genes were compiled from the Online Mendelian Inheritance in Man database [16] (OMIM, http://www.ncbi.nlm.nih.gov/ omim/) and Genetic Association Database [17] (GAD, http://geneticassociationdb.nih.gov/). PPI data from the Human Protein Reference Database [18] (HPRD, http://www.hprd.org/) were manually extracted from the literature by expert biologists who read, interpreted and analyzed the published data. To avoid any bias toward well studied genes [19], we examined the PPI networks for all interactions detailed by the HPRD annotation (Fig 1).

Candidate Gene Score Counting Method
The interaction number between each candidate gene (both level-1 and level-2) and a positive gene is designated as follows: node "A" indicates a positive gene, while nodes "B," "C," and "D" indicate level-1 candidate genes and nodes "E" and "F" indicate level-2 candidate genes. The interaction number between node B, C or D and positive gene node A is 1; the interaction number between node E and positive gene node A is 1; and the interaction number between node F and positive gene node A is 3 (S1 Fig).

Prediction of Candidate Genes
We constructed both level-1 and level-2 PPI networks to predict AAA disease genes (candidate genes). In the level-1 PPI network, the direct neighbors of all positive disease genes were named level-1 candidate genes. In the level-2 PPI network, the indirect neighbors (proteins with common interaction partners) of all positive disease genes were named level-2 candidate genes.

Animal Preparation
The animal experimental protocol was approved by the Institutional Animal Care and Use Committee (IACUC) of the Peking University Health Science Center (LA2015-142). All studies followed the guidelines of the Animal Care and Use Committee of Peking University. Appropriate analgesics were used on all animals to reduce surgical pain. Mice undergoing surgery to induce abdominal aortic aneurysm were anesthetized by intraperitoneal injection of pentobarbital (30-40 mg/kg) according to IACUC recommendations. After surgery, the animals were moved to a dry area with warming pad and were monitored during recovery.
Mice AAA was induced by CaPO 4 as previously described [20]. For the experimental group (n = 7), the infrarenal aortas of C57BL/6J mice (male, 12 weeks, purchased from Vital River, Beijing, China) were isolated and wrapped with gauze presoaked in 0.5 mol/L CaCl 2 for 10 minutes. Then, the CaCl 2 -treated gauze was substituted with gauze presoaked in PBS for another 5 minutes. The abdominal cavities were washed with 0.9% NaCl before suturing. For the sham operation group (n = 7), both CaCl 2 and PBS solution were substituted with 0.9% NaCl solution, and all the other operations were same as the experimental group. Aortas were collected 7 days later. A picture of the aortas together with a dividing ruler was taken, and the maximum diameter of the aortas was measured using Image Pro Plus 6.0 according to the ruler.

Real-Time PCR
Real-time PCR amplification involved the use of an Mx3000 Multiplex Quantitative PCR System (Stratagene Corp, La Jolla, California) and SYBR Green I reagent. Products were normalized to an internal β-actin control. The primer sequences used for real-time PCR are provided in S1 Table.

Analysis of Functional Coherence
Cytoscape software [21] was used to visualize complex networks and to integrate attribute data. BINGO [22] was used to evaluate which Gene Ontology (GO, http://www.geneontology. org/) terms were enriched. We also tested whether candidate genes shared functions with positive disease genes to validate the associations between Top 100 level-2 candidate genes and the incidence of disease.
Web-based Gene Set Analysis Toolkit [2] (WebGestalt, http://bioinfo.vanderbilt.edu/ webgestalt/) is an extensively used tool for functional enrichment analysis. We used it to compare level-2 candidate genes and positive disease genes with genes in KEGG pathways to identify significant pathways in which level-2 candidate genes and positive disease genes took part. A significance level of 0.01 was selected as the cutoff for selecting enriched pathway categories [11].

Statistical Analysis
All data are presented as the mean±standard error of the mean (SEM). Statistical analysis was performed with Student's t-test for the CaPO 4 AAA model. Cross validation was tested using the paired samples t-test. P<0.05 was considered statistically significant.

Results
In this article, we established a novel method for generating level-2 PPI networks to identify candidate disease genes by measuring the number of shared interaction partners of each gene in a PPI network. This method is detailed in Fig 1.

Level-1 and Level-2 PPI Network Construction
AAA-related genes were compiled from the OMIM and GAD databases. Together, 56 positive genes were identified (33 genes from OMIM and 37 genes from GAD; 14 genes were found in both databases). These genes were entered into the HPRD database, and level-1 neighbors known to directly interact with the genes were identified. These level-1 neighbors were in turn entered into the HPRD database, and their level-1 neighbors-that is, the level-2 neighbors of disease genes-were found (Fig 1). In the HPRD database, 959 level-1 neighbors with 1,312 direct interactions and 5,730 level-2 neighbors with 42,615 indirect interactions were identified (Table 1). Level-1 neighbors, together with the disease genes and direct interactions, comprise the level-1 PPI network; level-2 neighbors, together with the disease genes and indirect interactions, comprise the level-2 PPI network. Of the 56 positive genes in the level-1 PPI network, only 34 had a direct interaction with another positive gene, accounting for 60.71% of all positive genes. Of the 56 positive genes in the level-2 PPI network, 52 had an indirect interaction with another positive gene, accounting for 92.86% of all positive genes.

AAA-Related Gene Prediction and Topological Features of the PPI Network
Genes that encode proteins that interact directly or indirectly with the proteins encoded by known AAA-related genes were referred to as candidate genes. The number of interactions between every candidate gene and all known AAA related genes was calculated, and the interaction score of every candidate gene was then obtained. Based on the probability distribution, we found that in both level-1 and level-2 PPI networks, positive genes possessed much higher interaction scores with known AAA-related genes than did candidate genes (Fig 2). More precisely, most candidate genes had more than 3 direct interactions or less than 10 indirect interactions with known AAA-related genes; positive genes and candidate genes showed significant differences in distribution when the direct interaction number was greater than 4, or when the indirect interaction number was greater than 10. Thus, level-1 candidate genes with greater than 4 interactions are very likely to be hub nodes. The level-2 candidate genes and AAA disease genes may share physical or biochemical characteristics that allow them to bind to the hub nodes [3]. The more interaction partners they share, the higher the probability that they share a common function. In the level-1 PPI network, 14.7% of the positive genes shared at least 4 direct interactions with an AAA-related gene, an 8-fold increase compared with the candidate genes (1.9%) (Fig 2A). In the level-2 PPI network, 17.3% of the positive genes shared at least 50 indirect interactions with an AAA-related gene, an 11-fold increase compared with the candidate genes (1.6%). As seen in Fig 2B, most positive genes shared over 50 indirect interactions with AAA disease genes. Therefore, level-2 candidate genes with more than 50 indirect interactions were defined as prior candidate genes. One of the prior candidate genes with over 50 indirect interactions and 0 direct interactions (i.e., not predicted by the level-1 PPI network) with AAA disease genes was chosen for further study. As shown in Fig 3, protein kinase C delta (PRKCD) was chosen as the center of a protein network. In this network, PRKCD had no direct link (i.e., interaction) with AAA disease genes (shown as red child nodes), but had common interacting partners (level-1 neighbors, shown as blue child nodes) with AAA-positive genes. Specifically, level-1 child nodes, such as the MAPK1, MAPK3, PAK1 and SHC1 genes, were likely to be hub nodes through which PRKCD and known AAA-related genes indirectly interact. In a published human AAA RNA chip array dataset (Reference Series: GSE7084, GSE47472) [23], PRKCD expression was significantly increased in AAA cases compared with controls. In addition, PRKCD expression was markedly upregulated in human AAA vessel wall and was shown to mediate VSMC MCP-1 expression [24], which could contribute to the vascular inflammatory process. Furthermore, recent studies in a mouse model of AAA have shown that PRKCD is an important signaling molecule in VSMC apoptosis and inflammation [25].

Cross-Validation
A holdout validation test was used to evaluate the performance of both level-1 and level-2 PPI networks and to select the optimal PPI network. Five genes were randomly deleted from the list of 34 positive genes (i.e., positive genes shared by level-1 and level-2 networks), and both PPI networks were reconstructed. This process was repeated 100 times. The average scores for the positive and candidate gene groups were obtained each time and the ratios of the two average scores were calculated. In turn, the ratio after 100 randomizations was determined (Fig 4). According to this analysis, the ratio for the level-2 group was significantly higher than that for the level-1 group, making it more suitable for screening candidate genes related to AAA.

Comparison with Existing AAA Gene Data
Human AAA RNA array data(Reference Series: GSE7084, GSE47472) obtained via Illumina and Affymetrix microarray platforms was used to generate global gene expression profiles for both aneurismal (AAA) and non-aneurismal abdominal aortas. The genes that were significantly differentially expressed between cases and controls were used in our analysis. Identified candidate genes from both level-1and level-2 PPI networks were matched with the RNA array data, and the coincidence ratios of the candidate genes to the RNA array data in different toplevel groups were calculated. There were 3,274 differentially expressed genes, including 235 level-1 candidate genes (6.3%) and 1,240 level-2 candidate genes (33.1%). The top 14 genes corresponded to level-1 genes scoring greater than or equal to 5 and level-2 genes scoring greater than or equal to 100, with overlaps of 28.6% and 42.9% with the AAA chip array data, respectively. The top 15-76 genes corresponded to level-1 genes scoring 3-4 and level-2 genes scoring 82-98, with overlaps of 30.8% and 26.9% with the AAA chip array data, respectively. The top 77-965 genes corresponded to level-1 genes scoring 1-2 and level-2 genes scoring 12-31, with respective overlaps of 24.2% and 26.5% with the AAA chip array data. The remaining level-2 candidate genes scoring below 12 accounted for 26% of the genes. The upper range was segmented according to integer score. It was found that the coincidence ratio of level-2 candidate genes to the microarray data was significantly higher than that of level-1 candidate genes to the microarray data, especially among the top 14 ( Fig 5A).
In addition, literature available in PubMed was reviewed to determine whether candidate genes predicted by the two PPI networks were related to AAA. The candidate genes in the level-1 group scoring greater than 4 and those in the level-2 group scoring greater than 50 were selected. According to the literature, predicted candidate genes in the level-2 group had a higher coincidence ratio with published studies though the top 10 to top 75 groups, as shown in Fig 5B. The top 14 genes corresponded to level-1 genes scoring greater than or equal to 5 and level-2 genes scoring greater than or equal to 100, with respective overlaps 28.6% and 42.9% compared to the AAA chip array data. The top 15-32 genes corresponded to level-1 genes scoring 4 and level-2 genes scoring 82-98, with respective overlaps of 28.6% and 42.9% compared to the AAA chip array data. The top 33-76 genes corresponded to level-1 genes scoring 3 and level-2 genes scoring 54-81, with respective overlaps of 28.6% and 42.9% compared to the AAA chip array data.

Analysis of Functional Coherence
Functional coherence between candidate genes and known disease genes was examined to verify associations between candidate genes and AAA. The KEGG database (WebGestalt, http:// bioinfo.vanderbilt.edu/webgestalt/), which contains all signaling pathways in a Web-based Gene Set Analysis Toolkit platform, was utilized. The level-2 candidate genes and known AAA-related genes were entered, and a pathway enrichment analysis for both candidate genes and positive genes was performed. Eighty-five signaling pathways with enriched candidate AAA genes were obtained, with some-such as the ErbB signaling pathway, Focal Adhesion, Neurotrophin signaling pathway, TGF-β signaling pathway, MAPK signaling pathway, and Adherens Junction, B Cell Receptor signaling pathway (S3 Table)-showing marked enrichment.

Candidate Gene Expression in a CaPO 4 -Induced AAA Model
Five of the candidate genes at the top of the level-2 PPI network list that were unreported in the literature were tested and validated in a CaPO 4 -induced mouse abdominal aortic aneurysm model [30]. In the CaPO 4 -treated group, the infrarenal aorta was obviously expanded, while the aortas of NaCl-treated mice were morphologically normal (Fig 6A). The maximum diameter of the control (0.9% NaCl) group was 0.885± 0.077 mm, and that of the CaPO 4 -treated group was 1.391± 0.151 mm (p< 0.05) (Fig 6B). The elastic lamina disruption of the CaPO 4treated group was severer than that of the NaCl group as shown by Gomori staining (Fig 6C). Immunofluorescence staining showed that IL-6 and MCP-1 production in the aorta were also significantly increased in the CaPO 4 -treated group (Fig 6D). These data indicated the successful induction of infrarenal abdominal aortic aneurysm by CaPO 4 treatment.
RNA was extracted from the abdominal aortic tissue of both groups and was reverse transcribed into cDNA. Five of the candidate genes predicted in the level-2 PPI network were verified using Real-Time PCR. Compared with the NaCl-treated group, CaPO 4 -treated aortas contained reduced levels of Osteonectin (SPARC), Matrix Metalloproteinase-14 (MMP-14) and Integrin beta 1 (ITGB1) mRNA expression, while Amyloid Precursor Protein (APP) and tumor protein p53 (TP53) showed no significant differences (Fig 6E).

Discussion
Abdominal aortic aneurysm is a complex cardiovascular disease in which multiple genes are involved [31,32]. In this study, we created a protein interaction database to predict AAArelated genes. We constructed a level-2 PPI network according to the interaction partners shared by positive genes and candidate genes, and successfully predicted a large number of undiscovered AA-related genes.
In recent years, the amount of available protein interaction data has markedly increased, and many protein interaction databases have emerged. Using various disease gene prediction methods, it has been found that even when fixed position candidate strategies or gene chip arrays are utilized, protein interaction networks yield additional candidate genes in a convenient and accurate manner [33]. Genes and their interaction partners are likely to share similar functions, and their reciprocal links are often closely related to certain phenotypes or diseases [34,35]. Goh [3] and colleagues reported that proteins responsible for the same disease are inclined to cluster in a PPI network and that changes in neighbors of the disease genes are likely to contribute to the same or a similar disease. According to Josson [36] et al., the interaction probability between disease genes is much higher than that between non-disease genes, and disease genes are more prone to gather in clusters.
Direct protein-protein interaction (level-1 PPI) networks were widely used to predict the biological functions of proteins and disease related genes, especially in cancer and diabetes. However, at least one direct interaction with a known disease gene is required in a level-1 PPI network. Thus, genes that do not interact directly with known disease genes but share common interacting partners with those of a positive gene could be neglected.
In this study, we constructed a level-2 PPI network based on the number of shared interacting partners of known disease genes, then used it to predict candidate genes for abdominal aortic aneurysm (AAA). We found that most AAA candidate genes are likely to have the following characteristics: i) they tend to be hub nodes or to be bound to hub nodes in the network, indicating a greater link to disease genes; ii) they have a great number of indirect interactions with disease genes; iii) they have similar biological functions with known AAA genes according to GO functional categorization (S3 Fig); and iv) they often take part in common signaling pathways with known AAA related genes. According to our analysis, our network of 56 positive genes contains 959 level-1 neighbors and 5,730 level-2 neighbors, and previously identified candidate genes tended to be hub nodes. Our functional enrichment analysis and experimental validation confirmed this inference. In our CaPO 4 induced mouse AAA model, 3 (SPARC, MMP-14 and ITGB1) of the 5 chosen prior candidate genes were significantly down-regulated in the AAA model compared with controls, and the other 2 genes (APP and TP53) tended to decrease.
Recently, some research on SPARC in intracranial aneurysm showed controversial results of SPARC expression in intracranial aneurysm [37,38,39]. However, we found for the first time that SPARC decreased in CaPO 4 treated mice abdominal aorta.
For MMP-14, the reports are also contentious. Stéphanie Michineau et.al. conducted a CaCl 2 model (14 days) to induce mice AAA, and reported that MMP-14 increased [40]. But in Hazem Abdul-Hussien's study, using tissue from patients, they found that a modest augment of MMP-14 was observed only in ruptured aneurysm. While in growing (unruptured) aneurysm, MMP-14 mRNA level was the same as that of normal controls [41]. In our study, the mice were sacrificed 7 days after CaPO 4 treatment, and the aortas remained unruptured. We found that MMP-14 was down regulated in CaPO 4 treated group 7 days after surgery. These discrepancies could probably due to experimental conditions like various animal model, CaCl 2 incubation time, the induction of PBS, days after surgery, animal age etc.
Some previously reported genes that influence AAA, such as Epidermal Growth Factor Receptor (EGFR) [42], Smad family member 1 (Smad1) [43], Estrogen Receptor (ESR2) [44], and STAT family member 3 (STAT3) [45], were successfully predicted by our level-2 PPI network, but not by the level-1 PPI network. These reported genes were not positive in our network because they were identified in animals, whereas only human genome-wide association study (GWAS) reports were used in the construction of our networks.
Two difficulties remain in the prediction of disease genes based on PPI networks. One is the current inadequacy of PPI databases; approximately one-third are based on computer prediction, creating concerns regarding the reliability of the data. The other difficulty is that network topological features are difficult to quantitate. For example, a protein with a high degree might be a hub node, but it might also be a well-studied protein of lesser importance.
Further investigation of prior candidate genes and their interactions with positive genes may shed light on AAA nosogenesis and its underlying mechanisms. Future work with multiple AAA models and experimental methods is necessary to ascertain whether candidate genes are truly AAA-related and how they play a role in the disease.

Conclusions
We constructed a level-2 PPI network based on indirect (level-2) PPI neighbors with common interaction partners of known disease genes. The more common interaction partners a gene possessed, the higher the chance that the two proteins would share certain common functions. Select level-2 candidate genes were screened, and those with more than 50 indirect interactions with positive disease genes were referred to as prior candidate genes. These prior candidate genes were found to share similar GO functions and KEGG pathways with positive disease genes. Thus, predicting disease-related genes using a level-2 PPI network based on indirect protein-protein interactions can serve as a guide for researchers to discover additional novel disease-related genes.