Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Novel Prioritization Method in Identifying Recurrent Venous Thromboembolism-Related Genes

  • Jing Jiang ,

    Contributed equally to this work with: Jing Jiang, Wan Li, Binhua Liang

    ‡ These authors are joint first authors on this work.

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

  • Wan Li ,

    Contributed equally to this work with: Jing Jiang, Wan Li, Binhua Liang

    ‡ These authors are joint first authors on this work.

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

  • Binhua Liang ,

    Contributed equally to this work with: Jing Jiang, Wan Li, Binhua Liang

    ‡ These authors are joint first authors on this work.

    Affiliation National Microbology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada

  • Ruiqiang Xie,

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

  • Binbin Chen,

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

  • Hao Huang,

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

  • Yiran Li,

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

  • Yuehan He,

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

  • Junjie Lv,

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

  • Weiming He ,

    chenlina@ems.hrbmu.edu.cn (LC); hewm@hit.edu.cn (WH)

    Affiliation Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Hei Longjiang Province, China

  • Lina Chen

    chenlina@ems.hrbmu.edu.cn (LC); hewm@hit.edu.cn (WH)

    Affiliation College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081

Abstract

Identifying the genes involved in venous thromboembolism (VTE) recurrence is important not only for understanding the pathogenesis but also for discovering the therapeutic targets. We proposed a novel prioritization method called Function-Interaction-Pearson (FIP) by creating gene-disease similarity scores to prioritize candidate genes underling VTE. The scores were calculated by integrating and optimizing three types of resources including gene expression, gene ontology and protein-protein interaction. As a result, 124 out of top 200 prioritized candidate genes had been confirmed in literature, among which there were 34 antithrombotic drug targets. Compared with two well-known gene prioritization tools Endeavour and ToppNet, FIP was shown to have better performance. The approach provides a valuable alternative for drug targets discovery and disease therapy.

Introduction

Venous thromboembolism (VTE) is the third most common cardiovascular disease with a high risk of recurrence and mortality [15]. It was reported that around one-third of patients suffering from a first episode of deep venous thrombosis (DVT) or pulmonary embolism (PE) developed a VTE recurrence within 10 years [6]. Even during warfarin anticoagulant therapy, VTE-experienced patients still face risks of recurrent VTE [79]. In clinical practice, it is helpful to identify biomarkers that aid the early diagnosis of patients at a high or low risk of primary and recurrent VTE, and assess therapy [10].

In the past, efforts had been exerted on seeking these biomarker [11]. Through whole blood gene expression analysis, the D-dimer [12], the soluble p-selectin [13], and the thrombin [14] were found to be strongly associated with an increased risk of recurrent VTE and thus were accepted as biomarkers of recurrent VTE [15,16]. However, there were limitations in determining biomarkers of recurrent VTE through whole blood gene expression analysis. At first, the VTE patient population was a heterogeneous mixture of patients with provoked and non-provoked VTEs. Secondly, the two groups of VTE patients differed in the duration of time since their last VTE as well as duration of warfarin therapy. At last, some patients with a single VTE would likely be vulnerable to a recurrent event if anticoagulant therapy discontinued, resulting in reclassification of any affected individual [16]. Differential expression analysis might not determine which genes were more important or could neglect some potential disease-related genes [17].

Alternately, the computational methods such as prioritization methods, including ToppNet (https://toppgene.cchmc.org/network.jsp) [18] and Endeavour (http://homes.esat.kuleuven.be/~biouser/endeavour/tool/endeavourweb.php) [19], were deployed to investigate potential disease genes [2022]. These methods assume that both the potential disease-related genes and the known genes share functions, interact with each other, and are involved in similar phenotypes. The studied genes were assigned similarity or confidence scores with disease followed by the ranking based on the descending order of the scores. In general, these prioritization methods rely on functional annotations [23,24], network properties [2528] and gene expression data [2931]. ToppNet ranks or prioritizes genes based on topological features in the protein-protein interaction network (PPIN). ToppNet has been applied with good performances in a few studies [3234]. For example, Lascorz et al. applied ToppNet tool in identifying markers of colorectal cancer. The three overrepresenting genes was found to be closely related to the mitogen-activated protein kinase (MAPK) signaling pathways, which is well-known to increase the risk of colorectal cancer [35]. In another study by using ToppNet, the OPRM1 gene was shown to be significantly differently expressed between different HIV groups [36]. The weakness of ToppNet is only one data source used for ranking genes which affects its robustness for candidate gene identification.

Inspirited by the fact that that integrative strategy in combining distinct resources showed a better performance in discovery of disease-related genes [3746], Endeavour was developed [47]. Endeavour integrates 19 distinct data sources, including annotation (Gene Ontology, Swissprot, Interpro, Kegg, EnsemblEst), Interaction (Bind, String, BioGrid, Hprd, InNetDb, Intact, Mint), Expression (SonEtAl, SuEtAl), Precalculated (Ouzounis, Prospectr), Motif, Blast, and Text mining. The rankings of the candidates derived from each source were further combined into one global ranking. Robert et al. ranked the differentially expressed genes through Endeavour and identified P2rx7 (the 2nd ranked) and P2rx4 (the 3rd ranked) responsible for impaired blood pressure control in rat. The result was confirmed by Western analysis which was consistent with the previous congenic studies [48]. In Kamron et al.’s study, candidate genes of congenital cataract were prioritized using Endeavour and the three top-ranked genes were confirmed to be associated with the disease by literature [49]. The limitation of Endeavour was that it did not take disease samples into account [50]. In fact, the accuracy of prioritization methods is directly correlated with the quality of data [51]. Moreover, Endeavour solely depends on the protein interactions defined in the databases for gene prioritization. However, many protein-protein links in the databases are very loose since structural or chemical properties and functionalities were not taken into consideration, leading to reduced protein interaction reliabilities.

In this study, we present FIP (Function-Interaction-Pearson), a novel prioritization method designed for identifying Recurrent Venous Thromboembolism-related Genes. FIP addressed the limitations of the current commonly used methods in prioritizing genes. Potential VTE recurrence related genes were identified as the top-ranked genes. Our study would provide a valuable alternative for enhancing our understanding of the complex molecular mechanism of VTE recurrence at a system level.

Materials and Methods

Data Source

A gene expression profile of whole blood was downloaded from the publicly available Gene Expression Omnibus (GEO, http://www.ncbi.nml.nih.gov/geo/) [52]. The profile GSE19151 in the platform GPL571 was selected for downstream analysis in this study. GSE19151 contains 13785 genes derived from the 133 samples in different groups, including normal subjects (63), single event VTE patients (32), and recurrent VTE patients (38) who are on warfarin therapy. The differentially expressed genes were identified by using the Significance Analysis of Microarrays (SAM) between normal and recurrent samples. The 119 thrombosis disease-related genes were obtained from Online Mendelian Inheritance in Man (OMIM, http://omim.org/) [53], Genetic Association Database (GAD, http://geneticassociationdb.nih.gov/) [54] and Disease Ontology (DO, http://disease-ontology.org/) [55]. The interaction network used in this study was downloaded from STRING (http://string-db.org/) [56]. In the network, gene association datasets were either directly derived from physical interactions or functional links from experimental evidence and computational methods [57,58]. The network composes of 5260 nodes (disease-related genes and differential genes in the interaction network) and 42087 edges, which represent genes and interactions between them, respectively. In our study, 108 disease-related genes (excluded 11 genes not in the STRING database and the profile GSE19151) were selected as seed genes and other genes as candidate genes.

The FIP method

A novel prioritization method FIP was developed to prioritize VTE candidate genes by calculating gene-disease similarity scores, also called disease relevance score q. Briefly, the disease relevance score for each gene was measured by considering the overall similarity with its neighboring genes in the disease-related network based on the separated data sources: gene ontology, protein-protein interaction, and gene expression. The workflow of the method and its validations were described below (Fig 1).

thumbnail
Fig 1. The workflow of FIP method and its validations.

A: measurement of the overall similarity between genes. B: calculation of ranking scores of candidate genes. C: verification of the performance of the results.

https://doi.org/10.1371/journal.pone.0153006.g001

The score vector Q (n×1; n—the total number of genes) represented disease relevance scores for all genes in the disease-related network, which was formulated as follows: (1) where qi (qi∈Q) is the ranking score of gene i, I denotes an identity matrix of n×n, e is the expression score vector of n×1 where ei is defined as the absolute value of the difference between the sum of expression values of gene i in normal and recurrent samples, d denotes a control parameter in the range of [0,1] which is to adjust the weight of disease-related network in calculating ranking scores (here we chose d = 0.9 [59]), and D corresponds to a diagonal matrix of n×n where dii is the sum of weights of interactions between gene i and its neighboring genes in the network. The weights are contained in the matrix W (n×n), where wij is used to measure the overall similarity between gene i and its neighbor gene j from the aspects of interaction, expression and function. Thus, W was characterized as: (2)

Here S(i,j),P(i,j) and F(i,j) denote the interaction credibility score, Pearson correlation coefficient, and shared functional significance score between gene i and gene j, respectively. Three coefficients α, β, and γ in the range of [0,1] were used to assess the importance of S(i,j), P(i,j), and F(i,j) in formula, respectively.

The interaction credibility scores S(i,j) for each pair of gene i and j was calculated as follows [58]: (3) where Ci• and C•j are the sums over all pairs involving i or j and another entity, C•• is the sum over all pairs of entities, Cij represents the sums over all pairs involving both i and j, and t = 0.6 [58]. The parameters were optimized on the KEGG benchmark set [58]. The co-occurrence score Cij was defined as: (4) where vd = 1, vp = 2, and vs = 0.2 are the weights for co-occurrence genes within the same document, paragraph, and sentence based on literature mining, respectively. The delta functions δdijk, δpijk, and δsijk are 1 if the genes i and j are both mentioned in the document k, a paragraph of k or a sentence of k, otherwise they are 0 [58].

The Pearson correlation coefficient P(i,j), which is used to represent the co-expression relationship between gene i and gene j, was defined as follows: (5) where h is the number of normal samples adding recurrent samples in the expression profile, , , si, sj, iy and jy represent the average expression value of normal and recurrent samples, standard deviation and observed values of i and j, respectively.

For the shared functional significance score F(i,j) between gene i and gene j, one function was represented by one GO term fm. F(i,j) is defined as the total sum of the significance of the functions shared: (6) where x is the number of common GO terms annotated by genes i and j, and sig(fm) denotes the significance of a function fm, which was defined as follows: (7) here Gene(fm) is genes annotated on GO term fm, |Gene(fm)| is the number of genes annotated to fm. We calculated the ranking score q for each gene in the disease-related network and ranked these genes in the descending order of q.

In the formula (1) and (2), all the combinations of α, β, γ, and d were used to rank candidate genes. The best α, β, γ combination was determined according to the seed genes identified in the top 50 and 100 ranking list. The best d value was selected based on the α, β, and γ combination which showed the best performance in ranking candidate genes.

Validation

The comparison of FIP with ToppNet/Endeavour was carried out using the same data. The performance of them was assessed using the Leave-One-Out Cross Validation (LOOCV). For all the seed genes, one seed gene was removed as a test gene each time, and then added to candidate genes. All the candidate genes were ranked by our method to determine the ranking of the test gene. This procedure was repeated until all the seed genes were used up as test genes. Receiver Operating Characteristic (ROC) curves were then plotted and the area under ROC curve (AUC) values were used to compare the performances of the three methods.

Results

Optimization of ranking coefficient parameters

As described in method, score vector Q for all the genes was calculated based on the rankings from the separated data sources such as gene ontology, protein-protein interaction, and gene expression in their corresponding coefficients α, β, and γ, respectively. Candidate genes, which were the common genes in the disease-related network and the differentially expressed genes identified using SAM, were then ranked in the descending order of Q value. For the top 50 and 100 genes in the ranking list, we calculated the number of matched seeds against single, two and three these parameter combination, respectively. There was a significant difference between single and multiple parameter combinations both in top 50 and 100, as well as between two and three parameter combination in top 50 of ranking gene list (t-test, p<0.05) (Fig 2).

thumbnail
Fig 2. Violin plots of the number of matched seeds identified in top 50 and 100 of the ranking list.

The number 1, 2 and 3 represent the number of parameter(s) in the parameter combinations, respectively. The asterisks and white circles present our results and the medians of each combination.

https://doi.org/10.1371/journal.pone.0153006.g002

The LOOCV has been further applied for all parameter combinations and four parameter combinations (α = 0.8, β = 0.5, γ = 0.9 (AUC = 0. 0.9107); α = 0.7, β = 0.5, γ = 0.8 (AUC = 0.9187); α = 0.9, β = 0.5, γ = 0.8 (AUC = 0.8955); α = 0.9, β = 0.6, γ = 0.8 (AUC = 0.8763)) were shown to be better than the rest. Since no other independent dataset of VTE could be obtained, 10-fold cross-validation was carried out to further select the optimized parameter values in Formulas 1 and 2 from these four parameter combinations (α = 0.8, β = 0.5, γ = 0.9 (AUC = 0. 0.9013); α = 0.7, β = 0.5, γ = 0.8 (AUC = 0.8948); α = 0.9, β = 0.5, γ = 0.8 (AUC = 0.8527); α = 0.9, β = 0.6, γ = 0.8 (AUC = 0.8416)).

The optimal parameter combination of α = 0.8, β = 0.5, and γ = 0.9, was achieved (Fig 3).

thumbnail
Fig 3. The 3-D distribution of seed genes in top 50 (left) and top 100 (right) of the ranking gene list against all parameter combinations.

The three sides of the triangle coordinate system represent the three parameters, respectively. The perpendicular axis of the triangular coordinate system represents the number of seed genes. The purple five-pointed star and yellow ball present the optimal parameter combination and all other parameter combinations.

https://doi.org/10.1371/journal.pone.0153006.g003

For all parameter combinations, genes were also ranked according to the calculated q scores with five different d values (d = 0.1, 0.3, 0.5, 0.7, and 0.9). The matching numbers of genes were applied to assess the effectiveness of FIP (Fig 4). The number of matched seeds among top 500 in the ranking list of d = 0.9 was higher than those of other d-values.

thumbnail
Fig 4. Comparison of performance of FIP-based gene ranking with different d values for four parameter combinations.

The number of matched genes identified using FIP with five different d values (d = 0.1, 0.3, 0.5, 0.7, and 0.9) for four parameter combinations were counted and plotted ((A)α = 0.8, β = 0.5, γ = 0.9; (B)α = 0.7, β = 0.5, γ = 0.8; (C)α = 0.9, β = 0.5, γ = 0.8; (D)α = 0.9, β = 0.6, γ = 0.8). Y-axis: the number of matched genes identified using FIP; X-axis: the number of ranked genes.

https://doi.org/10.1371/journal.pone.0153006.g004

Finally, the parameter combination of α = 0.8, β = 0.5, γ = 0.9, and d = 0.9 was selected to calculate vector Q so as to obtain the ranking results.

Prioritization of candidate genes and validation by literature review

In the disease-related network, all the genes were prioritized by FIP according to vector Q in the optimal ranking coefficient parameter combination. As a result, a total 200 of top candidate genes were generated (S1 Table). We manually searched these top 200 candidate genes for drug targets in literature of PubMed (http://www.ncbi.nlm.nih.gov/pubmed). There were 34 antithrombotic drug targets among the top 200 candidates, including thrombin -activated factor 2 receptor (F2R; rank 5), SELPLG (rank 6), APOA1 (rank 10), SCARB1 (rank 17), TTR (rank 30), and F10 (rank 37) (S1 Table). Thrombin-activated factor 2 receptor (F2R) was reported to link thrombosis to inflammation modulating interleukin 6 (IL6) synthesis [60,61]. Treatment of rats with APOA1 Milano (the mutant form of human APOA1) was shown to markedly delay thrombus formation, inhibit platelet aggregation, and to reduce weight of the thrombus [62]. FX protein was encoded by gene F10, and its mutations gave rise to severe Factor X (FX) deficiency. Anti-FX inhibitor had been approval by FDA for the prevention of venous thromboembolism surgical intervention and as an initial treatment for deep venous thrombosis and pulmonary embolism [6365].

Non-drug target candidate genes of the top 200 candidates were also reported to be associated with thrombosis. For instance, SNP could be used in the prediction of recurrent thrombosis such as susceptibility gene ALPL (rank 1) with SNP [66,67]. The coagulation factor III gene (F3; rank 11) was suggested to produce tissue factor, which could initiate thrombosis on disrupted atherosclerotic plaques [68]. The loss of CYP2C19 (rank 22) function triggered platelet reactivity, which was a predictor of stent thrombosis [69,70]. Variation of VTN (rank 28) promoter haplotype, causing transcription factor binding activity increased, was proposed to be a novel genetic marker for deep venous thrombosis [71]. Sex hormone-binding globulin (SHBG; rank 51), easily measured in routine laboratories, could serve as a marker for the risk of venous thrombosis [72].

Taken together, of the top 200 candidate genes in the ranking list, 124 candidate genes predicted by our method had been confirmed to be correlated with thrombosis in PubMed literature, which have not been recorded in disease databases (OMIM, GAD and DO) (S1 Table). Top-ranked candidates were found to have a high confirmation rate in terms of their association with thrombosis, especially top 10 candidates (Table 1).

thumbnail
Table 1. The confirmation rate of top 200 candidate genes in the ranking list.

https://doi.org/10.1371/journal.pone.0153006.t001

Validation of FIP through Functional and pathway analysis

We conducted DAVID (http://david.abcc.ncifcrf.gov/) [73] and Gene Ontology (GO, http://geneontology.org/) (Biological Process and Molecular Function) [74] analysis to assess the functional enrichment of the identified candidate genes. In this way, the biological features/or meanings of the candidate genes can be extracted in order to improve the classification of these genes in terms of their functionalities. The classification was further interpreted in KEGG (http://www.genome.jp/kegg/) [75] pathways (FDR <0.05). Top 200 candidates were selected and divided into four groups with 50 genes in each, followed by KEGG and GO analysis in DAVID. As a result, 10 significant functional categories were identified and associated with thrombotic disease (Fig 5) [7683]. For instance, GO: 0007596~blood coagulation was reported to be the main cause of thrombosis and recurrence. Blood coagulation, causing damage to the vascular endothelium, was suggested to initiate acute venous thrombus generation [84]. The maximum number of candidate and seed genes were found in GO: 0009611~respond to wounding functional category. The most common sites of wounding in conflict were extremities, which were associated with a significant incidence of vascular trauma, and had a high complication rate (graft thrombosis) [85]. ‘GO: 0030168~platelet activation’, leading to severe end-organ damage, was shown to increase the risk of thrombosis, implying that platelet reactivity was an important pathological mechanism of thrombosis [86,87].

thumbnail
Fig 5. The top 200 candidate genes and known genes involved in the identified 10 functional categories.

The genes were analyzed in GO and KEGG with DAVID and classified into 10 VTE-related functional categories.

https://doi.org/10.1371/journal.pone.0153006.g005

We counted the number of the candidate and seed genes among the 10 functional categories which each gene was annotated to. Ten candidate genes (95% confidence interval) appeared in more than 8 functional categories and were confirmed by literature (Fig 6).

thumbnail
Fig 6. The distribution of the top 200 candidates and seeds among different functional categories.

X-axis and y-axis represent the number of the functional categories and enriched genes, respectively.

https://doi.org/10.1371/journal.pone.0153006.g006

Moreover, 6 of these candidates were drug targets, and 3 of them were at top 50 candidate genes (Table 2).

thumbnail
Table 2. The candidate genes in more than eight functional categories.

https://doi.org/10.1371/journal.pone.0153006.t002

Furthermore, the known disease-related genes and top 200 candidate genes were obviously enriched in four common pathways: Hematopoietic cell lineage, cytokine-cytokine receptor interaction, Cell adhesion molecules (CAMs) and complement and coagulation cascades pathway (FDR<0.05). The coagulation cascade pathway appeared to be a critical determinant of atherosclerotic plaque thrombogenicity [88]. Cell adhesion molecules (CAMs), hematopoietic cell lineage and cytokine-cytokine receptor interaction were also associated with thrombosis [85,89,90]. We mapped the enriched genes, including the known disease-related genes and candidate genes, in the coagulation cascades pathway [91,92] (Fig 7).

thumbnail
Fig 7. Coagulation cascades pathway.

The red, green, pink, orange and blue rectangles present known genes in the top 1–50, top 51–100, top 101–150 and top 151–200 candidate genes.

https://doi.org/10.1371/journal.pone.0153006.g007

In the map, there are 19 known genes and 9 candidate genes, respectively. Among these 9 candidate genes, each of them was annotated to no less than four functional categories, especially F2R, SERPINF2, and A2M, which were annotated to more than eight functional categories (Fig 6). Triggering tissue factor (F3) and F2R (coagulation pathway sensors) have been shown to influence the vascular microenvironment and angiogenesis respective of clinically apparent thrombosis [93,94]. The mutations of other two genes, PROC and PROS1, were shown to increase risk of recurrent thromboembolic events if they were combined with other genetic or environmental thrombosis factors [95,96]. A2M was reported to inhibit the known genes PROS1 and PROC in the coagulation cascade pathway, which could be associated with recurrent thrombosis.

Comparison of FIP to ToppNet and Endeavour

To evaluate the performance of the proposed FIP method in predicting novel recurrent thrombosis genes by prioritizing candidate genes, we carried out LOOCV on the known disease-related genes. In this validation, the same training and testing gene sets were used in the FIP, ToppNet, and Endeavour methods. The ROC curves were plotted to compare the performance of the three methods (Fig 8).

thumbnail
Fig 8. The ROC curves of FIP, ToppNet, and Endeavour methods.

https://doi.org/10.1371/journal.pone.0153006.g008

The AUC value of FIP method was 0.9107, which was much higher than ToppNet (0.7150) and Endeavour (0.8127). Thus, FIP method provided a good performance in efficiently identifying known disease-related genes in the prioritization gene list and was more sensitive and specific in ranking the test genes.

To further verify the top-ranked candidates as novel disease recurrence genes, support vector machine (SVM) was applied to classify normal and recurrent samples with top-ranked candidates as classification characteristics. The outcome of FIP was then compared with those of ToppNet and Endeavour methods with the top 50 and top 100 candidates as classification characteristics, respectively. Four performance measurements, false positive rate (FPR), true positive rate (TPR), best cutoff curve, and AUC, were calculated (Fig 9).

thumbnail
Fig 9. The performance of sample classification with top-ranked candidates of FIP, ToppNet, and Endeavour methods.

(A) ROC curves of FIP, ToppNet, and Endeavour methods with the top 50 candidates as classification characteristics (B) ROC curves of FIP, ToppNet, and Endeavour methods with the top 100 candidates as classification characteristics.

https://doi.org/10.1371/journal.pone.0153006.g009

The AUC values of FIP were higher than those of ToppNet and Endeavour methods using either the top 50 or 100 candidates as classification characteristics. In the meantime, the AUC values of each method using the top 50 candidates as characteristics were higher than those of each corresponding method using top 100 candidates as characteristics.

To explore the factors which may affect the performance of FIP, we first assessed the correlation between specific expression profile and outcome of gene prioritization. P(i,j) in formula (2) were assigned randomly from all correlation coefficients using sampling with and without replacement, respectively. The disease relevance score Q was recalculated and genes were ranked according to the q value. The seed numbers in the top 50 and top 100 ranking list were calculated to evaluate the performance of our method. Each process was repeated 100 times. The results showed that the performance of our method was better than that of random sampling of specific expression data (Fig 10). It suggested that specific expression profile did affect the performance of gene prioritization methods.

thumbnail
Fig 10. Comparison of the performance of the FIP with and without replacement in random sampling.

50N and 100N were top 50 and top 100 genes in the ranking list with and without replacement, respectively. 50Y and 100Y were top 50 and top 100 genes in the ranking list with back, respectively. Asterisks present our results.

https://doi.org/10.1371/journal.pone.0153006.g010

Secondly, we evaluated the importance of protein interaction reliability. We altered the S(i,j) in formula (2) to 1 (no protein interaction reliability) and recalculated the disease relevance score Q. Genes were ranked according to the q value. LOOCV was used to assess the performance using the new weights. Its AUC (0.6878) was lower than that of the original weights (AUC = 0.9107) (Fig 11).

thumbnail
Fig 11. The ROC of the weight with protein interaction reliability and non-protein interaction reliability.

https://doi.org/10.1371/journal.pone.0153006.g011

To evaluate the robustness of FIP, 10-fold CV was also applied to ToppNet and Endeavour. There was a statistical significance between FIP and ToppNet (one-side t-test, p-value<0.05) as well as FIP and Endeavour (one-side t-test, p-value<0.05) (S1 Fig).

We performed literature validation, function annotation and pathway analysis for top 200 candidates of ToppNet and Endeavour (S2S4 Figs). In general, the performances of FIP were better than those of ToppNet and Endeavour.

Discussion

In this study, we devised and implemented a novel algorithm called FIP to prioritize candidate genes involved in VTE. This algorism is based on overall similarity with its neighboring genes by taking into account three aspects: expressions, functions, and interactions. In this way, we were able to prioritize the genes involved in VTE. For the top 200 candidates, we manually searched in PubMed literature and 124 genes were confirmed, in which 34 were drug targets. Furthermore, we conducted KEGG and GO analysis to functionally enrich the identified candidate genes. More candidates not confirmed by literature were classified into 10 significant functional categories which were associated with thrombotic disease (Fig 5). Overall FIP had a better predictive performance and achieved a reliable AUC value.

In reality, multiple properties of genes could be associated with each other in disease states contributing to the formation of disease. Integrating multiple data sources of genes has been reported to be better than a single data source in terms of sensitivity and accuracy of gene prioritization [97]. In our study, we compared the performance of integrating three data sources with those of integrating the two. As a result, there was no significant difference of the number of the matched seed genes between the combinations of integrating two data sources, while the combinations of integrating three data sources produced the much better performance than those of integrating two data sources (t-test, p<0.05) in terms of the number of the matched seed genes (S2 Table). Moreover, coefficients such as α, β, and γ and the control parameter d were shown to affect the performance of gene prioritization. According to the number of matched seed genes, LOOCV and 10-fold cross-validation, the best performance of gene prioritization was achieved in the parameter combination of α = 0.8, β = 0.5, γ = 0.9, and d = 0.9 in prioritizing VTE-related genes in this study.

ToppNet and Endeavour are currently commonly used prioritization methods. According to network properties-based knowledge, ToppNet employs three algorithms (PageRank, Hyperlink-Induced Topic Search-HITS, and K-step Markov) to prioritize disease-related candidate genes by estimating their relative importance in PPIN [98,99]. Thus, ToppNet ranks or prioritizes genes based on topological features in PPIN with only one data type. As described above, the performance of integrating more data sources was better than those of integrating less ones. Thus, it is not surprised that FIP outperformed ToppNet in prioritizing genes involved in VTE in this study (Fig 8).

Endeavour takes the similar three data types as what we used in this study to rank candidate genes except its expression data background (high-density gene expression database). As compared to Endeavour, FIP applied disease-specific expression data, including recurrent VTE sample data, in our study. In theory, whether disease-specific or non disease-specific expression data through random sampling gene expression data could affect the performance of gene prioritization. This was confirmed by our results that disease-specific gene expression data did affect the performance of FIP (Fig 10). It was shown that FIP using VTE-specific gene expression data achieved the better performance than Endeavour using non disease-specific expression data since the identified top candidates by FIP through VTE-specific gene express analysis were more likely to be associated with VTE.

On the other hand, protein interaction databases used by the commonly used prioritization methods, including Endeavour, don’t provide the details enough to assess whether a protein binds its interaction partner(s) which share similar structural or chemical properties and functionalities since many protein-protein links are loose because of random or unspecific bindings of proteins collected in databases. Thus, the reliability of protein-protein interaction is interrogated, resulting in low accuracy of ranking genes. In fact, the edge weight in the disease-related network can provide reasonable and consistent values to quantify the strength of connection of proteins. In our study, we took this feature into account in prioritizing candidates. As a baseline for weighted networks, we constructed a non-weighted network with the same protein interaction pairs and assessed the performance of FIPs using a weighted or a non-weighted network. It was showed FIP using a weighted network achieved a better performance (Fig 11). This result implied that the improved reliability of protein interaction applied by FIP might enhance its performance compared to Endeavour in prioritizing candidates related to VTE.

In summary, our FIP method combined experimental data with mathematical modeling and provided an alternative system biology approach in promising to tackle complex VTE disease for aiding diagnosis of recurrent VTE. This method could also be applied to other complex diseases to reveal disease mechanism and provide new perspective for diagnosis and drug development.

Supporting Information

S1 Fig. The boxplot of FIP, ToppNet and Endeavour.

https://doi.org/10.1371/journal.pone.0153006.s001

(TIF)

S2 Fig. The Venn diagrams of literature validation among three methods at top 100 (left) and top 200 (right) candidates.

The numbers in the slash left and right present the number of confirmed genes and the number of candidate genes, respectively.

https://doi.org/10.1371/journal.pone.0153006.s002

(TIF)

S3 Fig. The comparison of literature validation for top 200 candidates generated from three methods on ten function categories.

https://doi.org/10.1371/journal.pone.0153006.s003

(TIF)

S4 Fig. The comparison of literature validation among three methods at four pathways for top 200 candidates.

https://doi.org/10.1371/journal.pone.0153006.s004

(TIF)

S5 Fig. The Venn diagrams of literature validation among four parameter combinations for top 200 candidates.

The numbers in the slash left and right present the number of confirmed genes and the number of candidate genes, respectively.

https://doi.org/10.1371/journal.pone.0153006.s005

(TIF)

S1 Table. Top 200 candidate genes identified by FIP.

https://doi.org/10.1371/journal.pone.0153006.s006

(DOC)

S2 Table. The significance between combinations among the integration of the two and three data sources.

https://doi.org/10.1371/journal.pone.0153006.s007

(DOC)

S3 Table. The literature validation of top 200 candidate genes among three methods.

https://doi.org/10.1371/journal.pone.0153006.s008

(DOC)

S4 Table. The top 200 candidates of three methods on ten function categories.

https://doi.org/10.1371/journal.pone.0153006.s009

(DOC)

S5 Table. The top 200 candidates among three methods at four pathways.

https://doi.org/10.1371/journal.pone.0153006.s010

(DOC)

S6 Table. The literature validation among four parameter combinations for top 200 candidates.

https://doi.org/10.1371/journal.pone.0153006.s011

(DOC)

Author Contributions

Conceived and designed the experiments: LNC WMH. Performed the experiments: JJL YHH. Analyzed the data: YRL HH. Contributed reagents/materials/analysis tools: BBC RQX. Wrote the paper: JJ WL BHL.

References

  1. 1. Kooiman J, van Hagen N, Iglesias Del Sol A, Planken EV, Lip GY, van der Meer FJ, et al. (2015) The HAS-BLED Score Identifies Patients with Acute Venous Thromboembolism at High Risk of Major Bleeding Complications during the First Six Months of Anticoagulant Treatment. PLoS One 10: e0122520. pmid:25905638
  2. 2. Tatebe S (2015) Cardiologists and the management of obstetric venous thromboembolism. Circ J 79: 1215–1217. pmid:25902743
  3. 3. Chew TW, Gau CS, Wen YW, Shen LJ, Mullins CD, Hsiao FY. (2015) Epidemiology, clinical profile and treatment patterns of venous thromboembolism in cancer patients in Taiwan: a population-based study. BMC Cancer 15: 298. pmid:25925555
  4. 4. Hamidi S, Riazi M (2015) Cutoff values of plasma d-dimer level in patients with diagnosis of the venous thromboembolism after elective spinal surgery. Asian Spine J 9: 232–238. pmid:25901235
  5. 5. Millan Longo C (2014) [Oral apixaban for the treatment of acute venous thromboembolism]. Rev Clin Esp (Barc) 214: 164.
  6. 6. Schulman S, Lindmarker P, Holmstrom M, Larfars G, Carlsson A, Nicol P, et al. (2006) Post-thrombotic syndrome, recurrence, and death 10 years after the first episode of venous thromboembolism treated with warfarin for 6 weeks or 6 months. J Thromb Haemost 4: 734–742. pmid:16634738
  7. 7. Rubio-Terres C, Soria JM, Morange PE, Souto JC, Suchon P, Mateo J, et al. (2015) Economic analysis of thrombo inCode, a clinical-genetic function for assessing the risk of venous thromboembolism. Appl Health Econ Health Policy 13: 233–242. pmid:25652150
  8. 8. Cai J, Preblick R, Zhang Q, Kwong WJ (2014) Utilization of parenteral anticoagulants and warfarin: impact on the risk of venous thromboembolism recurrence in the outpatient setting. Am Health Drug Benefits 7: 444–451. pmid:25558306
  9. 9. Schulman S, Kakkar AK, Goldhaber SZ, Schellong S, Eriksson H, Mismetti P, et al. (2014) Treatment of acute venous thromboembolism with dabigatran or warfarin and pooled analysis. Circulation 129: 764–772. pmid:24344086
  10. 10. Moll S, Mackman N (2008) Venous thromboembolism: a need for more public awareness and research into mechanisms. Arterioscler Thromb Vasc Biol 28: 367–369. pmid:18296590
  11. 11. Pabinger I, Ay C (2009) Biomarkers and venous thromboembolism. Arterioscler Thromb Vasc Biol 29: 332–336. pmid:19228607
  12. 12. Verhovsek M, Douketis JD, Yi Q, Shrivastava S, Tait RC, Baglin T, et al. (2008) Systematic review: D-dimer to predict recurrent disease after stopping anticoagulant therapy for unprovoked venous thromboembolism. Ann Intern Med 149: 481–490, W494. pmid:18838728
  13. 13. Kyrle PA, Hron G, Eichinger S, Wagner O (2007) Circulating P-selectin and the risk of recurrent venous thromboembolism. Thromb Haemost 97: 880–883. pmid:17549288
  14. 14. Hron G, Kollars M, Binder BR, Eichinger S, Kyrle PA (2006) Identification of patients at low risk for recurrent venous thromboembolism by measuring thrombin generation. JAMA 296: 397–402. pmid:16868297
  15. 15. Lewis DA, Suchindran S, Beckman MG, Hooper WC, Grant AM, Heit JA, et al. (2015) Whole blood gene expression profiles distinguish clinical phenotypes of venous thromboembolism. Thromb Res 135: 659–665. pmid:25684211
  16. 16. Lewis DA, Stashenko GJ, Akay OM, Price LI, Owzar K, Ginsburg GS, et al. (2011) Whole blood gene expression analyses in patients with single versus recurrent venous thromboembolism. Thromb Res 128: 536–540. pmid:21737128
  17. 17. Liebner DA, Huang K, Parvin JD (2014) MMAD: microarray microdissection with analysis of differences is a computational tool for deconvoluting cell type-specific contributions from tissue samples. Bioinformatics 30: 682–689. pmid:24085566
  18. 18. Chen J, Xu H, Aronow BJ, Jegga AG (2007) Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinformatics 8: 392. pmid:17939863
  19. 19. Tranchevent LC, Barriot R, Yu S, Van Vooren S, Van Loo P, Coessens B, et al. (2008) ENDEAVOUR update: a web resource for gene prioritization in multiple species. Nucleic Acids Res 36: W377–384. pmid:18508807
  20. 20. Adie EA, Adams RR, Evans KL, Porteous DJ, Pickard BS (2006) SUSPECTS: enabling fast and effective prioritization of positional candidates. Bioinformatics 22: 773–774. pmid:16423925
  21. 21. Nitsch D, Tranchevent LC, Goncalves JP, Vogt JK, Madeira SC, Moreau Y (2011) PINTA: a web server for network-based gene prioritization from expression data. Nucleic Acids Res 39: W334–338. pmid:21602267
  22. 22. Chang JT, Nevins JR (2006) GATHER: a systems approach to interpreting genomic signatures. Bioinformatics 22: 2926–2933. pmid:17000751
  23. 23. Foong J, Girdea M, Stavropoulos J, Brudno M (2015) Prioritizing Clinically Relevant Copy Number Variation from Genetic Interactions and Gene Function Data. PLoS One 10: e0139656. pmid:26437450
  24. 24. Taniya T, Tanaka S, Yamaguchi-Kabata Y, Hanaoka H, Yamasaki C, Maekawa H, et al. (2012) A prioritization analysis of disease association by data-mining of functional annotation of human genes. Genomics 99: 1–9. pmid:22019378
  25. 25. Wang X, Gulbahce N, Yu H (2011) Network-based methods for human disease gene prediction. Brief Funct Genomics 10: 280–293. pmid:21764832
  26. 26. Luo J, Liang S (2015) Prioritization of potential candidate disease genes by topological similarity of protein-protein interaction network and phenotype data. J Biomed Inform 53: 229–236. pmid:25460206
  27. 27. Goncalves JP, Francisco AP, Moreau Y, Madeira SC (2012) Interactogeneous: disease gene prioritization using heterogeneous networks and full topology scores. PLoS One 7: e49634. pmid:23185389
  28. 28. Magger O, Waldman YY, Ruppin E, Sharan R (2012) Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput Biol 8: e1002690. pmid:23028288
  29. 29. Antanaviciute A, Daly C, Crinnion LA, Markham AF, Watson CM, Bonthron DT, et al. (2015) GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles. Bioinformatics.
  30. 30. Oellrich A, Sanger Mouse Genetics P, Smedley D (2014) Linking tissues to phenotypes using gene expression profiles. Database (Oxford) 2014: bau017.
  31. 31. Xiao Y, Xu C, Ping Y, Guan J, Fan H, Li Y, et al. (2011) Differential expression pattern-based prioritization of candidate genes through integrating disease-specific expression data. Genomics 98: 64–71. pmid:21515357
  32. 32. Rao BS, Gupta KK, Karanam P, Peruri A (2014) Alzheimer disease: An interactome of many diseases. Ann Indian Acad Neurol 17: 48–54. pmid:24753659
  33. 33. Barh D, Kamapantula B, Jain N, Nalluri J, Bhattacharya A, Juneja L, et al. (2015) miRegulome: a knowledge-base of miRNA regulomics and analysis. Sci Rep 5: 12832. pmid:26243198
  34. 34. O'Brien MA, Costin BN, Miles MF (2012) Using genome-wide expression profiling to define gene networks relevant to the study of complex traits: from RNA integrity to network topology. Int Rev Neurobiol 104: 91–133. pmid:23195313
  35. 35. Lascorz J, Forsti A, Chen B, Buch S, Steinke V, Rahner N, et al. (2010) Genome-wide association study for colorectal cancer identifies risk polymorphisms in German familial cases and implicates MAPK signalling pathways in disease susceptibility. Carcinogenesis 31: 1612–1619. pmid:20610541
  36. 36. Dever SM, Costin BN, Xu R, El-Hage N, Balinang J, Samoshkin A, et al. (2014) Differential expression of the alternatively spliced OPRM1 isoform mu-opioid receptor-1K in HIV-infected individuals. AIDS 28: 19–30. pmid:24413261
  37. 37. Chapman NH, Wijsman EM (1998) Genome screens using linkage disequilibrium tests: optimal marker characteristics and feasibility. Am J Hum Genet 63: 1872–1885. pmid:9837839
  38. 38. Zheng G, Freidlin B, Gastwirth JL (2006) Robust genomic control for association studies. Am J Hum Genet 78: 350–356. pmid:16400614
  39. 39. Sasieni PD (1997) From genotypes to genes: doubling the sample size. Biometrics 53: 1253–1261. pmid:9423247
  40. 40. Manolio TA (2010) Genomewide association studies and assessment of the risk of disease. N Engl J Med 363: 166–176. pmid:20647212
  41. 41. Che J, Shin M (2015) A meta-analysis strategy for gene prioritization using gene expression, SNP genotype, and eQTL data. Biomed Res Int 2015: 576349. pmid:25874220
  42. 42. Jiang R, Wu M, Li L (2015) Pinpointing disease genes through phenomic and genomic data fusion. BMC Genomics 16 Suppl 2: S3. pmid:25708473
  43. 43. Vafaee F, Rosu D, Broackes-Carter F, Jurisica I (2013) Novel semantic similarity measure improves an integrative approach to predicting gene functional associations. BMC Syst Biol 7: 22. pmid:23497449
  44. 44. Wu C, Zhu J, Zhang X (2012) Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes. BMC Bioinformatics 13: 182. pmid:22838965
  45. 45. Hulsegge I, Woelders H, Smits M, Schokker D, Jiang L, Sorensen P (2013) Prioritization of candidate genes for cattle reproductive traits, based on protein-protein interactions, gene expression, and text-mining. Physiol Genomics 45: 400–406.
  46. 46. Zhang SW, Shao DD, Zhang SY, Wang YB (2014) Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression. Mol Biosyst 10: 1400–1408. pmid:24695957
  47. 47. Aerts S, Lambrechts D, Maity S, Van Loo P, Coessens B, De Smet F, et al. (2006) Gene prioritization through genomic data fusion. Nat Biotechnol 24: 537–544. pmid:16680138
  48. 48. Menzies RI, Unwin RJ, Dash RK, Beard DA, Cowley AW Jr., Carlson BE, et al. (2013) Effect of P2X4 and P2X7 receptor antagonism on the pressure diuresis relationship in rats. Front Physiol 4: 305. pmid:24187541
  49. 49. Khan K, Al-Maskari A, McKibbin M, Carr IM, Booth A, Mohamed M, et al. (2011) Genetic heterogeneity for recessively inherited congenital cataract microcornea with corneal opacity. Invest Ophthalmol Vis Sci 52: 4294–4299. pmid:21474777
  50. 50. Oliver KL, Lukic V, Thorne NP, Berkovic SF, Scheffer IE, Bahlo M (2014) Harnessing gene expression networks to prioritize candidate epileptic encephalopathy genes. PLoS One 9: e102079. pmid:25014031
  51. 51. Mughini-Gras L, Enserink R, Friesema I, Heck M, van Duynhoven Y, van Pelt W (2014) Risk factors for human salmonellosis originating from pigs, cattle, broiler chickens and egg laying hens: a combined case-control and source attribution analysis. PLoS One 9: e87933. pmid:24503703
  52. 52. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. (2013) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41: D991–995. pmid:23193258
  53. 53. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27–30. pmid:10592173
  54. 54. Becker KG, Barnes KC, Bright TJ, Wang SA (2004) The genetic association database. Nat Genet 36: 431–432. pmid:15118671
  55. 55. Kibbe WA, Arze C, Felix V, Mitraka E, Bolton E, Fu G, et al. (2015) Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data. Nucleic Acids Res 43: D1071–1078. pmid:25348409
  56. 56. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. (2015) STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 43: D447–452. pmid:25352553
  57. 57. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, et al. (2009) STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res 37: D412–416. pmid:18940858
  58. 58. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al. (2013) STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res 41: D808–815. pmid:23203871
  59. 59. Shin M, Lee H (2011) Prioritizing candidate genes by weighted network structure for the identification of disease marker genes. BioChip Journal 5.
  60. 60. Gigante B, Vikstrom M, Meuzelaar LS, Chernogubova E, Silveira A, Hooft FV, et al. (2009) Variants in the coagulation factor 2 receptor (F2R) gene influence the risk of myocardial infarction in men through an interaction with interleukin 6 serum levels. Thromb Haemost 101: 943–953. pmid:19404549
  61. 61. Misumida N, Kobayashi A, Saeed M, Fox JT, Kanei Y (2015) Prevalence and outcomes of non-ST-segment elevation myocardial infarction resulting from stent thrombosis. Cardiovasc Revasc Med.
  62. 62. Li D, Weng S, Yang B, Zander DS, Saldeen T, Nichol WW, et al. (1999) Inhibition of arterial thrombus formation by ApoA1 Milano. Arterioscler Thromb Vasc Biol 19: 378–383. pmid:9974422
  63. 63. Livnat T, Shenkman B, Kenet G, Tamarin I, Gillis S, Varon D, et al. (2011) Severe factor X deficiency in three unrelated Palestinian patients is caused by homozygosity for the mutation c302delG-correlation with thrombin generation and thromboelastometry. Blood Coagul Fibrinolysis 22: 673–679. pmid:22008904
  64. 64. Lippi G, Harenberg J, Mattiuzzi C, Favaloro EJ (2015) Next generation antithrombotic therapy: focus on antisense therapy against coagulation factor XI. Semin Thromb Hemost 41: 255–262. pmid:25703390
  65. 65. Luan D, Zai M, Varner JD (2007) Computationally derived points of fragility of a human cascade are consistent with current therapeutic strategies. PLoS Comput Biol 3: e142. pmid:17658944
  66. 66. Tregouet DA, Sabater-Lleal M, Bruzelius M, Emmerich J, Amouyel P, Dartigues JF, et al. (2012) Lack of association of non-synonymous FUT2 and ALPL polymorphisms with venous thrombosis. J Thromb Haemost 10: 1693–1695. pmid:22672431
  67. 67. van Hylckama Vlieg A, Flinterman LE, Bare LA, Cannegieter SC, Reitsma PH, Arellano AR, et al. (2014) Genetic variations associated with recurrent venous thrombosis. Circ Cardiovasc Genet 7: 806–813. pmid:25210051
  68. 68. Smith NL, Heit JA, Tang W, Teichert M, Chasman DI, Morange PE, et al. (2012) Genetic variation in F3 (tissue factor) and the risk of incident venous thrombosis: meta-analysis of eight studies. J Thromb Haemost 10: 719–722. pmid:22340074
  69. 69. Xie C, Ding X, Gao J, Wang H, Hang Y, Zhang H, et al. (2014) The effects of CES1A2 A(-816)C and CYP2C19 loss-of-function polymorphisms on clopidogrel response variability among Chinese patients with coronary heart disease. Pharmacogenet Genomics 24: 204–210. pmid:24535487
  70. 70. Dahabreh IJ, Moorthy D, Lamont JL, Chen ML, Kent DM, Lau J (2013). Testing of CYP2C19 Variants and Platelet Reactivity for Guiding Antiplatelet Treatment. Rockville (MD).
  71. 71. Wang Y, Xu J, Chen J, Fan X, Zhang Y, Yu W, et al. (2013) Promoter variants of VTN are associated with vascular disease. Int J Cardiol 168: 163–168. pmid:23041018
  72. 72. Raps M, Helmerhorst F, Fleischer K, Thomassen S, Rosendaal F, Rosing J, et al. (2012) Sex hormone-binding globulin as a marker for the thrombotic risk of hormonal contraceptives. J Thromb Haemost 10: 992–997. pmid:22469296
  73. 73. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4: 44–57. pmid:19131956
  74. 74. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29. pmid:10802651
  75. 75. Kanehisa M, Goto S, Sato Y, Kawashima M, Furumichi M, Tanabe M (2014) Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res 42: D199–205. pmid:24214961
  76. 76. Palta S, Saroa R, Palta A (2014) Overview of the coagulation system. Indian J Anaesth 58: 515–523. pmid:25535411
  77. 77. Thushara RM, Hemshekhar M, Basappa , Kemparaju K, Rangappa KS, Girish K. S. (2015) Biologicals, platelet apoptosis and human diseases: An outlook. Crit Rev Oncol Hematol 93: 149–158. pmid:25439323
  78. 78. Mollard JM, Prevot JM, Baulande M (1985) [Prevention of deep venous thrombosis by physical methods. Use of an external electrical stimulator. Initial results in surgery of the hip]. Phlebologie 38: 293–305. pmid:3875112
  79. 79. Koupenova M, Freedman JE (2015) Platelets: the unsung hero of the immune response. J Thromb Haemost 13: 268–270. pmid:25471925
  80. 80. Gardiner EE, Andrews RK (2014) Structure and function of platelet receptors initiating blood clotting. Adv Exp Med Biol 844: 263–275. pmid:25480646
  81. 81. Marcus AJ, Safier LB (1993) Thromboregulation: multicellular modulation of platelet reactivity in hemostasis and thrombosis. FASEB J 7: 516–522. pmid:8472890
  82. 82. Liu H, Hou J, Hu S, Du X, Fang Y, Jia H, et al. (2014) A rabbit model of spontaneous thrombosis induced by lipopolysaccharide. J Atheroscler Thromb 21: 1075–1086. pmid:24898380
  83. 83. Loren CP, Aslan JE, Rigg RA, Nowak MS, Healy LD, Gruber A, et al. (2015) The BCR-ABL inhibitor ponatinib inhibits platelet immunoreceptor tyrosine-based activation motif (ITAM) signaling, platelet activation and aggregate formation under shear. Thromb Res 135: 155–160. pmid:25527332
  84. 84. Myers DD Jr. (2015) Pathophysiology of venous thrombosis. Phlebology 30: 7–13. pmid:25729062
  85. 85. Brown KV, Ramasamy A, Tai N, MacLeod J, Midwinter M, Clasper JC (2009) Complications of extremity vascular injuries in conflict. J Trauma 66: S145–149. pmid:19359958
  86. 86. Versteeg HH, Heemskerk JW, Levi M, Reitsma PH (2013) New fundamentals in hemostasis. Physiol Rev 93: 327–358. pmid:23303912
  87. 87. Li S, Chen H, Ren J, Geng Q, Song J, Lee C, et al. (2014) MicroRNA-223 inhibits tissue factor expression in vascular endothelial cells. Atherosclerosis 237: 514–520. pmid:25463083
  88. 88. Tatsumi K, Mackman N (2015) Tissue Factor and Atherothrombosis. J Atheroscler Thromb.
  89. 89. Han J, Shuvaev VV, Davies PF, Eckmann DM, Muro S, Muzykantov VR (2015) Flow shear stress differentially regulates endothelial uptake of nanocarriers targeted to distinct epitopes of PECAM-1. J Control Release 210: 39–47. pmid:25966362
  90. 90. Kahner BN, Dorsam RT, Mada SR, Kim S, Stalker TJ, Brass LF, et al. (2007) Hematopoietic lineage cell specific protein 1 (HS1) is a functionally important signaling molecule in platelet activation. Blood 110: 2449–2456. pmid:17579181
  91. 91. Kanehisa M, Sato Y, Morishima K (2015) BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol.
  92. 92. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40: D109–114. pmid:22080510
  93. 93. D'Asti E, Kool M, Pfister SM, Rak J (2014) Coagulation and angiogenic gene expression profiles are defined by molecular subgroups of medulloblastoma: evidence for growth factor-thrombin cross-talk. J Thromb Haemost 12: 1838–1849. pmid:25163932
  94. 94. Zhang F, Wen Y, Guo X, Zhang Y, Wang S, Yang T, et al. (2015) Genome-wide pathway-based association study implicates complement system in the development of Kashin-Beck disease in Han Chinese. Bone 71: 36–41. pmid:25305519
  95. 95. Heeb MJ (2008) Role of the PROS1 gene in thrombosis: lessons and controversies. Expert Rev Hematol 1: 9–12. pmid:19809585
  96. 96. Wypasek E, Undas A (2013) Protein C and protein S deficiency—practical diagnostic issues. Adv Clin Exp Med 22: 459–467. pmid:23986205
  97. 97. Petrillo A, Fusco R, Petrillo M, Granata V, Sansone M, Avallone A, et al. (2015) Standardized Index of Shape (SIS): a quantitative DCE-MRI parameter to discriminate responders by non-responders after neoadjuvant therapy in LARC. Eur Radiol 25: 1935–1945. pmid:25577525
  98. 98. Chen J, Bardes EE, Aronow BJ, Jegga AG (2009) ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37: W305–311. pmid:19465376
  99. 99. Junker BH, Koschutzki D, Schreiber F (2006) Exploration of biological network centralities with CentiBiN. BMC Bioinformatics 7: 219. pmid:16630347