Development of heart diseases is driven by dynamic changes in both the activity and connectivity of gene pathways. Understanding these dynamic events is critical for understanding pathogenic mechanisms and development of effective treatment. Currently, there is a lack of computational methods that enable analysis of multiple gene networks, each of which exhibits differential activity compared to the network of the baseline/healthy condition. We describe the iMDM algorithm to identify both unique and shared gene modules across multiple differential co-expression networks, termed M-DMs (multiple differential modules). We applied iMDM to a time-course RNA-Seq dataset generated using a murine heart failure model generated on two genotypes. We showed that iMDM achieves higher accuracy in inferring gene modules compared to using single or multiple co-expression networks. We found that condition-specific M-DMs exhibit differential activities, mediate different biological processes, and are enriched for genes with known cardiovascular phenotypes. By analyzing M-DMs that are present in multiple conditions, we revealed dynamic changes in pathway activity and connectivity across heart failure conditions. We further showed that module dynamics were correlated with the dynamics of disease phenotypes during the development of heart failure. Thus, pathway dynamics is a powerful measure for understanding pathogenesis. iMDM provides a principled way to dissect the dynamics of gene pathways and its relationship to the dynamics of disease phenotype. With the exponential growth of omics data, our method can aid in generating systems-level insights into disease progression.
Recent advances in systems biology have revealed that changes in the structure and activity of gene network play a critical role in the disease progression. Heart failure is a complex disease involving multiple molecular pathways. Yet little is known regarding the dynamic changes in the gene network of heart cells during heart failure development. We have combined experimental and computational approaches to address this question. We developed a computational method to analyze multiple gene networks, each of which exhibits differential activity compared to the network of the healthy condition. In doing so, we are able to identify both unique and shared gene pathways across multiple differential networks. By applying our algorithm to our time-course transcriptome data of heart failure, we revealed dynamic changes in pathway activity and connectivity across heart failure conditions. We further showed that pathway dynamics were correlated with the dynamics of disease phenotypes during the development of heart failure. Our approach provides a principled way to dissect the dynamics of gene pathways and its relationship to the dynamics of disease phenotype.
Citation: Ma X, Gao L, Karamanlidis G, Gao P, Lee CF, Garcia-Menendez L, et al. (2015) Revealing Pathway Dynamics in Heart Diseases by Analyzing Multiple Differential Networks. PLoS Comput Biol 11(6): e1004332. https://doi.org/10.1371/journal.pcbi.1004332
Editor: Mona Singh, Princeton University, UNITED STATES
Received: February 10, 2015; Accepted: May 12, 2015; Published: June 17, 2015
Copyright: © 2015 Ma et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: RNA-Seq data were deposited in the NCBI GEO database, accession GSE66630.
Funding: This work was supported by National Institutes of Health grants GM108716 to KT and HL110349 to RT. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Many heart diseases are attributable to both genetic and environmental factors . These factors can perturb gene transcript levels, protein levels, and metabolite levels, which in turn perturbs the interactions among the molecules. Perturbation of the molecular network ultimately leads to perturbation of the cellular and physiological states, contributing to the diseases. Therefore, understanding molecular networks can lead to important insights into the pathogenic mechanisms of heart diseases.
The concept of network biology has been applied to studies of various cardiovascular diseases, including heart failure [2,3], atherosclerosis , coronary heart disease , and atrial fibrillation , just to name a few. Because transcriptome data is the most abundant type of omics data, most studies used co-expression networks. In such networks, two genes are connected and assumed to functionally interact if their expression profiles are correlated across multiple conditions. Because genes in the same pathway tend to have correlated expression, analyzing co-expression network is an effective strategy for pathway inference. However, a limitation of previous studies is that networks were constructed using only co-expression information. This practice reduces the statistical power for identifying pathways that are perturbed under diseased conditions. It is more powerful to identify groups of genes that exhibit coherent differential activities between healthy and diseased conditions. Such gene groups directly capture the perturbed pathways.
Here we described a novel computational framework, inference of multiple differential modules (iMDM) that enables simultaneous analysis of multiple differential co-expression networks (DCNs). iMDM finds coherently differentially expressed gene modules that are either unique or shared among multiple DCNs. By definition, sets of genes that are differentially expressed under diseased states but do not exhibit correlated expression pattern will not be identified as a module. This is consistent with the notation that the entire pathway is perturbed under disease condition. To capture dynamic changes in gene modules across conditions, we have applied a novel graph-theoretical measure that quantifies changes in both gene activity and gene connectivity.
We demonstrated the utility of our method using the development of heart failure as our model system. Using RNA-Seq, we measured the transcriptome of the heart at four critical stages during the development of heart failure. By applying iMDM to multiple differential co-expression networks constructed from our time-course RNA-Seq dataset, we discovered both condition-specific and shared gene modules in gene networks of different heart failure conditions. By quantifying connectivity changes in shared gene modules across different conditions, we showed that gene modules with higher connectivity dynamics have higher correlation with the dynamics of heart failure phenotypic measures, suggesting that studying pathway dynamics using iMDM is an effective strategy to uncover causal genes of disease progression. Given the vast amount of transcriptome data, there are ample opportunities to apply our method to better understand the role of network dynamics in the development of heart diseases.
Systematic profiling of the transcriptome during the development of heart failure using RNA sequencing
We performed a factorial RNA-Seq study to monitor the impact of mitochondrial respiratory complex I deficiency and chronic pressure overload on the heart transcriptome as it progressed from hypertrophy to failure (Fig 1A). Complex I deficiency was triggered by cardiac-specific deletion of Ndufs4 which encodes a structural component of complex I . Pressure overload was triggered by transverse aortic constriction (TAC). For each single perturbation or their combinations, we monitored disease progression at four time points, 1, 2, 4 and 8 weeks after the introduction of the perturbation. In total, we profiled the heart transcriptome under 4 major conditions. For the sake of discussion, we termed these conditions wild type sham (WTSH), wild type TAC (WTTAC), knock out sham (KOSH), and knock out TAC (KOTAC). For each time point, four biological replicate RNA-Seq data were generated using 4 hearts (Manuscript in preparation).
A, Time-course RNA-Seq data were generated using mouse hearts perturbed by stress (TAC) or genetic perturbation (Ndufs4 deletion) or both, resulting four conditions. WTSH, sham treatment of wild type mice (baseline); KOSH, sham treatment of KO mice; WTTAC, TAC treatment of wild type mice; KOTAC, TAC treatment of KO mice. B, Hierarchical clustering of top 5000 genes with highest variance of expression levels across conditions.
Hierarchical clustering revealed that the transcriptome profiles of the hearts segregate first by treatment conditions (TAC vs. SH) and then by genotypes (WT vs. KO) (Fig 1B). Further, we found the largest number of differentially expressed genes (DEGs) in the KOTAC vs. WTSH comparison (N = 6521, False Discovery Rate (FDR) < 0.05), followed by the WTTAC vs. WTSH comparison (N = 5238). In contrast, there were only 251 DEGs in the KOSH vs. WTSH comparison. This result suggests that the transcriptome of KOTAC hearts is most perturbed, which is also associated with accelerated heart failure. On the other hand, there is only a very modest perturbation to the transcriptome of KOSH hearts.
Application of iMDM to the heart failure RNA-Seq dataset
Although clustering and differential gene expression analyses can reveal global trend in transcriptome dynamics, such methods cannot reveal individual pathways and their dynamics across conditions, motivating us to develop the inference of multiple differential modules (iMDM) algorithm. The major algorithmic steps of iMDM are illustrated in Fig 2. We applied iMDM to our heart failure RNA-Seq data and identified M-DMs that occur in single as well as multiple DCNs. Using our RNA-Seq data, we first constructed three Differential Co-expression Networks (DCNs, see Materials and Methods), each of which contains 10929 genes. At a p-value threshold of 0.05, we identified a total of 232 M-DMs, including 109 1-DMs, 107 2-DMs, and 16 3-DMs (Fig 3). A summary of the discovered M-DMs is provided in S1 Table. Consistent with the result of our differential expression analysis, the KOTAC DCN yielded the largest number of condition-specific 1-DMs (N = 56) followed by the WTTAC DCN (N = 46). In contrast, much smaller number of modules (N = 7) was identified in the KOSH DCN. iMDM also uncovered a large number of 2-DMs in both the KOTAC and the WTTAC DCNs (N = 73).
The algorithm has two major steps. First, gene expression profiles across multiple conditions are used to build differential gene co-expression networks (DCNs). To build DCNs, a binary co-expression network is constructed first in which edges are chosen based on the absolute value of Pearson correlation of the expression profiles of two genes. Only edges whose correlation exceeds a pre-defined threshold δ are included in the binary network. Edges in the binary network are then weighted (wi,j) based on the p-values (pi and pj) of differential gene expression between the baseline and disease conditions. Second, multiple differential co-expression networks are analyzed to identify shared and unique multiple differential modules (M-DMs) under different conditions. 1-DM are modules that are only found in one condition whereas M-DMs with M ≥ 2 are modules that are found in multiple conditions.
Performance benchmarking of the iMDM algorithm
We conducted two types of comparisons to demonstrate the advantages of iMDM over existing methods. To determine if using differential network can improve performance over using co-expression network alone, we have compared the performance of iMDM when fed with these two types of networks separately. We used our RNA-Seq data to construct three DCNs and three co-expression networks for WTTAC, KOTAC, and KOSH condition, respectively. The two algorithms were fed with appropriate sets of input networks (i.e. DCNs for the iMDM, co-expression networks for the other algorithm). The outputs of the two algorithms were seven sets of modules that were discovered from seven sets of networks (three single networks, three sets of two networks, and one set of three networks). To determine if using multiple networks can improve performance over using a single co-expression network, we have compared iMDM to the popular WGCNA algorithm , which is primarily designed for the analysis of a single co-expression network and has been used in several studies of heart diseases [2,4–6]. We generated seven single co-expression networks using one, two, and three experimental conditions at a time, respectively. Each single co-expression network was fed to WGCNA to return a set of modules. Like the other two algorithms, seven sets of modules were computed by WGCNA. We evaluated the resulting seven sets of modules discovered by the different algorithms using multiple reference pathway annotations, including Gene Ontology , KEGG , MGI pathways , Canonical pathways , Biocarta , and Reactome . iMDM achieved significantly higher specificity and sensitivity when evaluated using all except one reference sets (p-value < 0.05, one-sided Fisher’s exact test, Fig 4A and 4B). Besides gold-standard pathway annotations, a higher percentage of gene modules identified by iMDM was enriched for genes whose deletions lead to cardiovascular phenotypes documented in the Mouse Phenome Database  (Fig 4C). We concluded that compared to using co-expression networks, simultaneous analysis of multiple differential co-expression networks improves the inference accuracy of gene pathways.
iMDM DCN, method using multiple differential co-expression networks; iMDM Co-expression, method using multiple co-expression networks but no differential gene expression information. A, Specificity of the algorithms. Gene modules found by each method were evaluated using a set of gold-standard pathway annotations. Specificity was defined as the fraction of predicted modules that significantly overlaps with reference pathways. B, Sensitivity of the algorithms. Sensitivity was defined as the fraction of reference pathways that significantly overlaps with predicted modules. Pathway overlap P-values were computed using the hypergeometric distribution. P-values for the difference in specificity and sensitivity were computed using Fisher’s exact test. C, Percentage of predicted modules that significantly overlapped with genes whose deletions lead to cardiovascular phenotypes. P-values for the difference in the percentage of overlapped modules was computed using Fisher’s exact test. All p-values were corrected for multiple testing using the method of Benjamin-Hochberg. *, p-value < 0.05.
Condition-specific 1-DMs reveal unique pathways associated with different heart failure conditions
We found that the three sets of 1-DMs were enriched for different Gene Ontology (GO) annotations (Fig 5A). For instance, KOSH 1-DMs were enriched for nucleotide catabolism and localization of cell. WTTAC 1-DMs were enriched for terms such as tricarboxylic acid cycle, phospholipid metabolism, and enzyme linked receptor protein signaling. KOTAC 1-DMs were enriched for terms such as extracellular structure organization, fatty acid metabolism, hemostasis, and negative regulation of response to stimulus. In general, the different enriched terms were consistent with their specific phenotypes. Several of the rate-limiting steps of nucleotide metabolism take place in the mitochondria and can be affected by the fitness of the organelle . Nucleotide metabolism is generally regarded as a house-keeping process. This is likely the reason why genes involved in nucleotide metabolism were enriched in modules identified under KOSH condition. For the more severe phenotypes of WTTAC and KOTAC, other processes directly related to heart failure were enriched other than this house-keeping process. The other term unique to KOSH 1-DMs was “localization of cell”. Genes annotated with this term are involved in communication with the extracellular matrix. Changes in extracellular matrix are linked to myocardial fibrosis and inflammation, which are earlier events of, hear failure development . For the terms specifically enriched among WTTAC 1-DMs, both TCA cycle and phospholipid metabolism contribute to the general energy metabolism deficiency in failing heart, which has been observed before using the TAC model of heart failure . For KOTAC condition, loss of Ndufs4 leads to significant lower NAD+/NADH ratio in KOTAC hearts compared to WTTAC hearts . The low NAD+/NADH ratio inhibits fatty acid beta-oxidation . This coordinated down-regulation of the fatty acid module likely contributes to the more severe deregulation of energy metabolism in KOTAC hearts. Hemostasis has been reported to be associated with more severe form of heart failure such as KOTAC .
A, Enriched GO biological process terms in the three sets of 1-DMs. Enrichment p-value was computed using hypergeometric distribution. The grey scale is proportional to the minus logarithm of the enrichment p-value. B, Percentage of 1-DMs enriched for genes annotated to have cardiovascular system phenotypes when disrupted. C, Distributions of average edge weight of 1-DMs in WTTAC and KOTAC DCNs. D, Percentage of 1-DMs up- and down-regulated during the course of heart failure development. W1, week 1, etc.
Besides GO annotation, we found that a higher fraction of KOTAC 1-DMs (versus WTTAC) was enriched for genes whose disruption leads to cardiovascular phenotypes documented in the Mouse Phenome Database  (17.9% vs. 4.3%, p-value = 0.03, one-sided Fisher’s exact test, Fig 5B).
Because the edge weight in DCNs is a measure of differential gene expression between the disease and baseline conditions, a larger average edge weight of a 1-DM means a bigger difference in the expression of module genes. In other words, the average edge weight serves as a measure of differential activity of the module. We next compared the distributions of average edge weight in the 1-DMs for WTTAC and KOTAC. Our result shows that 1-DMs in the KOTAC network had a greater difference in the expression level than those in the WTTAC network (0.32 vs. 0.19, p-value = 2.4E-16, one-sided t-test, Fig 5C). We also compared the percentages of up- or down-regulated 1-DMs in the WTTAC and KOTAC networks. At a p-value cutoff of 0.01, we found that the percentage of differentially expressed (up- and down-regulated) KOTAC 1-DMs was significantly higher than that of WTTAC 1-DMs at all time points. For example, the percentages at week 1 are 54.3% and 85.7% for WTTAC and KOTAC, respectively (p-value = 4.9E-4, one-sided Fisher’s exact test, Fig 5D).
Fig 6 shows two example 1-DMs, one unique to the WTTAC network and one unique to the KOTAC network. The top panels of the figure show the visualization of the 1-DMs. The middle panels show the expression profiles of the module genes under four perturbation conditions over time. The bottom panels show the mean edge weights of the 1-DMs in the three non-baseline conditions. Together the middle and right panels explain why a 1-DM is uniquely observed in one condition. Taking the WTTAC 1-DM for example, although many genes of the module were differentially expressed in both WTTAC and KOTAC conditions, their expression correlation was much lower in the KOTAC condition. Notice the tighter correlation of expression profiles among WTTAC module genes compared to that of KOTAC module genes (Fig 6A middle panel). As a result, the edge weights among the module genes in the KOTAC DCN were significantly smaller than those in the WTTAC DCN (Fig 6B right panel). Thus, this module was only identified by iMDM in the WTTAC DCN.
A, 1-DM unique to the WTTAC DCN and was enriched for genes involved in regulation of cell adhesion. B, 1-DM unique to the KOTAC DCN and was enriched for genes involved in fatty acid metabolism. Top panel, visualization of the module using Cytoscape . Node color is proportional to the p-value of differential gene expression between disease and baseline (WTSH) conditions. Octagon with red border, genes whose mutations lead to cardiovascular phenotypes. Middle panel, expression profiles of module genes in four conditions. Each condition has four time points. Expression levels of each gene across all samples were normalized by Z-score transformation. P-values for the difference in gene expression level were based on t-test. Bottom panel, histogram for edge weights of discovered modules in the WTTAC and KOTAC DCNs.
The example WTTAC 1-DM is enriched for genes involved in the regulation of cell adhesion (Fig 6A, p-value = 1.1E-4). The example KOTAC 1-DM is enriched for genes involved in fatty acid metabolism (Fig 6B, p-value = 3.2E-8). The expression of this module is significantly lower in KOTAC compared to WTTAC at weeks 2, 4, and 8. A number of module genes encode enzymes for fatty acid metabolism and have significantly reduced expression, including Acot1, Acot2, Acsl1, Cpt1b, Cpt2, Crat, and Decr1 (S2 Fig). These observations are consistent with our previous finding that loss of Ndufs4 leads to significant lower NAD+/NADH ratio in KOTAC hearts compared to WTTAC hearts . The low NAD+/NADH ratio inhibits fatty acid beta-oxidation . This coordinated down-regulation of the fatty acid module likely contributes to the more severe deregulation of energy metabolism in KOTAC hearts.
In summary, the above analyses demonstrate the power of simultaneous analysis of multiple DCNs for uncovering condition-specific pathways involved in heart failure. We found that 1-DMs in the KOTAC condition exhibited higher differential activities during heart failure progression and were enriched for higher fraction of genes with known cardiovascular phenotypes when disrupted. These KOTAC-specific 1-DMs provide new insights into the mechanisms for the accelerated heart failure in KOTAC mice.
M-DMs shared among multiple networks can be used to reveal pathway dynamics during the progression of heart failure
Pathway dynamics can be attributed to changes in both gene expression and connectivity among genes (i.e. pathway rewiring). Although less studied, the latter type of dynamics has recently been shown to play a critical role in disease progression and treatment response, such as the role of hub genes  and rewiring of signaling pathways during cancer treatment  and cardiac hypertrophy . Here, we demonstrate that iMDM enables systematic analysis of pathway dynamics by considering both activity and connectivity changes among shared 2/3-DMs across networks. We further show pathway dynamics is correlated with the dynamic changes in disease phenotypes, which can provide better insights into molecular mechanisms of disease progression.
Because component modules of a 2/3-DM share the same set of genes in multiple DCNs but can differ in their connectivity, 2/3-DM provides a natural way to capture dynamic changes in pathway connectivity. We thus devised the Module Connectivity Dynamic Score (MCDS) to quantify the dynamics of M-DMs (see Materials and Methods for details). Since the DCNs are weighted based on the degree of correlated differential expression, MCDS quantifies not only the presence and absence of edges but also changes in edge weights that can be viewed as the interaction strength among genes.
To identify M-DMs that exhibit significant dynamics than expected by chance, we compared the MCDS values of real 2/3-DMs to a null distribution of MCDS values of random 2/3-DMs. At a p-value cutoff of 0.01, we found 102 dynamic 2/3-DMs. A list of the dynamic 2/3-DMs is provided in S1 Table.
Fig 7A shows an example dynamic 2-DM, observed in both KOTAC and WTTAC DCNs. For clarity, only edges with significant weight changes (p < 0.05) are shown. For this module, the majority of the changed edges had increased weight in the KOTAC condition compared to the WTTAC condition (in red), due to more significant changes in the expression of the two genes under the KOTAC condition compared to the baseline. There were also a few edges (in green) with decreased weight in the KOTAC condition. These connectivity changes suggest that the pathway was rewired between different heart failure conditions. The degree of rewiring can be quantified by our MCDS metric. Additional example dynamic 2-DMs and 3-DMs are shown in S3 and S4 Figs.
A, An example 2-DM identified in WTTAC and KOTAC DCNs. It was enriched for genes involved in oxidation reduction. Node color is proportional to the average p-value of differential gene expression between the two disease conditions and baseline (WTSH) condition. Octagon with red border, genes whose mutations lead to cardiovascular phenotypes. Left panel, Rewiring of the 2-DM. Only edges that exhibit significant changes in edge weights between the two DCNs are shown. Edge thickness is proportional to the absolute value of difference. Difference was calculated as “KOTAC—WTTAC”. Red, increase; green, decrease. Unconnected nodes indicate there was no edge connected to the nodes that exhibit significance change in weight between the two conditions. Middle panel, expression profiles of module genes in four conditions. Each condition has four time points. Expression levels of each gene across all samples were normalized by Z-score transformation. P-values for gene expression level difference were based on t-test. Right panel, histogram for edge weights of the 2-DM in the respective networks. B, Correlation between module activity and phenotypic measures. Row, phenotypic measures; column, M-DMs. All, 3-DMs (WTTAC+KOTAC+KOSH). Module activity is the average normalized gene expression level of all member genes in a module. FS%, left ventricular fractional shortening; HW/TL, heart weight normalized by tibia length; LVID(d), left ventricular internal diameter in diastole. C, Histograms of the module activity and phenotype correlations for dynamic and static 2-DMs. P-values were based on one-sided t-test.
Previous studies have shown that certain pathways are more dynamic than others during disease progression or stress response [22,25,26]. To examine this issue in the context of heart failure, we performed GO term enrichment analysis of the 2/3-DMs. Although certain GO terms were enriched among both dynamic and static 2/3-DMs, each type of M-DMs was also enriched for a unique set of GO terms. For instance, unique functions of the dynamic M-DMs included cell proliferation, trans-membrane transport, ion homeostais, and cell morphogenesis whereas those of static 2/3-DMs include regulation of transcription, chromosome organization and response to organic nitrogen (S5 Fig).
The enrichment of unique functional annotations among dynamic modules suggests that dynamic modules may be effective markers for disease progression. We therefore asked how the observed dynamics of 2/3-DMs correlate with the change in cardiac function. We used the following three measures to monitor the function of the heart as it progressed to failure (S6 Fig): heart weight normalized by tibial length (HW/TL), left ventricular internal dimension in diastole (LVID(d)) and LV fractional shortening (FS%). For each 2/3-DM, only using conditions from which the M-DM is derived, we computed the correlation between its average normalized gene expression level and each of the three cardiac function measures (Fig 7B, S1 Table, and S1 Text). Strikingly, we found that dynamic 2/3-DMs had significantly higher correlation with measures of cardiac function than static 2/3-DMs (Fig 7C). For example, the correlations with fractional shortening were 0.60 and 0.31 for dynamic and static modules, respectively (p-value = 5.7E-6, one-sided t-test). This result suggests that dynamic 2/3-DMs are better markers for disease progression.
From a systems biology point of view, diseases are caused by perturbations to the gene network. Such perturbations change dynamically as the disease progresses. We developed a mathematical model to represent perturbed gene networks and a robust search algorithm to identify regions of the perturbed networks with differential activities and connectivities. Differential network analysis has been applied to protein-DNA interaction networks [27,28], protein-protein interaction networks [29,30], genetic interaction networks [25,31], and functional gene interaction networks [32,33]. However, in all previous work, only two conditions were considered (i.e. only one resulting differential network) in the computational methods. A key innovation in our method is the ability to identify unique and shared modules from multiple differential gene networks, each of which representing a different perturbation condition. By definition, iMDM finds coherently differentially expressed gene modules. Sets of genes that are differentially expressed under diseased states but do not exhibit correlated expression pattern will not be identified as a module. This is consistent with the notation that the entire pathway is perturbed under disease condition. From a computational point of view, it increases the specificity of the inference as we demonstrated in the benchmarking experiment (Fig 4A).
Another challenge in studying network dynamics is how to quantify the rewiring of the pathways. Previous studies only focused on highly connected genes in a pathway, the so-called hub genes, instead of the entire pathway [22,34,35]. Here, We have used the MCDS metric to quantify the dynamics of an entire pathway. MCDS examines all edges in a module. More importantly, it quantifies not only the presence and absence of edges but also changes in edge weights that can be viewed as interaction strength among genes.
By applying the iMDM algorithm to our heart failure RNA-Seq data, we found that condition-specific 1-DMs exhibit differential activities, mediate different biological processes, and are enriched for genes with known cardiovascular phenotypes. Unlike 1-DMs, 2/3-DMs are not condition-specific. A previous study has suggested that there were major differences in topological and biological properties among gene pairs that have global vs. conditional co-expression . We thus compared 1-DMs to 2/3-DMs in terms of their topological features and their activity correlation with disease phenotypes. We found genes in 1-DMs had more connections and located in more central positions in the networks. Activities of 1-DMs also had higher correlation with the disease phenotype measures (S7 Fig). This is consistent with the previous observation that conditional interactions are enriched for genes that are key to maintaining network integrity.
In contrast to 1-DMs, M-DMs identified in 2 or more conditions enabled us to study the dynamics of gene modules. By applying the MCDS metric, we were able to distinguish dynamic and static 2/3-DMs. We demonstrated that these two types of modules differ in multiple aspects, including their functional annotations. In particular, we have found that activities of dynamic 2/3-DMs have higher correlation with changes in cardiac disease phenotype, suggesting dynamic modules may play a more important role during disease progression. Thus, studying pathway dynamics can lead to novel insights into disease pathogenesis.
iMDM only needs transcriptome profiling data as the input and both microarray and RNA-Seq data are applicable. Given the increasing amount of transcriptome data on various cardiovascular diseases, we envision that iMDM can be applied in several ways to reveal network dynamics under different conditions, including temporal dynamics during disease progression and dynamics between disease subtypes.
Besides comparing disease subtypes as what was done here, another interesting analysis is between-disease comparison, such as heart failure versus arrhythmia. The pioneering work on human disease network by Goh et al.  has revealed that genes associated with similar disorders show both higher likelihood of physical interactions between their products and higher expression profiling similarity for their transcripts, supporting the existence of distinct disease-specific functional modules. We envision that a pan-heart analysis using iMDM can lead to similar insights, in particular pathway signature and disease-specific pathways.
There are a couple of directions that the basic concept of iMDM can be extended in future work. First, integrating multiple types of molecular analytes beyond gene expression might further expand our ability to identify dynamic molecular events that are associated with phenotypic dynamics. Genetic mutation data such as those from exome and whole-genome sequencing can be used as prior information to guide module search under the assumption that mutated sequences are likely to be involved in the diseases. Epigenomic data can be integrated with transcriptome data to understand how environmental factors perturb gene networks. Second, comparing and contrasting dynamic events involving different molecular types may yield new mechanistic insights into their interactions in the context of disease progression.
Materials and Methods
All animal procedures were performed with the approval of the Institutional Animal Care and Use Committee of the University of Washington.
Overview of the iMDM (inference of multiple differential modules) method
Fig 2 provides a schematic of the iMDM algorithm. The algorithm takes as the input transcriptome profiles gathered under both healthy/baseline and disease conditions. Using the transcriptome profiles, iMDM first constructs multiple differential co-expression networks (DCNs), one for each condition. Two genes are connected in a DCN if they exhibit correlated expression profiles across conditions and their expression levels are significantly different between the disease and the baseline conditions. Next, we adapted the M-module algorithm  to identify statistically significant multiple differential modules (M-DMs) present in multiple DCNs. iMDM is implemented in the R statistical programming language. The software is freely available upon request. In the following sections, we describe details of each algorithmic steps of the method.
Construction of differential co-expression networks (DCNs)
For each disease condition, construction of the DCN consists of two steps: 1) construction of a binary co-expression network; and 2) edge weight assignment based on differential gene expression between the disease and baseline conditions. To construct the binary gene co-expression network, edges are chosen based on the absolute value of the Pearson correlation of the expression profiles of two genes. To remove indirect correlation due to a third gene, we used the 1st order partial Pearson correlation coefficient . Only edges whose correlations are equal or greater than the pre-defined threshold δ are chosen. In this study, the value of δ was set at 0.8 such that maximal number of genes was connected in all DCNs to be constructed. In step 2, weights are assigned to edges in the binary co-expression network based on the p-value of differential gene expression between the disease and baseline conditions. Various methods can be used to detect differential gene expression for microarray or RNA-Seq data. Here, we used EdgeR . The weight wi,j on edge (i,j) in the differential network is defined as following: where pi and pj are p-values of differential expression for genes i and j, respectively. V is the node set of the co-expression network, and cor(i,j) is the absolute value of Pearson correlation between genes i,j based on their expression profiles. Under this weighting scheme, genes that are co-expressed and significantly differentially expressed are assigned higher weights, which satisfies our assumption that those genes likely participate in a pathway that exhibit differential activities between the two conditions being compared.
Mathematically, given M DCNs with the same node set but different edge sets, Gk = (V,Ek)(1≤k≤M), they can be represented by a 3-dimensional matrix A = (aijk)nxnxM where aijk denotes the weight on the edge e(i,j), wi,j, in network Gk. An M-DM, C, is defined as a set of genes whose connectivity within them is stronger than random expectation across all M DCNs under consideration.
Identification of multiple differential modules in multiple DCNs
We adapted our recently developed M-module algorithm to identify M-DMs. M-module is designed for identifying gene modules with common members but varied connectivity across multiple molecular interaction networks .
M-DM search consists of three steps: seed prioritization, module search by seed expansion, and refinement of candidate modules. The seed prioritization step ranks genes in multiple networks by using the topological feature of the gene in the network. Briefly, for each network Gk = (V,Ek)(1≤k≤M) with an adjacency matrix Ak = (aijk)nxn, we construct a function g:V→R such that g(i) denotes the importance of vertex i in the corresponding network. The function is defined as where Nk(i) denotes the set of neighbors of i in Gk; denotes the degree normalized weighted adjacency matrix which is computed as where D is diagonal matrix with element Dii = ΣjAijk. The product, A′g, denotes the information propagation on network via the edges of networks, which means the importance of a node depends on the number of its neighbors, strength of connection and importance of its neighbors. The exact solution to the equation above is .
For each gene, after obtaining its ranks in all individual networks, denoted as g = [g(1),…,g(M)], we calculate a z-score for each rank g(l). Then we obtain the rank for that gene across all networks by averaging the z-scores across all networks. The top 10% genes were selected as the seeds although the search result is not sensitive to the fraction of seeds used . Starting with each seed, the module search step iteratively includes genes whose addition leads to the maximum decrease in the graph entropy-based objective function until there is no decrease in the objective function. For a given vertex i∈C, let Lk(i) denotes the total weight between vertex i and other vertices of the M-DM C in the network Gk, i.e., Lk(i) = Σj≠i,j∈Caijk. Similarly, let denotes the weight between i and vertices outside of C. We defined the entropy for the connectivity of vertex i to C as where . The motivation for using graph entropy is that it quantifies the skewness of in-module connectivity versus out-module connectivity. Summing over all vertices in C and network k, we have Hk(C) = Σi∈CH(Ci). The graph entropy for C across all networks and normalized for the size of C is
The objective function of the algorithm is defined as: where Ci(1≤i≤τ) is a candidate M-DM. X = [x1,…,xτ] is an index matrix in which each column corresponds to an M-DM and each row corresponds to a gene. The constraints mean that each gene can belong to one or more modules and each module has to contain at least one gene.
During the refinement step, M-DMs whose sizes are smaller than five are removed. To merge overlapping M-DMs, we used Jaccard index which is the ratio of intersection over union for two sets. A Jaccard index of 0.5 was used in this study.
Calculation of the statistical significance of candidate M-DMs
The statistical significance of M-DMs is computed based on the null score distribution of M-DMs generated using randomized networks. Each network is completely randomized 100 times by degree-preserved edge shuffling. To obtain module scores for the null distribution, we performed module search on the randomized networks. Using the null distribution, the empirical p-value of an M-DM is calculated as the probability of the module having the observed score or smaller by chance. P-values are corrected for multiple testing using the method of Benjamini-Hochberg . An adjusted p-value of 0.05 was used as the significance threshold.
Quantification of connectivity dynamics of shared M-DMs
By definition, each M-DM with M≥2 has multiple component modules from different DCNs. To quantify the change in the connectivity of component modules, we used a graph-theoretical measure, the Module Connectivity Dynamic Score (MCDS). Specifically, given an M-DM C whose weighted adjacent matrices of the corresponding induced subgraphs are denoted by , the MCDS between two adjacent component modules is defined as the L2 norm of the matrix subtraction normalized by the number of genes in the M-DM, i.e., where ||·||2 is the matrix L2 norm. The overall MCDS of an M-DM is defined as the average MCDS of all pairwise comparisons:
The statistical significance of MCDS for an M-DM is computed in a similar way as that for M-DMs. Briefly, we first calculate the null distribution for MCDS scores based on random M-DMs. The empirical p-value of an MCDS is calculated using the null distribution. The method of Benjamini-Hochberg is used for multiple testing correction. An adjusted p-value of 0.05 was used as the significance threshold.
Transgenic mice, transverse aortic constriction surgery and echocardiography
Generation of transgenic mice with cardiac restricted Ndufs4 deletion was described in our recent publication . Mice were fed on rodent diet and water available ad libitum with a 12-hour light/dark cycle in a vivarium. Adult male mice (3–4 months old) received transverse aortic constriction (TAC) to induce chronic pressure overload or sham surgeries as previously described . Cardiac geometry and function (left ventricular internal dimension in diastole (LVID(d)) and LV fractional shortening (FS%)) were recorded at 1, 2, and 4 weeks using echocardiography with the VEVO 770 system equipped with a 707B scan head. All measurements were averaged from six cardiac cycles.
Total RNA was isolated from frozen cardiac tissue using the RNeasy Kit for fibrotic tissues (Qiagen) and treated with DNase to remove genomic DNA contamination. Quality and integrity of RNA were checked using Agilent Bioanalyzer 2100. All samples used for RNA sequencing had a Bioanalyzer RIN number of at least 8. Illumina TruSeq RNA sample preparation kit was used to generate multiplexed sequencing libraries. Libraries were loaded into a flowcell at a concentration of 5 pM and clustered on an Illumina cBot. Sequencing was done on a HiSeq2000 that generated paired-end reads of length 50 bp.
Transcriptome assembly and expression level estimate from sequencing reads
Paired-end reads were mapped to the mouse genome (mm9) using Tophat . Only uniquely mapped reads with fewer than 2 mismatches were used for downstream analyses. Transcripts were assembled using Cufflinks  and Ensemble (release 66) as the source of annotated transcripts. Normalized transcript abundance was computed using Cufflinks and expressed as FPKM (Fragments Per Kilobase of transcripts per Million mapped reads). Gene-level FPKM values were computed by summing up FPKM values of their corresponding transcripts . FPKM values were used to compute gene co-expression networks.
S1 Fig. Number of differentially expressed genes in perturbed hearts compared to control hearts (WTSH).
P-value cutoff is 0.05.
P-values were based on Wilcoxon test.
S3 Fig. Example 2-DMs.
A, 2-DM found in KOTAC and KOSH DCNs. It is enriched for genes involved in the regulation of cell proliferation. Node color is proportional to the average p-value of differential gene expression between the two disease conditions and baseline (WTSH) condition. Octagon, genes whose mutation leads to cardiovascular phenotypes. Left panel, rewiring of the 2-DM. Only edges that exhibit significant changes in edge weights between the two DCNs are shown. Difference in edge weight is calculated as “KOTAC-KOSH”. Red, increase, green, decrease. Unconnected nodes indicate there is no edge connected to the nodes that exhibit significance change in weight between the two conditions. Middle panel, expression profiles of module genes in four conditions. Expression levels of each gene across all samples are normalized by Z-score transformation. P-values for gene expression level difference are based on t-test. Right panel, histogram for edge weights of discovered 2-DMs in the respective networks. B, 2-DM found in WTTAC and KOSH DCNs. It is enriched for genes involved in cell migration. Difference in edge weight is calculated as “WTTAC-KOSH”.
S4 Fig. An example 3-DM.
It was enriched for genes involved in actin cytoskeleton organization. A. Rewiring of the 3-DM. Node color is proportional to the average p-value of differential gene expression between the two disease conditions and baseline (WTSH) condition. Octagon, genes whose mutation leads to cardiovascular phenotypes. Only edges that exhibit significant changes in edge weights between two DCNs are shown. Difference in edge weight was calculated as “KOTAC-WTTAC”, “KOTAC-KOSH”, and “WTTAC-KOSH”. Red, increased edge weight in the comparison, green, decreased edge weight. Unconnected nodes indicate there was no edge connected to the nodes that exhibit significance change in weight between the two conditions. B. expression profiles of module genes in four conditions. Expression levels of each gene across all samples were normalized by Z-score transformation. C. Histogram for edge weights of the 3-DM in the respective networks.
S5 Fig. Enriched GO terms among dynamic and static M-DMs.
Y-axis denotes the minus logarithm of the enrichment p-value.
S6 Fig. Heart functional measures of mice used for RNA-Seq profiling.
Each dot represents a heart. A, heart weight normalized by tibial length (HW/TL). B, left ventricular internal dimension in diastole (LVID(d). C, left ventricular fractional shortening (FS%). Because the value of FS% is between 0 and 1 and lower FS% values mean worse cardiac function whereas lower values of HW/TL and LVID(d) mean better cardiac function, we first transformed the raw FS% value as (1-FS%). Raw measures were then z-score transformed for each measurement type separately to make them comparable.
S7 Fig. Topological and biological differences between 1-DMs and 2/3-DMs.
A. Boxplot for the weighted degree of modules. B. Boxplot for the node betweenness centrality of modules. C. Histograms of module activity and disease phenotype correlations of the two types of M-DMs. Module activity is the average normalized gene expression level of all member genes in a module. P-values were based on one-sided t-test.
We thank members of the Tan and Tian labs for helpful discussion. We thank Lucas Van Tol and the University of Iowa Institute for Clinical and Translational Science for providing computing support.
Conceived and designed the experiments: KT XM RT. Performed the experiments: XM LG GK CFL LGM. Analyzed the data: XM LG KT. Contributed reagents/materials/analysis tools: GK PG CFL LGM. Wrote the paper: KT XM.
- 1. Kathiresan S, and Srivastava D (2012) Genetics of human cardiovascular disease. Cell 148: 1242–1257. pmid:22424232
- 2. Dewey FE, Perez MV, Wheeler MT, Watt C, Spin J, Langfelder P, et al. (2011) Gene coexpression network topology of cardiac development, hypertrophy, and failure. Circ Cardiovasc Genet 4: 26–35. pmid:21127201
- 3. Akavia UD, and Benayahu D (2008) Meta-analysis and profiling of cardiac expression modules. Physiol Genomics 35: 305–315. pmid:18780758
- 4. Gargalovic PS, Imura M, Zhang B, Gharavi NM, Clark MJ, Pagnon J, et al. (2006) Identification of inflammatory gene modules based on variations of human endothelial cell responses to oxidized lipids. Proceedings of the National Academy of Sciences of the United States of America 103: 12741–12746. pmid:16912112
- 5. Huan T, Zhang B, Wang Z, Joehanes R, Zhu J, Johnson AD, et al. (2013) A systems biology framework identifies molecular underpinnings of coronary heart disease. Arteriosclerosis, thrombosis, and vascular biology 33: 1427–1434. pmid:23539213
- 6. Tan N, Chung MK, Smith JD, Hsu J, Serre D, Newton DW, et al. (2013) Weighted gene coexpression network analysis of human left atrial tissue identifies gene modules associated with atrial fibrillation. Circulation Cardiovascular genetics 6: 362–371. pmid:23863953
- 7. Karamanlidis G, Lee CF, Garcia-Menendez L, Kolwicz SC Jr., Suthammarak W, Gong G, et al. (2013) Mitochondrial complex I deficiency increases protein acetylation and accelerates heart failure. Cell metabolism 18: 239–250. pmid:23931755
- 8. Langfelder P, and Horvath S (2008) WGCNA: an R package for weighted correlation network analysis. BMC bioinformatics 9: 559. pmid:19114008
- 9. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics 25: 25–29. pmid:10802651
- 10. Kanehisa M, Goto S, Sato Y, Furumichi M, and Tanabe M (2012) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic acids research 40: D109–114. pmid:22080510
- 11. Blake JA, Bult CJ, Eppig JT, Kadin JA, Richardson JE, and Mouse Genome Database, G (2014) The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse. Nucleic acids research 42: D810–817. pmid:24285300
- 12. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102: 15545–15550. pmid:16199517
- 13. Nishimura D (2001) BioCarta. Biotech Software & Internet Report 2: 117–120.
- 14. Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, et al. (2011) Reactome: a database of reactions, pathways and biological processes. Nucleic acids research 39: D691–697. pmid:21067998
- 15. Grubb SC, Bult CJ, and Bogue MA (2014) Mouse phenome database. Nucleic acids research 42: D825–834. pmid:24243846
- 16. Desler C, Lykke A, and Rasmussen LJ (2010) The effect of mitochondrial dysfunction on cytosolic nucleotide metabolism. Journal of nucleic acids 2010.
- 17. Diez J, Gonzalez A, Lopez B, and Querejeta R (2005) Mechanisms of disease: pathologic structural remodeling is more than adaptive hypertrophy in hypertensive heart disease. Nature clinical practice Cardiovascular medicine 2: 209–216. pmid:16265485
- 18. Doenst T, Nguyen TD, and Abel ED (2013) Cardiac metabolism in heart failure: implications beyond ATP production. Circulation research 113: 709–724. pmid:23989714
- 19. Ussher JR, Jaswal JS, and Lopaschuk GD (2012) Pyridine nucleotide regulation of cardiac intermediary metabolism. Circulation research 111: 628–641. pmid:22904042
- 20. Davis CJ, Gurbel PA, Gattis WA, Fuzaylov SY, Nair GV, O'Connor CM, et al. (2000) Hemostatic abnormalities in patients with congestive heart failure: diagnostic significance and clinical challenge. International journal of cardiology 75: 15–21. pmid:11054501
- 21. Saito R, Smoot ME, Ono K, Ruscheinski J, Wang PL, Lotia S, et al. (2012) A travel guide to Cytoscape plugins. Nature methods 9: 1069–1076. pmid:23132118
- 22. Taylor IW, Linding R, Warde-Farley D, Liu Y, Pesquita C, Faria D, et al. (2009) Dynamic modularity in protein interaction networks predicts breast cancer outcome. Nat Biotechnol 27: 199–204. pmid:19182785
- 23. Lee MJ, Ye AS, Gardino AK, Heijink AM, Sorger PK, MacBeath G, et al. (2012) Sequential application of anticancer drugs enhances cell death by rewiring apoptotic signaling networks. Cell 149: 780–794. pmid:22579283
- 24. Drozdov I, Didangelos A, Yin X, Zampetaki A, Abonnenc M, Murdoch C, et al. (2013) Gene network and proteomic analyses of cardiac responses to pathological and physiological stress. Circulation Cardiovascular genetics 6: 588–597. pmid:24214729
- 25. Bandyopadhyay S, Mehta M, Kuo D, Sung MK, Chuang R, Jaehnig EJ, et al. (2010) Rewiring of genetic networks in response to DNA damage. Science 330: 1385–1389. pmid:21127252
- 26. Bisson N, James DA, Ivosev G, Tate SA, Bonner R, Taylor L, et al. (2011) Selected reaction monitoring mass spectrometry reveals the dynamics of signaling through the GRB2 adaptor. Nature biotechnology 29: 653–658. pmid:21706016
- 27. Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, et al. (2004) Transcriptional regulatory code of a eukaryotic genome. Nature 431: 99–104. pmid:15343339
- 28. Luscombe NM, Babu MM, Yu H, Snyder M, Teichmann SA, and Gerstein M (2004) Genomic analysis of regulatory network dynamics reveals large topological changes. Nature 431: 308–312. pmid:15372033
- 29. Workman CT, Mak HC, McCuine S, Tagne JB, Agarwal M, Ozier O, et al. (2006) A systems approach to mapping DNA damage response pathways. Science 312: 1054–1059. pmid:16709784
- 30. Ellis JD, Barrios-Rodiles M, Colak R, Irimia M, Kim T, Calarco JA, et al. (2012) Tissue-specific alternative splicing remodels protein-protein interaction networks. Molecular cell 46: 884–892. pmid:22749401
- 31. Guenole A, Srivas R, Vreeken K, Wang ZZ, Wang S, Krogan NJ, et al. (2013) Dissection of DNA damage responses using multiconditional genetic interaction maps. Molecular cell 49: 346–358. pmid:23273983
- 32. Zhang B, Tian Y, Jin L, Li H, Shih Ie M, Madhavan S, et al. (2011) DDN: a caBIG(R) analytical tool for differential network analysis. Bioinformatics 27: 1036–1038. pmid:21296752
- 33. Gill R, Datta S, and Datta S (2010) A statistical framework for differential network analysis from microarray data. BMC bioinformatics 11: 95. pmid:20170493
- 34. Pujana MA, Han JD, Starita LM, Stevens KN, Tewari M, Ahn JS, et al. (2007) Network modeling links breast cancer susceptibility and centrosome dysfunction. Nature genetics 39: 1338–1349. pmid:17922014
- 35. Romanoski CE, Che N, Yin F, Mai N, Pouldar D, Civelek M, et al. (2011) Network for activation of human endothelial cells by oxidized phospholipids: a critical role of heme oxygenase 1. Circulation research 109: e27–41. pmid:21737788
- 36. Das J, Mohammed J, and Yu H (2012) Genome-scale analysis of interaction dynamics reveals organization of biological networks. Bioinformatics 28: 1873–1878. pmid:22576179
- 37. Goh KI, Cusick ME, Valle D, Childs B, Vidal M, and Barabasi AL (2007) The human disease network. Proceedings of the National Academy of Sciences of the United States of America 104: 8685–8690. pmid:17502601
- 38. Ma X, Gao L, and Tan K (2014) Modeling Disease Progression Using Dynamics of Pathway Connectivity. Bioinformatics.
- 39. Watson-Haigh NS, Kadarmideen HN, and Reverter A (2010) PCIT: an R package for weighted gene co-expression networks based on partial correlation and information theory approaches. Bioinformatics 26: 411–413. pmid:20007253
- 40. Robinson MD, McCarthy DJ, and Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140. pmid:19910308
- 41. Benjamini Y, and Hochberg Y (1995) Controlling the False Discovery Rate—a Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B Met 57: 289–300.
- 42. Tarnavski O, McMullen JR, Schinke M, Nie Q, Kong S, and Izumo S (2004) Mouse cardiac surgery: comprehensive techniques for the generation of mouse models of human diseases and their application for genomic studies. Physiol Genomics 16: 349–360. pmid:14679301
- 43. Trapnell C, Pachter L, and Salzberg SL (2009) TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105–1111. pmid:19289445
- 44. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, et al. (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28: 511–515. pmid:20436464