Discovering Networks of Perturbed Biological Processes in Hepatocyte Cultures

The liver plays a vital role in glucose homeostasis, the synthesis of bile acids and the detoxification of foreign substances. Liver culture systems are widely used to test adverse effects of drugs and environmental toxicants. The two most prevalent liver culture systems are hepatocyte monolayers (HMs) and collagen sandwiches (CS). Despite their wide use, comprehensive transcriptional programs and interaction networks in these culture systems have not been systematically investigated. We integrated an existing temporal transcriptional dataset for HM and CS cultures of rat hepatocytes with a functional interaction network of rat genes. We aimed to exploit the functional interactions to identify statistically significant linkages between perturbed biological processes. To this end, we developed a novel approach to compute Contextual Biological Process Linkage Networks (CBPLNs). CBPLNs revealed numerous meaningful connections between different biological processes and gene sets, which we were successful in interpreting within the context of liver metabolism. Multiple phenomena captured by CBPLNs at the process level such as regulation, downstream effects, and feedback loops have well described counterparts at the gene and protein level. CBPLNs reveal high-level linkages between pathways and processes, making the identification of important biological trends more tractable than through interactions between individual genes and molecules alone. Our approach may provide a new route to explore, analyze, and understand cellular responses to internal and external cues within the context of the intricate networks of molecular interactions that control cellular behavior.


Introduction
The liver is one of the important organs in our bodies, playing a vital role in glucose homeostasis, the synthesis of bile acids for the metabolism of cholesterol, and the secretion of proteins to aid clotting [1]. Additionally, the liver is primarily responsible for the detoxification of foreign substances, including a variety of environmental toxicants, alcohol, cigarette smoke, and drugs [1]. Hepatocytes are the principal cells in the liver, comprising over 80% of its mass and performing several characteristic functions of this organ. Liver culture systems such as hepatocyte monolayers (HMs) and collagen sandwiches (CSs) are routinely used to test adverse effects of drugs and environmental toxicants. In HMs, hepatocytes are cultured on a single collagen gel. Such cells progressively lose their phenotypic characteristics over time [2]. In CS systems, hepatocytes are maintained between two collagen gels. Hepatocytes in CS cultures remain stable over extended periods of time, and maintain differentiated hepatic functions [3,4]. While morphological and physiological characteristics of hepatocytes in CS cultures have been studied extensively, comprehensive transcriptional studies of these culture systems do not appear to have been reported. Therefore, in an earlier study, we performed a systematic temporal study of genome-wide gene expression programs in HMs and in CS cultures over an eight-day period [5]. We used Gene Set Enrichment Analysis (GSEA) [6] to compare the transcriptional programs in the two culture systems. Our results demonstrated that gene expression in hepatocytes in CS cultures steadily and comprehensively diverges from that in HMs [5]. Gene sets up-regulated in CS cultures included several hepatic functions, such as metabolism of lipids, amino acids, carbohydrates, and alcohol, and synthesis of bile acids. Monooxygenases such as Cytochrome-P450 enzymes did not show any change between the culture systems after one day, but exhibited significant up-regulation in CS cultures after three days and later in comparison to HMs.
This analysis did not consider the fact that a cell's response to its environment is governed by an intricate network of molecular interactions. These interactions dynamically change in response to a myriad of cues. Therefore, discovering the set of molecular interactions that are active in a given cellular context is a fundamental question in computational systems biology [7]. In the current work, we reanalyze the CS-HM transcriptional data in the light of an underlying molecular interaction network. We propose a novel approach called ''Contextual Biological Process Linkage Network'' (CBPLN) that focuses on computing which processes in the cell are perturbed in a particular context and how these processes are linked to each other. Our approach is predicated on the belief that high-level linkages between pathways and processes make identification of important biological trends more tractable and intuitive than through interactions between individual genes and molecules alone. Our method requires three inputs: 1. p-values representing the statistical significance of the differential expression of each gene (upon comparing a treatment to a control), which we refer to hereafter as expression p-values, 2. a functional or physical interaction network connecting genes and proteins, and 3. a dataset of functional annotations for genes and proteins.
We extend the method developed by Dotan-Cohen et al. [8] to detect directed linkages between gene sets in the context of a functional interaction network. Given two biological processes a and b and the sets of genes that are members of each, these authors computed the number of genes annotated by b that are themselves not annotated by a and interact with at least one gene annotated by a. They estimated the statistical significance of this count using the one-sided version of Fisher's exact test. Similar methods developed by Pandey et al. [9,10] for regulatory and physical interaction networks are aimed at discovering chains of significantly linked biological processes.
In this work, we extend the ideas of Dotan-Cohen et al. to incorporate gene expression measurements to determine which inter-process links are significantly perturbed between the measured conditions. Informally, we compute a score for a link from process a to process b based upon the expression p-values of pairs of interacting genes, where one gene belongs to process a and the other to process b. Our score takes estimates of confidence in the interactions into account. High-confidence interactions with highly perturbed incident genes make large contributions to the score. We estimate the statistical significance of the score by computing an empirical distribution of scores under two different hypotheses. The first hypothesis tests the dependence of the score on the particular set of genes annotated by b, i.e., it asks if we would observe a particular score from process a to b even if we selected the genes annotated by b uniformly at random from the set of all annotated genes. This test directly extends the approach used by Dotan-Cohen et al. The second hypothesis tests the dependence of the score on the specific interactions in the network, i.e., it asks if we would observe the score from a to b even with an interaction network drawn from a distribution of networks with the same node degrees. Under either hypothesis, we report the significance of the link, after multiple testing correction, as a pvalue. Hereafter, we refer to this quantity as the link p-value, to distinguish it from the expression p-values that are inputs to our method.

Input Data
Gene Expression Data. We used the Affymetrix Rat Genome 230 2.0 GeneChip to measure genomewide transcriptional profiles in rat hepatocytes grown in monolayers and in collagen sandwiches [5]. This dataset is available in MIAMEcompliant format in the Gene Expression Omnibus (accession number GSE20659). We marked the day when we deposited the second layer of collagen in CS cultures as day zero. On days one, two, three and eight after deposition of the second layer of collagen, we measured data in triplicate in hepatocytes in each culture system.
Functional Linkage Network. Existing databases of protein interactions contain very few experimentally detected Protein-Protein Interactions (PPIs) for rat: seven different widely-used sources [11][12][13][14][15][16][17] contained a total of just 1,274 non-redundant rat PPIs spanning 974 proteins. Therefore, we decided to use the rat functional linkage network predicted by the STRING system [18]. The interaction type in STRING is a functional association, which the authors define as ''the specific and meaningful interaction between two proteins that jointly contribute to the same functional process.'' Apart from incorporating experimental interaction data, STRING uses multiple methods to predict possible functional linkages including interolog-based interaction transfer, similar transcriptional response across a variety of conditions (co-expression), textmining, and gene families that share above-random similarities in their evolutionary histories. STRING includes a scheme to score each predicted interaction in the range 150-1000 against a common reference of functional partnership based on the KEGG database [19]. STRING version 8.3, released on May 26, 2010 contains 975,454 predicted interactions among 15,178 rat proteins. We used the subset of these interactions with a weight of at least 500; there were 204,992 such interactions among 9,925 proteins. We selected 500 as a cutoff based on the reasoning that interactions with at least this weight were more likely to connect genes belonging to the same process than to connect genes belonging to different processes. When we further pruned the network to include genes with at least one annotation (see below), we obtained 47,002 interactions among 4,714 genes.
Functional Annotations. In our earlier work [5], we used GSEA to compare the two culture systems at each of the four time points; Table 1 lists the contrasts we analyzed. This analysis provided insights into the temporal patterns of up-and downregulation in the gene sets in the Molecular Signature Database (MSigDB) [6]. In that work, we focused our analysis on gene sets that showed monotonically diverging patterns of expression between CS and HM cultures. In the current paper, we use the curated (c2), motif (c3), and Gene Ontology (c5) collections of gene sets in MSigDB as our set of functional annotations. We focus on establishing linkages among the subset of 18 up-regulated gene sets from the previous study; Table 2 lists these sets along with a short description of each.

Overview of Results
We considered only those links with a link p-value of at most 0:01, after using the method of Benjamini and Hochberg [20] to adjust for testing multiple hypotheses. We further restricted our attention to pairs of gene sets for which at least 10 genes exclusively in the second set of the pair interacted with genes in the first set, reasoning that fewer interacting genes might not yield robust link p-values. We compared the number of links computed by using each hypothesis test. We also compared these values to the number of links in the (context-free) BPLN computed using the method of Dotan-Cohen et al. [8]. Tables 3 and 4 display the  results of the comparisons. Several salient trends emerged. First, in Table 3, irrespective of the hypothesis test used, the number of links increased with time. This phenomenon parallels our earlier observation that the transcriptional programs of hepatocytes in CS cultures steadily diverged from that in HMs. Second, the size of the intersection between the two sets of links also increased with time, as did the Jaccard similarity coefficient of the two sets (i.e., the size of the intersection divided by the size of the union). Further, for each day, the number of links deemed to be significant by both hypothesis tests was itself statistically significant, based on Fisher's Exact Test (see File S1). These trends suggest that once the transcriptional programs of the two culture systems have diverged (day 2 and later), both hypothesis tests find very similar sets of process pairs to be significantly linked at the 0.01 level. However, the number of common links is very close to the number of links identified by the second hypothesis test, indicating that the second test is more conservative than the first in deciding whether a link is statistically significant. We observed similar results when we repeated these analyses with other cutoffs on the link p-value (0.005, 0.05, and 0.1) (see File S2). Third, normalizing the linkage score (see ''Methods'') pruned out a small number of links.
Finally, the overlap between the intersection of the results from both hypothesis tests and the BPLN was small in days 1 and 2 and more substantial in days 3 and 8 (Table 4), although the overlap was still statistically significant by Fisher's Exact Test (see File S1). These data suggest that only a subset of the links in a BPLN may have some relevance to the particular biological conditions being investigated. By incorporating measurements of gene expression, CBPLNs can identify those inter-process links that correspond to the phenotypic differences observed in the two conditions being compared (e.g., hepatocytes in CS versus HM).
Although both hypothesis tests find very similar sets of process pairs to be significantly linked at the 0.01 level, especially in later days, we found that the actual link p-values computed for each process pair were not very highly correlated to each other (see File S3). Based on these results, we decided to consider a linkage between a pair of gene sets only if this link was significant at the 0.01 level with both hypothesis tests with normalization. The resulting CBPLNs are displayed in Figures 1, 2, 3, 4. For reference, we have displayed the BPLN in Figure 5. We discuss the properties of these CBPLNs in the rest of the paper. We focus primarily on the day 8 CBPLN ( Figure 4), noting that many of the features we discuss are also apparent in the day 3 CBPLN (Figure 3). When we discuss some pairs of linked gene sets, we refer to the underlying functional interaction network connecting the genes in those sets. We start by discussing properties of liver-specific genes, focusing particularly on the regulation of these genes by the transcription factor HNF1. Then, we discuss the role of lipid homeostasis and bile acid synthesis in the liver. Finally, we summarize the different interpretations of the links in CBPLNs. We stress that the formulation of linkage between processes a and b is asymmetric. Hence, by definition, links in the CBPLN are directed, i.e., a CBPLN may contain a link between a and b and between b and a.

Liver Specific Genes
The 251 genes in the HSIAO_LIVER_SPECIFIC_GENES gene set are expressed selectively in the liver, as determined by Hsiao et al. [21] from a compendium of gene expression in normal human tissues created with the goal of defining a reference for basic organ systems biology. Genes in this set are members of a spectrum of biological processes, including fatty acid metabolism, metabolism of xenobiotics, blood coagulation, and response to wounding. Not surprisingly, this gene set occupies a central place in the CBPLN on day 8 ( Figure 4); it has the highest number of outgoing and incoming links. Outgoing links include connections to glycolysis and gluconeogenesis, alcohol metabolism process, metabolism of xenobiotics by cytochrome P450s, the Peroxisome Proliferator-Activated Receptor (PPAR) signaling pathway, lipid metabolic processes, the urea cycle, and bile acid biosynthesis, among others. In turn, the gene sets such as V$HNF1_Q6 and NUCLEAR_RECEPTORS are linked to HSIAO_LIVER_SPE-CIFIC_GENES. Some links involving HSIAO_LIVER_SPECI-FIC_GENES are unidirectional on day 2 or day 3 (Figures 2 and 3) but bidirectional on day 8 (Figure 4), e.g., to HSA03320_ PPAR_SIGNALING_PATHWAY and metabolism of fatty acids, bile acids, and alcohol. Such features suggest that CBPLNs may be representing cellular signals emanating from a subset of liver specific genes to other processes and subsequent feedback from the There are two groups of columns, one for the results without normalization and another for the results with normalization, where ''normalization'' refers to results obtained when we deduct the score calculated with average expression values from the observed score. Within each group, the columns titled ''Gene set randomization'' refer to the number of observed significant links (corrected link p-value ƒ0:01) when we construct the null distribution by re-sampling the genes annotated with the gene set b; similarly, the columns titled ''Network randomization'' refer to the number of significant links observed when generating interaction networks with the same node degrees as the original network. The columns titled ''Intersection'' refer to the number of links significant under both hypothesis tests. The column titled ''Jaccard index'' contains the ratio of the size of the intersection to the size of the union of the CBPLNs computed by the two hypothesis tests. File S1 contains the statistical significance values for the intersection sizes, as computed by Fisher's exact test. doi:10.1371/journal.pone.0015247.t003 The column titled ''BPLN'' denotes the number of links in the BPLN. Note that the number of links in the BPLN does not change with the number of days, as the BPLN method does not use gene expression data. The last six columns are divided into two groups of three columns each. The first set of columns compare BPLNs to CBPLNs computed using gene set randomization. The second set of columns compare BPLNs to CBPLNs computed using network randomization. The data in and meaning of columns ''Gene set randomization'' and ''Network randomization'' are identical to those in Table 3. The columns titled ''Intersection'' contains the number of links found to be significant in both the BPLN and the respective CBPLN. The columns ''Jaccard index'' contains the ratio of the size of the intersection to the size of the union of the BPLN and the respective CBPLN. Statistical significance values for the intersection sizes, as computed by Fisher's exact test, are available in File S1. doi:10.1371/journal.pone.0015247.t004 other processes to liver specific genes. Overall, these results suggest that CBPLNs can assist in the sub-division of liver-specific genes into more refined categories, based not only on the functions of the genes themselves, but also on how they are regulated and what other processes they may control. We discuss one specific link next that illustrates this property.  Table 2. An edge connects two gene sets whose linkage is determined to be statistically-significant by both hypothesis tests used in computing CBLPNs. The color of a node indicates the statistical significance of its perturbation, as computed by GSEA [6]. The legend mapping colors to ranges of statistical significance appears at the bottom of the figure. We use the same color scheme to indicate the statistical significance computed for a gene set by GSEA and for the significance value computed for a gene by LIMMA. We use this color scheme in all the subsequent figures as well. doi:10.1371/journal.pone.0015247.g001 Liver Specific Gene Sets Regulated by HNF1 Hepatic nuclear factor 1 (HNF1), also known as albumin proximal factor, is a transcription factor required for the expression of several liver-specific genes including albumin [22]. The protein functions as a homodimer and binds to the inverted palindrome 59-GTTAAT-NATTAAC-39. The promoter regions of genes in the MSigDB set V$HNF1_Q6 match this binding site for HNF1 [23]. In our previous study [5], we noted the monotonic up-regulation of this gene set in CS cultures when compared to HMs. This gene set has an overlap of 25 genes with the gene set HSIAO_LIVER_SPECIFIC_GENES. We concluded that HNF1 monotonically up-regulates the expression of liver-specific genes in CS cultures but not in HMs.
CBPLNs assist us in elaborating upon these earlier observations. We studied the link between V$HNF1_Q6 and HSIAO_ LIVER_SPECIFIC_GENES in the day 8 CBPLN by examining the functional interactions in the STRING database connecting genes in V$HNF1_Q6 to genes in HSIAO_LIVER_SPECIFIC_ GENES. Figure 6 displays a layout of this network. Visual examination of Figure 6 indicates that the linkage between these two gene sets is driven by the genes F2, Plg, CYP2E1, Nr1h4, Lipc, and their interactors, with weaker contributions arising from Hnf1a and Hnf4a. Note that F2, Plg, CYP2E1, Nr1h4, and Lipc are members of both gene sets while Hnf1a and Hnf4a are members of V$HNF1_Q6. We discuss a subset of these proteins next, highlighting liver-specific processes they participate in.
HNF1a and HNF4a. Hepatocyte nuclear factor 4a (Hnf4a) is a nuclear receptor implicated in the regulation of numerous genes associated with hepatic function [24][25][26], gluconeogenesis [27], and activation of the metabolism of xenobiotics, including drugs and pharmaceuticals [28]. It is known that both the HNF4 protein and HNF1 protein can transactivate the HNF1 gene [29]. Although both genes are not very highly up-regulated, their interactions with liver-specific genes Apoa2, Serpina1, Pcbd1, Slc2a2, Slc10a1, Fabp1, and Pck1 suggest the activation of many liver-related pathways.
Blood clotting (Plg and F2). Plasminogen (Plg) is a secreted protein that is proteolysed to plasmin and angiostatin. Plasmin dissolves fibrin in blood clots while angiostatin inhibits angiogenesis. In Figure 6, the significantly up-regulated genes that Plg interacts with include the serpin peptidase inhibitors Serpina1 and Serpinf2, kallikrein B (Klkb1), and coagulation factor XII (F12). Another important protein in Figure 6 is the prothrombin precursor (Coagulation factor II, F2), which interacts with F10, Fga, Fgg, Fn1, Proc, Serpina5, Serpind1, and Vtn. Most of the interactions involving Plg and F2 have been included in STRING via the KEGG pathway for complement and coagulation cascades [19]. The complement system and blood coagulation are a closely interacting pair of proteolytic cascades in blood plasma that are activated after injury [30]. The blood coagulation cascade culminates in the formation of thrombin, the enzyme responsible for the conversion of soluble fibrinogen to the insoluble fibrin clot.
Metabolism of xenobiotics (CYP2E1). Cytochrome P450, family 2, subfamily E, polypeptide 1 (CYP2E1) encodes a member of the cytochrome P450 superfamily of enzymes. Cytochrome P450s proteins are monooxygenases, which carry out the liver's prominent role in xenobiotic metabolism and synthesis of cholesterol, steroids and other lipids. CYP2E1 is an important member of this family, implicated in the metabolism of exogenous compounds such as benzene, carbon tetrachloride, ethylene glycol, and substances found in cigarette smoke as well as endogenous compounds including ethanol, acetone, and acetal [31][32][33]. In Figure 6, CYP2E1 interacts with C2, Cyb5a, CYP4F1, Ephx1, and Mgst1. The interactions of CYP2E1 with Cytochrome P450

Lipid Homeostasis and Bile Acid Synthesis
Two of the most important functions that hepatocytes in the liver carry out are lipid homeostasis and bile acid synthesis. These two functions are intrinsically linked. As illustrated schematically in Figure 7, the liver produces bile acids, which are secreted into the small intestine, where they allow for breakdown of dietary fats and uptake of fatty acids. Subsequently, the liver re-mobilizes these fatty acids throughout the body via lipoproteins [34]. Lipoproteins circulate fatty acids and cholesterol through the body in a cycle that begins with the liver's secretion of fatty acid-rich very lowdensity lipoproteins (VLDLs) and ends with the liver's uptake of cholesterol-rich high-density lipoproteins (HDLs) [35]. The liver then recycles these cholesterols or converts them into bile acids. Our results capture the high-level relationships between these processes, as displayed in the sub-CBPLNs involving nuclear receptors, the PPARa signaling pathway, bile acid biosynthesis, and fatty acid metabolism (Figures 8A-8D).
Before we examine some of these links in more detail, we stress that the links in CBPLNs (e.g., the bi-directional links between HSA03320_PPAR_SIGNALING_PATHWAY and HSA00120_ BILE_ACID_BIOSYNTHESIS) must be interpreted with caution. Both HSA03320_PPAR_SIGNALING_PATHWAY and HSA00120_BILE_ACID_BIOSYNTHESIS are up-regulated in CS cultures in contrast to HMs (Fig. 8C and Fig. 8D). Bile acids directly induce the expression of PPARa [36], which supports interpreting the observed link from HSA00120_BILE_ACID_ BIOSYNTHESIS to HSA03320_PPAR_SIGNALING_PATH-WAY as a regulatory one. On the other hand, although it is tempting to infer that the reverse of that link, from HSA03320_ PPAR_SIGNALING_PATHWAY to HSA00120_BILE_ACID_ BIOSYNTHESIS, also implies the PPARa pathway up-regulates bile acid biosynthesis, such a conclusion may be incorrect. Since the up-regulation trends arise from the comparison of CS cultures to HMs, it is possible that bile acid production in CS cultures is constant (or even decreasing) over time and that bile acid levels in HMs are decreasing. In fact, when we compare the expression values of these two gene sets exclusively within the CS cultures, we observe that there is no statistically significant change between the expression levels of the bile acid biosynthesis genes between days 3 and 8, and that there is a barely statistically significant upregulation of the genes in the PPARa signaling pathway between the same two days (data not shown). Moreover, PPARa has been shown to directly inhibit production of Cholesterol 7a-hydroxylase (CYP7A1) [37,38]. CYP7A1 is the rate-limiting enzyme in the classical pathway of bile acid synthesis from cholesterol [35]. Therefore, while we can conclude from the CBPLN that HSA03320_PPAR_SIGNALING_PATHWAY may regulate HSA00120_BILE_ACID_BIOSYNTHESIS, the mode of regulation (e.g., induction or inhibition) requires more detailed study.
We also note modest changes in the interconnections between the gene sets in Figures 8A-8D over the time-course. One example is the disappearance of the link from HSA03320_PPAR_SIGNA-LING_PATHWAY to HSA00120_BILE_ACID_BIOSYNTH-ESIS from day 1 to day 2, followed by the reappearance of this link at day 3. We attribute this behavior to a spurious report of the link as significant at day 1, since we believe our methods may be over-sensitive when very few genes are significantly perturbed in a given contrast (as was the case for day 1). We are currently investigating ways to improve the robustness of our methods in reporting links for such scenarios.
Two other noticeable changes over the time series have immediate biological interpretations. First, the link from NUCLE-AR_RECEPTORS to HSA03320_PPAR_SIGNALING_PATH-WAY appears at day 2, which we interpret as a regulatory relationship reflected in the underlying functional interaction network and the corresponding up-regulation of the two gene sets. Second, the link from HSA00071_FATTY_ACID_METABO-LISM to HSA03320_PPAR_SIGNALING_PATHWAY also appears at day 2, which we interpret in light of feedback in the fatty acid metabolic pathway. In the rest of this section, we discuss the linkages between these three gene sets, anchoring our discussing on the underlying functional interaction networks on day 8 (Figures 9 and 10). We divide our discussion into three parts: interactions of nuclear receptors with cytochrome P450 enzymes, Figure 5. Context-free BPLN, constructed using the approach of Dotan-Cohen et al. [8]. Colors of gene sets represent perturbation measured by GSEA for CS versus HM at day 8. Note that these perturbation values did not factor into the computation of the BPLN; we display them only for the purpose of visual comparison with
All these nuclear receptors exhibit increasing perturbation over time, and interact with CYP7A1, a cytochrome P450 enzyme that is a member of the PPAR signaling pathway. Note that CYP7A1 itself shows no significant perturbation until day 8. We discuss the support in the literature for a subset of the interactions with CYP7A1. HNF4a has been shown to bind to the promoter regions of CYP7A1, resulting in up to a nine-fold increase in production of the CYP7A1 protein in vitro [39]. The literature suggests tenuous regulatory connections between liver receptor homolog 1 (LRH-1, or Nr5a2) and CYP7A1. In vitro studies have shown that Nr5a2 both promotes and represses the expression of CYP7A1 [40,41]. In a recent study, a knockout of Lrh-1 (Nr5a2) performed selectively in cells that developed into mouse hepatocytes demonstrated that the absence of Nr5a2 had little effect on expression of CYP7A1 [42].
Liver X receptors regulate cholesterol and lipid homeostasis in multiple tissues via two isoforms: LXRa (Nr1h3), which is highly expressed in liver, and LXRb which is more abundant in adipose tissue, gut, kidney, and macrophages [43]. In contrast to the connection between LRH-1 and CYP7A1, LXRa is well known to Figure 6. Network of functional interactions resulting in the link between V$HNF1_Q6 and HSIAO_LIVER_SPECIFIC_GENES on day 8. In this and subsequent figures of such networks, each node represents a gene, and its color indicates the statistical significance of its perturbation (up-or down-regulation) in the contrast between CS and HM on the corresponding day. A node's shape represents its membership within the two gene sets: a pentagon represents membership in the first gene set (i.e., V$HNF1_Q6), a rectangle represents membership in the second gene set (i.e., HSIAO_LIVER_SPECIFIC_GENES), and the house shape represents membership in both gene sets. Nodes with blue (respectively, green) borders are those genes in the first (respectively, second) gene set that we mention or discuss in the text. An edge connecting two nodes represents a functional interaction as predicted by STRING. To increase clarity, we do not display interactions between genes within the same set. Abbreviations: HNF1: annotated with V$HNF1_Q6, LS: annotated with HSIAO_LIVER_SPECIFIC_GENES. doi:10.1371/journal.pone.0015247.g006 activate transcription of CYP7A1 in the presence of cholesterol [44]. Thus, it is surprising that we did not observe significant perturbation in expression of CYP7A1 until day 8. However, in vitro studies indicate that CYP7A1 protein exhibits low turnover [35], raising the possibility that the hepatocytes in both cultures had ample amounts of the proteins up to day 3.
Another set of contributions to the linkage between these two gene sets come from interaction of the nuclear receptors Hnf4a and Nr5a2/Lrh1 with sterol 12a-hydroxylase (CYP8B1), a member of the PPAR signaling pathway. CYP8B1 catalyzes a fate-determining reaction in which cholesterol is ultimately converted into the primary bile acid cholic acid, rather than chenodeoxycholic acid [35]. The study of selective knockout of Lrh-1 (Nr5a2) in mice [42] showed that, in contrast to the effect on the expression of CYP7A1, the knockout caused a significant drop in expression of CYP8B1, demonstrating a very strong regulatory relationship between Nr5a2 and CYP8B1 [42]. Additionally, strong experimental support for Hnf4a promotion of CYB8B1 expression exists [45]. Thus, the expression of CYP8B1 also increases over time, although it lags the expression of its regulatory receptors Hnf4a and Nr5a2.
Nr5a2 is also predicted to interact with 27-hydroxylase (CYP27A1), a mitochondrial cytochrome P450 enzyme that is responsible for a step in the conversion of cholesterol to approximately 25% of the bile acids in mouse [35]. We observe an increase in the perturbed expression of CYP27A1 concomitant to but lagging that of Lrh-1 (Nr5a2). The knockout of Lrh-1 led to significantly decreased expression of CYP27A1 [42], supporting the interaction of these two genes.
The role of PPARa. Next, we focus on the role played by PPARa in the linkage between nuclear receptors and the PPAR  signaling pathway. PPARs are a class of nuclear receptors responsive to fatty acid ligands. PPARs have been divided among three known subtypes, a, b=d, and c, with each subtype occurring in distinct tissues and effecting differing biological responses. Liver cells express PPARa, which is responsible for the regulation of fatty acid uptake and catabolism [46,47]. In our data, only PPARa shows increasing expression in CS cultures, compared to HMs; the other PPARs are not significantly different between the two culture systems.
In Figure 9, the significantly perturbed members of the PPARa pathway that PPARa interacts with include Scd1, Fabp1, Apoa2, Lpl, Acox1, Cpt1a, and CYP7A1. PPARa has been shown to promote expression of these genes by binding to their upstream Peroxisome Proliferator Regulatory Element (PPRE) regions as a heterodimer with RXRa (reviewed in [48]). We note that RXRa shows significant up-regulation in CS versus HM, as well (Fig. 9).
RXRa has been shown to be particularly highly expressed in the liver [49]. RXRb, however, tends to have low expression levels across all tissues [49]. The significant up-regulation of RXRc in CS versus HM is somewhat puzzling, given that RXRc tends to be exclusively expressed in the brain, anterior pituitary, and skeletal muscle [49][50][51], where it is responsible for triglyceride uptake and metabolism [52]. We discuss a subset of the interactions involving PPARa next.
Stearoyl-Coenzyme A desaturase 1 (D9-desaturase, Scd1) is the main hepatic isoform of SCD. Scd1 helps catalyze the rate-limiting step in the synthesis of monounsaturated fatty acids, particularly the production of palmitoleic acid and oleic acid from palmitic acid and stearic acid, respectively [48,53]. LXRa indirectly regulates transcription of Scd1 through activation of transcription of sterol regulatory element binding protein (SREBP) 1c [54,55], an activator of Scd1 transcription [56,57]. Additionally, LXRa directly activates Scd1 transcription through an upstream response element [58]. PPARa has also been demonstrated to directly activate transcription of Scd1 [59]. Thus, our observation of increasingly significant changes in expression for LXRa and PPARa, and a similar trend in Scd1, runs in accordance with previous studies.
The interaction of Fatty Acid Binding Protein 1 (Fabp1, L-FABP) with PPARa through protein-protein contacts is thought to promote the expression of proteins involved in fatty-acid oxidation and gluconeogenesis [60,61]. Included among these genes is Fabp1. Thus, it regulates its own expression through PPARa.
Regulation of fatty acid metabolism by nuclear receptors. The genes in NUCLEAR_RECEPTORS are responsible for initiating cellular responses to a wide variety of conditions and for starting appropriate signal cascades. The nuclear receptors in HSA03320_PPAR_SIGNALING_PATHWAY are the specific subset responsible for initiating the signaling cascade leading to the breakdown of long chain fatty acids [48]. The gene set HSA00071_FATTY_ACID_METABOLISM contains the full contingent of genes responsible for the catabolism of fatty acids. HSA03320_PPAR_SIGNALING_PATHWAY acts as a bridge between the two general classes of genes, NUCLEAR_ RECEPTORS and HSA00071_FATTY_ACID_METABOLISM. Figure 9 shows the interactions of individual genes in NUCLEAR_ RECEPTORS with those in HSA03320_PPAR_SIGNALING_ PATHWAY responsible for the upstream processes of fatty-acid catabolism, including uptake, such as L-FABP (Fabp1) and early-stage fatty-acid b-oxidation in the peroxisome, such as acyl-Coenzyme A oxidase 1 (Acox1) [48]. Figure 10 shows the individual genes in HSA03320_PPAR_SIGNALING_PATHWAY that interact with those in HSA00071_FATTY_ACID_METABOLISM responsible for later stages of b-oxidation in the mitochondria, such as acetyl-Coenzyme A acyltransferase 2 (Acaa2) and hydroxyacyl-Coenzyme A dehydrogenase/3-ketoacyl-Coenzyme A thiolase/ enoyl-Coenzyme A hydratase b (Hadhb) [48]. Thus, the signals from NUCLEAR_RECEPTORS are transferred to HSA00071_ FATTY_ACID_METABOLISM via the subset of nuclear receptors that are members of the PPAR signaling pathway, a chain of events that we are able to recover in the CBPLNs.

Interpretation of Links in CBPLNs
Keeping the examples of the previous sections in mind, we now discuss how links in CBPLNs might be interpreted.
Regulatory relationship. Gene set a may contain genes whose products regulate genes and/or their products in gene set b. An example is the linkage from NUCLEAR_RECEPTORS to other gene sets such as HSA03320_PPAR_SIGNALING_ PATHWAY and genes involved in cellular lipid metabolism; many liver-specific nuclear receptors such as LXRa and HNF4a regulate critical hepatic processes.
Multi-input motif. Multiple gene sets may link to a gene set b, suggesting that the expression of genes in b is regulated by genes in multiple other sets. Such a phenomenon is called a ''multi-input motif'' in the case of a gene being regulated by multiple transcription factors [62]. An example is HSIAO_LIVER_SPECIFIC_ GENES and links to this gene set from V$HNF1_Q6 and NUCLEAR_RECEPTORS.
Feedback. Links that exist in both directions between a and b may suggest that a regulates b and that b receives a feedback signal from a. This phenomenon may be observed within CBPLNs when the link is unidirectional at some time points and bidirectional in later time points. A specific example is the linkage between bile acid biosynthesis and HSA03320_PPAR_SIGNALING_PATHWAY, which is unidirectional on day 2 ( Figure 8B) but bidirectional on days 3 and 8 ( Figures 8C and 8D).
Downstream in the signal flow. A link from process a to process b and another from b and process c may suggest that c lies downstream of a. An instance of this feature is the link from NUCLEAR_RECEPTORS to HSA03320_PPAR_SIGNALING_ PATHWAY and the link from HSA03320_PPAR_SIGNALING_ PATHWAY to HSA00071_FATTY_ACID_METABOLISM in Figure 8B.
Multi-functional gene set. A gene set a that has many incoming links and/or many outgoing links might be an example of a multi-functional gene set. A prominent example in our CBPLNs is the central HSIAO_LIVER_SPECIFIC_GENES gene set. As we remarked earlier, the links incident on this gene set suggest what other processes the genes in HSIAO_LIVER_ SPECIFIC_GENES may regulate or be connected to. Clearly, such a feature depends on how a gene set is defined. For example, many biological processes in the Gene Ontology such as ''response to stress'' are themselves composed of well-defined and functionally-coherent processes. Similarly, the genes that are perturbed by a particular stimulus may participate in a wide variety of processes. CBPLNs can situate such genes in a rich context within the underlying network of molecular interactions.

Conclusions
We have presented an approach that represents cellular responses at the granularity of biological processes and connections among them. Our approach extends the work of Dotan-Cohen et al. [8] by integrating transcriptional data (the ''context'') with functional interaction networks. We focused our analysis on nearly 20 MSigDB gene sets we had identified as up-regulated in hepatocyte cultures in an earlier study. CBPLNs revealed numerous meaningful connections between different biological processes and gene sets, which we were successful in interpreting within the context of liver metabolism. Links and local network features in CBPLNs are generalizations of diverse physiological phenomena such as regulation, feedback, and downstream signal flow from the gene/protein level to the scale of biological processes.
Our approach is a complement to a suite of methodologies that integrate physical, signaling, regulatory, and functional networks with measurements of molecular profiles such as transcriptional, proteomic, or metabolic data to compute the response network, which may be defined as the sub-network of interactions that are perturbed in a particular condition. A wide variety of methods have been developed for computing such response networks [63][64][65][66][67]. Response networks are typically interpreted by computing which biological processes are enriched in them. In contrast, rather than compute the entire response network, we focus on discovering connections between perturbed biological processes. Since response networks can include genes without any annotations, they can be used to predict biological processes to which unannotated genes belong [68]. In contrast, only genes annotated to some biological process can contribute to CBPLNs. A detailed comparison of CBPLNs to response networks and the development of methods that combine both approaches will be the focus of future research.
Generalizing our approach to the entire spectrum of MSigDB gene sets or to the set of all biological processes in the Gene Ontology raises several interesting challenges. First, gene sets can have considerable overlap, leading to redundant links. Second, scaling this approach up to thousands of gene sets may result in tens to hundreds of thousands of links that are deemed to be statistically significant. This deluge of links will be hard to interpret. Third, it will be challenging to computationally scale our permutation-based sampling to the large number of process pairs we will have to test. We are currently investigating these issues.
In this work, we computed CBPLNs for two conventional hepatocyte culture systems. Three dimensional liver mimics [69,70] and microscale co-culture systems [71] have shown improved retention of hepatic phenotype over conventional systems. In the future, we plan to apply CBPLNs to liver mimics and co-culture systems in order to obtain insights into the intercellular signaling mechanisms that confer improved hepatic phenotype. More generally, our approach may provide a novel route to explore, analyze, and interpret cellular responses to internal and external cues.

Measuring perturbation from gene expression data
We applied Linear Models for Microarray Data (LIMMA) [72] to the DNA microarray data to compute expression p-values indicating the differential expression of each gene for each of the four contrasts shown in Table 1.

Scoring a link between a pair of processes
We first present the approach developed by Dotan-Cohen et al. to identify linkages between biological processes [8]. Given an intracellular interaction network for an organism and Gene Ontology annotations for the genes in those networks, Dotan-Cohen et al. compute what they term a Biological Process Linkage Network (BPLN). Informally, given two biological processes, they defined the first process as being linked to the second process if genes annotated by the first process interact with a significant number of genes annotated by the second process. By definition, such links are directed. The resulting output of the algorithm by Dotan-Cohen et al. is, for each ordered pair of processes, the probability that the first process is linked to the second.
Formally, let F be the set of all biological processes. We seek to ask ''Given two processes a,b[F , is process a linked to process b?'' More specifically, of the genes that are neighbors of those annotated by a, are many more annotated with b than would be expected by chance? Let V be the set of all genes in an organism. Let V a (V be the set of genes annotated by process a[F , and let the universe U~S a[F V a , be the set of all genes annotated by at least one process in F . Let G(U,E) denote an undirected interaction graph where E is the set of undirected edges (u,v), each representing an interaction between genes u,v[U. We define the set N a as the set of genes v that meet the following criteria: 1. gene v neighbors at least one gene u annotated with a 2. gene v is not annotated with a.
In other words, Next, we define e N N ab~Na \V b , i.e., the set of genes that are neighbors of genes annotated with a, are not annotated with a themselves, and are annotated with process b. We define the link p-value p(a,b) as the probability that, if we selected a set X of DV b D genes uniformly at random from U, the set N a \X would contain D e N N ab D or more genes. We can compute this link p-value as the tail of a hypergeometric distribution: If this link p-value is significant at some cutoff a, we conclude that process a is linked to process b.
Extending the score to include transcriptional data and interaction weights With this background, we extend the formulation of BPLN to take transcriptional measurements and interaction weights into account. For each interaction (u,v) in the graph G(U,E), we use w uv w0 to denote its weight. The larger the weight of an interaction, the larger is our belief that u and v indeed interact functionally in the cell. We define a scoring function s(v) : V ?R z that maps genes to a non-negative real number representing their degree of perturbation in a given biological context (e.g., CS day 8 versus HM day 8). In this work, we compute s(v) as absolute value of the logarithm of the LIMMA p-value of the gene. Given processes a and b, we first define a score t(v,a) : U?R z .
The function t measures the contribution of the neighbors of v annotated with term a based on their perturbation. Ideally, if at least one neighbor of v that is annotated with a is highly perturbed, we desire that t(v,a) take a high value. On the other hand, if no such neighbor of v is highly perturbed, we desire that t(v,a) take a small value. Naturally, the weights of the interactions should also play a role in t(v,a). Accordingly, we define t(v,a)~maxfw uv s(u) : i.e., t(v,a) is the maximum weighted score of all neighbors of node v that are annotated with process a.
We define the contextual linkage score s(a,b) between processes a and b as the following: Figure 11 contains a toy example that illustrates these concepts. Thus, a node v annotated by a makes a large contribution to the contextual linkage score s (a,b) if v shows a high amount of perturbation in a particular context and if the neighbors of v annotated by a also show a high amount of perturbation. If we have many such nodes v, then s(a,b) itself will be large. Note that if we set s(v)~1 for all v[U and if all edges have weight 1, then s(a,b) is equal to the size of ab , identical to the score computed by the original BPLN algorithm.
In this formulation, some pairs of processes may have a high contextual linkage score even if all genes were perturbed by the same amount. To account for this possibility, we compute a normalized score s Ã (a,b)~s(a,b){s'(a,b), where s'(a,b) is a background score computed in the same manner as s(a,b), but, after setting the gene perturbation score s(u) equal to the average expression s~P u[U s(u)=DUD for all genes u in U. Thus, s'(a,b) represents the score for the link between processes a and b if all genes had the average expression score.

Assessing the statistical significance of links
Since the contextual linkage score is a weighted generalization of the statistic measured by Dotan-Cohen et al. it is unclear how to compute its statistical significance analytically. Therefore, we use two different approaches in order to assess the significance of the observed score s(a,b) empirically.
1. The first approach is an empirical version of the test performed by Dotan-Cohen et al. [8]: what is the probability that we would observe a score s(a,b) or more if we were to randomly select the nodes annotated with b? Specifically, we repeatedly select a set X of size DV b D uniformly at random without replacement from U and calculate s(a,b) for each of these random selections. After performing the step 10,000 times, we return the fraction of random scores that are larger than the observed value of s(a,b) as the link p-value p(a,b). 1. Two different processes may annotate some genes in common.
To preserve this property even in the random selections of the set X over different processes, we adopt the following approach: we construct a bipartite graph H in which a node is a gene or a biological process and an edge connects a gene to a biological process it is annotated with. We randomly permute the labels of the genes in this graph. To generate a random set X of size DV b D, we simply select the genes annotated with b in the bipartite graph with randomized gene labels. These steps create a randomized set of annotations that satisfy two properties: (a) every process annotates the same number of genes as in the original set of annotations, and (b) if k genes are annotated by each process in a set of processes P(F , then these processes co-annotate exactly k genes in the randomized dataset as well. 2. The second approach accounts for the role played by the interactions between the genes in V a and genes in V b . Therefore, we generate a graph G'(U,E') with the property that each node v[U has the same degree in G' and G. We measure the contextual linkage score between a and b with respect to G'. We generate G' 10,000 to build a null distribution for the contextual linkage score, and compute the link p-value p(a,b) as before. 2. To construct G', we follow the ''edge-swap'' approach [73]. We begin with the set of edges E'~E and modify the edges in E' with pairwise edge swaps. For each edge swap, we first select a pair of edges (u,v),(x,y)[E'. We then select, with equal probability, either (u,y),(x,v) or (u,x),(v,y) (i.e., the edges created by swapping the endpoints of the original pair of edges) as a candidate edge pair. If either candidate edge already exists in E' or creates a self-loop, we retain the original pair of edges in E', i.e., we do not perform the edge swap. Otherwise, we remove the original edges (u,v),(x,y) from E' and insert the new edges into E'. In total, we perform kDED edge-swap events to create a randomized graph G', where k is a user-defined parameter. In this work we used k~10.
We use the method of Benjamini and Hochberg [20] to correct for testing multiple hypotheses, while ensuring that the corrected link p-values are monotonic [74]. For either approach, if p(a,b)ƒ0:01, we say that term a is linked to b in the given biological context.

Supporting Information
File S1 File S1 is in tab-separated values format. It contains results of comparisons on the number of links identified to be significant under the two hypothesis tests, as well as under the original BPLN algorithm by Dotan-Cohen, et al. [8], which does not consider gene expression data. Six tables are given for different pairwise comparisons of hypothesis tests. In the table headers, ''gene set'' indicates testing the significance of a link when compared to a distribution of scores calculated from randomized annotations, ''network'' indicates testing the significance of a link when compared to a distribution of scores calculated from a randomized network, ''normalization'' indicates the scores were normalized by deducting the score calculated for averaged expression, and ''bpln'' indicates testing the significance using the original BPLN algorithm. Column headers of tables are defined as follows: ''day'' indicates the time point of the contrast; ''in both'' indicates the number of links found to be significant in the two compared hypothesis tests (e.g., gene set randomization and network randomization); ''first only'' indicates the number of links found significant under the first hypothesis test (e.g., gene set randomization); ''second only'' indicates the number of links found significant under the second hypothesis test (e.g., network randomization); ''neither'' indicates the number of links not found significant under either hypothesis test; ''intersection significance'' indicates the significance of the number of links found significant under both hypothesis tests versus what would be expected by chance, as assessed under Fisher's Exact Test. (TSV) File S2 File S2 is in tab-separated values format. It contains results of comparisons between the two different hypothesis tests, as well as the original BPLN algorithm by Dotan-Cohen, et al. [8]. Four sets of tables appear indicating the comparison of results at different cutoffs for considering a link to be significant. The header of each set indicates the cutoff used: 0.005, 0.01, 0.05, or 0.1. In each set of tables, the first set is the pairwise comparison under the two hypothesis testing methods of gene set randomization and network randomization, using normalization. The column headers for this table are defined as follows: ''Gene set randomization normalized'' indicates the number of links found to be significant under gene set randomization with normalization; ''Network randomization normalized'' indicates the number of links found to be significant under network randomization with normalization; ''Intersection'' indicates the number of links found significant under both forms of randomization; and ''Jaccard index'' indicates the ratio of the size of the intersection of the sets of links significant under the two tests to the size of their union. In the second table of each set, the results under the original BPLN algorithm are compared to those of the two hypothesis tests. The column headers for this table are defined as follows: ''BPLN'' indicates the number of links found significant under the original BLPN algorithm; ''Gene set randomization normalized'' and ''Network randomization normalized'' are identical to the first table; ''Intersection'' indicates the number of links found significant under the original BPLN algorithm and the respective hypothesis test (e.g., under gene set randomization); ''Jaccard index'' indicates the ratio of the size of the intersection of the sets of links found significant under BPLN and the respective hypothesis test to the size of the union. (TSV) File S3 File S3 contains scatter plots of link p-values for links found to be significant (p-value ƒ0:01) by least one of the hypothesis tests (based on gene set randomization or on network randomization) with normalization. Each plot corresponds to a single day. Each point on a plot corresponds to one pair of processes, with the x-coordinate being the p-value from gene set randomization and y-coordinate representing the p-value from network randomization. In each plot, both axes are on a logarithmic scale.