ENCODE Tiling Array Analysis Identifies Differentially Expressed Annotated and Novel 5′ Capped RNAs in Hepatitis C Infected Liver

Microarray studies of chronic hepatitis C infection have provided valuable information regarding the host response to viral infection. However, recent studies of the human transcriptome indicate pervasive transcription in previously unannotated regions of the genome and that many RNA transcripts have short or lack 3′ poly(A) ends. We hypothesized that using ENCODE tiling arrays (1% of the genome) in combination with affinity purifying Pol II RNAs by their unique 5′ m7GpppN cap would identify previously undescribed annotated and unannotated genes that are differentially expressed in liver during hepatitis C virus (HCV) infection. Both 5′-capped and poly(A)+ populations of RNA were analyzed using ENCODE tiling arrays. Sixty-four annotated genes were significantly increased in HCV cirrhotic as compared to control liver; twenty-seven (42%) of these genes were identified only by analyzing 5′ capped RNA. Thirty-one annotated genes were significantly decreased; sixteen (50%) of these were identified only by analyzing 5′ capped RNA. Bioinformatic analysis showed that capped RNA produced more consistent results, provided a more extensive expression profile of intronic regions and identified upregulated Pol II transcriptionally active regions in unannotated areas of the genome in HCV cirrhotic liver. Two of these regions were verified by PCR and RACE analysis. qPCR analysis of liver biopsy specimens demonstrated that these unannotated transcripts, as well as IRF1, TRIM22 and MET, were also upregulated in hepatitis C with mild inflammation and no fibrosis. The analysis of 5′ capped RNA in combination with ENCODE tiling arrays provides additional gene expression information and identifies novel upregulated Pol II transcripts not previously described in HCV infected liver. This approach, particularly when combined with new RNA sequencing technologies, should also be useful in further defining Pol II transcripts differentially regulated in specific disease states and in studying RNAs regulated by changes in pre-mRNA splicing or 3′ polyadenylation status.


Introduction
Microarray based gene analyses have provided a new approach to identifying disease-specific changes in gene expression. This approach has improved the understanding of molecular pathobiology by identifying genes that are differentially regulated in selected diseases and by identifying biomarkers for disease states or responses to therapy. Successful examples of this approach include identifying subtypes of diffuse large B-cell lymphomas with different prognoses and increased expression of the zeta-chain (TCR) associated 70 kDa protein kinase (ZAP70) in chronic lymphocytic leukemia (CLL) which is a predictor of the clinical course [1,2,3]. Examples in stratifying breast cancer include identifying a panel of transcriptional changes predicting distant metastasis in lymph-node-negative patients, predicting the response to specific therapies and survival of patients [4,5,6,7,8,9].
Progress towards stratifying patients with colon cancer using gene expression signatures was recently reported [10].
In hepatitis C infection, gene array analyses in chimpanzees identified host responsive gene pathways in acute and chronic hepatitis C infection [11]. Gene array studies have also been used to identify potential biomarkers of hepatitis C infection, interferon regulated genes which predict the response to therapy, and gene expression patterns which predict early progression to fibrosis in liver transplant recipients [12,13,14]. Unique gene expression profiles were also identified which distinguish alcoholic liver disease and hepatitis C as well as hepatitis B and hepatitis C infection [15,16]. More recent data suggests that the hepatitis C virus may regulate host non-coding RNAs (i.e., miRNAs) to promote viral replication [17].
One limit of standard oligonucleotide or cDNA microarrays is that they require prior knowledge of the sequence of the RNA that will be measured. In addition, many arrays are purposefully biased to interrogate sequences originating from 39 ends of mRNA encoded by the 39 terminal exon of previously annotated genes. Although these methods measure RNA transcripts from wellannotated protein coding genes and selected ncRNAs, they do not generally provide information regarding pre-mRNAs and most non-coding RNAs that can be important in physiological or pathological processes. Genomic tiling arrays or next generation sequencing provide a useful alternative for quantifying and characterizing transcription across the genome without requiring prior knowledge of gene or RNA transcript sequences. Tiling arrays are designed to cover the entire genome, selected chromosomes or contigs of the genome. Recent gene expression experiments using ENCODE (Encyclopedia of DNA Elements) tiling arrays, representing 1% of the human genome, have demonstrated that much of the genome is transcribed and that most nucleotides appear to be present in at least some form of RNA transcript [18]. A recent study in Schizosaccharomyces pombe provides further evidence for extensive sense and antisense transcription [19]. The nature of these RNAs, the RNA polymerase responsible for their transcription and their biological function remain unknown.
Previous ENCODE studies analyzed polyadenylated [poly(A)+] RNA, but recent data suggests that many RNAs in yeast and mammals either lack or have short 39 poly(A) ends and are underrepresented or absent in microarray analyses using poly(A)+ RNA [18,20]. To enhance the selection of Pol II transcripts we have developed an efficient method of purifying RNA polymerase II (Pol II) transcripts regardless of their 39 polyadenylation status. Pol II RNAs have a 59 m7GpppN cap added enzymatically to their 59 ends during the pausing phase of transcription [21,22]. Our approach purifies Pol II transcripts by binding their 59 caps with a high-affinity variant of the RNA cap binding protein (eIF4E K119A ) [20,23]. When compared to standard poly(A)+ dependent purifications the yield of 59 capped RNA is 2-3 fold greater from the same quantity of total RNA starting material, suggesting that poly(A)+ purification does not recover all capped Pol II RNAs [20].
To date, no gene expression studies on hepatitis C infected liver have been performed using tiling array analyses such as ENCODE. The goal of this study was to identify differentially expressed annotated genes and novel RNAs in hepatitis C infected liver that would not typically be recorded by analyzing poly(A)+ RNA with standard gene expression arrays. In this study, we utilized 59 capped and poly(A)+ RNA populations isolated from control and chronically infected hepatitis C (HCV) cirrhotic human liver biospecimens using ENCODE tiling arrays to measure differential expression of Pol II RNAs. Differentially expressed RNAs identified in this analysis were then measured by real-time PCR (qPCR) in additional control, mild chronic hepatitis C (no fibrosis) and chronically infected hepatitis C cirrhotic biospecimens.

Results
The ENCODE tiling array analysis of 59 capped RNA identified 47 annotated genes with increased expression (fold change .1.5, Bonferroni adjusted p value ,0.05) ( Figure 1A) and 22 genes with decreased expression in a chronic hepatitis C (HCV) cirrhotic as compared to an uninfected control liver specimen ( Figure 1B). Analysis of poly(A)+ RNA identified 37 genes with increased expression and 15 genes with decreased expression in HCV cirrhotic as compared to control liver. Twenty of the upregulated genes and six of the downregulated genes were identified in both 59 capped and poly(A)+ RNA populations ( Figure 1). Of note, 8 out of 17 upregulated genes (47%) and 2 out of 9 down-regulated genes (22%) unique to poly(A)+ RNA did have a statistically significant expression difference (p,0.05) in 59 capped RNA, but were excluded from our list because their fold change did not meet the inclusion criteria (,1.5). None of 27 upregulated or 17 downregulated genes unique to 59 capped RNA were found to have significant differential expression in the poly(A)+ RNA (see Tables S1, S2, S3, S4, S5, S6). This is likely due to less variation in signal intensity observed with 59 capped RNA among experimental replicates as compared to poly(A)+ RNA (the average SEM per gene was 28.1 for 59 capped; poly(A)+ average SEM per gene was 77.2).
Thirteen of the 64 genes (20%) found to be upregulated in HCV cirrhotic liver were identified by GO-term enrichment analysis to have biologic functions related to the immune response (Table 1) [24,25]. Several of these genes were selected for qPCR analysis of liver tissue from multiple patients, including interferon regulatory factor 1 (IRF1), a transcription factor involved in the interferon response to HCV infection in-vitro [26]. On ENCODE analysis, IRF1 was found to be upregulated in HCV cirrhotic liver in 59 capped RNA only (fold change 2.1, p = 0.03, Figure 2A). qPCR analysis of IRF1 in multiple patient samples confirmed the ENCODE findings of increased expression in HCV cirrhotic liver (n = 7) compared to control liver (n = 10) (fold change 2.4, p = 0.03, Figure 2B). Percutaneous liver biopsies (n = 7) from patients with chronic hepatitis C with mild inflammation and no fibrosis (Metavir grade 1, stage 0) were also analyzed and showed a significant increase in IRF1 expression (fold change 3.5, p = 0.001, Figure 2B). These patients were not being treated with recombinant interferon at the time of liver biospecimen acquisition.
The tripartite motif-containing 22 (TRIM22) mRNA, encoded by an interferon regulated gene previously reported to be upregulated with HIV and hepatitis B viral infections [27,28], was also found to have increased expression in our analysis of HCV cirrhotic liver. TRIM22 was increased on ENCODE analysis in both 59 capped (fold change 4.1, p,0.001) and poly(A)+ RNA (fold change 6.2, p = 0.001, Figure 3A); we also identified increased transcription in intronic regions within the TRIM22 gene. qPCR analysis of TRIM22 showed a large increased expression (fold change 9.7, p,0.001) in HCV cirrhotic liver ( Figure 3B). Liver biopsies from patients with mild chronic hepatitis C and no fibrosis showed a marked increase in TRIM22 expression (fold change 17.0, p,0.001, Figure 3B).
Connective tissue growth factor (CTGF) mRNA is upregulated during liver fibrosis and its protein product has recently been suggested as a non-invasive biomarker of liver fibrosis in patients infected with hepatitis C [29]. CTGF mRNA was upregulated in both 59 capped (fold change 5.5, p,0.001) and poly(A)+ RNA (fold change 13, p,0.001, Figure S1A). qPCR analysis of liver tissue from multiple control and HCV cirrhotic liver specimens showed a large mean increase in CTGF expression (fold change 17, p = 0.004) ( Figure S1B). Interestingly, qPCR analysis of liver biopsies with mild chronic hepatitis C with no fibrosis showed no increase in CTGF expression (fold change 25.7, p = 0.03, Figure  S1B).
Several other differentially expressed genes identified in the ENCODE tiling array analysis were selected for qPCR analysis using additional patient samples. The Met proto-oncogene (MET) is the hepatocyte growth factor receptor and has been implicated in the development of multiple tumor types. ENCODE array analysis of 59 capped RNA showed decreased expression of MET in hepatitis C cirrhotic liver (fold change 21.8, p = 0.015, Figure  S2A). However, qPCR analysis of multiple patients did not demonstrate a significant difference in expression in HCV cirrhotic liver ( Figure S2B). However, qPCR analysis of mild chronic hepatitis C biopsies with no fibrosis showed a significant increase in MET expression (fold change 3.7, p = 0.002, Figure  S2B).
Cathepsin D (CTSD), a lysosomal aspartyl protease associated with several disease processes, was significantly upregulated in the ENCODE array analysis of 59 capped RNA from hepatitis C cirrhotic liver (fold increase 1.9, p,0.001). However, when multiple patient samples were analyzed by qPCR no significant difference was identified (data not shown). The FUN14 domain containing 2 (FUNDC2) mRNA, encoding a protein suggested to interact with hepatitis C core protein, was downregulated in 59 capped RNA from hepatitis C cirrhotic liver (fold change 22.1, p,0.001) [30]. However, when specimens of liver from additional patients with hepatitis C cirrhosis were analyzed by RT-PCR no significant difference was seen (data not shown). Nevertheless, RT-PCR of cDNA prepared from the same liver specimens (HCV cirrhotic 1 and Control 1) used in the ENCODE tiling array analysis did verify that MET, CTSD and FUNDC2 were both significantly different in those specimens (data shown for MET only).
Differential gene expression associated with HCV cirrhotic liver was also observed in non-protein coding genomic regions including intronic and intergenic regions. Figure 4 illustrates the differential gene expression observed in exonic, intronic, and intergenic regions in 59 capped and poly(A)+ RNA. Fifty percent of the upregulated and 78% of the downregulated nucleotides were found in intronic regions using 59 capped RNA, as compared to 23% of the upregulated and 49% of the  Figure S4A, coordinates 39,253,256-39,413,500). All three of these genomic regions demonstrated RNA transcription at least 10 kb from annotated genes. Although these regions were found to be transcribed in previous ENCODE tiling array analysis of HL60, HeLa, and Gm06990 human cell lines ( Figure S5, http://genome.ucsc.edu/ENCODE/pilot.html), there was no prior evidence that they represented 59 capped Pol II RNAs. Further investigation using the UCSC genome browser (http:// genome.ucsc.edu) identified several ESTs (spliced and unspliced) originating from the transcribed region identified on chromosome 9 ( Figure S6A). Not surprisingly, the 160kb region on chromosome 21 included several ESTs (BE870595, BG459638, BG460250, BI011795) and overlapped with one hypothetical protein (AJ011409, unpublished). It also included previously identified 59 Rapid Amplification of cDNA Ends (RACE) products [31]. The 59 capped RNA transcript(s) originating from the unannotated region of chromosome 14 did not overlap with any known genes in the UCSC database but was associated with a SNP (rs2884435) and several unspliced ESTs ( Figure S6B). qPCR analysis of the 59 capped Pol II RNA(s) originating from the unannotated region on chromosome 14 confirmed the presence of a RNA transcript that was increased 7.8 fold in seven HCV cirrhotic as compared to ten control liver specimens (p,0.001) ( Figure 5B). Moreover, qPCR analysis of mild chronic hepatitis C biopsies with no fibrosis showed a similar increase in expression of the transcript (fold change 11.3, p = 0.003, Figure 5B).
The 59 capped RNA originating from the unannotated region on chromosome 9 was also confirmed to have a 4.5 fold increase by qPCR analysis of seven HCV cirrhotic as compared to ten control liver specimens (p,0.001) ( Figure 6B). qPCR analysis of mild chronic hepatitis C with no fibrosis showed a similar increase in expression of the transcript (fold change 4.1, p = 0.002, Figure 6B). Pol II RNA transcripts were isolated from HCV cirrhotic and control liver via two methods. 59 capped RNA was isolated using a high affinity variant eIF4E protein, poly(A)+ RNA was isolated with oligo-dT (Qiagen) and cDNA was synthesized with random primers (Methods). RNA transcript expression was measured by averaging fluorescent signal intensity on Agilent's ENCODE gene arrays for each sample (Methods). Differences in gene expression were visualized using Genoviz's Integrated Genome Browser software. Only annotated genes with $1.5 fold and Bonferroni corrected p values ,0.05 between HCV cirrhotic and control liver are listed with details of their expression. Genes are listed by the specific pool of RNA analyzed. No genes with decreased expression in HCV cirrhotic liver were found in the poly(A)+ RNA. Genes which have been previously documented to have increased expression in HCV cirrhotic liver are marked with *. Only genes associated with the immune response are listed above. A complete list of differentially expressed genes is provided in supplemental materials. doi:10.1371/journal.pone.0014697.t001 The region on chromosome 21 did not differ significantly among multiple patients with HCV cirrhosis using qPCR.
To further define the structure of differentially expressed RNA(s) originating from chromosome 9 and 14, we used 59 and 39 rapid amplification of cDNA ends (RACE) and DNA sequencing over one region on chromosome 9 and three separate regions of chromosome 14 (Chr14a, Chr14b, and Chr14c) where differential expression was observed by ENCODE analysis. Using 5' RACE we identified the 59 end of an unannotated RNA transcript found on chromosome 9 ( Figure S6A). The 59 end of this transcript was at a similar chromosome coordinate as other previously identified ESTs. Using 39 RACE we identified the 39 ends of independent RNA transcripts on the minus strand of Chr14a and Chr14b regions, respectively ( Figure S6B). We also sequenced approximately 600-800 nucleotides including the poly(A)+ end of each transcript. The existence of at least a 1.50 kb transcript in the chromosome 14c region was confirmed by standard PCR product sequencing. To compare relative expression of RNA transcripts on the Chr14a and Chr14c regions we performed qPCR analysis. Interestingly, gene expression was similar in the HCV cirrhotic and control human liver samples but more than 10-fold higher in the Chr14a region compared to the Chr14c region in Huh7.5 cells, a human hepatoma cell line (data not shown). . Differential expression of IRF1 in HCV cirrhotic as compared to control liver. Panel A. Expression of IRF1 as measured by signal intensity on ENCODE tiling arrays is displayed using the Integrated Genome Browser (IGB). The y-axis represents the log2 transformation of the normalized signal divided by the background signal on arrays and each bar represents the normalized signal intensity of probes hybridized to 60mer targets tiled across the gene region (Methods). Genomic regions that are tiled yet lack a signal are indicated by a baseline tick mark. Absence of a tick mark indicates no probes in that region. Gene structure, orientation, and chromosomal location are shown in black. Average RNA expression based on the analysis of 59 capped RNA isolated from HCV cirrhotic and control liver is depicted in green for both HCV cirrhotic and control liver. Average RNA expression using poly(A)+ RNA is depicted in blue. Differences in expression between HCV cirrhotic and control liver for poly(A)+ and 59 capped RNA populations are depicted as window level false discovery rates in red (-10Log 10 FDR) (see Methods). A transformed FDR of $13 (represents an untransformed FDR #0.05 or 5 false positives out of 100) was considered statistically significant. Panel B. qPCR was performed as described in Methods. Triplicate samples from seven HCV cirrhotic, seven mild HCV (no fibrosis), and ten control livers were analyzed. HCV Cirrhotic 1 and Control 1 refer to the original samples used for the ENCODE tiling array analysis. The mean 6 SEM fold change for each specimen analyzed is shown and the location of the qPCR primers is indicated by qPCR in Panel A. P-values were calculated using the Student's t-test. doi:10.1371/journal.pone.0014697.g002

Discussion
The quantitative analysis of RNA transcripts in control and diseased tissue to define differential gene expression and aid biomarker discovery has generally used cDNA derived from total or poly(A)+ RNA hybridized to oligonucleotide microarrays. Recent studies of RNA transcription in human cells using ENCODE tiling arrays and poly(A)+ RNA have surprisingly described extensive transcription in previously unannotated genomic regions [18,31,32]. A consistent observation in these studies has been unexpectedly high transcription in unannotated and intronic regions of the genome. For example, a highresolution strand-specific analysis of the entire transcriptome of S. pombe showed extensive transcription including intergenic and antisense transcription [19]. In addition, studies suggest that the number of Pol II transcripts with absent or short poly(A)+ ends may be markedly underestimated (ref 18,20). Based on these findings it seems likely that standard microarray analyses are underestimating disease specific gene expression changes. In this study we tested this possibility by analyzing gene expression changes in hepatitis C (HCV) cirrhotic as compared to control liver in both 59 capped and poly(A)+ populations of RNA using ENCODE tiling arrays representing 1% of the genome.
Differential gene expression between HCV cirrhotic and control liver, in most cases, was observed with a greater sensitivity and additional detail when 59 capped RNA was analyzed as compared to poly(A)+ RNA (Table 1). Although many of the differentially expressed transcripts identified using poly(A)+ RNA had higher fold changes, the variability between experimental replicates was greater resulting in failed statistical testing. One possible explanation for the increased consistency in analyzing 59 capped as compared to poly(A)+ RNA could be variability of the poly(A)+ ends. Short poly(A)+ ends have been demonstrated for a number of well-annotated genes in human liver, such as eNOS mRNA in vascular endothelial cells, and is a well-documented mechanism for regulating gene expression in developmental models [20,33,34]. Moreover, examples of mRNAs with a poly(A)-limiting element in their 39 end, such as the Xenopus albumin mRNA produced by liver, have poly(A)+ ends of ,20 nts yet are efficiently translated [35].
Additional differences observed in the ENCODE tiling array analysis of 59 capped and poly(A)+ RNA populations included differences in signal intensity across specific regions of each gene transcript. In most cases the signal intensities at the 39 end of transcripts were greater when poly(A)+ RNA was analyzed as compared to 59 capped RNA. In some cases the signal on tiling arrays was greater in the 59 end of annotated genes when 59 capped RNA was analyzed as compared to poly(A)+ RNA. With some gene transcripts, such as the hepatocyte growth factor receptor (MET), tissue inhibitor of metalloproteinase 3 (TIMP3), MyoD family inhibitor domain containing (MDFIC) and mitogenactivated kinase kinase kinase 1 (MAP3K1), higher levels of transcription was observed in intronic regions when 59 capped RNA was analyzed as compared to poly(A)+ RNA. This is consistent with the 59 capped RNA purification that includes unspliced heterogeneous nuclear RNA (hnRNA) as well as mature spliced RNA, while the poly(A) dependent purification predominantly represents mature spliced RNA.
We identified many immune response genes that were significantly increased in HCV cirrhotic as compared to control liver, as other studies have, and found upregulated genes that were not identified previously by gene array analysis. These included IRF1, tripartite motif-containing 22 (TRIM22), and multiple leukocyte immunoglobulin-like receptors (LILRA1, LILRA4, LILRA5, LILRB2, LILRB3 and LILRB4) [14,16,36]. IRF1 transcription was significantly increased in HCV cirrhotic as compared to control liver when 59 capped RNA was analyzed, but not when poly(A)+ RNA was analyzed. IRF1 is a critical transcriptional regulatory factor that modulates interferon stimulated gene (ISG) expression and has been shown to regulate HCV subgenomic replicon activity in cultured hepatoma cells [37,38]. Interestingly, polymorphisms in the IRF1 promoter have been reported to be associated with a better response to interferon alpha (IFN-a) therapy in patients with chronic hepatitis C [39]. Tripartite motif (TRIM) 22 was another immune response related gene that was significantly increased in HCV cirrhotic as compared to control liver in our analysis. The tripartite motif family of proteins has been associated with innate immunity to viruses by restricting viral replication [40]. TRIM22 is dramatically upregulated by interferon signaling and decreases HIV replication (Barr et al., 2008). In addition, it has recently been shown to suppress HBV replication in culture [28]. Genome wide expression array analysis of both chimpanzee and human liver tissue have provided evidence for increased expression of TRIM22 in hepatitis C infected liver, however this is the first report to confirm its increased expression in HCV infected human liver (both cirrhotic and non-fibrotic) using real time PCR [11,12,41]. The qPCR analysis of biopsies showing mild hepatitis C without fibrosis provide evidence that the upregulation of IRF1, TRIM22, and MET are authentically due to HCV infection, and not due to major changes in liver tissue cell type.
Further evidence of alterations in immune response function includes the upregulation of multiple leukocyte immunoglobulinlike receptors. These receptors, also known as immunoglobulinlike transcripts (ILTs), are expressed on myelomonocytic cells and can influence both the innate and acquired immune response [42]. LILRB2, the most highly upregulated inhibitory ILT in our study, is also upregulated in HIV patients and may impair the antigen presentation of monocytes [43]. Increased expression of another inhibitory ILT, LILRB4 or ILT3, also impairs antigen presentation and T cell recruitment as well as modulates the expression of proinflammatory cytokines [44]. Together, these transcriptional  Figure 2 except that each tick mark represents the normalized mean signal intensity of probes within a 200 nt window. Panel B. qPCR was performed as described in Methods. Triplicate samples from seven HCV cirrhotic, seven mild HCV (no fibrosis) and ten control livers were analyzed. HCV cirrhotic 1 and Control 1 refer to original samples used for the ENCODE tiling array analysis. The mean 6 SEM fold change for all specimens analyzed is shown and the location of the qPCR primers is indicated by qPCR in Panel A. doi:10.1371/journal.pone.0014697.g005 changes observed in immune response genes are consistent with viral infection. It should be noted that a number of interferoninducible genes previously reported to be upregulated with HCV infection were not interrogated in our ENCODE tiling array analysis (representing 1% of the genome) included STAT1, IRF9, IFI27, CXCL9, CXCL19 and CXCL11 [41,45,46].
A number of gene expression studies of HCV infected liver biospecimens, done with annotated gene arrays and qPCR, have been reported. They include studies that identified potential molecular makers for HCV-associated liver disease and transcript profiles predictive of both early stage fibrosis and end stage HCV induced liver disease (cirrhosis) [14,47]. These studies provided evidence that during late stages of HCV induced liver disease many of the changes in gene expression observed between HCVinfected liver as compared to control liver biospecimens are due to liver fibrosis and not HCV infection. Gene transcripts found to be upregulated in both early and late stage disease include many interferon-stimulated genes like STAT1, IRF9, IFI27 and CXCL10 while transcripts involved in growth factor signaling and tissue remodeling like CTGF, MMP7 and IL8 are more commonly upregulated during fibrosis [47,48,49]. In our tiling arrays studies, 1% of the genome was interrogated at a high degree of resolution that included approximately 400 protein coding genes. Therefore a direct comparison of gene transcripts interrogated in our dataset compared to whole annotated gene datasets is limited. Nevertheless, our gene expression results are consistent with the activation of interferon signaling pathways in both mild hepatitis C (without fibrosis) and HCV cirrhotic biospecimens as observed by the significant upregulation of both liver IRF1 and TRIM22 mRNA transcripts. In addition, CTGF, considered a biomarker of fibrosis and cirrhosis, was only upregulated in HCV cirrhotic biospecimens but not in mild hepatitis C without fibrosis similar to previous reports. With the goal of improving the selection of patients for current hepatitis C therapies, liver gene expression signatures predictive of response to therapy have been identified [13,45,50]. These gene signatures have included interferon Figure 6. Differential expression of a Pol II RNA from an unannotated region of Chromosome 9. Panel A. Expression of 59 capped and poly(A)+ RNAs as measured by signal intensity on ENCODE tiling arrays are displayed using IGB. The data is displayed as in Figure 5. Panel B. qPCR was performed as described in Methods. Triplicate samples from seven HCV cirrhotic, seven mild HCV (no fibrosis) and ten control livers were analyzed. HCV cirrhotic 1 and Control 1 refer to original samples used for the ENCODE tiling array analysis. The mean 6 SEM fold change for all specimens analyzed is shown and the location of the qPCR primers is indicated by qPCR in Panel A. doi:10.1371/journal.pone.0014697.g006 stimulated transcripts including IFI5, IFI6 and IFNAR1 that are differentially expressed between responders and non-responders. Recently IL28B genotype has been found to be a strong predictor of response to therapy and spontaneous clearance of HCV [51,52,53,54,55]. A recent report indicates that low interferon induced gene expression in liver biopsies is a stronger predictor of treatment response than IL28B genotype in patients with hepatitis C [56]. The approach we describe may be useful in developing future panels of gene expression biomarkers for patients with chronic hepatitis C or other liver disorders that predict clinical outcomes such as the risk for hepatocellular carcinoma.
We also identified novel unannotated regions showing increased transcription in both HCV cirrhotic and mild hepatitis C as compared to control liver including multiple RNA transcripts on chromosome 14 and an unannotated RNA transcript on chromosome 9. It is unknown at this time whether these transcripts represent protein coding RNAs or non-coding regulatory RNAs. The use of a sequence analysis tool (BESTORF, www.softberry.com) to predict potential coding fragments in these unannotated sequences, did not identify any open reading frames greater than 40 amino acids. This finding suggests that these transcripts may represent non-coding RNAs. In addition to the presence of open reading frames (ORF), both expression level and cross-species sequence homology are important indicators of the protein coding or non-coding nature of RNA transcripts [57]. Based on the low expression level of the novel transcript found in the chromosome 14c region, it's low sequence homology to other species, and our inability to identify a poly(A)+ tail in this transcript, it may indeed represent a non-coding RNA. Although the other two RNA transcripts on chromosome 14 (14a and 14b) and one on chromosome 9 were expressed at a higher level, their low cross species homology suggests they are also non-coding RNAs. These unannotated expressed regions were evaluated for possible miRNAs using the informatics program MIREVAL [58]. Although this analysis identified multiple possible miRNA precursor hairpin structures in the chromosome 14 transcript, none of these regions encoded known miRNAs in the miRBase database [59]. Our evidence that these transcriptionally active regions represent novel Pol II transcripts is supported by transcription at these same chromosomal coordinates in published ENCODE analysis of poly(A)+ RNA from human cell lines [18]. These uncharacterized Pol II transcripts may represent genes that are part of the host antiviral defense or those required for specific steps in replication or release of mature virions. We are currently further defining these RNAs and testing their biological role in an HCV cell culture system.
We conclude that analyzing 59 capped RNAs in eukaryotic cells and biospecimens aids the definition of the complete Pol II transcriptome and increases the sensitivity of identifying differentially expressed Pol II genes in physiological and pathophysiological states. This approach should also prove useful in identifying subsets of mRNAs that are regulated post-transcriptionally by changes in pre-mRNA splicing or 39 polyadenylation states [33,60,61,62]. Analyzing differentially selected RNAs, such as 59 capped and poly(A)+ RNA, with next generation RNA sequencing technologies should be very useful in fully defining the Pol II transcriptome and identifying previously undefined Pol II transcripts that are differentially expressed in disease states. In addition, such novel transcripts may prove useful as biomarkers and may provide insight into the role of ncRNAs in the development and progression of specific diseases.

RNA isolation
Total RNA was prepared using TRIzol (Invitrogen). A single sample from one hepatitis C cirrhotic and one control liver specimen were used for the initial ENCODE tiling array analysis. A total of seven hepatitis C cirrhotic liver explants, ten control liver specimens and seven percutaneous biopsies showing mild hepatitis C with no fibrosis were used for qPCR analysis. RNA species with 59 m 7 GpppN caps were purified using a recombinant GST fusion high-affinity variant of eIF4E (eIF4E K119A ) which binds m 7 Gpp with a tenfold higher affinity as compared to wild-type eIF4E [23,63]. The 59 capped RNAs were purified as previously described using GST-eIF4E K119A recombinant protein bound to glutathione-agarose beads in microfuge tubes by batch purification [20,64]. The efficiency of this purification is 70% as compared to 30% when wild-type eIF-4E is used in such purifications (manuscript in preparation). The quantity of 59 capped RNA was measured by NanoDrop analysis and its integrity confirmed with an Agilent 2100 Bioanalyzer. Total RNA from the same liver explants was used as a starting material to purify 59 capped and poly(A)+ RNA. The poly(A)+ RNA was purified with oligo(dT) beads (OligotexH, Qiagen) as previously described [20]. The quantity and integrity of poly(A)+ RNA was determined by the same measures used for 59 capped RNA.

Human liver pathological specimens
Liver explant pathological specimens from patients undergoing liver transplantation for chronic hepatitis C with cirrhosis (n = 7) and unused donor (control) liver tissue (n = 10) were collected with IRB approval. HCV cirrhotic liver samples were from both female and male patients as were the donor liver specimens. Percutaneous liver biopsy specimens from patients with chronic hepatitis C found to have mild inflammation and no fibrosis (mild HCV; Metavir grade 1, stage 0) (n = 7) were obtained with IRB approval. All HCV liver samples were flash frozen in liquid nitrogen generally within 5 to 10 minutes after biopsy or organ removal.

ENCODE tiling arrays
Four experimental replicate samples of 59 capped RNA isolated from hepatitis C cirrhotic liver and control liver were used to produce cDNA using a commercially available kit (Just cDNA Double-Stranded cDNA Synthesis Kit, Stratagene, La Jolla, CA). Random hexamers were used to prime first strand cDNA synthesis from both 59 capped and poly(A)+ RNA samples. The cDNA product was labeled using an Agilent Genomic DNA Labeling Kit PLUS by incorporating fluorescently labeled nucleotides (Cy3-dUTP or Cy5-dUTP) using the exo-Klenow fragment. Following termination of the labeling reaction, fluorescently labeled cDNA probes were purified by isopropanol precipitation. Precipitated pellets were dried and then rehydrated in distilled water. The concentration of purified oligonucleotides was determined using a NanoDrop ND-1000 spectrophotometer. A fraction of the labeled DNA (100 ng) was assayed with an Agilent BioAnalyzer to validate the size distribution of the labeled cDNA probes.
Fluorescently labeled cDNA probes were heat denatured after being combined with cot-1 DNA, Agilent aCGH blocking agent and Agilent Hi-RPM hybridization solution. Microarray hybridizations were performed using Agilent SureHyb chambers incubated at 65uC for 40 hours with a rotational speed of 20 rpm. Following incubation, the microarray slide was washed for 5 minutes in aCGH/ChIP-on-chip Wash Buffer 1 (0.5X SSPE, 0.005% N-lauroylsarcosine; room temperature) and 5 minutes in a CGH/ChIP-on-chip Wash Buffer 2 (0.1X SSPE, 0.005% Nlauroylsarcosine; 31uC). Microarray slides (Agilent ID# 014792) were briefly dipped in a solution of acetonitrile and dried. Two microarray slides each were used for labeled cDNA generated from 59 capped and poly(A)+ RNA from two experimental replicate samples (HCV cirrhotic and control liver). These slides were then stripped and used again with the other two experimental replicate samples.
Microarray slides were scanned in an Agilent Technologies G2505B Microarray Scanner at 5 mm resolution for the simultaneous detection of Cy-3 and Cy-5 signals. Data captured from the scanned microarray image was saved as a TIFF image file and loaded into Agilent Feature Extraction Software version 10.1.1.1. The software automatically positions a grid and finds the centroid positions of each feature on the microarray. This information was used to perform calculations that include feature intensities, background measurements and statistical analyses. Data generated by the software was recorded as tab-delimited text files which were processed using the TiMAT2 open-source software package (http://timat2.sourceforge.net) and results were visualized and our graphics produced using the Integrated Genome Browser (IGB) [65].
Bioinformatic Analysis Data quality. The quality of data was assessed by calculating all pair Pearson correlation coefficients for each set of biological replicas using the unadjusted raw median intensity from the Agilent scan files (R ' 2 * 100: 59  Whole Genome Static Maps. Scaled static maps were made for comparison in IGB for each of the four datasets (59 capped control, 59 capped hepatitis C cirrhotic, poly(A)+ control, and poly(A)+ hepatitis C cirrhotic liver) using the TiMAT2 analysis package. Agilent's ENCODE probe sequences were remapped to the NCBI 36.1 human genome build. For each sample, the four biological replica raw median probe intensities were quantile normalized and median scaled to 50 [66]. For each probe, an average was calculated for the replicas, divided by 50, and log2 transformed.
Whole Genome Dynamic Difference Maps. To identify regions of change between the different RNA samples (59 capped HCV cirrhotic compared to 59 capped control and poly(A)+ HCV cirrhotic compared to poly(A)+ control liver), a sliding window approach was taken to minimize noise using the TiMAT2 package. The four test (HCV cirrhotic) and four control remapped raw median probe intensities were quantile-normalized and median scaled to 50. Probe level summaries were calculated by taking the log2 ratio between the mean treatment and mean control. Window level summaries were calculated by identifying 260 bp windows that contain 2 or more probes. These windows were scored by first calculating all the relative difference pairs between the treatment and the control replica probes, and second, by calculating the pseudo median of these relative difference pairs. For display purposes, the pseudo median relative difference scores were converted to log2 (ratios). To estimate window level FDRs, a null distribution of random label permutation pseudo median scored windows was created. The FDR estimation associated each real pseudo median window score was calculated by dividing the number of null distribution windows that met or exceeded the score (false positives) by the number of real windows that exceeded the score (true positives and false positives). For display purposes, the FDRs were -10Log10(FDR) transformed. This FDR estimation was used to score both enriched and reduced windows. Regions enriched (or reduced) in the HCV cirrhotic compared to control liver were created by joining overlapping and adjacent windows (max gap 200 bp) with an FDR of $13 (-10Log10(0.05)). The majority of such regions correspond with known annotation. To identify potentially novel transcribed regions, the window arrays were first filtered against those that intersected any known exonic sequence, using a known gene set created by combining UCSC's Known Genes and the Ensembl gene database [67,68].
Gene-Centric Analysis. A near identical approach was used to identify genes differentially transcribed in the HCV cirrhotic vs. control datasets. Instead of a sliding window, normalized probe intensities falling within the exons of each gene model were compared using the pseudo median and random label statistics. Those genes with an FDR $13 and a log2 ratio .0.65 were considered differentially transcribed.
Bayesian Analysis. Differential gene expression for wellannotated genes was determined by the regularized t-test, which uses a Bayesian procedure [69]. Briefly, the expression level of each gene was assumed to be from a normal distribution with m and s 2 . Using a conjugate prior, the mean of the posterior (MP) estimate of m is the sample mean. The MP estimate of s 2 is , where n is the sample size, s 2 is the sample variance, n 0 is the degrees of freedom of the prior (a value of 10 is used in the analysis), s 2 0 is the mean of sample variances of genes in the neighborhood of the gene under consideration. The neighborhood was the 50 genes with sample means immediately above and below the sample mean of the gene under consideration; that is, the neighborhood consists of the 101 genes centered on the gene. After the MP estimates of m and s 2 are obtained, the t-test of unequal variances was used to calculate a Pvalue of differential expression.
Real-time PCR (qPCR). Ten control, seven hepatitis C cirrhotic liver and seven mild HCV (no fibrosis) specimens were used for further analysis of differential gene expression in hepatitis C infected liver tissue. cDNA was prepared from these pathological specimens and assayed for transcript levels of selected genes to determine if the same changes in gene expression observed in the ENCODE array analysis of one control and disease sample were observed in multiple patients with hepatitis C as compared to controls. Total RNA from control, hepatitis C cirrhotic and mild HCV liver was extracted using TRIzol followed by an RNA cleanup procedure using a RNeasy Mini kit (Qiagen, Valencia, CA). RNA was treated with DNase I (Invitrogen, Carlsbad, CA) to remove genomic DNA. First-strand cDNA was synthesized using Moloney Murine Leukemia Virus reverse transcriptase (SuperScript III; Invitrogen) with 20 ng/ml of RNA at 55uC (60 min) with random hexamers or oligo(dT) primers. Each PCR reaction was carried out in a 96-well optical plate (Roche Applied Science) in a 20 ml reaction buffer containing LightCycler 480 Probes Master Mix (100 mM Tris-HCl, 100 mM KCl, 400 mM of each dNTP (with dUTP instead of dTTP), 64 mM MgCl 2 , FastStart Taq DNA Polymerase, 0.3 mM of each primer, 0.1 mM hydrolysis probe and approximately 50 ng of cDNA (done in triplicate). Triplicate incubations without template were used as negative controls. Thermal cycling was done in a Roche LightCycler 480 System (Roche Applied Science). The qPCR thermo cycling was 95uC for 5 min, 45 cycles at 95uC for 10 sec, 59uC for 30 sec and 72uC for 1 sec. The relative quantity of each RNA transcript was calculated with the comparative Ct (cycling threshold) method using the formula 2 DCt . DCt represents the difference between target gene expression in control samples and target gene expression in HCV samples. A reference gene (bactin, ACTB) was used as the control and statistical significance was evaluated using the Student's T-test. RACE Analysis. To define RNA transcript(s) structure from the differentially expressed unannotated regions on chromosomes 9 and 14, 59 and 39 rapid amplification of cDNA ends (RACE) were performed using cDNA made from total RNA and oligo (dT) primer or 59 capped RNA and random primers. We designed gene specific primers on the minus and plus strand of one region on chromosome 9 and three different regions on chromosome 14 (Chr14a, Chr14b, and Chr14c) that were found to be differentially expressed in HCV cirrhotic liver by ENCODE tiling array analysis. To verify the chromosome location of each RACE PCR product, each PCR product was gel purified and cloned using a TA cloning vector. Cloned products were then sequenced with an Applied Biosystems 3130xl Genetic Analyzer. Figure S1 Differential expression of CTGF in hepatitis C (HCV) cirrhotic as compared to control liver. Panel A. Expression of Connective tissue growth factor (CTGF) as measured by signal intensity on ENCODE tiling arrays is displayed using IGB. The data are displayed as in Figure 2. Panel B. Real-time PCR (qPCR) was performed as described in Methods. Triplicate samples from seven HCV cirrhotic, seven mild HCV (no fibrosis) and ten control livers were analyzed. HCV cirrhotic 1 and Control 1 refer to original samples used for the ENCODE tiling array analysis. The mean + SEM fold change for all specimens analyzed is shown. Found at: doi:10.1371/journal.pone.0014697.s001 (0.44 MB TIF) Figure S2 Differential expression of MET in hepatitis C (HCV) cirrhotic as compared to control liver. Panel A. MET (mesenchymal-epithelial transition factor) is a proto-oncogene that encodes the tyrosine kinase MET and is also known as c-Met or hepatocyte growth factor receptor (HGFR). Expression of MET as measured by signal intensity on ENCODE tiling arrays is displayed using IGB. The data are displayed as in Figure 2. FDRs are depicted as negative because this gene showed less expression in hepatitis C cirrhotic as compared to control liver. Panel B. qPCR was performed as described in Methods. Triplicate samples from seven HCV cirrhotic, six mild HCV (no fibrosis), and ten control livers were analyzed. HCV cirrhotic 1 and Control 1 refer to original samples used for the ENCODE tiling array analysis. The mean + SEM fold change for all specimens analyzed is shown. Note that due to limited quantities of cDNA from mild HCV percutaneous liver biopsy specimens, duplicates were performed for four biospecimens and triplicates for two (note SEM bars for assays done in triplicate). Expression of the unannotated region identified by signal intensity on ENCODE tiling arrays is displayed using IGB. The data are displayed as in Figure 5. Panel B. Liver specimens from seven HCV cirrhotic and seven control livers were analyzed by qPCR in triplicate. HCV cirrhotic 1 and Control 1 refer to original samples used for the ENCODE tiling array analysis. The mean + SEM fold change for all specimens analyzed are shown for the qPCR1 primer set. Results for the second primer set (qPCR2) also did not show a significant difference between HCV cirrhotic and control specimens (not shown). Found at: doi:10.1371/journal.pone.0014697.s004 (1.49 MB TIF) Figure S5 Differentially expressed unannotated genomic regions in HCV cirrhotic liver compared with ENCODE data from human cell lines. The data from high density tiling array analysis of GM06690 cells (nontumorigenic B lymphocytes), HeLa cells, and HL60 (human promyelocytic leukemia, predominantly neutrophilic promyelocyte precursors) cells was loaded into IGB and aligned with the ENCODE tiling array data that we obtained in this study. The signal intensity on the ENCODE tiling arrays are displayed in IGB as in Figure 5. The aligned data provide evidence that the changes in RNA signals observed in HCV cirrhotic liver as compared to control liver in the unannotated region of chromosome 14, 9, and 21 were also observed in the ENCODE array analysis of GM06690, HL60, and HeLa cells (http://genome.ucsc.edu/ENCODE/pilot.html). The strongest signals were observed in the GM06690 cells suggesting that at least some of the signal in this region observed in HCV cirrhotic liver was due to lymphoid cells that home to and infiltrate the liver during chronic hepatitis C. Table S1 Upregulated genes in HCV cirrhotic liver, identified only by analyzing 59 capped RNA. RNA transcripts were isolated from hepatitis C infected and control liver as described in Methods. cDNA was prepared using random hexamers and probes prepared as described in Methods. RNA transcript expression was measured by averaging fluorescent signal intensity on Agilent ENCODE arrays for each sample. Only annotated genes with .1.5 fold differences and Bonferoni corrected p-values ,0.05 between hepatitis C infected and control liver are listed. Differentially expressed genes are categorized by function. Mean signal intensity, fold change, and p-values for each gene as determined by analyzing poly(A)+ RNA is included for comparison. Genes that have been previously documented to have increased expression in HCV infected liver are highlighted marked with*. Found at: doi:10.1371/journal.pone.0014697.s007 (0.04 MB DOCX)