Genome-Wide Transcriptional Profiling Reveals MicroRNA-Correlated Genes and Biological Processes in Human Lymphoblastoid Cell Lines

Background Expression level of many genes shows abundant natural variation in human populations. The variations in gene expression are believed to contribute to phenotypic differences. Emerging evidence has shown that microRNAs (miRNAs) are one of the key regulators of gene expression. However, past studies have focused on the miRNA target genes and used loss- or gain-of-function approach that may not reflect natural association between miRNA and mRNAs. Methodology/Principal Findings To examine miRNA regulatory effect on global gene expression under endogenous condition, we performed pair-wise correlation coefficient analysis on expression levels of 366 miRNAs and 14,174 messenger RNAs (mRNAs) in 90 immortalized lymphoblastoid cell lines, and observed significant correlations between the two species of RNA transcripts. We identified a total of 7,207 significantly correlated miRNA-mRNA pairs (false discovery rate q<0.01). Of those, 4,085 pairs showed positive correlations while 3,122 pairs showed negative correlations. Gene ontology analyses on the miRNA-correlated genes revealed significant enrichments in several biological processes related to cell cycle, cell communication and signal transduction. Individually, each of three miRNAs (miR-331, -98 and -33b) demonstrated significant correlation with the genes in cell cycle-related biological processes, which is consistent with important role of miRNAs in cell cycle regulation. Conclusions/Significance This study demonstrates feasibility of using naturally expressed transcript profiles to identify endogenous correlation between miRNA and miRNA. By applying this genome-wide approach, we have identified thousands of miRNA-correlated genes and revealed potential role of miRNAs in several important cellular functions. The study results along with accompanying data sets will provide a wealth of high-throughput data to further evaluate the miRNA-regulated genes and eventually in phenotypic variations of human populations.


Introduction
Expression level of many mRNA genes shows abundant natural variation in human populations. The quantitative variations in mRNA expression are thought to contribute to phenotypic differences between individuals. Several molecular mechanisms have been identified that control gene expression. In addition to known transcription factors that bind to specific regulatory DNA sequences [1,2] and extensively studied genetic polymorphisms that determine transcription level via cisor trans-effects [3][4][5][6][7][8], newly discovered miRNAs have been proven to be a major player in posttranscriptional regulation of gene expression [1,2,9,10]. The miRNAs were first identified to play a role in developmental timing of Caenorhabditis elegans in the early 1990s [11,12]. Subsequent studies have shown that cellular factors necessary for miRNA biogenesis and many miRNAs are conserved in many organisms, suggesting the importance of miRNAs during developmental processes and evolutions [13][14][15][16][17].
miRNAs are a novel class of non-coding small RNAs which have been recognized as global regulators of gene expression that control the key cellular processes such as growth, development and apoptosis [9,10]. A single miRNA can potentially regulate several hundreds of mRNAs forming a complex regulatory network that can act in a flexible manner for precise and rapid effects on protein translation and gene expression. Majority of the miRNAs are expressed in a cell-or tissue-specific manner and may contribute to the establishment and/or maintenance of cellular and/or tissue identity. It is estimated that several thousand human genes, up to about one-third of the mRNA transcriptome, are potential targets for regulation by miRNAs encoded in the genome [18]. The regulatory process occurs posttranscriptionally and involves miRNA interaction with a target site in the mRNA that has partial or complete complementarity to the miRNA. The regulatory effect of miRNAs on gene expression is a complex process involving both translational repression and accelerated mRNA turnover, each of which appears to occur by multiple mechanisms. Moreover, certain miRNAs are also capable of activating translation [19,20]. Hence, miRNAs are related to diverse cellular processes and regarded as important components of the gene regulatory network.
Importance of an individual miRNA is reflected in the diseases that may arise upon the loss, mutation or dysfunction of specific miRNAs [21][22][23]. One study reported mutations in 5 of 42 sequenced miRNAs in 11 of 75 patients with chronic lymphocytic leukemia. Although the majority of these mutations were somatic, at least one was germline [23]. Another study showed that upregulation of several miRNA genes was correlated with loss of their target gene transcript (KIT) in papillary thyroid carcinoma. In 5 of 10 such cases, this down expression was associated with germline single-nucleotide changes in the two recognition sequences in KIT for these miRNAs [22]. Recently, a series of papers presented conceptually related ideas linking the genetic variations and alterations of biogenesis and function of miRNAs to the increased risk of developing sixteen major human diseases. Significant role of miRNAs in the pathogenesis of many major human disorders has been proposed as part of disease phenocode concept [24][25][26]. These results suggest that germline changes in miRNAs and their target genes may have a profound effect not only on disease progression but also an individual's risk of developing disease.
Current studies, however, have focused primarily on miRNA role as posttranscriptional regulators of target mRNAs or at a much higher level on their cell biological processes and organismal roles [9,10]. Loss-or gain-of-function studies often analyzed effects on mRNAs by expressing or suppressing specific miRNA in cells.
As these experiments create non-physiological levels of miRNAs that may affect target mRNAs abnormally, accurate evaluation of the miRNA effects may require normal range of variations in miRNA expression under an endogenous condition. Additionally, because miRNAs can interact with their target genes directly and influence expressions of many other genes indirectly, the miRNAs may demonstrate correlations with their target genes as well as non-target genes. Therefore, merely measuring target gene expression may not be sufficient to gain understanding miRNA regulatory effects. A complimentary approach is to identify downstream genes that are tightly correlated with fluctuation of miRNA expression. Liu et al [27] has reported significant miRNA correlations with target genes as well as non-target genes by performing expression profiling analysis on 12 brain tumor biopsies. Subsequent experimental validations demonstrated a directional causal relationship from miRNAs to mRNAs. However, the small sample size and potential mutations in the samples restrained statistical power to detect weak correlations.
To fully examine the miRNA-mRNA correlations at whole genome scale, we measured both miRNA and mRNA transcriptional profiles in 90 human Epstein-Bar virus transformed lymphoblastoid cell lines. We performed pair-wise correlation coefficient analysis and identified strong correlations between the endogenous variations in the miRNA and mRNA expression. Gene Ontology (GO) analysis identified over-representation of these miRNA-correlated genes in several biological processes.
These high-throughput expression data provides a valuable resource to examine global effects of miRNAs on gene expression and hence on complex traits.

Endogenous correlations between miRNA and mRNA expression
We performed pair-wise correlation coefficient analysis to evaluate potential correlations between 366 miRNA and 14,174 mRNA expression levels. When false discovery rate q value (qFDR),0.01 (approximately p value,0.00076), we detected significant correlation in 7,207 miRNA-mRNA pairs (Table S1), which were involved in 2,448 (17.27%) of the 14,174 mRNA probes and 90 (24.59%) of the 366 miRNAs. Of the 7,207 pairs, 4,085 and 3,122 pairs showed positive and negative correlations, respectively ( Table 1, also see Table S2 for all 366*14,174 correlation coefficient r values). The most frequently involved miRNA was miR-363, which was correlated with 672 mRNAs. Cumulative frequency of these correlated genes for each of 366 miRNAs is shown in Figure 1A. Significance level of each mRNA probe for its correlation with miR-363 is demonstrated in Figure 1B.
When examining each of 7,207 significant miRNA-mRNA pairs individually, we found that positive correlations dominated highly correlated pairs. The positive correlations accounted for top 110 pairs (ranked based on qFDR values). The correlation between miR-10a and HOXB4 was the most significant with a positive correlation qFDR = 2.21610 219 . The most significant negative correlation was between miR-98 and SERBP1 (qFDR = 3.97610 27 ), which was ranked 111 th of the most significant pairs. Table 2 lists top 20 most significant positive and top 20 negative correlation pairs.
To confirm and validate the Illumina BeadArray data, we tested 6 miRNAs (miR-10a, -20b, -181b, -181c, -34b, -372) and 7 mRNA genes (GRK5, KIF3B, ADD3, HOXB4, SERBP1, ST7 and ZNF532) using TaqMan-based quantitative RT-PCR in each of the 90 lymphoblastoid cell lines. The 6 miRNAs were chosen because they demonstrated various degrees of correlations with mRNAs and were in the same multiplex RT pool (human pool 1) of TaqMan miRNA assays. The 7 mRNA genes were selected based on various levels of correlations with the selected miRNAs. Four of the six miRNAs (miR-10a, -20b, -181b, -181c) and all 7 mRNA genes had a Ct (cycle threshold) value#35 and so were used in our final analysis. The results showed high level of concordance between the BeadArray and quantitative RT-PCR in 26 of the 28 comparisons ( Table 3). Two miRNA-mRNA pairs (miR-20b and ST7, and miR-181c and ST7) gave poor reproducibility, and this may have resulted from the use of the two separate assays in targeting different isoforms (a and b) of the gene ST7.

miRNA-correlated genes and biological processes
To functionally classify miRNA-correlated genes, we used 11,417 known genes (14,174 mRNA probes) as a reference set and applied GOMiner (http://discover.nci.nih.gov/gominer/) for enrichment analysis of selected gene sets. We first evaluated GO classification for each miRNA-correlated gene set. To correct for multiple comparisons, we performed 1000 randomization analyses. For each of these gene sets, we observed various degrees of GO term enrichments. For example, eight of the top 20 miRNAs in Table 1 had at least one GO term that was significantly overrepresented (randomization-corrected FDR,0.01). The most striking findings were from gene sets that were correlated with miR-331, miR-98 and miR-33b. For miR-331-correlated genes (269 genes from 284 probes), we detected significant over-representa-tion for the GO terms of DNA replication (p = 2.65610 223 , 8.67 fold enrichment), DNA metabolic process (p = 2.35610 222 , 4.15 fold enrichment) and cell cycle (p = 1.94610 219 , 3.66 fold enrichment). Further analysis demonstrated that these enrichments were exclusively derived from negatively correlated genes. While we did not see any significant GO term in positively correlated genes, we observed significant enrichments in 58 GO terms for 191 negatively correlated genes (200 probes). Again, the most marked GO term included DNA replication (p = 9.21610 228 , 11.59 fold enrichment), cell cycle (p = 7.99610 227 , 4.89 fold enrichment) and DNA metabolic process (p = 1.09610 226 , 5.34 fold enrichment) ( Table 4).
For the gene sets that were correlated with miR-98 (239 genes from 348 probes) and miR-33b (124 genes from 126 probes), we observed similar GO term enrichments as the miR-331 correlated genes. This was particularly true for the GO terms related to cell cycle. When considering both types of the correlated genes separately, however, we found the cell cycle-related GO term enrichments only in the miR-98 negatively correlated (153 genes from 159 probes) and in the miR-33b positively correlated gene sets (84 genes from 85 probes) ( Table 4).
We then evaluated overall influence of miRNA expression on the GO categories. For a total of 2,248 miRNA-correlated genes (from 2,448 mRNA probes), we detected significant enrichments on several GO terms, in particular those pertinent to cell cycle with 72% (8 of 11 significantly enriched terms) relevant. Other significant GO terms included those related to cell communication, signal transduction and response to stimulus. To see if these cell cycle-related enrichments were driven by the miR-331, miR-98 and miR-33b, we excluded genes that were correlated with these 3 miRNAs and re-evaluated the GO term distribution. We found that cell cycle-related terms were no longer over-represented, but the two terms (cell communication and signal transduction) still remained enriched. Interestingly, further analysis identified that only positively correlated gene set contributed to the enrichments with p = 6.85610 29 for cell communication and p = 1.05610 27 for signal transduction ( Table 4). Other enriched terms included immune system process (p = 6.30610 27 ) and multicellular organismal process (p = 5.37610 26 ). Table S3 provides these significant GO terms with randomization-corrected FDR,0.01 in detail.

Direct vs. indirect miRNA-mRNA correlations
To test if miRNA-correlated mRNA genes are direct miRNA targets, we downloaded the predicted miRNA targets from TargetScan5.1 (http://www.targetscan.org) and compared them  with the miRNA-correlated genes. Because miRNA annotation file was based miRBase v9.1, some of miRNA names did not match the new version of TargetScan such as missing the -3p or -5p. Therefore, we limited our analysis to these miRNAs with name or sequence match in the TargetScan 5.1. We also limited the analysis to these genes with a reference sequence accession number. Before statistical analysis, we further filtered out the miRNAs that had less than 10 correlated mRNAs. Finally, 31 miRNAs were left for target prediction. We then compared each list of miRNA-correlated gene set to the list of predicted miRNA target genes. For the 31 miRNAs, we found 8 miRNAs whose targets were significantly enriched among their correlated gene sets (p,0.05 when included both conserved and non-conserved targets). The target gene set of miR-181a remained significant after multiple testing correction (FDR = 0.048).

Physical proximity and expression correlations
To estimate the effect of miRNA-mRNA proximity on their expression correlation, we extracted all miRNA-mRNA pairs that mapped on the same chromosomes and had correlation qFDR,0.01. We plotted r value of each correlation against distance between the miRNA and mRNA ( Figure 2). For positive correlations, we observed clear trend of r value decrease when the distance increased. Specifically, the highest positive correlations were observed when miRNA and its corresponding mRNA was physically close each other on a chromosome. The highly positive correlations gradually decreased to baseline (at ,$2 Mb). For negative correlations, however, we observed constant r values from 0.4 Mb to over 140 Mb. We did not observe any correlation when the distance was ,0.4 Mb, where 26 positive correlations existed.

Discussion
miRNAs play an important role in regulation of gene expression. In this study, we examined genome-wide expression profiling in normal lymphoblastoid cell lines under routine culture condition and identified strong correlations between miRNA and mRNA expressions. Although complex gene-gene interactions may greatly diminish the power to identify significant miRNA effects, we were able to detect a variety of biological processes that may indicate function of those miRNAs. These results demonstrate that genome-wide transcriptional profiling analysis was able to detect endogenous correlations between miRNA and mRNA and that miRNA regulatory effect was discernable under natural culture condition. miRNAs have been shown to target transcripts that encode proteins involved in cell cycle progression and cellular proliferation. Some of them display defective expression patterns in human tumors with the consequent alteration of target oncogene or tumor suppressor genes. These miRNAs modulate the major proliferation pathways through direct interaction with critical regulators such as MYC, RAS, PI3K/PTEN or ABL, as well as members of the retinoblastoma pathway, Cyclin-CDK complexes or cell cycle inhibitors of the INK4 or Cip/Kip families [28,29]. It is postulated that by regulating an entire cellular program through the cooperative repression of target genes, miRNAs may serve as buffers to limit the accumulation of many gene products that impact cell cycle progression under a variety of contexts [28].
Interestingly, the miR-98 in this study is one of miRNAs that show significant association with cell cycle-related genes. The miR-98 is a member of let-7 family which plays a critical role in cell cycle control with respect to differentiation and tumorigenesis. The let-7 family is a master regulator of cell proliferation pathways by regulating the expression of the RAS as well as MYC oncogenes [30][31][32]. Over-expression of let-7 miRs alters cell cycle progression and reduces cell division in lung cancer cells [30] and causes cell cycle arrest by directly regulating the gene Cdc34 in human fibroblasts [33]. Clearly, the let-7 is a negative regulator of cell cycle process, which is consistent with our observation that miR-98 is negatively correlated with cell cycle genes in this study. Additionally, we also noticed a correlation of other let-7 members (especially let-7i and let-7g) with cell cycle-related genes (Table S1) in the lymphoblastoid cell lines, suggesting that let-7 family members do regulate cell cycle not only in fibroblasts and cancer cells but also in lymphoblastoid cell lines.
Because direct regulation of gene expression by miRNAs, we expect to see enrichment of miRNA-correlated genes among predicted targets. Indeed, we observed significant concordance between miRNA-correlated genes and miRNA predicted target genes in 8 of 31 miRNA sets. In particular, the miR-181a gene set showed significance of concordance even after multiple testing correction. For some reasons, we did not see any evidence of concordance in most miRNA sets. Generally, miRNA is believed to bind 39UTR of a target gene and regulates gene expression at protein level. Therefore, miRNA target itself may not demonstrate noticeable change at mRNA level although some exceptions have been reported [34]. It is especially true when these correlations are examined under endogenous condition and there is limited range of variations in miRNA expression (contrary to extremely high   level of expression by transfection study). Because miRNA inhibitory effect is at posttranscriptional level, the most significant changes could be downstream genes of the miRNA target at transcript level. Furthermore, because background noise and weak correlation between miRNAs and their targets, each tested miRNA had relatively small number of correlated genes when applying FDR,0.01. This also limited statistical power to detect significance.
Although the results from this study need further validation, the importance of the present study is clear. First, the study adopts a whole genome approach and identifies thousands of highly correlated miRNA-mRNA pairs. Second, the study measures expression levels under endogenous condition and without extremely high or low levels of miRNAs caused by transfection approach. Third, the study correlates miRNAs with their targets as well as downstream genes. The downstream genes are closer to the final consequences of miRNA-mRNA interactions. Fourth, the study identifies several biological processes that are associated with variations in miRNA expression. Lastly, the study results are supported by other publications using different materials and methods, demonstrating feasibility of this approach in studying miRNA function. We believe that these results along with supplementary data sets will provide a valuable resource to further investigate the miRNAs and their functions.

Ethics Statement
All subjects provided written informed consent; and the study was approved by the Mayo Clinic IRB.

Cell lines and Cell culture
We collected peripheral bloods from 90 Caucasian men with median age of 68 years old (44-74) and transformed the peripheral blood lymphocytes with Epstein-Bar virus to establish immortal cell lines. We then grew all transformed cell lines in RPMI 1640 media supplemented with 15% fetal bovine serum, and 1% penicillin/streptomycin at 37uC in humidified incubators in an atmosphere of 5% CO 2 . Experimental series were set up by seeding 5-ml cultures in T25 flasks. Each culture was fed with 5 ml of fresh media twice a week until the cell number reached ,10 6 in a T75 flask. The cells were harvested and suspended in 500 ml of RNAlater and stored at 280uC for further processing.

RNA extraction
We extracted total RNA from each cell culture using miRNeasy Mini Kit (QIAGEN) under the manufacturer's guidelines. This protocol could effectively recover both mRNA and miRNA. The integrity of these total RNAs was assessed using an Agilent 2100 Bioanalyzer.

mRNA and miRNA microarrays
We used the Illumina human-6 V2 BeadChip for mRNAs profiling and Illumina microRNA expression profiling panel (based on miRbase release 9.0) for miRNA analysis according to the manufacturer's recommendation (Illumina, Inc., San Diego, CA). 200 ng of RNA from each cell culture was first labeled and then hybridized to each array using standard Illumina protocols. BeadChips (mRNA) or sample array matrices (miRNA) were scanned on an Illumina BeadArray reader. For mRNA, we repeated 30 samples in triplicate, 30 samples in duplicate and 30 singleton samples for a total of 180 expression profiles. For miRNA, we repeated 84 samples in duplicate and 6 samples in quadruplicate for a total of 192 expression profiles. We also arranged each of these replicates in separate arrays to reduce potential batch effect. Before data processing, we used various bioinformatics tools to examine the quality and reproducibility of each expression profiles. Based on principal component analysis, we removed 26 individual miRNA profiles due to substantial shifts away from a main cluster. However, replicates from each of the 26 individuals were still included in the analysis because they were in the main cluster. The expression profiles have been deposited in NCBI's Gene Expression Omnibus (GEO) with accession number GSE14794.

Data processing
We processed 180 mRNA and 166 miRNA profiles which included all 90 subjects. For both mRNA and miRNA data, we first transformed raw data generated from BeadStudio (Illumina, San Diego, CA) using a variance stabilization transformation algorithm [35] and then normalized them using quantile normalization implemented in Bioconductor (www.bioconductor. org). After normalization, the samples with replicates were averaged. As a large fraction of mRNAs and miRNAs were either not expressed or non-detectable, we filtered out probes with median detection p value$0.01 (the p values were automatically reported in BeadStudio). This procedure reduced the number of mRNA probes from 48702 to 14174 and of miRNA probes from 736 to 366 for final data analysis. Among the 366 miRNAs, 273 are in miRBase database (http://microrna.sanger.ac.uk, v9.1), and 93 are potential miRNAs identified in a RAKE analysis [36,37].

Data analysis
For each of the 366 miRNAs, we correlated them with 14,174 mRNA probes in the 90 individuals using Spearman's rank correlation analyses. We assessed statistical significance using q value of false discovery rate (qFDR) based on Storey et al [38] and Bonferroni correction. In this study, we defined the miRNA-mRNA correlation coefficient qFDR,0.01 to be statistically significant. We also reported our findings using Bonferronicorrected p,0.05 (an equivalent to un-corrected p,9.64610 29 , 0.05 divided by 366614,174). These analyses were done using Partek Genomics Suite (Partek Inc, St. Louis, Missouri). To explore whether miRNA affects expression of genes that share a common biological relationship, we searched for over-representation in GO categories from miRNA-correlated genes. We labeled genes with positive correlation (+1) and genes with negative correlation (21). One gene list for each miRNA was submitted to the High-Throughput GoMiner at http://discover.nci.nih.gov/ gominer/. The reference gene list was 11,417 known gene names from the 14,174 mRNA probes. The GO annotation was obtained by matching to the gene names in the UniProt database. We specified 1,000 randomizations for calculating the GO enrichment FDR q-value. The enrichment p-value for each GO category was calculated using Fisher exact test and the q-value was calculated using the distribution of the p-values obtained by randomly resampling from the reference genes [39,40].

Real-Time PCR analyses of mRNAs and miRNAs
For mRNAs, 1.2 ug of total RNAs from each of the 90 subjects was converted to cDNA by High Capacity RNA-cDNA Master Mix (Applied Biosystems, Foster City, CA) in a 40 ul reaction according to the manufacturer's protocol. 1 ul of the cDNA was used for real time quantitative PCR in a total volume of 15 ul containing 16 TaqMan Gene Expression Master Mix and gene specific assay for selected genes which included GRK5 (Hs00992173), KIF3B (Hs01122781_m1), ADD3 (Hs00249895_m1), HOXB4 (Hs00256884_m1), SERPB1 (Hs00854675_gH), ST7 (Hs00251157_m1), ZNF532 (Hs00539543_m1) and GADPH (Applied Biosystems). For miRNAs, 0.4 ug of total RNAs was converted to cDNA using TaqMan MicroRNA Reverse Transcription Kit and multiplex RT primer pool 1. 1.8 ul of 1:10 diluted cDNAs was used for real time quantitative PCR in a total volume of 15 ul containing 16 TaqMan Universal PCR Master Mix and specific assay for selected miRNAs which included miR-10a, miR-20b, miR-181b, miR181c, miR-34b, miR-372 and RUN48 (Applied Biosystems). All PCR assays were run in triplicate, and expression values were averaged. In order to increase the reliability, we excluded the assays with Ct value$35, which included the miRNA assays for miR-34b and miR-372. To calculate the relative expression for each transcript, all real-time PCR data were normalized to GAPDH (mRNA) or RUN48 (miRNA) by DCt method. We also converted the DCt into a value of 20-DCt such that the latter value was positively proportional to the log of copy number and was comparable with log transformed data from microarrays.