Discovery and Validation of Barrett's Esophagus MicroRNA Transcriptome by Next Generation Sequencing

Objective Barrett's esophagus (BE) is transition from squamous to columnar mucosa as a result of gastroesophageal reflux disease (GERD). The role of microRNA during this transition has not been systematically studied. Design For initial screening, total RNA from 5 GERD and 6 BE patients was size fractionated. RNA <70 nucleotides was subjected to SOLiD 3 library preparation and next generation sequencing (NGS). Bioinformatics analysis was performed using R package “DEseq”. A p value<0.05 adjusted for a false discovery rate of 5% was considered significant. NGS-identified miRNA were validated using qRT-PCR in an independent group of 40 GERD and 27 BE patients. MicroRNA expression of human BE tissues was also compared with three BE cell lines. Results NGS detected 19.6 million raw reads per sample. 53.1% of filtered reads mapped to miRBase version 18. NGS analysis followed by qRT-PCR validation found 10 differentially expressed miRNA; several are novel (-708-5p, -944, -224-5p and -3065-5p). Up- or down- regulation predicted by NGS was matched by qRT-PCR in every case. Human BE tissues and BE cell lines showed a high degree of concordance (70–80%) in miRNA expression. Prediction analysis identified targets that mapped to developmental signaling pathways such as TGFβ and Notch and inflammatory pathways such as toll-like receptor signaling and TGFβ. Cluster analysis found similarly regulated (up or down) miRNA to share common targets suggesting coordination between miRNA. Conclusion Using highly sensitive next-generation sequencing, we have performed a comprehensive genome wide analysis of microRNA in BE and GERD patients. Differentially expressed miRNA between BE and GERD have been further validated. Expression of miRNA between BE human tissues and BE cell lines are highly correlated. These miRNA should be studied in biological models to further understand BE development.


Introduction
Chronic gastroesophageal reflux disease (GERD) is an important risk factor for the development of Barrett's esophagus (BE). BE is the dominant pre-malignant lesion for esophageal adenocarcinoma [1]. The prevalence of GERD has increased substantially over the past decade with weekly reflux symptoms increased by ,50% and will significantly impact the future rates of BE [2]. Esophageal adenocarcinoma has already increased by 600% since 1975 [3] and the increasing prevalence of GERD and BE are likely to worsen the rates of esophageal adenocarcinoma raising a significant public health concern. Understanding factors that lead to development of BE in 10-15% of GERD patients may allow for the development of prevention strategies against this cancer by timely detection and intervention. Molecular events underlying the initiation of Barrett's metaplasia are incompletely understood but biological interactions between developmental signaling pathways and morphogenetic factors appear to play key roles [4]. MicroRNA (miRNA) regulate 20-30% of the genome by binding to the mRNA transcripts and promoting their degradation and/or inhibition of translation [5,6]. Since a single miRNA can impact several hundred genes [5,6], miRNA can potentially impact multiple signaling pathways and elicit large effects on a cell's phenotype integral to BE development.
To date, studies have focused on identifying miRNA associated with BE progression [7,8,9,10,11,12,13,14] but miRNA differentially expressed between GERD squamous epithelium and BE columnar epithelium have not been systematically examined. While it is unknown but it is plausible that miRNA could be logical targets to study for causal relationships in BE development. Additionally, miRNA can be targeted by inhibitors and mimetics that opens novel therapeutic possibilities for BE prevention [15]. For the final goal of identifying miRNA that are not simply associated with BE but are causal to the transformation of squamous to columnar mucosa, high-throughput miRNA profiling is an initial necessary step. To characterize the miRNA transcriptome of BE, we used state of the art next generation sequencing (NGS). NGS has several significant advantages over previous methods such as reverse-transcription (RT) PCR arrays and hybridization-based microarrays including high sensitivity towards low abundant transcripts, excellent reproducibility and possibility of discovering previously unknown miRNA [16]. Our aim was to perform one of the first comprehensive investigations into defining the miRNA transcriptome of well-characterized GERD and BE patients and set the platform for further biologic characterization of specific miRNA using cellular, animal and more recently organotypic [17] models. In the study described henceforth, we were able to profile the miRNA expression of GERD and BE patients using rigorous methodology and have identified several novel miRNA such as miR-708-5p, -3065-5p, -944 and -224-5p to be associated with BE that were predicted to regulate important developmental, inflammatory and metabolic pathways.

Ethics Statement
The current study was approved by the Institutional Review Board of the Veterans Affairs Medical Center, Kansas City. All subjects provided written and signed informed consent. All research was conducted in accordance with the principles outlined in the Declaration of Helsinki.

Selection of GERD and BE patients
Patients with GERD and BE were selected from a prospective tissue and serum repository (Clinical Trials.gov # NCT00574327). The Institutional Review Board of the Veterans Affairs Medical Center, Kansas City, Missouri, approved this repository. Patients presenting to the endoscopy unit for evaluation of reflux symptoms or screening/surveillance of BE were invited to participate in the study. After signing informed consent, all patients were required to fill a validated GERD questionnaire [18]. Patients with inability to provide written informed consent, advanced chronic liver disease, severe uncontrolled coagulopathy, and prior history of esophageal or gastric surgery or BE ablation were excluded from the repository.
The patients were defined to have GERD if they answered affirmative to the presence of heartburn and/or regurgitation. After endoscopic examination, GERD patients were further subclassified into those with erosive esophagitis (EE) and those without (Non-erosive reflux disease, NERD). BE was defined as presence of columnar lined esophagus at least 1 cm in length on endoscopy with demonstration of intestinal metaplasia in biopsies. To minimize misclassification, BE patients were biopsied only if they had no evidence of active reflux disease i.e. erosions or ulcers in the Barrett's segment. Only those BE patients that did not have dysplasia were included in the current study to minimize the impact of dysplasia grade on miRNA expression. For the initial high-throughput discovery phase with NGS, only patients with a definitive diagnosis of GERD based on the presence of EE were included. In the validation phase by qRT-PCR, patients with EE as well as NERD were allowed. Research biopsies were obtained as part of a standardized protocol for collection of specimens for the tissue repository. Per protocol, in GERD patients, 2 biopsies were obtained at 1 cm and 5 cm above the gastro-esophageal junction. In BE patients, 2 biopsies were obtained every 2 cm of the BE length. Immediately after procurement, each biopsy specimen was divided into two halves-one half was randomly selected to be fixed in 10% formalin for histopathological evaluation while the other half was placed in RNAlater preservative (Applied Biosystems, Foster City, CA) for miRNA studies.
Histologic review, RNA extraction and quality control 4 mm thick sections were stained by hematoxylin and eosin and reviewed by a single experienced gastrointestinal pathologist according to the revised Vienna classification [19]. Specimens were examined for the presence of intestinal metaplasia characterized by the presence of histologically typical goblet cells. Total RNA was extracted using Trizol as per manufacturer's protocol (Sigma, St. Louis, MO). Total RNA was quantified using a NanoDrop-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE), and quality was assessed on an Agilent 2001 Bioanalyzer (Agilent Technologies, Santa Clara, CA) and only the highest quality samples with RNA integrity number (RIN) of .8 were used for NGS. The RIN value (mean 6 SEM) for the validation cohort (n = 67) was 6.960.7.

Next generation sequencing
Total RNA from GERD (n = 5) and BE (n = 6) patients was size fractionated on Flash-PAGE gels and RNA (,70 nucleotides) was then subjected to SOLiD 3 library preparation [20], (Cofactor Genomics, St Louis, MO) SOLiD 3 sequencing using ligationbased sequencing technology was completed to yield 35 nucleotide reads. Reads with a minimum of 6-nucleotide adaptor sequence with no ambiguous bases and the final trimmed length of at least 15 nucleotides were included for the final alignment analysis. Median-based normalization was done. To determine the final candidate miRNA, the following steps were undertaken.
Step 1: Alignment to reference genome. Read sequences from NGS data were aligned into the latest version (v18) of miRBase, a repository of up-to-date miRNA information of many species including human. Alignment was performed using the bowtie short-read aligner software (version 0.12.7) [21]. Bowtie has been shown to be efficient [22] and has been successfully used in previous studies [23]. A minimum trimmed length of at least 15 nucleotides after removal of the adaptor sequences was used as previously done [11,22]. Reads were regarded to be mature miRNA based on two conditions a) if the entire read sequence mapped within a miRNA hairpin sequence consecutively with a maximum of one mismatch and b) overlapped minimum of 7 bases to the mature miRNA. The hairpin sequence represents the precursor miRNA sequences [24] and is a unique characteristic of miRNA [25]. Reads that overlapped with multiple mature miRNA were not counted [24]. Post alignment, the number of read sequences aligned to each miRNA (read counts) was calculated. After the initial mapping to miRBase, the unmapped reads were mapped to the non-coding RNAs reported in functional RNA database (fRNAdb version 3.4), the human reference genome version 19 downloaded from UCSC genome browser (hg19) and E. coli genome allowing up to 3 mismatches. Reads that were still unmapped were remapped to the miRBase allowing 2-3 mismatches, but these reads were not used in the analysis of differentially expressed miRNA.
Step 2: Normalization of Read Counts. Normalization was done using the DESeq Bioconductor package in R [26] that takes the total number of reads into consideration [27]. This was done prior to the differential expression analysis to control for the variation in the number of read sequences across samples. The normalization method consisted of the following steps: 1) Construct a pseudo-reference by taking geometric mean of all miRNA. That is, the value for i-th miRNA is calculated as geometric mean of i-th miRNA in all samples. 3) Divide the i-th sample's counts by its size-factor to obtain normalized counts.
k' ij~k ij ŝ s j : Step 3: Differential Expression Analysis. After normalized read counts were obtained, a state of the art statistical model for NGS differential expression analysis ''R'' package called DESeq [26] was used. DESeq is based on the negative binomial distribution and outputs fold change and p-values for differential expression. miRNA whose p-values (adjusted for false discovery rate of 5%) ,0.05 were considered to be differentially expressed. The standard R function p.adjust was used to adjust p-values for multiple testing using the Benjamini-Hochberg method [28].

Validation of NGS results by quantitative RT-PCR analysis in independent samples
Total RNA (50 gg) from an additional 40 GERD patients, and 27-BE patients were reverse transcribed using hairpin RT-primers that matched our custom designed low-density qPCR array cards (Applied Biosystems). Quantitative RT-PCR was conducted using our established procedures [29,30]. SDS software (version 2.4, Applied Biosystems) was used to identify threshold cycle (Ct) values for each PCR reaction. Expression of the small nucleolar RNAU6 was used to normalize miRNA expression measurements, and relative fold-changes of miRNA expression values between samples were calculated using the delta-delta Ct method [29]. All samples were compared to a sample from a single patient in order to calculate fold-changes. Each primer set included a minus RT control. Standard t-test was used to test differential expression of miRNA. MicroRNA with a p-value ,0.05 after adjusting for the false discovery rate of 5% were labeled as differentially expressed. The standard R function p.adjust was used to adjust pvalues for multiple testing using the same correction method as in NGS analysis.

miRNA target prediction and pathway analysis
We searched the potential target genes of the miRNA and mapped the signaling pathways related to the target genes. We used multiple prediction programs including microT [31], miRanda [32], miRTarget2 [33], PicTar [34], PITA [35], RNA22 [36], and TargetScan [37]. To minimize the risk of false positives, predictions of each program were filtered by using only those scoring within the top 5%. Genes with strong prediction scores for the same miRNA from at least 2 programs were labeled as potential targets for that miRNA and used in pathway analysis. But for miRNA with no target gene shared by multiple programs, genes with prediction scores that ranked within the top 1% by any program were used as potential target genes for the miRNA. We used EGAN [38] to find KEGG pathways strongly associated with the target genes.

miRNA expression in Barrett's cell lines
After RNA extraction, we compared the ten highest and the lowest expressed miRNA identified by NGS in human BE tissues with three different BE human cell lines, BAR-T, CP-A and CP-C. We performed this analysis to mutually validate the expression of miRNA between human BE epithelium and well established BE human cell lines and to identify appropriate cell lines for future biological experiments of miRNA modulation to understand miRNA function in BE development. BAR-T is a non-neoplastic Barrett's cell line created by hTERT immortalization of human BE cells [39]. CP-A and CP-C are immortalized cell lines created also from human Barrett's biopsies [40,41]. CP-A expresses wild type p53 whereas CP-C hosts p53 LOH and mutations. Both CP-A and CP-C cells have p16 sequence alterations. Spearman's correlation coefficients for miRNA expression between human BE tissues and cell lines were calculated.

Study subjects
The initial NGS cohort consisted of 11 patients, five with GERD (all with EE) and six with BE. All 11 patients were white males with mean ages of 5464 and 6169 years respectively. After initial NGS profiling, the miRNA were validated by qRT-PCR in independent GERD (n = 40) and BE (n = 27) patients. Mean ages were 55613 and 61610 years respectively. All patients were white males and were on acid suppressive therapy. Mean BE length was Prague [42] M563.1C3.161.5. Hiatus hernia was present in 63% of GERD patients versus 95% of BE patients, p,0.05. Mean body mass index BMI was similar in two groups, 3266.6 in GERD versus 3067.7 in BE, p = NS. Among 40 GERD patients, 20 had EE and 20 had NERD, mean ages 50614 and 59611 years respectively, p = NS. All EE patients had Los Angeles classification B or higher grade of esophagitis.

Discovery project and validation
NGS completed on BE patients (n = 6) and GERD (n = 5) patients, yielded an average of 19.3 million raw reads/patient. After removing adapter sequences and filtering out reads too short to be accurately mapped (less than 15 bases), we obtained on average 7.6 million reads/patient sample and 98.5% of them were mapped to either miRBase, non-coding RNAs (fRNAdb), or human reference genome version 19 ( Figure 1). 53% of reads mapped to known miRNA in miRBase 18.0 with either 0 or 1 mismatch (Figure 1). The remaining reads were mapped to the non-coding RNA database excluding miRNA (fRNAdb version 3.4) which accounted for 25.1% of the reads of which rRNA accounted for 12%. Remaining reads were then mapped to the human genome version 19 that accounted for 8.19% of the reads. Remaining reads were also compared to the E. Coli database and a very small fraction 0.013% of reads mapped to that database. All of the unmapped reads were then remapped to the miRbase allowing for two or three mismatches (13.66% of total trimmed reads) leaving ,1% of the reads unmapped ( Figure 1). The majority of trimmed reads that mapped to miRNA were 21-23 nucleotides in length as expected for the miRNA (Figure 2A). Relative distribution of the reads into miRNA and non-miRNA databases according to read length is shown in figure 2B and again, as expected, a majority of miRNA alignment with 0 or 1 mismatch occurred between 21-23 nucleotides in length. Among 1921 known miRNA in miRBase 18, the number of miRNA detected in our samples (non-zero read counts) ranged from 736,1122/patient (919 miRNA/patient on average). The complete list of miRNA with normalized read counts is described in Table S1. Raw NGS expression data will be made available to the investigators upon request.

Identification of differentially expressed miRNA between GERD and BE patients
When we compared the GERD group with the BE group, 18 miRNA were differentially expressed based on DEseq FDR adjusted p value of ,0.05 (Table 1). Of these 18 miRNA, four miRNA (-miRs-4253, -4776-3p, -548n and -675-3p) had reads ,25 in each group and were excluded from the RT-PCR validation step. Of the 14 miRNA included in the qRT-PCR validation and evaluated in the larger cohorts of patient, all but one (miR-551b-3p) were detectable by qRT-PCR. Of the 13 detected miRNA by RT-PCR, ten were significantly different between the two groups ( Table 1). Several of these miRNA are new and not been previously described to be associated with BE, such as miR-708-5p, -3065-5p, -944 and -224-5p. Of the 10 miRNA discovered by NGS and validated by qRT-PCR, three were up-regulated (log 2 fold change 5.9-6.8) and seven were down-regulated (log 2 fold change 2.9-6.2). Additionally, reassuringly, there was 100% consistency in the direction of fold change between NGS and RT-PCR datasets for all validated miRNA. In other words, if a miRNA was found as up-regulated by NGS data, it was also found to be up-regulated by RT-PCR and the same principle was true for down-regulated miRNA. Upon subgroup analysis, none of the evaluated miRNA were differentially expressed between the EE and NERD groups ( Table 2). We also compared the BE group with the EE and the NERD subgroups ( Table 2). Similar miRNA were found to be differentially expressed between the BE versus EE and the BE versus NERD groups. There were minor differences in the degree of fold change between the BE/EE and BE/NERD groups without any statistical significance.

Genes targeted by the identified miRNA
Only those targets that scored in the top 5% of all predictions by at least two different programs or scored in the top 1% by any one program were included. Using these criteria, targets for the differentially expressed miRNA between BE and GERD group were identified (Table S2). These targets belonged to multiple signaling pathways including TGFb, MAPK, Notch, mTOR, WNT, hedgehog and PPAR, several of which regulate embryological development and differentiation. On further analysis, several of the miRNA shared common targets and were similarly up-or down regulated ( Figure 3). For instance, miRs-3065, -149 and -944 shared common targets and were similarly downregulated (Fig. 3A). Note that two of these, miR -3065 and -944 are new and not previously described in association with BE. Similarly, miR -192 and -215 shared common targets and were both upregulated ( Figure 3B). These miRNA-mRNA target analyses suggest a coordinated interplay between several miRNA in regulation of target genes that may play a role in the development of BE and their role in BE genesis need to be further validated.

miRNA expression in Barrett's cell lines
We examined the expression of the ten most over-(miR-192-5p, 103a-5p, 145-5p, -215, -451a, -23b-3p, -21-5p, 23a-3p, 24-3p, 191-5p) and under-expressed (miR-491-3p, -574, -18a, -488-5p, -216a, -548, -520d, -20b, -218, -346) human BE miRNA in three BE cell lines, BAR-T, CP-A and CP-C. The mean number of reads by NGS for the ten most expressed miRNA in human BE specimens was 78178 (range 27,374-240,611). Of these 10 miRNA highly expressed in human BE tissues, eight were expressed in the BAR-T cell line, mean Ct 25.7 (range 18-31) and seven were expressed in CP-A and CP-C cell lines, mean Ct As demonstrated, the unmapped reads were remapped to miRbase after relaxing the criteria to allow 2-3 mismatches leaving only ,1% of the reads unmapped. fRNAdb, functional RNA database version 3.4; 'ambiguous' represents those reads that mapped to multiple different non-coding RNA in the fRNAdb; 'others' includes unclassified ncRNAs in fRNAdb; * these miRNA were not included in the final analysis of differential expression. doi:10.1371/journal.pone.0054240.g001  Fig. 1A shows that majority of trimmed reads were 21-23 nucleotides in length, the same size as miRNA. Fig. 1B shows the distribution of trimmed reads based on their mapping to miRNA, human genome, non-coding RNA (besides miRNA etc) and E coli genome. mm1, mm2 and mm3 represent alignment to miRBase with 0 or 1, 2 and 3 mismatches respectively. Note that the majority of aligned miRNA with 0 or 1 mismatch are distributed around 22

Discussion
MicroRNA can regulate multiple genes and impact multiple cellular processes including cell fate and differentiation [6,43] and likely regulate the development of BE. To the best of our knowledge, this is the first study that has comprehensively examined the GERD and BE miRNA transcriptome using NGS. This study not only confirmed previously known BE associated miRNA shown in small studies, we also discovered new miRNA potentially associated with the initiation and development of BE. To this effect, we have established a list of miRNA up-and down-regulated between well-defined GERD and BE patients from a prospective tissue repository. We did not observe significant differences in fold changes of miRNA when BE patients were compared with the GERD subgroups, EE and NERD. Our findings that majority of differentially expressed miRNA were down-regulated in BE is consistent with the proposed role of miRNA as oncosuppressors and their consequent downregulation in neoplasia [44]. Additionally, the miRNA expression of BE patients correlated well with that of BE cell lines suggesting that these cell lines may be useful to further understand the role of miRNA in BE pathogenesis. Differentially expressed miRNA discovered in this study target genes that map to pathways important in embryological development and differentiation such as TGFb, Notch, WNT, hedgehog; inflammation such as toll-like receptor signaling, TGFb, T cell receptor signaling, chemokine signaling pathway; metabolism and survival such as mTOR; homeostatic signaling such as MAPK and lipid homeostasis such as PPAR (Table S2). Also, similarly up-regulated and downregulated miRNA shared common targets suggesting coordination between miRNA in regulation of BE development. These miRNA should be studied further to elucidate specific miRNA regulated molecular mechanisms that lead to the development of BE in a subset of patients with chronic GERD.
Previous studies by our group and others evaluating miRNA expression in BE have focused on identification of the miRNA associated with the progression of BE to dysplasia and adenocarcinoma [7,8,9,10,11,12,13,14]. These studies have identified several miRNA that are associated with the development of BE neoplasia. However, none of the studies have focused on systemic identification of miRNA associated with the squamous to columnar switch as seen in GERD patients who harbor BE. A few studies evaluated both dysplastic and non-dysplastic patients and compared them with controls using hybridization arrays and found several differentially expressed miRNA such as miR-215, -192, and miR-205 [8,45]. Another study that compared select miRNA between paired squamous and columnar tissues from seven BE patients found miR-215 and -192 to be upregulated and miR-203 and -205 to be downregulated in the columnar epithelium [46]. Our NGS dataset not only confirmed the significantly different expression of miR-215, -192, -203 and -205 between GERD and BE but took a more comprehensive approach to identify several novel miRNA not previously described in BE, such as miR-708, -944, -224-5p, -3065-5p among others. Some of the miRNA identified in the current study have relatively low copy numbers (footnote Table 1) and thus, were uniquely identified by NGS but likely missed by microarrays due to lower sensitivity [16]. However, at this point, the relationship between copy numbers and their biologic relevance is unclear. Since a miRNA can regulate several hundred genes, a miRNA with small copy numbers could still have a significant effect on cellular processes.
We also compared the expression of the ten most over-and under-expressed miRNA in human BE tissues with three BE cell lines, BAR-T, CP-A and CP-C. There was a high degree of concordance between the miRNA expression in human BE tissues and three distinct BE cell lines. 70-80% of the over-expressed human BE miRNA were also expressed in the three BE cell lines (all Ct ,31) and 70% of the least expressed human BE miRNA were not detected in any of the BE cell line (all Ct .40). miR-215 that was highly expressed in human BE tissues was expressed only in the BAR-T cell line, perhaps suggesting BAR-T to be a good cell line for biological experiments of miRNA modulation to further understand the pathways associated with BE pathogenesis.
The role of miRNA in the origin of BE remains under-evaluated but is plausible. Direct evidence for the important role of miRNA in maintenance of columnar epithelia comes from mice with the intestine specific knockout of Dicer [47], an enzyme obligatory for miRNA processing. In these mice, the intestinal epithelium was disorganized with decrease in goblet cells and increased intestinal inflammation. A recent study over-expressed miR-145 in Het-1A and BAR-T cells and showed changes in expression of important BE related genes such as BMP4 and provide rationale for miRNA involvement in BE development [45]. Software based prediction analysis of targets for the differentially expressed miRNA in this study mapped to multiple signaling pathways related to development and inflammation such as TGFb [48], MAPK, Notch, mTOR, WNT, hedgehog, PPAR, Toll like receptor chemokine signaling several of which have been implicated in the origin of BE [49,50]. Interestingly, miRNA-944, a novel miRNA detected in this study regulates HOXB5 (Table S2). HOXB5 is a transcription factor of the homeobox family that has recently been experimentally validated to regulate BE development [51]. Cluster analysis found multiple similarly regulated (up or down) miRNA to share common targets suggesting a coordinated interplay between miRNA in regulation of BE development (Figure 3). The majority (7/10) of validated miRNA in the current study were downregulated in BE compared to GERD suggesting that squamous to columnar phenotype is associated with activation of previously repressed genes. MicroRNA have also been shown to modulate cellular differentiation in other organ systems [52,53]. There are several competing theories for the origin of BE, the two prominent ones being transdifferentiation of the squamous cells [4] versus repopulation of the distal esophagus from the embryonic precursor cells at the squamocolumnar junction [54]. In both of these models, there are regulating factors other than the cell of origin that lead to the metaplastic change of BE in a subset of GERD individuals. Supported by significant differences in miRNA profiles between the GERD and BE population, we propose that the miRNA may regulate development of BE and need to be further evaluated. Previous studies have demonstrated the feasibility of molecular diagnosis of BE by measurement of Trefoil factor 3 expression on cytology specimens [55]. MicroRNA expression appears to be highly discriminative between GERD and BE patients and can similarly be useful for the molecular diagnosis of BE. Whether these miRNA can be also used as molecular markers of cancer progression remains to be seen. The three commonly used methods for high-throughput miRNA analysis are RT-PCR arrays, hybridization-based microarray and NGS. RT-PCR arrays can only detect known miRNA. Hybridization based technologies are limited by issues related to probe design and array background [16]. NGS does not require prior knowledge of small RNA transcripts [16], allows discovery of other non-miRNA small RNA molecules such as Piwi-interacting RNAs [5]. NGS has high sensitivity towards low abundance transcripts and excellent reproducibility. A limitation of NGS [56] is that miRNA copy numbers depend on the method used for RNA library preparation [57]. However, NGS is a highly robust method for comparing relative abundance of miRNA copies across samples since any biases introduced by the preparation method are highly systematic [57]. An important determinant of the usefulness of a high-throughput methodology is its validation by standardized techniques. The validation rate for our NGS data by qRT-PCR was ,70%, significantly higher than a rate of 30-40% reported for miRNA hybridization microarrays [58].
Our study does have limitations but we believe that they do not alter our interpretation. The sample sizes for the discovery phase were relatively small. Sample size calculations for NGS are not well defined and are dictated by cost constraints as practiced in other NGS studies [59,60]. We would like to emphasize that close to three-fourths of the differentially expressed miRNA discovered by NGS were validated by qRT-PCR (adjusted for multiple testing) with direction of fold change matched in every case and attests to the robustness of our procedures for NGS analysis. The average alignment rates of the NGS datasets in this study were somewhat lower. However, this has also been noted in other studies on cervical cancer where both patient specimens and cell lines were sequenced and could be explained on the basis of some cellular damage during the acquisition of clinical specimens [59]. However, the alignment rates were not significantly different between the GERD and BE samples and would not affect the differentially detected miRNA. Barrett's biopsies may contain more stroma than squamous biopsies but still such biopsies are predominantly (,90%) composed of epithelial cells as suggested by previous flow cytometry studies [61]. pH monitoring was not performed to confirm GERD. However, a validated GERD questionnaire was used at the time of recruitment. Moreover, pH monitoring is not practical as part of patient enrollment into a tissue repository and could discourage subject participation with potential for recruitment bias.
In summary, the current study has discovered and validated the miRNA transcriptome of GERD and BE patients by next generation sequencing. The results validated previously described miRNA as well as discovered novel miRNA in BE and provide a comprehensive list of miRNA to be the subject of future molecular research into the pathogenesis of BE using animal and cellular models. The target genes and the pathways being regulated by the identified miRNA need to be further deciphered.

Supporting Information
Table S1 The table lists complete list of miRNA identified by NGS with normalized read counts in BE and GERD groups. (XLSX)

Table S2
The table lists the potential target genes for the differentially expressed miRNA. The miRNAs with * in front of their names had no strong target gene shared by multiple programs and predictions scored in top 1% by any program is shown in the table. The column 'Associated Pathways' lists pathways significantly enriched among potential target genes of a miRNA. The column '# of supporting programs' denotes the number of programs that predicted the genes as potential target of the miRNA. The target genes reported in literatures registered in either miRecords or TarBase are marked as 'Literature' in this column. (DOC)