Dampening Enthusiasm for Circulating MicroRNA in Breast Cancer

Genome-wide platforms for high-throughput profiling of circulating miRNA (oligoarray or miR-Seq) offer enormous promise for agnostic discovery of circulating miRNA biomarkers as a pathway for development in breast cancer detection. By harmonizing data from 15 previous reports, we found widespread inconsistencies across prior studies. Whether this arises from differences in study design, such as sample source or profiling platform, is unclear. As a reproducibility experiment, we generated a genome-wide plasma miRNA dataset using the Illumina oligoarray and compared this to a publically available dataset generated using an identical sample size, substrate and profiling platform. Samples from 20 breast cancer patients, 20 mammography-screened controls, as well as 20 breast cancer patients after surgical resection and 10 female lung or colorectal cancer patients were included. After filtering for miRNAs derived from blood cells, and for low abundance miRNAs (non-detectable in over 10% of samples), a set of 522 plasma miRNAs remained, of which 46 were found to be differentially expressed between breast cancer patients and healthy controls (p<0.05), of which only 3 normalized to baseline levels in post-resection cases and were unique to breast cancer vs. lung or colorectal cancer (miR-708*, miR-92b* and miR-568, none previously reported). We were unable to demonstrate reproducibility by various measures between the two datasets. This finding, along with widespread inconsistencies across prior studies, highlight the need for better understanding of factors influencing circulating miRNA levels as prerequisites to progress in this area of translational research.


Introduction
Breast cancer is the leading cancer in women in the United States, and the third leading cause of cancer deaths among women [1]. While long term survival for localized breast cancer is high (over 95%), five year survival declines sharply with stage (84% for regional involvement and 27% distant spread) [1], underscoring the importance of early detection. Despite widespread use in the US, mammography is not an ideal screening modality [2]. High false positive rates result in unacceptable rates of unnecessary biopsies each year, which in turn, increases health care costs and the anxiety associated with screening. This has led to controversial recommendations to reduced the frequency of screening [3], and many women already perceive mammography as uncomfortable and/or time consuming, influencing their rates of compliance [4,5]. A blood-based alternative would have a major clinical impact for screening, as well as for monitoring response to treatment and long term surveillance.
MicroRNAs (miRNAs) are short non-coding segments of RNA that bind directly to messenger RNA to suppress translation of target genes [6]. The high level of conservation between species in miRNA coding regions indicates a critical biologic role [6]. In cancer, miRNAs have been shown to be deregulated in tissue specific patterns which uniquely classify every type of tumor studied to date [7], and are disproportionately localized to regions of genomic fragility in cancer [8]. An ensemble of miRNAs are known to be deregulated in breast cancer [7,9], with specific miRNAs correlated to breast cancer subtype, prognosis, metastasis [10] and treatment resistance [11]. Functional studies have further demonstrated the mechanisms through which these miRNAs are intimately involved in tumor biology of the breast [12][13][14][15][16]. In the circulation, miRNAs were recently identified at unexpectedly high levels and found to be the most stable nucleic acid in peripheral blood. This exciting discovery immediately spurred a rush to investigate circulating miRNAs as a novel biomarker for minimally invasive early cancer detection [17,18]. Since late 2009, ten studies from 9 independent groups in 5 different countries reported circulating miRNA profiling in breast cancer by screening a handful of selected miRNA by qPCR [19][20][21][22][23][24][25][26][27][28]. Collectively, these selected probe-set studies, profiled 25 candidate miRNAs using this approach, using serum or whole blood samples representing a total of 541 breast cancer cases and 326 healthy controls. The composite was a short list of 16 differential miRNAs, which were significantly altered in the circulation of breast cancer cases (10 up, 6 down). However, only two of these miRNAs, miR-21 and miR-155, both up in cancer, were corroborated by independent groups. Two important limitations to this approach should be noted: 1) probe-set bias, whereby a priori probe selection defines the possible set of final observations -this effect likely explains why only two candidate miRNAs were independently corroborated; 2) normalization -all but three studies selected circulating miR-16 for endogenous normalization; however, miR-16 is predominantly derived from erythrocytes and has been shown to be particularly prone to artificial elevation by hemolysis, as high as 30-fold, which would far exceed any conceivable range of correction utilizing total red blood cell count [29][30][31].
Genome-wide platforms for high-throughput profiling of all circulating miRNA, such as an oligoarray or next generation miRNA sequencing (miR-Seq), allow agnostic/unbiased discovery of putative circulating miRNAs biomarkers as a pathway to development for breast cancer detection. In late 2010, the first such genome-wide data were reported in a pilot study by Zhao et al. [32] comparing plasma miRNA profiles of 20 breast cancer patients and 20 healthy controls on the Illumina oligoarray platform, which provides coverage of 1145 miRNAs. This resulted in identification of a short list of 26 differentially expressed plasma miRNAs (11 up and 15 down, p,0.005 sans multiple testing correction). Notably, no overlap was demonstrated between this set of 26 circulating miRNAs and previous candidate miRNA identified by qPCR-based candidate miRNA studies. Since that time, four additional genome-wide circulating miRNA studies have been reported, using miR-Seq (SOLiD or Solexa), oligoarray (Geniom, 1100 miRNAs) and TaqMan multiplex array (ABI, 446 miRNAs) platforms for agnostic discovery of candidate biomarkers in breast cancer [33][34][35][36]. We therefore sought to determine: 1) the degree of consensus, if any, between these genome-wide studies, and 2) to test the reproducibility of these results. We further designed our experiments to account for some possible deficiencies in current study designs that may account for some of the lack of reproduction. First, we included additional samples to allow the evaluation of any putative biomarker in post-surgical resection breast cancer cases, where the biomarker should regress to baseline, and cases of other cancers in females (colorectal and lung), to allow evaluation of specificity to breast cancer. Secondly, we filtered out miRNAs associated with blood cells that were likely to capture blood counts, which is not the intention of this study.

Review of Previous Genome-wide Circulating miRNA Studies
PubMed search using the terms ''miRNA breast cancer'' identified 732 publications, from 2003 through July 2012. Fifteen publications met the criteria of: original research comparing circulating miRNA levels between samples from breast cancer cases and healthy controls for at least one miRNA species. Studies were further categorized as genome-wide vs. probe set (qPCR). Differential expression results were harmonized across studies as simple fold-change, up or down in breast cancer, to allow efficient comparison.

Clinical Specimens
Cases for this study were recruited from newly diagnosed breast cancer patients at University Hospitals Case Medical Center (UHCMC) and controls were recruited from individuals undergoing screening mammography at UHCMC, between 2009 and 2010. Exclusion criteria for both cases and controls included prior non-surgical treatment for any cancer or known BRCA1 or BRCA2 mutation. All participants were asked to complete a survey of breast cancer risk factors and to donate a blood sample for genetic and biomarker studies. All surveys and blood samples were obtained prior to initiation of systemic chemotherapy or hormonal therapy. Included in this study were 20 patients with blood samples collected prior to surgical resection of tumor, 20 patients with blood samples collected after tumor resection (range 6 to 62 days post) and 20 age and race matched healthy controls with negative mammography.
Additionally, we included samples in the study from 10 female subjects newly diagnosed with cancers other than breast, in order to evaluate specificity of circulating miRNA to breast cancer. Research blood samples were collected just prior to index colonoscopy from 5 female subjects recruited between 2005 and 2010, who were diagnosed with pathologically confirmed colorectal cancer as a result of the procedure, through the Case Transdisciplinary Research in Energetics and Cancer Center Colon Polyps Study [37,38]. This study was approved by the UHCMC IRB and all study participants gave written informed consent. Research blood samples were also collected through the UHCMC Department of Thoracic Surgery solitary pulmonary nodule clinic between 2010 and 2011 from 5 females prior to scheduled surgical excision, who were found to have pathologically confirmed non-small cell lung cancer as a result of surgery, through the Genetic and Biologic Markers of Lung Cancer Study.

Ethics Statement
This breast cancer study and the Case Transdisciplinary Research in Energetics and Cancer Center Colon Polyps Study (from which 5 colorectal cancer plasma samples were utilized) were both approved by the UHCMC institutional review board (IRB). The Genetic and Biologic Markers of Lung Cancer Study, from which 5 lung cancer samples were obtained was approved by the Case Comprehensive Cancer Center IRB. All participants in all three studies provided written informed consent.

Sample Handling
In all instances, blood samples were processed in the same day as collection, and all samples were processed in the same lab. Whole blood was collected in standard 10 mL Vacutainer lavender-top glass tubes containing EDTA anticoagulant. Plasma was separated by centrifugation at 600 g615 minutes at room temperature and separated into 1.0 mL aliquots which were immediately stored at 280uC until further use. All participants gave written informed consent and signed a medical record release. All studies were approved by either the UHCMC or Case Comprehensive Cancer Center institutional review board.

RNA Isolation and miRNA Expression Profiling
Plasma samples were de-identified and lab personnel were blinded to subset status (newly diagnosed breast cancer cases, postresection breast cancer cases, healthy controls and other lung/ colon cancer) to avoid potential bias and/or batch effects. Total plasma RNA, including miRNA, was isolated using the miRNeasy kit (Qiagen #217004) according to manufacturer's protocol, with the following modifications: 800 uL total of plasma per sample was used for extraction; each of four 200 uL aliquots was mixed with 1 mg of carrier MS2 bacteriophage RNA (Roche #10165948001) in 750 uL QIAzol reagent and incubated at room temperature (RT) for 5 minutes, followed by addition of 200 uL chloroform and incubation for additional 2 minutes; samples were centrifuged at 12,000 g for 15 minutes at 4C and then 500 uL of upper aqueous phase was carefully transferred to 1.5 volumes of 100% ethanol, which was mixed and then loaded on silica-membrane columns; columns were spun at 13,000 g for 30s at RT and this was repeated until all aliquots of an individual sample were batched on a single column; columns were washed with 700 uL of RWT buffer and spun at 13,000 g for 1 min at RT, followed by three successive washes with 500 uL RPE buffer spun at 13,000 g for 1 min at RT; after drying for 2 min at RT, elution using 50 uL of nuclease-free water was performed. A 1 ul aliquot was used for RNA fluorometric quantification (Qubit, Invitrogen) and remainder stored at 280uC until further analysis. The Illumina Human v2 Microarray (MI-101-1124, Illumina) was utilized to profile circulating levels of 1145 microRNAs. Following manufacturer's recommendation 200 ng of isolated total RNA from each sample was used and assay performed according to supplied protocol (MicroRNA Expression Profiling Assay Guide, Illumina). The Illumina BeadArray Reader and BeadScan software were used to scan and extract raw intensity values.

Data Analysis
Background subtraction and quantile normalization were performed using the Illumina GenomeStudio package. Expression profiles have been deposited in NCBI's Gene Expression Omnibus (GEO) with accession number [GSE41526]. We then applied two filters to the dataset. Filter 1: circulating miRNAs derived from the cellular blood compartment may theoretically confound circulating signatures and therefore we filtered a recently published set of 140 circulating miRNA which were identified as primarily derived from the peripheral blood cellular compartment [39]. Filter 2: miRNA which were undetectable in more than 10% (N.7) of the samples after array background subtraction were filtered, because these low abundance species would have little practical reliability as candidate biomarkers.
Statistical Analysis: miRNA Expression Associated with Breast Cancer Figure 1 provides an overview our study design and analyses performed. The statistical significance of differences in age distribution between pre-resection cases and healthy controls was calculated using a standard t-test. The significance of differences in the number of African-Americans and Caucasians between preresection cases and controls was done using a Fisher exact test.
We applied a standard t-test to test the association of the normalized expression values for each individual miRNA (after filtering as described above) comparing the pre-resection breast cancer cases with the healthy mammography screened controls. In those that were statistically significant, we also used a t-test to compare the controls to breast cancer patients after surgical tumor removal in order to evaluate if this miRNA regresses toward ''normal'' after tumor resection. Lastly, we tested the significance of the difference in expression of the controls and the ''other'' cancers (colorectal and lung) using another t-test to see if these miRNAs are associated with cancer in general or are breast cancer specific.  We identified a single study which employed the same sample substrate (plasma) and genome-wide platform as our study (Illumina Human v2 Microarray, Zhao et al.) [32], with publically available datasets in the GEO repository. We extracted the log 2 fold-changes between the 20 cases and 20 controls from this study and corresponding unadjusted p-values for each of the 1145 miRNAs profiled on the oligoarray, using GEO2R. In order to test the global agreement among miRNAs between the two independent datasets, each of which was generated using 20 cases and 20 controls, we compared estimated log 2 fold-changes of all 1145 miRNAs to calculate a Pearson correlation coefficient and corresponding p-value. All statistics were performed using SAS 9.2.
Because this global comparison for correlation would potentially miss biologically important outlier readouts due to dilution by the non-significant findings, we performed a second test of reproducibility, the goal of which was to test for overlap in outliers between datasets. We selected the list of top candidate miRNAs in our dataset which were statistically significant using a threshold p-value of ,0.05 with at least two-fold change between cases and controls (n = 35), and tested for differential expression in the dataset from Zhao et al. Likewise we evaluated the top candidate miRNAs from the Zhao study (n = 26), selected by identical criteria of p-value ,0.05 and at least two-fold change between cases and controls, for differential expression in our dataset.

Review of Previous Genome-wide Circulating miRNA Studies
Fifteen prior studies were identified which met our criterion of original research publication comparing circulating levels of at least one or more miRNA species between breast cancer cases and healthy controls. Ten of the fifteen studies were qPCR-based, using pre-selected probes to profile from 2 to 7 candidate miRNAs in the circulation, as listed in Table S1. In aggregate, 25 distinct circulating miRNAs were profiled by qPCR, using study cohorts which ranged in size from 20-102 cases and 20-85 controls. Table 1 lists the sixteen miRNAs which were found to be differentially expressed in the circulation (10 up & 6 down, in breast cancer). Two of these sixteen miRNAs were consistently identified by more than one group: miR-21 up in cancer by three groups and miR-155 up in cancer by two groups. Table S2 lists the remaining five studies which were categorized as genome-wide, using comprehensive approaches to agnostically profile circulating miRNAs for candidate biomarker discovery. Genome-wide profiling studies reported total detection of between 188 and 385 miRNAs in circulation, based on cohorts which ranged between 13-48 cases and 10-57 controls. Between these five studies, 158 candidate miRNAs were identified as differentially expressed in circulation (78 up & 80 down, in breast cancer) the complete list of which is included in Table S3. In total, only 16 of these 158 candidate miRNAs overlapped between two studies, but none overlapped in three or more studies. As shown in Table 2, inconsistent findings between studies were more common than consistent findings. Contradictory up/down results were observed for 10 of 16 overlapping miRNAs, whereas only 6 miRNAs showed consistent direction of change in two independent studies. Based on these 6 ''consensus'' miRNAs, overall concordance between studies was calculated as 3.8% (6/158). The six consensus miRNAs, as listed in Table 2, were: (up in breast cancer) miR-25, miR-222, miR-451, miR-497; (down in breast cancer) miR-31, miR-151-5p. Comparison of qPCR-based and genome-wide studies showed surprisingly little concordance. None of the 6 genome-wide consensus miRNAs were selected for study in any of ten prior qPCR profiling studies. On the other hand, the two most promising qPCR candidate miRNAs, circulating miR-21 and miR-155, which were consistently reported as 2.5 to 3.5-fold higher in breast cancer by independent groups, were directly contradicted by genome-wide results, showing that both were down in breast cancer. Using the SOLiD platform, miR-21 was reported to be 4-fold down in breast cancer, while miR-155 was reported to be 1.2 to 2.4-fold down on both the SOLiD and Illumina oligoarray platforms, respectively [32,36].

2miRNA Expression Associated with Breast Cancer
We identified 20 pre-treatment female breast cancer cases (''preresection cases''), 20 matched female healthy volunteers (''controls''), 20 female breast cancer patients who had already undergone complete resection tumor (''post-resection cases'') and 10 female patients with either lung or colorectal cancer (''other cancer''). Table 3 shows the demographic characteristics of the patients included in this study. The pre-resection cases did not statistically significantly differ from the controls with respect to age or race.
Filtering of 140 reported blood cellular miRNAs from 1145 miRNAs on the oligoarray, resulted in 1008 miRNAs (note not all 140 miRNAs mapped directly to a single microarray assay, resulting in the removal of 137 data points). Filtering of low abundance species, defined as undetectable in .10% of samples, further eliminated 486 miRNAs, leaving a total set of 522 miRNAs for analysis. We identified 46 miRNAs whose circulating expression were statistically significantly different between the controls and the pre-resection breast cancer cases at p,0.05 with at least 2-fold change between cases and controls (Table 4). Of these 46 miRNAs, 13 candidates met the criterion of normalizing toward baseline after surgical resection of breast cancer, (no statistically significant difference between mean levels in controls and post-resection cases at p.0.1). Ten of these 13 candidate miRNAs appeared to lack specificity to breast cancer, as evidenced by statistically significantly differences in comparisons between healthy controls vs. other cancers (p,0.05), all in the same direction (up or down regulated) as the breast cancer cases (Table 4), leaving three candidate miRNAs with evidence of    specificity for breast cancer (miR-708*, miR-92b* and miR-568). Figure 1 shows a summary of our results.
Comparing our results to other published studied showed a similar lack of consistency in results. Filtering of 140 blood cellular miRNAs in our study design eliminated four of the six genomewide consensus miRNAs from our analysis (miR-25, miR-222, miR-451 and miR-151-5p). This is not surprising, as none of the prior studies incorporated provisions to adjust for blood cellular miRNA filtering. Of the two remaining consensus miRNAs, miR-497 was eliminated by pre-designated low abundance filtering for those miRNA undetectable in greater than 10% of samples. The single consensus miRNA remaining in our set of 522 miRNAs for analysis, miR-31, did not significantly differ between cases and controls in our sample (p = 0.13), although a trend to lower circulating mean levels in breast cancer was observed, consistent with the findings of other groups.

Test of Reproducibility between Identical Platforms
In the analysis of the correlation between estimated fold change in each of the 1145 miRNAs between our data and the Zhao et al. data, we found no evidence of an association between datasets, as shown in Figure 2. Among all 1145 miRNAs, fold changes were not correlated (R = 20.024, p = 0.41). This suggests a global lack of data agreement in the two datasets. In order to account for possibility of relevant association only in extreme data-points, we secondarily compared only the top miRNA candidates from each study. Table 5 shows the profiling results from the Zhao et al. dataset using only our top 46 miRNAs, with a comparison of the fold changes observed in both studies. Of these 46 miRNAs, the only miRNA that was statistically different between cases and controls in Zhao et al. data as well (miR-1304) was actually altered in the opposite direction. Table 6, represents the converse and shows the profiling results from our dataset using only the top 26 miRNAs identified in the study by Zhao et al. Again, the three overlapping miRNAs in this comparison (p,0.05) were altered in the opposite direction. In generating of both tables 5 and 6, identical methods of background subtraction, normalization and statistical cutoffs for significance were employed [32]. No replication of findings was observed between datasets at the topcandidates (outlier) level. Figure 3 shows the correlation in fold change observed in these two studies among those identified as significant in one. Among these 72 miRNAs, the fold changes were not correlated (R = 0.08, p = 0.50).

Discussion
In our review of genome-wide circulating miRNA data in breast cancer, we observed little if any concordance between five similar and independent studies. Only six ''consensus'' miRNAs emerged from the total of 158 candidate miRNAs identified in prior studies, as shown in Table 2; consensus in this case, being loosely defined as any circulating miRNA with a consistent change, (up or down in breast cancer), identified by at least two independent studies. None of these six ''consensus'' miRNAs overlapped in three, or more, studies arguing against a biologic association. Concordance between genome-wide miRNA studies was 3.8% overall, (6/158), although this is a generous estimate. If total detectable miRNA are used as the denominator, rather than set of 158 candidate miRNAs, overall concordance falls to well below 1%. ''Nonconsistent'' miRNAs (where a miRNA is up-trending in study A but down-trending in study B) were actually more common than consensus miRNAs, n = 10 or 6.3% (10/158), as also shown in Table 2.  Furthermore, little consistency was observed between ten earlier, non-genome-wide studies (qPCR profiling), as shown in Table 1. In particular, the findings of significantly elevated circulating miR-155 and miR-21 by qPCR in breast cancer, by independent groups, were actually contradicted by subsequent data reported by genome-wide approaches. In the serum miRNA study by Wu et al., using the SOLiD platform, and the plasma miRNA study Zhao et al., using the Illumina oligoarray, miR-155 was found to be significantly reduced by 1.2 to 2.4-fold in breast cancer vs. controls, while miR-21 was found to be significantly reduced by 4.0-fold for breast cancer vs. controls in the SOLiD/ serum study by Wu et al. [32,36]. In our study, using the Illumina oligoarray genome-wide approach, we found 46 plasma miRNAs that were significantly differentially expressed between newly diagnosed, breast cancer patients and mammography-screened controls, and identified a subset (n = 3) which showed expected normalization following tumor resection, as well as specificity for breast cancer when compared to female participants with other cancers (lung or colon). Neither miR-155 nor miR-21 were significantly differentially expressed in our study.
As a sixth genome-wide dataset, our results cast further doubt on a growing body of inconsistent data for circulating miRNA as candidate markers of breast cancer. None of the 6 ''consensus'' miRNA from prior genome-wide studies appeared in our list of the top 46 candidates. However, 4 of the 6 consensus miRNA were eliminated from our analysis by pre-designated filtering for the 140 circulating miRNAs predominantly derived from the blood cellular fraction and subject to high level confounding by variation in blood counts [29][30][31]; and a fifth miRNA was eliminated to due low abundance, defined in our study as failing detection in more  than 10% of samples, which would likely preclude clinical applicability. The sole unfiltered consensus miRNA in our dataset, miR-31, did not reach a threshold p-value ,0.05 for breast cancer in our comparison, though we did observe a downtrend for breast cancer cases, consistent in direction with the two other studies.
Public availability on GEO of original data reported by Zhao et al. [32] offered a compelling opportunity to test for reproducibility with our dataset, a comparison between two nearly identical studies. Both datasets were generated using the same platform (Illumina oligoarray), the same substrate (plasma) and identical samples size for in the discovery set (20 cases/20 controls). To our initial surprise, we were unable to demonstrate any reproducibility between datasets: neither at the global level for all 1145 miRNA theoretically detectable by the oligoarray, nor in a limited sense, by restricting comparison to the set of top significant differentially expressed miRNAs from each study (n = 46 in our dataset, n = 26 in the Zhao et al. dataset). Lack of correlation between datasets at both the global and the outlier level is represented in Figures 2 and  3.
By harmonizing data from multiple reports, we were able to demonstrate widespread inconsistencies across reported studies. Whether these inconsistencies are due to differences in study design, statistical analyses, shortcomings of current technology, or other factors, remains to be answered. It may well be, that technical variance introduced by non-uniform sample handling and processing, the effects of long-term storage of archival samples or contamination by miRNA from the blood cellular fraction are all contributing to the generation of artifact, resulting nonreproducible results, as we have demonstrated. Growing caution regarding circulating miRNA discovery is being advanced by several recent methodological investigations which demonstrate, in particular, the high level of confounding introduced by blood cellular components [29, 31. 40]. Furthermore, a recent study illustrated a high level of difference in miRNA profiles between serum and plasma from the same individual [41], highlighting the fact that choice of substrate may be an important design consideration for which there is no current standard. Schrauder et al., in one of the five prior studies we reviewed in our analysis, first noted a lack of reproducibility comparing to a single previously published genome-wide study at the time (Zhao et al.), raising early concerns over technical variance as a source of confounding [34]. The results of our current analysis substantially corroborate these concerns, and lead us to the discouraging conclusion that initial enthusiasm for circulating miRNA as an approach to screening and detection of breast cancer ought to be dampened.
Ours is the first genome-wide study to pre-designate filtering of miRNAs originating from blood cellular components, in order to narrow the focus of discovery to tumor-specific circulating miRNA candidates. Although this is a major strength of our study, we did base our filtering on a single study investigating the expression of  miRNAs in blood cells. Independent replications of this earlier study would help create a more robust list of blood cellular expressed miRNAs.
Ours is also the first study to include a separate cohort of postresection samples, which we hypothesized would show a normalization to baseline levels for putative circulating miRNA markers that are truly related to breast cancer. In our study we utilized samples from 20 breast cancer patients prior to resection and 20 different breast cancer patients after tumor resection. Although not available for this study, future studies using samples from the same patients collected before and after surgical resection would be most useful for identifying miRNAs that regress to normal with removal of the tumor. However, we feel that using a sample of postresection patients achieved our goal of identifying the overall levels of miRNAs in a typical breast cancer patient after tumor resection.
The fourth cohort included in our study, females with lung or colon cancer, was intended to preliminary ask whether a putative circulating miRNA was related to cancer per se, or breast cancer specifically. A possibility exists in our study design that the healthy control population was not cancer-free. These samples were collected from women being screened for breast cancer, and subsequently reported to have a negative mammogram, but it was not a requirement of the study that they were screened for other cancers; while very unlikely, they may have another undiagnosed cancer. A substantial limitation to this and other studies of circulating miRNAs, which may in part explain the lack of reproducibility we have demonstrated, is a lack of standardization in the field. It is not clear how circulating miRNA levels vary by sample collection protocols, handling/storage and isolation techniques [40,42]. Just as importantly, there is no current consensus on the best way to normalize circulating measurements, because a universal housekeeping miRNA has yet to be identified in the circulation [35]. This fundamental limitation is likely a significant source inter-individual variance [43,44].
In our study design, we chose to filter out ''low abundance'' miRNAs, or those that were not found in more than 10% of the samples. Our main reason for filtering these out was that low expression values are within the sensitivity of the microarray and we may be finding array artifacts. However, in this design we may also filter out miRNAs that are only found in breast cancer patients or controls, which may turn out to be an excellent discriminator. To see if this was indeed the case, we went back to the 486 miRNAs we filtered due to low abundance. Of these, 97 had an expression ,0 in more than 75% of both pre-resection cases and controls. Of the remaining 389, 205 had an average expression, among those where the expression was .0 of 100 or less. Given the average overall expression was .1000, we feel that these low quantile-normalized background-subtracted expression values are within the margin of error of the array and normalization methods and may not be expressed in most of the other samples as well. Of the remaining 184, 23 were statistically significantly differentially expressed between pre-resection cases and controls (p,0.05, Table S4). However, none of the 23 overlapped with any previous study either, and therefore our conclusions are not changed. Future studies using more sensitive methods, such as qPCR, may find that these are indeed able to offer some information with regards to likely breast cancer status.
In conclusion, our additional data and comprehensive review of the literature suggests that there is still substantial work that would need to be done in order to identify an individual circulating miRNA, or set of circulating miRNAs, that could be used to identify women who have breast cancer. Further work needs to be done in order to develop standards for circulating miRNA studies, including sample preparation standards, controls for circulating blood cellular components and normalization of measured values. Although a number of studies have reported positive findings, the near complete lack of concordance suggests that, at this time, the utility of miRNAs for breast cancer detection is still questionable.    List of all miRNAs that were filtered out due to low abundance but, had they not been filtered, would be statistically significantly differentially expressed between pre-resection cases and controls.