Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Cell and Microvesicle Urine microRNA Deep Sequencing Profiles from Healthy Individuals: Observations with Potential Impact on Biomarker Studies

  • Iddo Z. Ben-Dov ,

    Affiliation Nephrology and Hypertension, Hadassah–Hebrew University Medical Center, Jerusalem, Israel

  • Veronica M. Whalen,

    Current address: Appeals Metro Plus Health Plan, New York, NY, United States of America.

    Affiliation Rockefeller University Hospital, The Rockefeller University, New York, NY, United States of America

  • Beatrice Goilav,

    Affiliation Pediatric Nephrology, Children’s Hospital at Montefiore, Albert Einstein College of Medicine, Bronx, NY, United States of America

  • Klaas E. A. Max,

    Affiliation Laboratory of RNA Molecular Biology, The Rockefeller University, New York, NY, United States of America

  • Thomas Tuschl

    Affiliation Laboratory of RNA Molecular Biology, The Rockefeller University, New York, NY, United States of America

Cell and Microvesicle Urine microRNA Deep Sequencing Profiles from Healthy Individuals: Observations with Potential Impact on Biomarker Studies

  • Iddo Z. Ben-Dov, 
  • Veronica M. Whalen, 
  • Beatrice Goilav, 
  • Klaas E. A. Max, 
  • Thomas Tuschl



Urine is a potential source of biomarkers for diseases of the kidneys and urinary tract. RNA, including microRNA, is present in the urine enclosed in detached cells or in extracellular vesicles (EVs) or bound and protected by extracellular proteins. Detection of cell- and disease-specific microRNA in urine may aid early diagnosis of organ-specific pathology. In this study, we applied barcoded deep sequencing to profile microRNAs in urine of healthy volunteers, and characterized the effects of sex, urine fraction (cells vs. EVs) and repeated voids by the same individuals.


Compared to urine-cell-derived small RNA libraries, urine-EV-derived libraries were relatively enriched with miRNA, and accordingly had lesser content of other small RNA such as rRNA, tRNA and sn/snoRNA. Unsupervised clustering of specimens in relation to miRNA expression levels showed prominent bundling by specimen type (urine cells or EVs) and by sex, as well as a tendency of repeated (first and second void) samples to neighbor closely. Likewise, miRNA profile correlations between void repeats, as well as fraction counterparts (cells and EVs from the same specimen) were distinctly higher than correlations between miRNA profiles overall. Differential miRNA expression by sex was similar in cells and EVs.


miRNA profiling of both urine EVs and sediment cells can convey biologically important differences between individuals. However, to be useful as urine biomarkers, careful consideration is needed for biofluid fractionation and sex-specific analysis, while the time of voiding appears to be less important.


Urine is a source of biomarkers for diseases of the kidneys and urinary tract as well as systemic disorders [1]. In addition to proteins and serum-derived metabolites, extracellular nucleic acids present in the cells and microvesicles of urine can serve biomarker purposes. RNA is present in the urine enclosed in detached cells or in extracellular vesicles (EVs) released from intact kidney and urinary tract cells. Conditions, which alter relative quantities of individual microRNAs (miRNAs) in urine, include inflammatory processes within the genitourinary tract, malignancies and infections [2].

The presence of cell-specific miRNAs in the urine may aid early diagnosis of organ-specific pathology. In this study of healthy volunteers, we examined sex differences, relationship between matched cellular and extracellular miRNA profiles and repeatability of miRNA profiles between matched consecutive urine voids.

Materials and Methods

Ethics statement

This study was approved by the Institutional Review Board of The Rockefeller University Hospital. Clinical data and specimens were collected after obtaining participants’ written informed consent.

Participants and biofluid specimens

We collected urine specimens from 20 healthy volunteers, 20–30 years old [3], representative of New York City ethnic demographics. Each volunteer provided 2 urine specimens, 1–3 h apart, each at least 50 ml. Urine specimens were handled as previously described [4]. Of note, second urine voids were more dilute, as indicated by ~1.9-fold lower creatinine concentrations (p = 0.014, paired-sample t-test). Total RNA was extracted from 50 ml urine sediment cells and from the ensuing cell-free, ultrafiltration-retained supernatant, a ~250-fold concentrate of the cell-free urine particles and macromolecules larger than 100 kDa, thus including EV-enclosed RNA (VS2042, Vivaproducts Inc., Littleton MA) [4]. RNA was quantified using Qubit 2.0 fluorometer [5]. Small-RNA cDNA libraries were constructed with addition of external miRNA-like synthetic calibrators (2.5 attomol per ng total RNA), and sequenced, as recently described [4]. The median amounts of input RNA used for cDNA library preparation were 10.2 ng (IQR 4.2–15.1 ng) in batches of cell-derived samples and 0.49 ng (IQR 0.28–1.44 ng) in batches of EV specimens.

Sequencing and annotation of small RNA cDNA libraries

The obtained sequence files were trimmed and split into the separate samples according to the barcode sequences. Extracted reads were assigned annotations by aligning to the genome and small-RNA databases. For miRNA annotation we used contemporary in-house definitions (see S1 Tables) [6].

The sequencing data discussed in this publication have been deposited in NCBI's Gene Expression Omnibus and are accessible through GEO Series accession number GSE72183 (

Statistical analysis

Normally distributed clinical parameters were compared using unpaired or paired t-tests. Characteristics with skewed distributions were compared using Mann-Whitney, Wilcoxon signed-rank, Friedman rank sum and Kolmogorov-Smirnov non-parametric tests, as appropriate, using Bioconductor on R [7]. The 'DESeq2' package [8] was used for analysis of differential expression. Additional Bioconductor packages were used for plotting ('ggplot2' [9] and 'rggobi' [10]), PCA and correlation analysis ('stats') and unsupervised clustering and heat map generation ('pheatmap'). Classification with machine learning methods was carried out using 'MLInterfaces' [11]. Machine learning refers to a family of computational methods for analyzing multivariate datasets. Serving as a mock-up for biomarker discovery studies, we explored using such methods the possibility to classify and predict our samples based on their miRNA profiles. Attempted algorithms include neural networks, support vector machine, discrimination analyses, k-nearest neighbor classification and recursive partitioning and regression trees. Prior to statistical analysis and presentation of miRNA, except differential expression analysis, counts were transformed to log2-counts per million by the 'limma' voom function [12], following TMM normalization using 'edgeR' [13]. S1 Text Files include selected R scripts and related input data used in this study.

Results and Discussion

Total RNA yield from urine and small RNA composition analysis

To obtain miRNA profiles we processed urine cells and EVs isolated from two urine voids, taken one to three hours apart, and provided by 20 healthy young adults (Table 1). Women’s urine had higher total RNA content (per given volume) in both cells and EVs, compared to men (Table 2). Disparities in RNA content were most prominent in the first void, where inter-gender differences of 12.8-fold for cells and 4.3-fold for EVs were observed.

Table 1. Characteristics of study volunteers.

Anthropometric and clinical characteristics of study volunteers.

Table 2. Total RNA yields in cells and extracellular vesicles.

Total RNA content (ng) in study specimens by volunteer sex.

Barcoded small RNA cDNA libraries were prepared from 79 total RNA specimens (one volunteer provided an insufficient volume of urine for the second void EV preparation). The samples were organized in four batches, each multiplexing 19 to 20 samples (see Methods for amounts of input RNA), and sequenced on four Illumina Genome Analyzer IIx lanes. Table A in S2 Tables displays read numbers by small RNA annotation categories.

Quality control via analysis of spike-in calibrator oligoribonucleotides

A mixture of 10 calibrator oligoribonucleotides was spiked during library preparation. Unsupervised clustering of reads corresponding to these oligonucleotides in the obtained sequence files, Fig 1, shows no evidence of bias related to void (first vs. second void) or sample source (cells vs. EVs). There was a trivial degree of barcode-related clustering, reflective of highly similar miRNA composition in closely related samples.

Fig 1. Heat map of hierarchically clustered study samples according to external calibrator content.

A mixture of 10 synthetic oligoribonucleotides (‘calibrators’) was spiked into RNA samples during library preparation for sequencing. Unsupervised analysis of samples based on read counts of these calibrators shows absence of clustering by specimen source (cells vs. EVs), sex or void, as would be expected for an external spike-in control.

There were no consistent sex-related or void-related differences in calibrator-normalized miRNA content. However, compared to cell-derived small RNA libraries, EV libraries were enriched with miRNA, while EV content of scRNA, snRNA, snoRNA, rRNA and rRNA precursors was lower compared to cells (Fig 2 and Table B in S2 Tables). Second and first void cells had similar small RNA composition. However, second void EVs had lower miRNA content compared to first void EVs (Table B in S2 Tables).

Fig 2. Heat map of hierarchically clustered study samples according to small RNA category profiles.

Small RNA sequencing was conducted from all study samples as described in the Methods section. An automated pipeline identified and summarized the resulting sequence reads according to small RNA annotation categories. Unsupervised analysis based on read counts shows clustering by specimen source (cells vs. EVs), sex and subject.

Principal component analysis and unsupervised clustering

Reads mapping to miRNA were summed, and organized according to mature miRNA (Table C in S2 Tables) and miRNA precursor clusters (Table D in S2 Tables). Principal component analysis (PCA) of mature miRNA counts, Fig 3, disclosed consistent sex- and fraction-based separation of profiles. Unsupervised clustering of specimens according to miRNA precursor cluster expression levels, Fig 4, showed prominent bundling by specimen type (cells vs. EVs) and by sex, as well as a tendency of void repeats to neighbor closely in the dendrogram. Thus, if miRNA are to be useful as urine biomarkers, careful consideration is needed for biofluid fractionation and sex-specific analysis, while the time of voiding may be less important.

Fig 3. Principal component analysis plot of study sample miRNA profiles.

A scatter plot depicting principle component 1 (PC1) and PC2 coordinates of study samples, color-coded according to subject sex and urine fraction. Consistent sex- and fraction-based separation of profiles is observed.

Fig 4. Heat map of hierarchically clustered study samples according to miRNA profiles.

Small RNA sequencing was conducted from all study samples as described in the Methods section and in Figs 1 and 2. Unsupervised analysis based on miRNA precursor read counts shows prominent clustering by sex, urine fraction (cells vs. EVs) and subject.

Test-retest repeatability of miRNA profiles

To examine repeatability of miRNA expression, namely the similarity of miRNA profiles in successive voids from the same individual, we examined the Spearman correlation matrix of all samples (limiting this analysis to miRNAs which were detected with ≥2 reads in ≥25% of samples– 160 mature miRNA, Table E in S2 Tables). Table 3 shows median Spearman correlation coefficients of samples from a given batch with their respective matched counterparts in other batches (left panel) or with all samples in other batches (middle panel). miRNA profile correlations between a sample and its void (as well as fraction) counterparts are distinctly higher than correlations between miRNA profiles overall (right panel). This is another indication that intra-individual variability is considerably smaller than inter-individual differences, and that the exact timing of urine collection has minor consequences.

Table 3. Correlation between microRNA profiles.

Median Spearman correlation coefficients between batches restricted to matched samples (left panel) or including all samples (middle panel).

Analysis of differential expression

To distinguish miRNA composition in EVs vs. cells we conducted differential expression analyses using DESeq2 [8]. The expression of 129 of 431 mature miRNAs (miRNAs with at least 5 reads summed across all samples, Table F in S2 Tables) was found to differ between EVs and cells in a model including the cofactors void, sex and fraction*sex interaction. The pattern of differential expression was unbalanced in that a handful of high-abundance miRNAs were slightly enriched in EVs, while numerous low-abundance miRNA were depleted in EVs compared to cells (Fig A in S1 Figs). As an exception to this trend, the highly abundant miR-320 was less frequent in EVs. The unique biogenesis of miR-320 [14] may be linked to this relative depletion.

miRNA expression differences relating to volunteer sex (women vs. men) were evaluated in the same model. We found differential expression of 97 of 431 mature miRNA (Table G in S2 Tables). Top sex-related differentially expressed miRNAs in separate analyses on cells and EVs are shown in Table 4. Good agreement was found between findings in cells and EVs in respect to leading sex-related differences; Pearson correlation coefficient of log2 fold-changes among top differentially expressed miRNA in cells vs. EVs, 0.940. Thus, similar sex-related differences (and expectedly also disease-associated changes [4]) are detected in cells and EVs (see also Fig B in S1 Figs).

Table 4. Top differentially expressed miRNA in women compared to men.

Analysis of all specimens (left), urine cells (middle) or EVs (right).

Fig 5 shows the mean-variance association of normalized and log-transformed miRNA read counts. Inverted-U-shape relationship is seen, while several outliers are denoted. This pattern would not be expected of typical tissue profiles, and is likely a result of inconsistent cell type composition, with diversity represented in the mid-abundance miRNAs. For example, miR-203 and miR-205 are expressed in keratinocytes [15], which are present in urine irregularly, predominantly in women.

Fig 5. Mean-variance association of miRNA abundance.

A scatter plot depicting the mean (x axis) and coefficient of variation (y axis) relationship of log-transformed normalized miRNA counts across all samples. A fit line is shown and several prominent outliers are labeled. The color scale represents the standard deviation from the mean.


Lastly, this study involved healthy volunteers without known kidney or urinary tract disorder. However, we reasoned that in lieu of its design, this study can simulate the results of a disease-centered clinical investigation, and thus point to limitations and needed modifications to the methodology and analysis. We examined the following questions: (1) can miRNA profiles classify volunteers’ sex and specimen type, and (2) can the first void serve as a training set and guide classification (sex, specimen type) of second void specimens?

As shown above, a dendrogram (Fig 4) and a two-dimensional PCA plot (Fig 3) can partially distinguish volunteer sex and specimen fraction. Additional dimensions of data may allow better distinction. Indeed, two-dimensional projections of a multi-dimension plot are more effective at separating specimens (Fig C in S1 Figs) [16].

Various machine learning algorithms [11] performed reasonably well in classification and prediction of sample sex and fraction (Table H in S2 Tables and Fig D in S1 Figs). Using first void profiles for training, test classification of second void specimens was variably successful (Table I in S2 Tables).


From this pilot study applying next generation sequencing to profile urine miRNA in healthy volunteers we conclude that miRNA profiling of both EVs and sediment cells relate to biological characteristics of interest. We found indications that intra-individual variability is considerably smaller than inter-individual differences, and that the exact timing of urine collection has minor consequences. Thus, for miRNA profiling to be useful as urine biomarkers, careful consideration is needed for biofluid fractionation and sex-specific analysis, while the time of voiding may be less important. We found similar miRNA profile differences between men and women in urine cells and extracellular vesicles, likely reflecting that extracellular vesicles are originating from the same cell types that are present in the sediment. This suggests that potential biomarkers found in urine sediment cells are likely to be present in the extracellular vesicles, and vice versa. However, in patients with kidney or urinary tract disease, as opposed to the healthy subjects who volunteered to this study, the composition of cells and extracellular vesicles in the urine may have different derivations.

Supporting Information

S1 Figs. Figs presenting supplementary data.

Fig A–MA plot of miRNA expression in urine EVs compared to cells. Log fold-change (y axis) vs. average expression (x axis) in urine EVs compared to cells. Dots representing differentially expressed miRNA (adjusted p-value <0.05 according to a DESeq2 analysis) are colored red. Fig B–Sex-related fold-change values in extracellular vesicles vs. cells. Scatter plot of log2 fold-change values of miRNA expression in women compared to men in extracellular vesicles (EV, y axis) vs. sediment cells (x axis). Colors code the average miRNA abundance. Fig C–Multidimensional scaling of samples according to miRNA profiles. Two-dimensional projections of multidimensional scaling analysis of samples based on miRNA profiles. Principle component analysis and plotting were generated using rggobi [16]. Between 3 and 5 principal components are projected. Orange and purple, symbolize female; red and yellow, male; orange and yellow, cells; purple and red, EVs. Various projections capture clear separation based on subject sex and urine fraction. Panels b and c uncover subject 11 samples, particularly her cell specimens, as outlier. Asymptomatic bacteriuria (E. coli) is likely responsible for this aberration. Fig D–miRNA-based classification trees. Two proposed classification trees to categorize samples according to volunteer sex and specimen type (urine cells or EVs). Binary decisions are based on the specified miRNA levels (expressed as log-transformed counts per million).


S1 Tables. Spreadsheet tables presenting miRNA annotation definition


S2 Tables. Spreadsheet tables presenting supplementary data.


S1 Text Files. R scripts and related input data files that can be used to reproduce the main analyses presented in this study.


Author Contributions

Conceived and designed the experiments: IZB-D TT. Performed the experiments: IZB-D VMW. Analyzed the data: IZB-D VMW. Contributed reagents/materials/analysis tools: KEAM. Wrote the paper: IZB-D BG.


  1. 1. Goligorsky MS, Addabbo F, O'Riordan E. Diagnostic potential of urine proteome: a broken mirror of renal diseases. Journal of the American Society of Nephrology: JASN. 2007;18(8):2233–9. Epub 2007/07/13. pmid:17625117.
  2. 2. Alvarez ML, Khosroheidari M, Kanchi Ravi R, DiStefano JK. Comparison of protein, microRNA, and mRNA yields using different methods of urinary exosome isolation for the discovery of kidney disease biomarkers. Kidney Int. 2012;82(9):1024–32. Epub 2012/07/13. pmid:22785172.
  3. 3. International Federation of Clinical Chemistry (IFCC). Scientific Committee, Clinical Section. Expert Panel on Theory of Reference Values (EPTRV). IFCC Document (1982) stage 2, draft 2, 1983-10-07 with a proposal for an IFCC recommendation. The theory of reference values. Part 2. Selection of individuals for the production of reference values. Clin Chim Acta. 1984;139(2):205F–13F. Epub 1984/05/30. pmid:6733931.
  4. 4. Ben-Dov IZ, Tan YC, Morozov P, Wilson PD, Rennert H, Blumenfeld JD, et al. Urine microRNA as potential biomarkers of autosomal dominant polycystic kidney disease progression: description of miRNA profiles at baseline. PLoS One. 2014;9(1):e86856. Epub 2014/02/04. pmid:24489795; PubMed Central PMCID: PMC3906110.
  5. 5. Li X, Ben-Dov IZ, Mauro M, Williams Z. Lowering the quantification limit of the QubitTM RNA HS assay using RNA spike-in. BMC molecular biology. 2015;16:9. pmid:25943882; PubMed Central PMCID: PMC4431604.
  6. 6. Farazi TA, Horlings HM, Ten Hoeve JJ, Mihailovic A, Halfwerk H, Morozov P, et al. MicroRNA sequence and expression analysis in breast tumors by deep sequencing. Cancer research. 2011;71(13):4443–53. Epub 2011/05/19. pmid:21586611; PubMed Central PMCID: PMC3129492.
  7. 7. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5(10):R80. Epub 2004/10/06. pmid:15461798; PubMed Central PMCID: PMC545600.
  8. 8. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. Epub 2014/12/18. pmid:25516281; PubMed Central PMCID: PMC4302049.
  9. 9. Wickham H. ggplot2: Elegant Graphics for Data Analysis. Berlin: Springer; 2009.
  10. 10. Emerson JW. Interactive and Dynamic Graphics for Data Analysis: With R and GGobi by COOK, D. and SWAYNE, D. Biometrics. 2008;64(4):1301–3.
  11. 11. Carey V, Gentleman R, Mar J, Vertrees J, Gatto L. MLInterfaces: Uniform interfaces to R machine learning procedures for data in Bioconductor containers. 1.46.0 ed. p. R package.
  12. 12. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research. 2015;43(7):e47. Epub 2015/01/22. pmid:25605792; PubMed Central PMCID: PMC4402510.
  13. 13. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40. Epub 2009/11/17. pmid:19910308; PubMed Central PMCID: PMC2796818.
  14. 14. Xie M, Li M, Vilborg A, Lee N, Shu MD, Yartseva V, et al. Mammalian 5'-capped microRNA precursors that generate a single microRNA. Cell. 2013;155(7):1568–80. Epub 2013/12/24. pmid:24360278; PubMed Central PMCID: PMC3899828.
  15. 15. Nissan X, Denis JA, Saidani M, Lemaitre G, Peschanski M, Baldeschi C. miR-203 modulates epithelial differentiation of human embryonic stem cells towards epidermal stratification. Dev Biol. 2011;356(2):506–15. Epub 2011/06/21. pmid:21684271.
  16. 16. Swayne D, Buja A. Exploratory Visual Analysis of Graphs in GGOBI. In: Antoch J, editor. COMPSTAT 2004—Proceedings in Computational Statistics: Physica-Verlag HD; 2004. p. 477–88.