RNA-sequencing analysis of umbilical cord plasma microRNAs from healthy newborns

MicroRNAs are a class of small non-coding RNA that regulate gene expression at a post-transcriptional level. MicroRNAs have been identified in various body fluids under normal conditions and their stability as well as their dysregulation in disease has led to ongoing interest in their diagnostic and prognostic potential. Circulating microRNAs may be valuable predictors of early-life complications such as birth asphyxia or neonatal seizures but there are relatively few data on microRNA content in plasma from healthy babies. Here we performed small RNA-sequencing analysis of plasma processed from umbilical cord blood in a set of healthy newborns. MicroRNA levels in umbilical cord plasma of four male and four female healthy babies, from two different centres were profiled. A total of 1,004 individual microRNAs were identified, which ranged from 426 to 659 per sample, of which 269 microRNAs were common to all eight samples. Many of these microRNAs are highly expressed and consistent with previous studies using other high throughput platforms. While overall microRNA expression did not differ between male and female cord blood plasma, we did detect differentially edited microRNAs in female plasma compared to male. Of note, and consistent with other studies of this type, adenylation and uridylation were the two most prominent forms of editing. Six microRNAs, miR-128-3p, miR-29a-3p, miR-9-5p, miR-218-5p, 204-5p and miR-132-3p were consistently both uridylated and adenylated in female cord blood plasma. These results provide a benchmark for microRNA profiling and biomarker discovery using umbilical cord plasma and can be used as comparative data for future biomarker profiles from complicated births or those with early-life developmental disorders.


Introduction
Complications during childbirth and pre-term births can lead to developmental and neurological dysfunction in later life in a subset of children [1][2][3][4]. There remains a major unmet need for molecular biomarkers of maternal and neonatal complications such as hypoxic ischemic encephalopathy (HIE). The development of reliable, non-invasive biomarkers would allow us to identify at an early stage, babies at risk of succumbing to developmental and neurological deficits and enable early intervention or prevention. Umbilical cord blood and plasma are a potential source of pertinent biological information following birth which could contain predictive biomarkers of neurological outcome, however little is known about the molecular profile of umbilical cord plasma.
MicroRNAs (miRNAs) are ubiquitously expressed, short non-coding RNAs which finetune gene expression by negatively regulating mRNA translation [5]. They are shuttled between cells via extracellular vesicles [6-8] and importantly are abundant in peripheral biofluids including plasma [9], urine [10], cerebrospinal fluid [11], tears, saliva and peritoneal fluid [12]. They were first profiled in human plasma, serum and microvescicles in 2008 [13-15] and since then, subsequent studies have found that their levels in peripheral biofluids often fluctuate in patients with various types of cancer, neurological disorders, sepsis, liver and cardiovascular disease (reviewed in [16][17][18]). As such miRNAs have received much interest as potential biomarkers and contain many characteristics which render them ideal biomarker candidates: they are more stable than mRNA as they are resistant to RNase cleavage [19]; expression profiles of miRNAs are often more informative and discriminatory than mRNA profiles; they are abundant and profiling miRNAs is rapid and economical [20]; and their levels often change more rapidly in response to an insult or pathophysiological processes allowing early detection of disease which is critical for progressive illnesses such as cancer, Alzheimer's disease, epilepsy and early life insults [21][22][23][24][25].
We have previously shown that a number of miRNAs display altered expression in umbilical cord blood from infants following perinatal asphyxia [26]. These miRNAs may, in the future, aid clinicians in providing targeted neuroprotection. We have also shown that miRNA alterations may be used to examine downstream targets and elucidate pathogenesis [27]. Accordingly, identification of miRNAs associated with perinatal and neonatal injury is a priority.
Foetal Growth Restriction (FGR) is a disorder which manifests as a reduction or complete halt of genetically predetermined potential growth of a foetus [28]. The placenta plays a vital role in the correct development and growth of the foetus via provision of essential nutrients and protection from toxins which may affect development and growth and is believed to temporally and abundantly produce miRNAs which are involved in placental development and function [29][30][31]. The placenta can also produce exosomes in which miRNAs can be found [32]. Placentally-expressed miR-424 may play a crucial role in the development of the placenta and is believed therefore to be associated with FGR. Upregulation was noted in placentae with aberrant vascular development linked to FGR [33]. Downregulation of placental miR-16 and miR-21 has been linked to FGR also. MiR-16 has involvement in both apoptosis and regulation of the cell cycle and may display cell-specific functions and expression. PTEN, the target of miR-21, is normally expressed in the placenta and dysregulation may cause aberrant invasion of the placenta, and reduced migration and growth [30]. Trophoblastic miRNAs regulated by hypoxia are increased in maternal plasma and decreased in placental tissue from FGR cases [32].
Preeclampsia affects up to 8% of all live births worldwide and can result in a high risk of morbidity and mortality for both mother and offspring [29], with approx. 25% resulting in FGR [34]. Inadequate placental oxygenation/angiogenesis may result in consequential hypoxia-ischemia seen in the disorder. Zhu et al. [35] reported over 90 differentially expressed miRNAs between preeclamptic and healthy patients. Multiple miRNAs are dysregulated in severe cases of preeclampsia when compared to uncomplicated, healthy pregnancies. These included but are not limited to miR-210, miR-195, miR-181a, miR-411, and miR-377 [29,34]. Master hypoxamir miR-210 expression levels are raised in both placental tissue and plasma samples from preeclamptic women [36] and can influence multiple pathways in preeclampsia, for example, angiogenesis, mitochondrial dysfunction and immunity [31]. Pineles et al. [37] demonstrated that overexpression of miR-210 and miR-182 may differentiate preeclampsia from healthy controls. miRNAs which are responsible for regulation of angiogenic factors such as VEGF are also dysregulated in preeclampsia [34].
Serum miR-323 levels differ from healthy controls in both ectopic pregnancy and spontaneous abortion [38,39]. Conditions such as gestational diabetes may also be diagnosable with miRNA biomarkers, for example, miR-16, miR-17, miR-19a, miR-19b and miR-20a are dysregulated in the condition when compared to healthy subjects [40].
Many RNA species including miRNAs have been shown to be subjected to editing and modification processes including A-I editing, base modifications, as well as chemical modifications [41][42][43][44][45]. Analysis of deep sequencing data has revealed differences between genomic sequences and RNA sequences, including mRNAs, miRNAs and lncRNAs, the result of RNA editing mechanisms which is a critical function of gene regulation [46,47]. MiRNAs are targeted by A-to-I editing enzymes (ADARs), where an A base is changed to an apparent G, as well as tailing and trimming modifications which involves the addition or removal of a nucleotide at the 3 0 or 5 0 end of the miRNA entity which is mediated by TUTases, however other forms of modification are also prominent including 2 0 O-methylation [47-50]. Modification of miRNAs has been shown to modify miRNA-mRNA targeting, stability and RISC-uptake which can alter gene network dynamics and cellular activity and function. Analysis of RNA editing in plasma is still poorly understood and limited by the lower yields of RNA obtained. However analysis of RNA editing, including miRNA editing in peripheral biofluids may confer additional biomarker potential and sensitivity which would allow greater confidence in biomarker identification as well as provide insight into the function of RNA editing in normal and disease processes.
MiRNA profiling has moved away from high throughput qRT-PCR-based platforms towards the use of RNA-sequencing (RNA-seq), however, few datasets on healthy umbilical cord blood plasma exist for reference purposes. In order to develop reliable biomarkers from umbilical cord plasma, it is important that we gain perspective on the naturally occurring miRNA profiles. Previous studies have used microarrays to profile miRNAs in cord blood from neonates with HIE [26], or used RNA-seq to profile miRNA expression from trios of samples from newborn babies and their parents [51], umbilical cord blood derived cells [52,53], and cord blood buffy coat layers [54], However, a reference dataset of total miRNA profiles and miRNA editing analysis of healthy umbilical cord plasma has yet to be established.
Here we perform unbiased small RNA-seq on umbilical cord blood plasma from healthy new born infants. We compared the expression of miRNAs between sexes and between the maternity hospitals of origin. We also performed preliminary analyses on RNA editing differences which may exist between sexes in order to obtain a more comprehensive catalogue of miRNA profiles in umbilical cord blood plasma. Interrogation of the miRNA profiles from umbilical plasma revealed a largely stable miRNA profile between male and female, however differences in RNA editing were identified, indicating increased complexity in the miRNA makeup of umbilical cord blood plasma in females compared to males.

Study population
Umbilical cord blood samples were collected from two maternity hospitals: Cork University Maternity Hospital (CUMH) and Karolinska University Hospital (KUH). Consent from parents or guardians of the infants included in the study was obtained according to the Declaration of Helsinki and ethical approval was granted from the Clinical Research Ethics Committee of the Cork Teaching Hospitals, Cork, Ireland and local ethical commitee approval in Karolinska University Hospital. This was a nested study of infants recruited to the BiHiVE 2 study (NCT02019147).
In total, 8 infants were included in this study; 4 males and 4 females ( Table 1). All infants were of European descent, singleton, full term, uncomplicated vaginal births. Samples were collected from 4 infants in CUMH (2 males and 2 females) and from 4 infants in KUH (2 males and 2 females). We did not detect any significant difference between male and female infants based on maternal age, parity, gestation, birthweight, head circumference or length. The Apgar score for all infants was above 9 at 1 and 5 minutes.

Biofluid collection
Umbilical cord blood samples were collected immediately after delivery of all infants in this study and processed within 3 h following strict laboratory SOPs by a dedicated research team who were available 24 h a day. Samples were stored at −80˚C in a monitored storage facility until analysis. 6 ml of umbilical cord blood was collected into vacutainer tubes from the infants and processed within 3 hours of delivery. The plasma was prepared by centrifuging the tubes at 2400 x g, for 10 minutes, at 4˚C. The supernatant was collected into an RNAase free tube and extra care was taken not to disturb the buffy coat which contains the white blood cells. Plasma was then collected into 250 μl RNAase free eppendorfs and stored at −80˚C. The level of haemolysis in the plasma samples was assessed by spectrophotometric analysis using a Nanodrop 2000 spectrophotometer. The absorbance at 414 nm was checked and a cut-off level of 0.25 was used to distinguish haemolysis free samples [55].

Small RNA library preparation and RNA-seq
200 μl of non-hemolyzed cord blood plasma was used to isolate RNA, using the miRCURY RNA isolation kit (Exiqon) according to the manufacturer's protocol. RNA was eluted in Apgar score (1 min) 9 (9) 9 (9-10) 0.3559 Apgar score (5 min) 10 (10) 10 (10) -25 μl. A standard overnight RNA ethanol precipitation step was then performed and RNA was re-suspended in 10 μl RNase free water to concentrate. RNA integrity and quantity was determined using the small RNA detection kit (DNF-470) on a Fragment Analyzer (Advanced Analytical Technologies). RNA (5 μl) from each sample was then used to construct small RNA libraries using the Illumina TruSeq Small RNA library kit. Due to the small amount of input RNA the protocol was modified slightly (all kit reagents were halved) to prevent extensive adapter dimer formation. Libraries were size selected using a Pippin Prep with 3% agarose dye free cassettes and size selection was validated using a 2100 High sensitivity DNA Bioanalyzer chip (Agilent). The concentration of each library was determined using the HS-dsDNA kit for Qubit, libraries were pooled and pooled libraries were sequenced at the Trinseq Facility at the Institute for Molecular Medicine at St. James Hospital Dublin on an Illumina miSeq.
Sequencing data processing and differential miRNA analyses

Modification analysis
Modification analysis of the umbilical cord blood samples was performed using Chimira [44]. Chimira allows for the detection of any non-templated sequences within the input small RNA-Seq samples that are not encoded in the genomic sequence of origin. The output of this pipeline is a comprehensive set of all identified 3 0 , 5 0 and internal modifications (SNPs and ADAR-edits). Each modification is characterised by a non-templated sequence pattern and an index, which determines its position relative to the original sequence. In order to study the differential levels of adenylation, uridylation, guanylation and cytocylation between the female and male samples we have collapsed Chimira's modification counts into either mono-nucleotide or poly-nucleotide patterns of the same nucleotide (e.g. A, UU, CCC, etc.). Poly-nucleotide patterns refer to sequences of two or more identical nucleotides and are all grouped together into a single modification type. For example, any 'CC', 'CCC', and/or 'CCCC' modifications are considered collectively as poly-C modifications. A mono-nucleotide modification on the other hand, e.g. 'C', stands on its own as a distinct modification pattern. All other isoforms are not included in this analysis and their counts are merged with the counts of the corresponding templated sequences. Finally, we have defined as differentially expressed/modified miRNAs with a fold change in expression or modification level that is > 2 (or < -2) and an associated p-value < 0.05. Normalization of miRNA modification counts from male and female samples and identification of differentially modified miRNAs was performed using the DESeq2 software package [66].

Results and discussion
Count-based miRNA expression data was generated by mapping to human miRBase V21, resulting in an average of 329,721 counts per sample (range 162,954 to 460,783, Fig 1A). A total of 1,004 unique miRNAs were identified across all samples, ranging from 426 miRNAs in sample CF_1 to 659 in sample CF_2 with an average 486 miRNAs per sample ( Fig 1B). 370 unique miRNAs were detected in at least six of the eight samples ( Fig 1C) and 269 miRNAs were detected in all eight samples (S1 File).
The raw counts were converted to CPM and log-CPM values and miRNAs were removed unless their CPM value was greater than 10 with expression in at least four of the eight samples. Fig 2 shows the density of the log-CPM values for raw pre-filtered data and post-filtered data for each sample. This includes the threshold for the log-CPM of one (equivalent to a CPM value of 10) used in the filtering step. This reduced the number of miRNAs from 1,004 to 300. This is substantially higher than results reported for high throughput qRT-PCR profiling platforms studies of adult plasma [9], and is one of the potential advantages of using an RNA-seq based approach to obtain genome-wide coverage. Lizarraga et al [54] used an EdgeSeq miRNA Whole Transcriptome Assay to profile the miRNAs in buffy coat of cord blood samples from 89 newborns, of which 564 miRNAs were retained for further analysis after filtering. Looney

Unsupervised clustering of samples
Principal Component Analysis (PCA), is a non-parametric method of reducing a complex data set to reveal hidden, simplified dynamics within it. This is accomplished by converting a set of observations of variables (which may be correlated) into a set of values of linearly uncorrelated principal components (PCs). These PCs may then reveal relationships between the variables. PC1 explains the largest proportion of variation in the data, with subsequent PCs having a smaller effect and being orthogonal to the ones before them. Ideally, samples should cluster by the condition of interest, and any outliers should be identified. If the samples cluster  by anything other than the condition of interest in any dimensions then that factor can be included in the linear modelling. Fig 4 shows the principal components analysis plots of PC1 against PC2 for the normalised log-CPM values in our data. Each dot represents a sample and we have coloured and labelled the samples by the sex of the infant (Male or Female) (Fig 4A) or the centre of origin of the samples (CUMH or KUH) ( Fig 4B). We can see from these plots that there is no obvious clustering by either sex or centre of origin. However, when we calculate the Pearson correlation coefficient (r) between the individual PCs and the available clinical/demographic variables (Table 1) we can observe that there is some correlation between the sex of the infant and the centre of origin of the sample and PC1 (r = -0.44 and 0.4 respectively) and between the centre of origin of the sample and PC2 and PC3 (r = -0.6 and -0.5 respectively) ( Fig 4C). Therefore we conclude that there may be a slight "batch" effect due to the centre of origin of the sample and have included this factor in the linear model when looking for differentially expressed miRNAs between males and females. The PCs > 3 each account for less than 10% of the variation in the data and are not shown.

Differential expression analysis
Differential expression analysis was performed using voom [63] and limma [64] as implemented in the R package, edgeR [60]. Subsequently, empirical Bayesian moderation was applied by borrowing information across all miRNAs to obtain more precise estimates of miRNA variability. Significance was defined using an adjusted p-value [65] cutoff that is set at 5% by default. No miRNAs were found to be significantly differentially expressed between male and female infants ( Table 2). A heatmap of log-CPM values for the top 75 miRNAs ranked by p-value is shown in Fig 5.

Cellular origin of miRNAs in umbilical cord plasma samples and biomarker potential
The 10 most abundant miRNA (i.e. highest average miRNA expression across all samples) are: miR-486-5p, miR-10b-5p, miR-26a-5p, miR-191-5p, miR-16-5p, miR-22-3p, miR-181a-5p, miR-92a-3p, miR-451a and miR-30e-5p. Many of these are similar to those found in [51]. The most abundant miRNA, miR-486-5p, accounts for nearly 50% of all raw miRNA counts in our samples. miR-486-5p has previously been noted to be highly expressed in RNAseq studies [13,51] and is detected in most profiling studies of adult plasma that we have seen (S2 File). This miRNA is abundant within red blood cells suggesting that this may be due to selective secretion to plasma or increased stability in plasma [51], however, it does not appear to be detected with such abundance in high-throughput qRT-PCR profiling [9].
Utilising Ensembl [67], the locations of the highest expression of these 300 miRNAs were determined. Of the most abundant miRNAs, interestingly, 49 were most highly expressed in "brain fragments", 46 in the choroid plexus, 27 in the hindbrain, 21 in the forebrain/forebrain fragment and 11 in the spinal cord. While others were most abundant in the cerebellum (8), the cerebral cortex (8), the medulla oblongata (7), the basal ganglia (6), the temporal lobe (5) or in the midbrain (3). Other miRNAs of note were most abundant elsewhere such as skeletal muscle, ovaries, testis, adrenal gland and stomach. This is interesting as a number of placental miRNAs have been linked to brain development: miR-16-5p, miR-21-5p, miR-93-5p, miR-182-5p, miR-146a-5p and miR-135b-5p [68]. All of these, except miR-135b-5p, were found to be abundantly expressed in our samples (S3 File).

Comparison of umbilical cord plasma miRNA profiles to adult miRNA plasma profiles
In a previous study of miRNA expression profiles in adult plasma we found that although there is a similar number of miRNAs being detected across studies, there is a large degree of variation between the lists of miRNAs being detected by the different platforms (e.g. high throughput qRT-PCR profiling or RNA-seq, Exiqon or TaqMan, etc) [9]. However, we observed that there was a set of 40 miRNAs that were common to at least six of the seven studies that were compared [9, 12, 13, 15, 75, 76]. In this case, of the 300 miRNAs that remained in our study after filtering, 192 of these have been previously detected in at least one other profiling study of adult plasma (S2 File).
Of the 108 miRNAs which were not detected in adult plasma 11 of these miRNA are also in the top 100 most abundant miRNA in our samples: miR-381-3p, miR-378a-3p, miR-92b-3p, miR-654-3p, miR-106b-3p, miR-6131, miR-340-5p, miR-151b, miR-1307-5p, miR-421 and miR-3182. This could indicate that umbilical cord plasma has a unique miRNA profile which may contain biomarkers more selective for pre/post-natal development or disease. This may also reflect the temporal changes in miRNA expression throughout ageing and future studies may compare miRNA profiles from adolescents and elderly people to determine this further.
The NCBI GeneRIF database [77] was used to determine whether any of the 108 miRNAs that were not found in adult plasma played specific roles in pregnancy or pregnancy complications. We identified miRNA that are involved throughout pregnancy from the preparation of the endometrium for pregnancy (miR-181a-3p) [ Using miRTarBase [94], we identified 75 validated targets (supported by strong experimental evidence) of the most abundant miRNA in our samples that were not found in adult plasma (summarised in Table 3)). We found evidence to support the role of 50 of these genes in pregnancy and pregnancy related complications.

Analysis of miRNA editing
RNA editing enzymes including ADAR proteins have been shown to function aberrantly in various types of cancers and neurological disorders [161][162][163][164]. Additionally, it has recently been demonstrated that it is possible to distinguish between different cancer types based on the presence or absence of alternatively modified miRNAs (isomiRs) [165]. As such it is intuitive that the search for biomarkers should include criteria which would allow the identification of alternatively edited RNAs, as they may correlate with, and therefore aid in diagnosis of disease development, progression and prognosis [165]. MiRNAs have been shown to be subjected to edits and modifications and we therefore expanded our interrogation of umbilical cord plasma miRNA profiles to investigate the prevalence of editing and whether differences exist between sexes. Using Chimira [44], we performed editing analysis on our sequencing data, initially analysed global modification profile of all samples and differential expression of adenylated, uridylated, guanylated and cytocylated miRNAs (Fig 6). Statistically significant differential expression was taken as a fold change in expression of > 2 (or < -2) and an associated pvalue < 0.05. We found there were consistently more differentially expressed edited miRNAs in female cord blood compared to males (Fig 7). However, expression of the differentially RNA-sequencing analysis of umbilical cord plasma microRNAs from healthy newborns edited miRNAs is very low and thus further evidence is required in order to infer any functional implications from the differential modification profiles in this particular study (S4 File).
Of further interest was the origin of miRNAs which were both differentially adenylated and uridylated in females. With the exception of miR-29a-3p (which is enriched in brain) 5 of the 6 miRNAs which underwent differential adenylation and uridylation are brain specific [170]. Editing of RNA has been reported to be highly prominent in the central nervous system [171]. It is interesting to speculate on the origin of these miRNAs, as these are healthy infants who experienced routine vaginal births, leakage via a disrupted blood brain barrier is unlikely. It is possible however that these brain specific miRNAs were encapsulated in signalling micro-vesicles and transported out of the brain in order to elicit a peripheral cell response [172]. Site-ofediting analysis revealed that editing occurred most frequently at the 3 0 end of the miRNA molecule. Specifically adenylation and uridylation occurred most frequently at the +1 site suggesting 3 0 tailing of the miRNAs. MiRNAs have been shown to be selectively uridylated by 3 0 terminal uridylyl transferases (TUTases) TUT7 and TUT4. This editing can modify miRNAgene regulatory networks by affecting the stability of the miRNA [50, [173][174][175].
Conclusion miRNAs play a role in multiple key processes throughout pregnancy; including preparation of endometrial tissue for implantation, management of immune-associated genes, development of the placenta and angiogenesis [28]. Dysregulation of the expression of these miRNAs may therefore be associated with complications in pregnancy [29], making them good candidate biomarkers for not only HIE [26,176], but for many pregnancy-related disorders. Furthermore, due to the relative stability of miRNAs under normal conditions [177], they appear to be potentially useful diagnostic biomarkers of multiple disordered states [178] in pregnancy and beyond. More research is required however, to decipher their target pathways and mechanisms of action.
While overall miRNA expression did not differ between male and female cord blood plasma, we did detect differentially edited miRNAs in female plasma compared to male. Editing of miRNAs is now known to be altered in disease [161][162][163][164] and can affect miRNA-mRNA targeting, indeed Choudhury et al, identified that A-I editing of miR-376 was reduced in glioma and had an effect on the repertoire of target mRNAs [179]. This allowed an increase in cell invasiveness. As such it is intuitive that future miRNA biomarker studies profile changes in miRNA editing as it may correlate with disease development, progression and outcome. Of note, and consistent with other studies of this type, adenylation and uridylation were the two most prominent forms of editing. Analysis of the sites of adenylation and uridylation along the miRNA molecule revealed that editing was most prominent at the 3 0 end of the miRNAs at the +1 position, indicating 3 0 tailing, a common modification of miRNAs. Analysis of the expression patterns of these miRNAs revealed that all except miR-29a-3p are expressed almost exclusively in brain [170]. Although only a few miRNAs were differentially edited in females and expression levels were low, it is an interesting finding as the effects of sex on RNA editing is poorly understood and warrants further investigation.
This study is the first to profile miRNA editing in cord blood plasma from healthy infants. Although we did not detect a difference between male and female miRNA expression, possibly due to the small sample size, and expression of the differentially edited miRNAs is very low, this study can be used as comparative data for future biomarker profiles from complicated births or those with developmental disorders including those initiated by HIE.