Profiles of Extracellular miRNA in Cerebrospinal Fluid and Serum from Patients with Alzheimer's and Parkinson's Diseases Correlate with Disease Status and Features of Pathology

The discovery and reliable detection of markers for neurodegenerative diseases have been complicated by the inaccessibility of the diseased tissue- such as the inability to biopsy or test tissue from the central nervous system directly. RNAs originating from hard to access tissues, such as neurons within the brain and spinal cord, have the potential to get to the periphery where they can be detected non-invasively. The formation and extracellular release of microvesicles and RNA binding proteins have been found to carry RNA from cells of the central nervous system to the periphery and protect the RNA from degradation. Extracellular miRNAs detectable in peripheral circulation can provide information about cellular changes associated with human health and disease. In order to associate miRNA signals present in cell-free peripheral biofluids with neurodegenerative disease status of patients with Alzheimer's and Parkinson's diseases, we assessed the miRNA content in cerebrospinal fluid and serum from postmortem subjects with full neuropathology evaluations. We profiled the miRNA content from 69 patients with Alzheimer's disease, 67 with Parkinson's disease and 78 neurologically normal controls using next generation small RNA sequencing (NGS). We report the average abundance of each detected miRNA in cerebrospinal fluid and in serum and describe 13 novel miRNAs that were identified. We correlated changes in miRNA expression with aspects of disease severity such as Braak stage, dementia status, plaque and tangle densities, and the presence and severity of Lewy body pathology. Many of the differentially expressed miRNAs detected in peripheral cell-free cerebrospinal fluid and serum were previously reported in the literature to be deregulated in brain tissue from patients with neurodegenerative disease. These data indicate that extracellular miRNAs detectable in the cerebrospinal fluid and serum are reflective of cell-based changes in pathology and can be used to assess disease progression and therapeutic efficacy.


Introduction
The ability to meaningfully profile peripheral biofluids to monitor and gain insights about the underlying severity of central nervous system pathology would bring significant benefits to monitoring disease progression and treatment efficacy. Development of diagnostic tests and preventative and treatment therapies for neurodegenerative diseases is encumbered by the complexity of pathomechanisms underlying neurodegenerative diseases, as well as the difficulty of achieving an accurate diagnosis in early, asymptomatic stages of disease. Whereas several genes have been linked to rare monogenic forms of Alzheimer's disease (AD) and Parkinson's disease (PD), molecular mechanisms underlying sporadic forms of the disease are complex and largely unknown [1,2].
AD is an age-related, chronic, neurodegenerative disorder characterized by gradual dementia and deteriorated higher cognitive functions including language and behavior [3]. Similarly to AD, PD is a progressive neurodegenerative disorder affecting approximately 1-2% of individuals over 60 years of age [4].
Cardinal clinical features of PD are rigidity, resting tremor, bradykinesia and postural instability [3]. As PD advances, up to 80% of patients develop dementia.
Histopathologically, the AD brain is characterized by deposition of both neuritic plaques composed of amyloid-b (Ab) peptide and hyperphosphorylated forms of the microtubule-associated protein Tau that create neurofibrillary tangles (NFTs) [2]. Neurons of PD subjects exhibit abnormal accumulation of cytoplasmic inclusions consisting mainly of a -synuclein, a protein whose aggregation forms insoluble fibrils, Lewy Bodies [3]. To complicate the detection of AD and PD, age-matched cognitively normal individuals have low levels of plaque and tangle formation, as do most PD patients.
An important emerging level of pathophysiological complexity underlying neurodegenerative disorders is derived from miRNA gene regulation [5,6]. MiRNAs represent a class of endogenous, stable, non-coding RNA molecules involved in post-transcriptional regulation of target gene expression. Biogenesis of mature miRNA occurs through a multi-step process that starts in the nucleus with endonucleolytic cleavage of the primary miRNA transcript, and ends with a ,20-25 nucleotides long single stranded mature miRNA (miRNA) in the cytosol. The binding of miRNA with imperfect complementarity to target mRNAs leads to a reduced protein expression by either degradation of the RNA or translational arrest [7]. Discovery of miRNA regulatory potential has significantly broadened our knowledge of preferential gene expression in the central nervous system. Half of the identified tissue specific miRNAs are brain or brain-region specific, promoting homeostatic functions on brain gene expression [8,9]. Several age-related disease studies suggest differential expression of several miRNAs in the human brain, some of which regulate the expression of genes known to be associated with neurodegeneration [10,11,12]. More importantly, abnormal expression of miRNAs have been detected in cellular dysfunction and disease, including AD and PD [1,6,13,14,15].
The concept that peripheral biofluids, such as cerebrospinal fluid (CSF) and blood serum (SER), contain markers of central nervous system disorders has become an active area of research. Circulating cell-free RNAs, as indicators (snapshots) of diseaserelevant information, are carried to the periphery and are attractive candidates for monitoring central nervous system disease. The miRNA changes associated with neurodegenerative disease that are detectable in the periphery have not been appreciably profiled and compared in the CSF and SER of AD and PD patients. Profiling cell-free miRNA may reduce interfering miRNA signals from blood cells and immune cells [16]. In addition, there has not been an extensive study to correlate peripheral miRNAs with corresponding postmortem neuropathology characterization.
Recent advances in library sample preparation and analytical methods have introduced new protocols that allow miRNA profiling by next generation sequencing (NGS) from CSF and SER [17]. In this study, we used NGS to investigate the expression patterns of the known miRNAs listed in miRBase (V18) in acellular fluids from postmortem subjects with verification of Alzheimer's or Parkinson's disease neuropathology, and neurologically normal control subjects. We compared the detectible miRNAs in CSF and SER. Postmortem autopsy data on brain tissue revealed the severity and extent of neuropathology, which we were able to correlate with miRNA status in biofluids. As a potential biofluid of choice for CNS disease, human CSF has the advantage to reflect a more stable signature of the brain due to its proximity to the diseased tissue. However, unless there is a significant precedent to submit a patient to a lumbar puncture, most patients are reluctant. Serum is less invasive and more readily available, but also contains miRNA signals from all tissues in the body. One goal of this study was to ascertain the advantages of CSF compared with SER for the detection of Alzheimer's and Parkinson's disease-relevant miRNA. From postmortem patients we were able to profile both the CSF and serum. To determine which fluid has a higher signal-to-noise ratio, we sequenced and analyzed miRNA abundance in paired CSF and SER samples from a cohort consisting of control, AD, and PD subjects. The sample set was used to correlate miRNAs associated with AD and PD pathology that are detectable in peripheral biofluids. We identified AD and PD miRNA signatures, as well as subsets of misregulated miRNAs in connection with regional (Braak stage) and time-dependent characteristics (tangle and plaque load) of AD and PD pathology. Importantly, identical analysis of CSF and SER datasets revealed non-overlapping results, with a potentially more stable miRNA signature derived from the CSF.
One of the advantages to using sequencing to profile the miRNA content is the ability to assess all detectable miRNA expression at once. We used miRDeep2 software [18] to predict novel miRNAs in CSF and SER. We report the differential expression of these putative miRNAs in both CSF and SER across diseases. In addition, we compare our findings with those previously reported for deregulated miRNAs identified in tissue. This is the first paper to use sequencing to compare the miRNA profile in both CSF and SER from the same individuals. In addition, we sequenced one of the largest miRNA datasets to date, comparing two neurodegenerative diseases. The profiling and sequencing data from this paper are publicly available and represent a significant resource for future evaluations of control, AD and PD biofluids. These data can provide us with information regarding the types of miRNAs detectable in cell-free peripheral biofluids.

miRNA expression profiling
The principal demographic, postmortem interval, clinical and pathological characteristics of the 69 AD patients, 67 PD patients and 78 control subject samples included in this miRNA profiling study are summarized in Table S1. Samples were obtained from the Banner Sun Health Research Institute after thorough evaluation of neuropathology and consisted of AD, PD, and neurologically normal control subjects. Average expired age was comparable across the three groups: controls (82.1 + 10 years), AD (81.3 + 7.7 years) and PD (80.0 + 5.1 years) ( Figure 1). Average disease duration was 7.5 + 4.1 years for AD patients, and 12.6 + 7.9 years for PD subjects. Mean postmortem interval for all samples was approximately 3.1 hours. In most cases, we were able to analyze one CSF and one SER sample from each subject, hence allowing for direct comparison of miRNA signatures for the two biofluids and thereby reducing sample variability. Supporting the consistency of our results, analysis of variance revealed no significant source of variation in the expression data due to age, gender, or postmortem interval (PMI; Figure S1).
We conducted miRNA expression profiling of SER and CSF samples using NGS. Small RNA sample preparation for NGS platforms typically require at least 1 mg of total RNA as a starting input. This is problematic for SER and CSF samples which contain low levels of total RNA. We modified a protocol for small RNA deep sequencing for samples with low RNA content and small starting volumes, allowing for miRNA NGS expression profiling from CSF and SER [17]. We concentrated our downstream analysis on the 2228 known miRNAs in miRBase (Version 18). When examining the data from all of our CSF samples simultaneously, we detected 1773 different miRNAs expressed at least once in the CSF samples and 1757 in the SER samples. For our analysis, we reduced these numbers to 428 miRNAs in CSF and 414 miRNAs in SER that had a minimum average of .5 read counts. From the 2228 possible mature miRNAs listed, we removed those that had the same expression patterns across all samples. For example, if hsa-let-7a-5p_hsa-let-7a-1 and hsa-let-7a-5p_hsa-let-7a-2 were present with the same expression profile, hsalet-7a-5p_hsa-let-7a-2 was considered redundant and removed from further analysis.
Because this is the first paper to sequence and compare the miRNA profile of CSF and SER from the same patients, we provided a list of the 2228 miRNAs used in our analysis and the normalized average number of counts per million detected in each biofluid, from all samples (Table S2). miRNA signature derived from CSF is slightly more stable In an effort to determine which biofluid, CSF or SER, has a more stable and consistent miRNA signature associated with disease, we compared the matched CSF and SER data sets derived from AD, PD and control samples. Using consensus clustering analysis and silhouette scores (Figures S1 and S2), the serum data reflected a slightly reduced stability in cluster membership compared to the CSF due to the predominantly unimodal nature of its consensus matrix histogram ( Figure S2). However, consensus clustering analysis revealed that there was only a slight improvement in CSF cluster stability in our data sets. Therefore, we report our results for both CSF and SER due to the lack of significant advantage of using either biofluid.
miRNAs are differentially expressed in CSF and SER of AD patients The samples from AD and age-matched non-affected subjects were subsequently analyzed for differential miRNA content. Based on the distribution of total number of mapped reads (sequence reads that align to known mature miRNAs), we set the threshold for removing samples to those with less than 100,000 mapped reads for CSF and less than 60,000 for SER data. Subsequently, we removed m outliers from the following groups: CSF AD (m = 5), CSF Control (m = 5), SER AD (m = 11) and SER Control (m = 10). The remaining samples each had an average of 2,631,443 reads that mapped to known miRNAs for CSF samples and 1,953,105 mapped read counts for SER samples. These samples represent some of the largest depth of coverage in any study to date.
Sample size for serum consisted of 53 AD, n = 50 PD and 62 control subjects. Results were filtered at corrected p-value ,0.05 ( Table 2). We describe only significant differentially expressed miRNAs with an average number of mapped reads greater than 5 and 0.7, FC(log2) or FC(log2) ,20.7. Logarithmic base 2 fold change (FC) is relative to the first listed group for each comparison. Of the 20 differentially expressed miRNAs, we found that 11 (,55%) were previously reported in the literature: 125a-3p, 125b, 127-3p, 1285, 135a/b, 30c, 21-5p, 219-2-3p, 34c, 375, 873 [2,6,25,26,27,28,29,30,31]. The overlap of CSF and SER expressed miRNAs for AD compared to neurologically normal control subject analysis consists of two miRNAs, miR-184 and miR-127-3p. The direction of miR-184 and miR-127-3p expression did not correlate between CSF and SER data. It is interesting to note that the miRNAs expressed differently in the CSF were all significantly down-regulated, whereas 85% of the miRNAs identified in SER were up-regulated compared to neurologically normal age-similar controls.
We also examined miRNAs that were different between AD and PD patients ( Table 1;Table 2). In the CSF, only 1 of the 5 differentially expressed miRNAs between AD and PD subjects was specific to that analysis, and did not overlap with miRNAs that were detectably different in AD compared with control subjects or PD compared with control subjects: 32-5p. In SER, 16 miRNAs had different expression levels when AD and PD subjects were compared, out of which 12 were unique to that analysis and exhibited no overlap with results from CSF with AD or PD compared with control subjects.

Superscript 1 indicates miRNAs that are differentially expressed in both patients with
Alzheimer's disease and Parkinson's disease compared to control subjects. Superscript 2 indicates differentially expressed miRNAs in both CSF and SER biofluids for the corresponding analysis. Significant miRNAs with a superscript 3 are in low abundance, with normalized mean ,10 mapped reads.   Control subjects and PD patients compared with Control subjects, in the CSF. There were 5 miRNAs differentially expressed in SER samples from PD patients compared to control subjects. The expression levels of miR-338-3p, 30e-3p and 30a-3p were up-regulated in the serum of PD (n = 50) subjects, whereas miR-16-2-3p and 1294 were significantly down-regulated ( Table 2). Of the 5 miRNAs, 16-2-3p, 30e, and 30a-3p (,60%) were previously identified to be differentially expressed in Parkinson's subjects when compared to controls subjects [39,43].

Potential novel miRNAs detected in CSF and SER
We used miRDeep2 to predict novel miRNAs in our CSF and SER data [18,44]. MiRDeep2 first aligns miRNA reads to the genomic reference, then uses an RNA fold tool to predict the RNA secondary structures in the sequence surrounding the aligned miRNA read and evaluates the structure and signature of each potential miRNA precursor. If the structure creates a miRNA hairpin and the potential miRNA read falls within the hairpin, as would be expected from Dicer processing, then the potential miRNA is assigned a score that reflects the calculated confidence in the predicted miRNA [45]. We used the following cutoffs: the miRNA must be expressed in at least 30% of either CSF samples or SER samples and expressed on average more than 5 times in each sample. Using these criteria, we detected a total of 13 novel miRNAs ( Table 3). When we examined these new miRNAs for differential expression, only one displayed significant expression level changes between AD and PD SER samples at p ,0.05 (statistical tests were corrected for multiple testing using all known plus potential miRNAs). The significant miRNA sequence is labeled bold in Table 3.
miRNA expression in connection with Braak neurofibrillary stages, neurofibrillary tangle scores, and plaque-density scores We sought to investigate the correlation between miRNA expression data and the severity of pathology findings quantified at autopsy, regardless of disease diagnosis. We examined miRNAs that consistently increased or decreased their expression as measures of pathology increased. Ordinal logistic regression (OLR) was used to model the relationship between normalized miRNA counts and several ordinal outcome variables comprised of: i) Braak neurofibrillary stages; ii) neurofibrillary tangle scores and iii) plaque-density scores. Consequently, OLR was used for identification of miRNA markers associated with the progression of regional and time-dependent characteristics typical for AD pathology. Neuropathology examination at autopsy provided total Braak stages (1-6), neurofibrillary tangle scores (0-15) and plaquedensity scores (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15). The plaque and tangle scores were sums of pathology (0 = none, 1 = sparse, 2 = moderate, 3 = frequent) across five brain regions (Frontal, Temporal, Parietal, Hippocampal, Entorhinal). For additional information on patient scores, see Table S1. Prior to the analysis, neurofibrillary tangle and plaquedensity scores were binned into 3 ordered response categories, with 1,2,3 for increasing gravity of progression. Similarly, Braak neurofibrillary stages were treated as ordinal under the assumption that levels of Braak staging have a natural stage ordering (1,2, 3,4,5,6), with an unknown distance between adjacent levels. Upon filtering, each analysis consisted of the following number of subjects in each subgroup: Ordinal logistic regression analysis resulted in several predictor variables (miRNAs) significant at unadjusted p-value ,0.05, that consistently increased or decreased their expression across pathologic severity. We report miRNAs with the lowest Akaike Ordinal logistic regression analysis (OLR) was implemented in order to detect miRNAs with monotonic expression patterns across Braak neurofibrillary stages. Braak stages were recorded during autopsy for each subject, and specific CSF subgroups  Neuropathological examination disclosed total neurofibrillary tangele scores. We binned the data 0-15, in increasing increments, for each subject. Scores were divided into three groups corresponding to low neurofibrillary tangeles score (0-4), moderate neurofibrillary tangeles score (5-9) and high neurofibrillary tangeles score (10)(11)(12)(13)(14)(15). Ultimately, neurofibrillary tangle subgroups consisted of stage 1 (n = 73), stage 2 (n = 58) and stage 3 (n = 53) subjects for CSF and stage 1 (n = 71), stage 2 (n = 49) and stage 3 (n = 44) for SER. Ordinal logistic regression analysis (OLR) was implemented in order to fit miRNA expression data across the three ordered groups. Delta AIC quantifies the information loss associated with using each model relative to the best approximating model. We report predictor variables with the lowest Akaike Information Criterion (AIC) and D i v10 that satisfy assumptions of the OLR. p-Value* is unadjusted.
iii) (a) CSF plaque-density stages: Neuropathology characterization of total plaque-density scores, ranging from 1-15 for each subject. Scores were summed from five brain regions described above. Total scores were divided into three groups corresponding to low plaque-density score (1-5), moderate plaque-density score (6-10) and high plaquedensity score (11)(12)(13)(14)(15). The ordinal regression method was used to model the relationship between the ordinal outcome variable, plaque density score, and normalized miRNA counts as explanatory variable. We report miRNAs with the lowest AIC significant at uncorrected p-value ,0.05 if the parameter estimate 95% confidence interval does not include zero. We plotted two miRNAs out of the 17 reported (miR-195-5p, miR-101-3p) in Table 6 that showed consistent expression changes with increased density of plaques ( Table 6, Figure 4A). (b) SER plaque-density stages: 7 miRNAs including miR-106a-5p and miR-30b-5p ( Table 6, Figure 4B). miR-106-5p and miR-30b-5p, detected in SER and selected from Table 6, showed significant fit across increasing plaque density stages. The progressive loss of of melanin-containing dopaminergic neurons in the substantia nigra leads to a loss of pigmentation, resulting in measurable depletion of staining in the tissue. The depigmentation score correlates well with the loss of striatal tyrosine hydroxylase reactivity. For the subjects in this study, depigmentation pathology was assessed according to Beach et al., 2009 [46]. No differentially expressed miRNAs were detected from comparing moderate and severe depigmentation in samples with Limbic type Lewy body progression. The spread of Lewy bodies and Lewy neurites from the brainstem to the cerebral cortex is one of the best correlations of PD progression to PD with dementia (PDD) [46,47,48]. Olfactory bulb and tract, brainstem IX-X, brainstem (locus coeruleus), brainstem (substantia nigra), amygdala, transentorhinal, anterior cingulate gyrus and neocortex (temporal, frontal and parietal) were assessed via histopathology to calculate the Lewy-related density scores for aggregate formation with all immunoreactive features in the regions noted (the antibody used was against phosphorylated a -synuclein) [46]. Neuronal perikaryal cytoplasmic staining, neurites and puncta are all considered together, using the templates provided by the Dementia with Lewy Bodies Consortium [49]. Scores are binned from 0-2, 0 being no Lewy body detection to 2 being the highest (neocortical type). Upon filtering, OLR analysis consisted of the following number of subjects in each subgroup: no Lewy bodies (CSF: n = 126; SER: n = 113), Limbic type (CSF: n = 30; SER: n = 23) and Neocortical type (CSF: n = 21; SER: n = 20). Total of 12 miRNAs in CSF and 10 in SER were reported as best singular predictor models of Lewy body stage progression ( Table 7). Normalized read counts for miR34a-5p and miR-374a-5p are displayed in Figure 5. Interestingly, our OLR results indicate that miR-132 expression monotonically decreases in CSF as Lewy body pathology advances-findings concurrent with decreased expression levels of miR-132 in PD samples compared to controls (Table1; Table 7).

miRNA expression, potential markers of cognition
Thirty-four miRNAs had significant differential expression in serum samples when comparing PD patients with PD with a clinical diagnosis of dementia (PDD) ( Table S3). We were interested to know whether or not these same PDD miRNAs were significantly different in our serum data from AD patients compared to normal controls. We found that 3 out of the 34 miRNAs had significantly altered expression in AD subjects as well ( Table 8). Sample size for serum consisted of PD (n = 32), PDD (n = 18), AD (n = 53) and Control (n = 62) subjects. Results were filtered at corrected p-value ,0.05, and the logarithmic base 2 fold change (FC) is relative to the first listed group for each comparison.
One of the differentially expressed miRNAs, miR-34c, was previously identified to be highly expressed in the hippocampus of patients with AD and in animal models of AD [50]. The same group linked miR-34c as a negative regulator of memory consolidation [50]. Interestingly, our data examining miRNAs differentially expressed in the progression of Lewy bodies from limbic to neocortical, also identified miR-34c and 34b as significantly altered. While we identified miRNAs detectible in blood (serum) that have the potential to indicate cognitive impairment, CSF had revealed only 11 significant differentially expressed miRNAs and no overlap with the AD and Control CSF analysis.

Discussion
These data represent one of the largest data sets to date, examining the miRNAs detectable in cell-free biofluids from  patients with neurodegenerative disease, and the first to use NGS to compare the profiles from CSF and SER. We were able to detect differentially expressed miRNAs in CSF and SER, many of them previously identified to be misregulated in patient tissue samples. Interestingly, there was minimal overlap between the miRNAs identified in CSF with the miRNAs identified in SER. Further temporal investigation in a living cohort will be necessary to determine which biofluid will be most reliable for early detection of disease and predictive of disease progression. These data are an important first step, comparing the biofluid profiles with one another and with known miRNAs deregulated in brain tissue. The examination of miRNA changes associated with the severity of disease pathology can also provide important insight about how to interpret miRNA changes as diagnostic and prognostic indicators of disease.

miRNAs of particular interest
Many of the miRNAs we were able to detect as differentially expressed in cell-free CSF and SER have been reported previously in studies examining brain tissue from patients with AD and PD [22,24,26,29,51,52,53,54], or as miRNAs that target genes of particular interest-such as APP, BACE1 and a-synuclein [54,55,56]. For example, 73% of the miRNAs we identified to be differentially expressed in AD patient CSF compared with control subject CSF were previously found to be deregulated in AD brain tissue or target known AD-related mRNAs. We selected a few of the differentially expressed miRNAs for further discussion.
We found miRNA-9 to be downregulated in CSF from AD patients when compared to levels in CSF from control subjects. miRNA-9 expression levels change across Braak stages and neurofibrillary tangle advancement in CSF, decreasing with Alzheimer's disease progression. To date, several studies demonstrate the altered expression of miR-9 in AD brains [22,23,55,57]. The gene coding for neurofilament H is among the miR-9 targets potentially involved in AD [58]. This protein has previously been shown to be upregulated in disease conditions and can be isolated from NFTs along with Tau and other cytoskeleton proteins [58,59,60,61]. These observations correlate with the decrease in miR-9 levels we observed with tangle severity. In addition, miR-9 has been shown to be downregulated in response to Ab treatment in primary neurons, suggesting that miR-9 downregulation could be a consequence of the disease pathogenesis that results in neurofilament-H upregulation [2]. However, miR-9 also targets Sirtuin (SIRT1), a de-acetylase with reduced expression in AD brains [62,63]. In contrast to neurofilament H, decreased SIRT1 levels would indicate a potential increase in miR-9, or the increase of another miRNA targeting SIRT1. Interestingly, SIRT1 can also be regulated by miR-34c (below).
miR-34c was found in our study to be upregulated in PDD patients compared with PD patients and in AD patients compared to control subjects. Zovoilis et al. [50] found high levels of miR-34c in hippocampus of AD patients and in animal models of AD. They observed that, when miR-34c is elevated, memory consolidation is impaired. When miR-34c is targeted for removal, learning and memory is restored. One of the mRNA targets for miR-34c is SIRT1, involved in synaptic plasticity and memory formation [64]. The authors confirmed that elevated miR-34c correlated with decrease in SIRT1 in tissue samples. The authors did not look for the expression of any miRNAs in AD patient blood samples, nor did they examine PD or PDD patients. The hypothesis that elevated levels of miR-34c is related to cognitive decline holds true in our data from patient serum samples. There is approximately a 2.1-log2 fold increase in miR-34c in PDD patient serum compared with PD patients and a 1.6-log2 fold increase in miR-34c in AD patient serum compared with normal control subjects.
miR-34b/c is also associated with PD. Levels of miR-34b/c are decreased by 40-65% in amygdala, substantia nigra, cerebellum and frontal cortex of PD patients [33]. Additionally, knock-down of miR-34b/c in differentiated SH-SY5Y neuroblastoma cells resulted in a decrease in parkin and DJ-1 (encoded by PARK7) concentrations that led to a disturbance of mitochondria function and decrease in viability of the cell [65]. DJ-1 can be involved in regulation of apoptosis; it can also act as a redox chaperone inhibiting the aggregation of a -synuclein [66]. Cell death associated with altered mitochondrial activity and oxidative stress are recognized biochemical abnormalities associated with PD. It remains to be proven whether the decreased expression of these miRNAs is due to their specific down-regulation in surviving neurons or secondary to neuron degeneration.
miR-101 was decreased in CSF, and correlated with increases in neurofibrillary tangles and plaque density. Several independent studies showed that miR-101 was downregulated in human AD cortex [23,57,67]. Cyclooxydenase-2 (COX-2) and APP are known miR-101 targets implicated in AD [15]. COX-2 is involved in the inflammatory response, associated with neuronal loss, colocalizes with NFTs, and is deregulated in the AD brain [15,67]. It is possible that miR-101 down-regulation might contribute significantly to AD pathology by: 1) increasing APP expression; 2) promoting NFT formation through the increase in Tau phosphorylation; 3) contributing to inflammation through the upregulation of COX-2 expression.
Expression of miR-132 has been previously described as required for neuron morphogenesis and function, whereas significant down-regulation in miR-132 expression has been associated with a -synuclein accumulation and neuronal malfunction in a -synuclein (A30P)-transgenic mice [68,69]. Yang et al. demonstrated through bioinformatics prediction, luciferase-reporter assay, and Western blot analysis that miR-132 could directly regulate expression of Nurr1, a critical transcription factor for Sample size for serum consisted of PD (n = 322), PDD (n = 188), AD (n = 53) and Control (n = 62) subjects. Results were filtered at corrected p-value ,0.05. The logarithmic base 2 fold change (FC) is relative to the first listed group for each comparison. P-Values are adjusted for multiple corrections. doi:10.1371/journal.pone.0094839.t008 midbrain dopamine neuron development and differentiation [70]. Additionally, Yang et al. showed that inhibition of endogenous miR-132 significantly increases differentiation of dopamine neurons, whereas prolific expression of miR-132 in embryonic stem cells dramatically represses dopamine neuron differentiation with no effect on the total number of neurons [70]. As a potential regulator of methyl-CpG-binding protein, an important component of neurodevelopment and neurodegeneration, miR-132 is a prospective molecule of interest in PD diagnosis and treatment [70].

Conclusion
One of the first decisions most researchers studying markers of neurodegeneration must consider before they begin a project is what tissue or biofluid to profile. We provide a comprehensive examination of miRNAs detected in CSF and blood from the same patients and a comparison to reported miRNAs deregulated in brain tissue from AD and PD. In living patients, accessible tissue samples are limited. Among readily available biofluids, we can examine urine, saliva, and serum; CSF is more difficult to obtain. Although recently saliva and salivary gland biopsies have been shown to contain potential markers of PD, the utility of urine and saliva samples for profiling neurodegenerative disease or central nervous system damage still needs further examination [71]. For this study, we concentrated our analysis on CSF and serum from blood. CSF is in close proximity to the diseased tissue, but is often difficult to obtain from subjects. Blood is easier to acquire, but may not reliably reflect changes associated with neurodegeneration. When we compared the miRNA profiles from the two biofluids, we found that miRNAs detected in CSF cluster patients slightly more effectively than miRNAs detected in SER (Figures S1, S2). However, depending on individual analyses, there appeared to be benefits to both biofluids. For example, 73% of the deregulated miRNAs identified in our CSF data from AD patients were previously reported. However, comparison of miRNAs that overlap between PD with PDD and AD with cognitively normal controls revealed changes only in SER samples.
There are many more studies and data available for miRNA deregulation in association with AD than with PD. We found deregulated miRNAs associated with both diseases and present in both CSF and SER biofluids; interestingly, there were consistently fewer miRNAs associated with PD in each of the analyses we performed. There are several reasons why this may be the case: 1) patients with AD have significant ongoing spread of the disease from one brain region to another with severe plaque deposition and tangle pathologies. Perhaps these pathologies are more significant drivers of miRNA deregulation and detection, 2) patients with PD display mild to moderate plaque and tangle pathology in addition to Lewy bodies, leading to potentially fewer detected miRNAs specifically indicative of the disease, and 3) by the time of death, the destruction of several of the specific brain regions and cell types associated with PD (substantia nigra and striatum), have already occurred. PD patients begin to experience symptoms upon the loss of 50-60% of dopaminergic neurons within the substantia nigra, and severe depletion of dopamine in the striatum [1,72]. This may contribute significantly to a reduction in detectable disease-related miRNAs late in the disease.
We will continue to evaluate many of the miRNAs identified in this paper using additional methods and samples. We will use qRT-PCR as an additional assay for validation of differentially expressed miRNAs as well as sequencing to validate the presence of miRNAs in SER from patients living with the disease, early in their diagnosis. We will also examine the possible enrichment of specific miRNAs within microvesicles or associated with extracellular RNA-binding proteins. Ultimately validation of these miRNAs in larger patient cohorts will enable the research community to identify the critical miRNA biomarkers that are most clearly associated with specific neurodegenerative disorders, stage and severity of disease.

Samples and patient data
Ethics Statement -All subjects were enrolled in the Banner Sun Health Research Institute (BSHRI) Brain and Body Donation Program as a whole-body donor and had previously signed informed consent approved by the BSHRI Institutional Review Board (IRB). The TGen Office of Research Compliance approved the use of the banked postmortem samples for this study. We obtained the following three groups of samples that were used for this study: AD (n = 67 CSF and n = 64 SER), PD (n = 65 CSF and n = 60 SER), and control (n = 70 CSF and n = 72 SER) from the Sun Health Research Institute, Sun City AZ. Verification of the diagnosis using neuropathology evaluations was completed and reported for all samples. A comprehensive overview of the cohort and data collected is included in Table S1. Figure S1 displays no significant source of variation in samples due to age, gender, or postmortem interval (PMI).

RNA isolation and sequencing
Total RNA was isolated from 1ml of CSF and 1ml of SER from each subject as described in Burgos et al., 2013 [17]. Briefly, the miRVana PARIS kit (Invitrogen) was used with a modified protocol to extract total RNA and maximize miRNA yield. The Illumina TruSeq Small RNA sequencing kit was used for library preparation as previously described [17]. The samples were given individual barcodes up to 48, pooled and loaded on seven lanes of the Illumina HiSeq2000 with one lane of the flowcell used as a control for calculating phasing throughout the run. Each sample was often sequenced on two different flowcells to maximize reads mapped to mature miRNA sequences in miRBase.

Post-sequencing analysis pipeline
Sequencing data generated by Illumina HiSeq2000 was preprocessed as previously described in Metpally et al., 2013 [44] and aligned to the reference with miRDeep2 software [45]. The sequencing data was processed and de-multiplexed using Illumina's CASAVA (v1.8) pipeline. Quality control checks on raw fastq reads generated by CASAVA were preformed by FastQC software. The FASTX toolkit was used for fastq pre-alignment processing, including adapter clipping and read collapsing, for better mapping results. Illumina three prime adapter sequences were removed by the fastx_clipper tool. Clipped reads were used as an input argument for miRDeep2 alignment software.
The processing of sequencing data using miRDeep2 consists of three modules. The Mapper module preforms read preprocessing and alignment to the reference genome. Once aligned, the miRDeep2 module excises genomic regions covered by the sequencing data in order to identify probable secondary RNA structure. Plausible miRNA precursors are evaluated and scored based on their likelihood of being true events. The Quantifier module produces a scored list of known and novel miRNAs with quantification and expression profiling. We used default parameters suggested by the creators of the tool and allowed one single nucleotide variation (SNV). The csv files from miRDeep2 were used for further analysis. All sequencing associated with the samples can be found with accession phs000727.v1.p1 in dbGaP.

Normalization and quality control
The miRNA read counts identified by miRDeep2 were normalized using DESeq2 normalization method to account for compositional bias in sequenced libraries and library size. Assuming typical DESeq2 data frame, the method consists of computing a size factor for each sample as the median ratio of the read count over the corresponding row geometric average [73]. Raw counts were then divided by the size factor associated with their sample [73]. Under DESeq2 normalization hypothesis, most genes are not differentially expressed (DE), leading to a ratio of 1. Therefore, the size factor for the sample is an estimate of the correction factor that needs to be applied to all read counts of the corresponding column in order to make samples comparable.
Quality control of miRNA expression data consisted of filtering both samples and miRNAs. Samples with total sum of mapped read counts lower than 100,000 for CSF and 60,000 for SER were removed. Thresholds were determined based on the distribution of the total counts for all samples. Additionally, miRNAs with average less than 5 counts were not considered for further analysis.

Differential expression
Differential expression of miRNA read counts was performed using DESeq2 (v2.1.0.19) package [74]. Three groups were considered for paired analysis from CSF data: i) Control and Alzheimer's subjects, ii) Control and Parkinson's subjects, and iii) Alzheimer's and Parkinson's subjects. Similarly, three groups were considered for paired analysis from SER data: i) Control and Alzheimer's subjects, ii) Control and Parkinson's subjects, and iii) Alzheimer's and Parkinson's subjects. DESeq2 method is based on negative binomial distribution (NB), with custom fit for variancemean dependence [74]. Upon normalization, dispersion is estimated by local regression for gamma-family generalized linear models, providing basis for inference. Sum of all replicates for gene i corresponding to conditions A and B, C iA and C iB , are evaluated as NB-distributed with moments as estimated and fitted. The p value of a pair of observed count sums (c iA ,c iB ) is then the sum of all probabilities less or equal to p(c iA ,c iB ), conditioned on c iA zc iB [74]. We report differentially expressed miRNA with fold change 0.7, FC(log2) or FC(log2) ,20.7 significant at adjusted p-value ,0.05.

Regression analysis
To take advantage of the ordinal nature of regional and timedepended characteristics present in AD and PD pathology, we implemented ordinal logistic regression (OLR) in order to detect miRNAs with monotonic expression patterns. The ordinal logistic model assumes the presence of a covert continuous predictor variable and ordinal outcome that arises from discretization of the underlying continuum into j-ordered groups such that j = [1…J] [75]. Analysis of ordered categorical data was executed via cumulative link models (CLMs). Ordinal response variable Y i then follows multinomial distribution with probability p ij that the ith observation falls in response cathegory j. Ordinal logit considers the probability of a single event and all events that are ordered before it, hence incorporating ordered nature of the dependent variable in the fit [75]. With cumulative probabilities set y ij~P (Y i ƒj)~p i1 z:::zp ij , cumulative logits which incorporate the logit link are defined as: Let X i be a vector of explanatory variables, b the corresponding set of regression parameters, and a j provides each cumulative logit its unique intercept value. Then, cumulative logit model is a regression model for cumulative logits defined as: Four well described signatures of AD and PD pathology were binned into ordinal categories and considered as OLR outcome variables: i) Braak neurofibrillary stages, ii) neurofibrillary tangle scores, iii) plaque-density scores and iv) synuclein/Lewy body stages. Neuropathology examination disclosed total Braak stages (1-6), neurofibrillary tangle neurofibrillary tangle (0-15), plaquedensity scores (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15) and Lewy body stages (no Lewy bodies; Limbic type; Neocortical type). For convenience, we binned the neurofibrillary tangle and plaque-density scores for each subject into three ordinal categories, in increasing increaments. The events of interest correspond to low neurofibrillary tangles score (0-4), moderate neurofibrillary tangles score (5-9) and high neurofibrillary tangles score (10)(11)(12)(13)(14)(15). Similarly, for plaque-density data three groups correspond to low plaque-density score (1-5), moderate plaque-density score (6-10), and high plaque-density score (11)(12)(13)(14)(15). Lastly, synuclein/Lewy body stage was divided into ordinal outcome variables as defined by the Unified Staging System for Lewy Body Disorders corresponding to lowest progression (no Lewy bodies), moderate progression (Limbic type) and advanced progression (Neocortical type) [46]. The OLR method was used to model relationship between the ordinal outcome variables and explanatory predictor variable, namely normalized miRNA counts, using the R package ordinal. The Logit build-in link function was used to determine factors assoicated with Braak, neurofibrillary tangle and plaque density stages. The cummulative link model assumes that thresholds are constant for all values of the explanatory variables. For reported miRNAs, the graphical method for assessing the parallel slopes assumption was used to check ordinal logit requirments. A modified Newton algorithm was used to optimize the likelihood function. The condition number of the Hessian did not indicate a problem with any of the models corresponding to reported miRNAs. Parameter confidence intervals were based on the profile likelihood function, and the estimates in the output are given in units of ordered log odds.
In addition to the usual hypothesis-testing approach, we decided to estimate the effect of a certain variable on the response outcome and its precision. The objective of the model selection analysis is to evaluate whether the effect of the possible predictor is sufficiently important, and as such, determine if it possible to make predictions based on a regression model that includes it as a parameter. Akaike Information Criterion is a particularly useful information theory approach for model selection when a number of variables are believed to have an effect on a process or a pattern.
For the same dataset with the same response variable, the ''best'' model is the one that minimizes the Kullback-Leibler value, or the information loss when approximating a real process [76]. In order to minimize the expected Kullback-Leibler information, it is necessary to maximize E y E x ½log (g(xj ĥ(y))) for a collection of admissible models, where g is the approximated model in terms of a probability distribution, y is the random sample from the density function f(y) for the unknown real process f, and ĥ is the maximum likelihood estimate based on the model g and data y [76]. Approximately unbiased maximum likelihood estimate of E y E x ½log (g(xj ĥ(y))) for a large sample corresponds to AIC~{2 log f( ĥ(y))z2k, where k is the number of estimated parameters included in the model and log f( ĥ(y)) is the loglikelihood of the model given the data, which reflects the overall fit of the model [77]. Essentially, AIC provides an indication of which model would best approximate reality, in terms of minimizing the loss of information, as well as gives a measure of strength of evidence for each model.
For the acquired data, we tested a series of plausible models. The global model, defined as the most complex model considered, was constructed as a set of variables suspected of having an effect on the outcome variable (OLR, uncorrected p-value ,0.05, parameter estimate 95% confidence interval did not include zero). Fit of the global model was assessed first. In case of a fit, simpler models, originating from the global model, were compared based on the weight of evidence that model i is the best approximation of the true mathematical model given the data and the set of considered candidates [78]. The value of the AIC has no important meaning unless compared to AIC of a series of alternate models. Note that a small Kullback-Leibler information discrepancy in a model corresponds to a small AIC value for the same model. The AIC differences, D i , quantify the information loss when one of the fitted models is used instead of the best approximating model. In general, 0ƒD i ƒ2 suggests substantial evidence for the model, 3ƒD i ƒ7 indicates the model has considerably less support, whereas D i w10 signifies that the model is very unlikely due to essentially no support [78]. We considered predictor variables significant at unadjusted pvalue ,0.05 and D i ƒ10. Figure S1 Consensus clustering of CSF and SER data. Consensus clustering conjoint with resampling techniques constructs the consensus across multiple runs of a clustering algorithm, determines the number of clusters in the data, and assesses the stability of the generated clusters. Consensus matrices for agglomerative hierarchical clustering upon 1-Pearson correlation distances with 80% item and miRNA resampling was established from log-transformed normalized counts (AD, PD and control combined). Empirical cumulative distribution (CDF) corresponding to the consensus matrices k = {2 (pink), 3 (yellow), 4 (blue), 5 (purple)} was plotted in order to establish stability of the subsequent consensus matrices. Perfect agreement between consensus matrix entries translates into an ideal step function with little shape distortion as k approaches positive infinity. Due to the unimodal nature of the SER consensus matrix histogram, CSF data seems to demonstrate more stable clustering for the first five relevant clusters. (TIF) Figure S2 Distribution of Silhouette scores for the first 15 clusters in CSF and SER data. Silhouettes quantify how well a data point assigned to a cluster was classified according to both tightness of the clusters and the separation between them. Quality of the cluster assignment, as indicated by the average silhouette score, ranges for 1.0 for unequivocal cluster assignment down to 21.0 for arbitrary assignment. Unsupervised agglomer-ative hierarchical clustering of CSF and SER data (AD, PD and controls combined) was preformed and average silhouette score was estimated for each cluster. Despite the relatively low silhouette scores, CSF data seems to be more appropriately clustered than SER data, with tighter, more separated clusters. (TIF)

Supporting Information
Table S1 Demographic information from study subjects. The samples are color-coded; blue = subjects with Alzheimer's disease, red = subjects with Parkinson's disease, yellow = control subjects. Subject IDs correspond to the data entered into dbGaP, gender (  Table S3 miRNAs with significant differential expression in serum samples when comparing PD patients and PD with a clinical diagnosis of dementia (PDD).