miRNA Signatures in Sera of Patients with Active Pulmonary Tuberculosis

Several studies showed that assessing levels of specific circulating microRNAs (miRNAs) is a non-invasive, rapid, and accurate method for diagnosing diseases or detecting alterations in physiological conditions. We aimed to identify a serum miRNA signature to be used for the diagnosis of tuberculosis (TB). To account for variations due to the genetic makeup, we enrolled adults from two study settings in Europe and Africa. The following categories of subjects were considered: healthy (H), active pulmonary TB (PTB), active pulmonary TB, HIV co-infected (PTB/HIV), latent TB infection (LTBI), other pulmonary infections (OPI), and active extra-pulmonary TB (EPTB). Sera from 10 subjects of the same category were pooled and, after total RNA extraction, screened for miRNA levels by TaqMan low-density arrays. After identification of “relevant miRNAs”, we refined the serum miRNA signature discriminating between H and PTB on individual subjects. Signatures were analyzed for their diagnostic performances using a multivariate logistic model and a Relevance Vector Machine (RVM) model. A leave-one-out-cross-validation (LOOCV) approach was adopted for assessing how both models could perform in practice. The analysis on pooled specimens identified selected miRNAs as discriminatory for the categories analyzed. On individual serum samples, we showed that 15 miRNAs serve as signature for H and PTB categories with a diagnostic accuracy of 82% (CI 70.2–90.0), and 77% (CI 64.2–85.9) in a RVM and a logistic classification model, respectively. Considering the different ethnicity, by selecting the specific signature for the European group (10 miRNAs) the diagnostic accuracy increased up to 83% (CI 68.1–92.1), and 81% (65.0–90.3), respectively. The African-specific signature (12 miRNAs) increased the diagnostic accuracy up to 95% (CI 76.4–99.1), and 100% (83.9–100.0), respectively. Serum miRNA signatures represent an interesting source of biomarkers for TB disease with the potential to discriminate between PTB and LTBI, but also among the other categories.


Introduction
Tuberculosis (TB) remains one of the most relevant infectious diseases with nearly 9 million cases and 1.4 million deaths per year worldwide [1]. Due to the complexity of the clinical presentations of the infection caused by members of the Mycobacterium tuberculosis complex (latent asymptomatic infection, active pulmonary and/or extra-pulmonary disease), accurate classification of cases is essential to address the most appropriate clinical management [2][3][4].
Current standards for TB diagnosis, including the most sensitive molecular tests, rely on the detection of the pathogen, thus being dependent on the bacterial load in the specimen analyzed; indeed, diagnosing TB in children is a difficult task because the mycobacterial load is often low [5][6][7]. Similarly, extra-pulmonary TB (EPTB) cases are often challenging to diagnose due to the difficulties in obtaining samples for microbiological investigations, and are affected by unpredictable distribution of bacteria in tissues. Therefore, EPTB and smear-negative pulmonary TB (PTB) are usually diagnosed ex juvantibus. Finally, despite the importance of utterly discriminating between latent TB infection (LTBI) and active TB, clear-cut biological markers separating the two conditions are not yet available [8][9][10]. Biomarkers and surrogate endpoints are therefore crucial tools for the development of innovative strategies for TB management [11].
The development of diagnostic tests based on host biomarkers is advocated to clinically categorize paucibacillary or not microbiologically-confirmed TB cases, and for proper identification of LTBI cases [11,12]. The need for biomarkers extends beyond the urgency for improved diagnostic tools: the assessment of the disease status and of the risk of progression to active disease or the assessment of treatment success early during therapy are critical bottlenecks in the development of new vaccines and drugs, and could allow categorization of therapy based on individual risk of unfavorable outcome.
The ideal biomarker should be a stable molecule, and should be in sufficient amounts for easy detection in accessible body fluids [9,13]. The discovery that human microRNAs (miRNAs) expression is frequently altered in various diseases has uncovered a new repertoire of molecular factors, which warrants investigation to further elucidate their role in physiology and disease [14]. In the cell, miRNAs post-transcriptionally regulate the expression level of target genes [15]. The miRNA highly-conserved regulatory system does not remain confined to the intracellular compartment: they can be transferred via body fluids (including plasma, serum, urine, and saliva), thus modulating translational responses by intercellular communication [16][17][18][19].
miRNAs could represent an ideal biomarker, owing to the sampling easiness, the inherent stability and the resilience [19,20]. However, their characterization during viral or bacterial infection has raised interest only recently, and data on TB remain limited to few studies [21][22][23].
This research aimed at identifying miRNA profiles associated with the different phases of M. tuberculosis infection (PTB, EPTB, LTBI), and non-tubercular lung infections as well as in healthy condition and showing their use as specific signatures for PTB diagnosis.

Ethical Statements
The protocol of the study was approved by the Ethical Committee of the San Raffaele Scientific Institute, Milano, Italy (GO/URC/ER/mm prot. N. 82/DG) and of the participating institutions in Uganda and Tanzania. The study was conducted in full compliance with the principles of the Declaration of Helsinki. All samples were collected from individuals who had signed an informed consent form for the purpose of the study and for cryopreservation of their biological samples.

Study Population
The following case definitions were adopted to categorize individuals enrolled in the study:  All subjects included in the study underwent the following procedures: -phlebotomy through a 21G butterfly device to minimize hemolysis during specimen collection. Additive-free blood collection tubes were chosen to minimize unwanted modification of miRNA content in the serum; -collection of clinically relevant data to determine TB status (with relevant confirmatory exams), and HIV status. A questionnaire to capture any additional information such as: pregnancy, smoking, current medical problems (diabetes, transplant, silicosis, sarcoidosis, cancer), current therapies with particular focus on immunosuppressive, antiretroviral and anti-TB ones was compiled for all patients during enrolment.
All information was stored in an electronic data-protection system.
Enrolment and exclusion criteria of the study population are summarized in Figure 1.

Serum Preparation
Within 4 hours from time of phlebotomy, after coagulation, tubes were centrifuged at 2500 rpm for 10 minutes, with separation of serum from corpuscular fractions. Subsequently, serum fraction was transferred -in sterile environment -into 15 mL tubes and underwent a second centrifugation at 2800 rpm for 10 minutes, in order to achieve maximum removal of cellular component. Serum was hence transferred in aliquot of 1 mL into cryogenic vials and stored at 280uC. For RNA extraction, sera were thawed on ice and the degree of hemolysis was determined through spectrometry analysis of free hemoglobin as previously described [24]. A cut-off value of hemoglobin concentration of .10 mg/dL was considered for hemolyzed samples.

Serum Pools
Each pool for qRT-PCR analysis was composed of 10 nonsmoker subjects from the same category, equally distributed by gender. Selected subjects were free of co-morbidities (as for the pathologies investigated during the enrolment interview) to avoid confounding effects on miRNA profiles. A serum aliquot from each subject was thawed on ice, and 500 mL of serum from each sample were mixed together in order to obtain a homogeneous pooled serum sample. One mL of pooled serum was used for RNA extraction and subsequent miRNA analysis in duplicate.

Serum from Single Individuals
To refine the results obtained from pooled sera, we performed individual serum analysis on 18 subjects belonging to the PTB and  H categories from the TBnew group, and on 10 subjects from the TB CHILD group. As for pooled sera, individuals were selected free of co-morbidities among non smokers. A serum aliquot from each subject was thawed on ice, and 1 mL of serum from each individual was used for RNA extraction and subsequent miRNA analysis.

RNA Extraction
RNA extraction was performed using the mirVana miRNA isolation kit (Life Technologies) according to the manufacturer's instructions for isolating total RNA. RNA samples were stored at 280uC until use.  [25]. We first preprocessed raw C t values by means of quantile normalization, as described elsewhere [26][27][28]. This widely used approach is based on the assumption that only few miRNAs are differentially expressed. As a general result, this method provides homogenous data with the same distribution and the correlation coefficient between observations increases compared to raw data. Normalized data distribution was graphically inspected.

Statistical Analysis
Results from pools and individuals were analyzed separately. Pools. We performed one-to-one category comparisons between mean C t values fitting a constrained regression model with MM robust estimators [29,30]. These robust estimates have a high breakdown-point and are not affected by the presence of outliers or differently expressed miRNAs. We then computed the Empirical Distribution Function of residuals and filtered miRNAs associated with residuals outside the Inter Quartile range (i.e. outside the 1 st quartile -3 rd quartile interval) of the residuals distribution and we defined them as ''interesting miRNAs'' or ''relevant miRNAs''. Circular visualization of data was made by Circos software [31].
Individuals. As our first step we filtered out miRNAs detected C t ,35 in at least $80% of subjects of at least one of the categories considered (H in TBnew, PTB in TBnew, H in TB CHILD, and PTB in TB CHILD). For these filtered out miRNAs we performed a two ways ANOVA for health status and genetic makeup (defined as for country of birth). P-values were computed non-parametrically by means of permutations [32]. We checked for False Discovery Rate (FDR) with the method described by Benjamini and Yekutieli [33]. miRNAs showing both (i) an adjusted p-value (p-adj) ,0.05 on individuals and (ii) relevant by pooled specimens analysis were considered for miRNA signature definition.
Performances of the signature. To assess the single miRNA performance in identifying health status a Receiver Operating Characteristic (ROC) curve based on kernel density distributions method fit as described in [34]. As overall measures of the performance in distinguish cases, the associated Area under the curve (AUC) was calculated and the p-values computed by means of permutations.
To assess and compare diagnostic performances of the miRNA signature identified, we fitted a multivariate logistic model selected by maximizing the Akaike Information Criteria (AIC) and a Relevance Vector Machine (RVM) model [35][36][37]. In contrast with Support Vector Machine, RVM follow a Bayesian approach giving a posteriori probability of the class. This makes the results from the two approaches more directly comparable. ROC curve and associated AUC were also computed for the logistic model.
A leave-one-out-cross-validation (LOOCV) approach was adopted for assessing how the results of both the RVM and AIC logistic regression predictive models would perform in practice. Performances were summarized in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and diagnostic accuracy.

Study Population
A total of 311 subjects (159 males, 51.2%) were enrolled within the TBnew group. As summarized in Figure 2  Eight pools from the TBnew group, and 11 pools from the TB CHILD group were considered for the analysis. We used a subset of individuals to refine the miRNA signatures identified on pooled specimens. Briefly, we performed individual serum analysis for 36 and 20 subjects for the TBnew, and the TB CHILD group, respectively (details are reported in Table S1).

Normalization of qPCR Data
The TLDA is a 384-well microfluidic card containing dried TaqManH primers and probes. Array A focuses on more highly characterized miRNAs while array B contains many of the more recently discovered miRNAs along with the miR* sequences. The use of two panels (array A and array B) enables quantitation of gene expression levels of up to 671 different miRNAs. This is  accomplished by loading the cDNA product onto the array for PCR amplification and real-time analysis. MegaPlex Pools are designed to detect and quantitate up to 380 microRNAs (miRNAs) per pool in human species thanks to a set of stem-looped reverse transcription primers (MegaPlex RT Primers) that enable the simultaneous synthesis of cDNA and a set of miRNA-specific forward and reverse primers (MegaPlex PreAmp Primers) intended for use with very small quantities of starting material. The primers enable the unbiased preamplification of the miRNA cDNA target by PCR prior to loading the TaqManH MicroRNA Array.
After the quantile normalization procedure, the C t values of four miRNAs (ath-miR159a, MammU6, RNU44, and RNU48) detected by both array A and array B were compared. As showed in Table S2, C t values were consistent between ''poolsindividuals'', ''array A -array B'', and ''TBnew -TB CHILD''. Normalized data from pooled and individual specimens are reported in Table S3.

Analysis of Serum miRNA Profiles in Pooled Samples
Normalized qPCR data from pools showed 277 miRNAs undetectable in the categories (H, PTB, LTBI, OPI, EPTB, and PTB/HIV) from both groups (TBnew, and TB CHILD).
The mean C t value for each miRNA was calculated and a oneto-one comparison between different categories was carried out. Residual values are available in Table S4. Figure 3 summarizes the number of miRNAs outside the 1 st and 3 rd quantile tails of the distribution of the residuals obtained by comparing two categories. The two percentiles considered, should most probably contain the miRNAs that are significantly different between the two compared categories. According to this qualitative analysis based on the distribution of the residuals, between 120 and 172 serum miRNAs could allow to discriminate among the categories considered in this study. For example, 134 miRNAs showed to be relevant in differentiating LTBI and PTB, whereas 132 miRNAs would allow discriminating between PTB and OPI. After filtering the pooled specimen results according to clinically relevant categories, we identified putative serum miRNA signatures defining MTB infection, active TB, pulmonary disease, or any of the disease statuses considered in this study (Figure 4).
To optimize our approach, for the PTB and H categories, we performed the analysis of individual sera.

Analysis of Serum miRNA Profiles in Individuals
Serum miRNA profiles from 18 H, and 18 PTB from TBnew group as well as 10 H and 10 PTB from the TB CHILD group were analyzed.

Pools vs Individuals
Out of the 20 miRNAs showing a significant p-adj in the study on individuals (Table 1), 16 (80%) had already been identified as ''relevant miRNA'' in the analysis of pooled specimens ( Figure 5). Among those miRNAs, nine showed differences between H and PTB pools in each group; four showed differences only in TB CHILD pooled specimens, whereas three only in the pools from TBnew group. Four miRNAs with a significant p-adj in the study on individuals had not been detected as relevant by the first screening on pooled specimens. Figure 5 shows that the analysis on pools correctly excluded 429 out of 439 ''not relevant'' miRNAs from further investigations.

Discussion
In TB biomarker research, published studies on highly multiplexed assays focus on proteomics, gene expression, transcriptomics, and miRNAs, as predictors of disease, disease recurrence or drug resistant infection [38][39][40][41][42][43][44][45][46][47]. Our study on serum miRNA signatures ascertains the value of these biomarkers for TB disease classification, as previously reported by others [21][22][23]. However, previous studies did not consider the impact of the genetic makeup on the miRNA signatures: ethnicity, together with age and gender, could influence the levels of circulating miRNAs [48]. By the comparison of two populations with different genetic makeup we showed that some population-specific miRNAs can increase the diagnostic accuracy for active TB.
While much research is still focused on assessing the quality of single biomarkers, there is an emerging interest in panels of biomarkers composed of multiple candidate targets which are neither specific nor sensitive when used as single tests, but which show a good performance when used in combination. In this study we used both RVM and logistic regression methods supported by a LOOCV approach to evaluate the diagnostic performances of serum miRNAs identified as a signature rather than as single biomarkers. The presence of miRNAs detectable in at least the 80% of one category indicates a higher diagnostic index; moreover, our filtering approach did not introduce biases in detecting potential on/off miRNAs between categories. Indeed, whenever a serum sample is tested for miRNA, its diagnostic relevance will be attested by its level rather than by their presence/ absence in H and PTB categories.
To ascertain if a serum miRNA signature could discriminate between different categories of patients, we used a restrictive stratification screening approach to minimize the number of possible biases. Despite subjects within the H category showed a mean age lower than the other categories, we consider this to be only a minor drawback of the study. Particular attention should be reserved to specific subgroups (e.g. childhood and elderly) where age would expect to have a bigger impact. Another possible drawback could be the heterogeneity in terms of genetic background (estimated on the basis of the country of birth) in the TBnew group. This could partially explain lower performances of the miRNA signature on this population. The results on pooled specimens identified differences in serum miRNA profiles in the categories analyzed. Interestingly, according to our qualitative analysis based on the Empirical Distribution Function of residuals, serum miRNAs would allow not only to discriminate LTBI and PTB, but also PTB from OPI and EPTB. Indeed, the 134 miRNAs showing relevant differences in serum level of LTBI and PTB subjects include a smaller subset of miRNAs that could be used as specific signature to discriminate between these two categories. A similar approach could be applied for the 132 miRNAs showing relevant differences in serum levels of PTB and OPI subjects and for the 124 miRNAs differentiating PTB and EPTB subjects. Further studies on cohorts of LTBI, OPI and EPTB individuals will allow to identify specific miRNA patterns and to evaluate their diagnostic accuracy. The discriminatory power of serum miRNAs observed is further supported by the fact that the same findings have been confirmed in the two different groups (namely TBnew and TB CHILD) for the signature identified for the comparison H-PTB. Additionally, the use of pooled specimens allowed halving the number of targets to be analyzed by excluding miRNAs under the detection threshold or showing very little changes across categories.
To refine our findings, we performed serum miRNA analysis on individual sera from H and PTB subjects. Table 5 summarizes the comparison between our results and previously published studies. In the first study on circulating miRNAs as biomarkers for TB reported by Fu and colleagues [21], it was demonstrated that 92 miRNAs had significantly different levels in the sera of healthy controls vs PTB subjects: 59 miRNAs were down-regulated and 33 miRNAs were up-regulated in the serum of TB patients. One-by-one comparison is not possible due to different analytical platforms and normalization strategies, but some homologies between the study by Fu and our results can be observed. For example, miRNAs belonging the families let-7, miR-30, and miR-146 were found to be significantly different between H and PTB in both studies. miRNAs miR-590-5p, miR-185, miR-660, let-7e, miR-25, miR-146a, and miR-885-5p showed to be differentially expressed between healthy controls and PTB subject also in the study reported by Qi and colleagues [22]. miR-197 which was observed to be slightly increased in our study was also reported to be increased in sera from pulmonary TB patients by Abd-El-Fattah and colleagues [23]. However, the previous studies did not consider the genetic background of the subjects enrolled. Differently, in the present study the inclusion of groups with different genetic background allowed us to better define serum miRNA signatures associated to a different (health) status. Indeed, subjects belonging to the same status (i.e. H or PTB) showed different serum miRNA levels between the TBnew and the TB CHILD groups in pooled specimens (Table S4). Comparing individual sera from subjects belonging to the two populations we found significant differences in the level of several miRNAs (Table S5). Despite larger population-based studies are still needed, our data support the hypothesis that the genetic background could influence the specific serum miRNA profiles. Interestingly, by matching miRNA signatures from pooled specimens, individual specimens, and direction of variation (increase/ decrease) we identified 7 common discriminatory miRNAs (let-7e, miR-148a, miR-192, miR-193a-5p, miR-451, miR-590-5p, miR-885-5p) plus three miRNAs specific for the TBnew group (miR-16, miR-25, miR-365), and five miRNAs specific for the TB CHILD group (miR-146a, miR-532-5p, miR-660, miR-223*, miR-30e). The diagnostic accuracy for each single miRNA was found to be ,75% (data not shown), while better results were achieved by using the approach of ''signatures'': AUC values were above 0.90 and the use of the entire fifteen-miRNAs signature provided a diagnostic accuracy between 77% and 82% in a LOOCV approach (logistic regression and RVM, respectively). Population-specific signatures allowed to further improve classification accuracy (81-83% for TBnew, and 95-100% for TB CHILD, respectively). As mentioned before, serum miRNA signatures showed less efficiency in classifying subjects belonging to the TBnew group. Our hypothesis is that, despite the mild to moderate differences, the genetic background heterogeneity of this group is likely affecting the classification performances of the miRNA signature. The higher number of miRNAs with discrepancy variations (in terms of increase/decrease) between individual and pooled specimens provides some evidence on the heterogeneity of the TBnew population.
As for the function of extracellular miRNAs, current evidences suggest a regulatory role on the expression of target genes when taken up by recipient cells, with the peculiar capability to act on several targets at a time and to operate in a network with other extracellular/ intracellular miRNAs [17,19]. A ''hormonal'' role to extracellular miRNAs was attributed [17,19]. If this interpretation proves to be true, we could then be facing a major advancement in modern biology not only for better understanding of biological complexity, but also in terms of diagnostic and even therapeutic possibilities [49]. Target cells for extracellular miRNAs and related targeting mechanism are poorly understood, thus careful interpretation of circulating miRNA origin and function should be considered.
Here we described a serum miRNA signature discriminating H and PTB subjects. Despite promising results, several challenges in pre-analytical and analytical phases remain in the analysis of circulating miRNAs. Accurate large-cohort studies are therefore required to validate PTB-specific miRNA signatures, and to identify miRNA signatures also for LTBI, EPTB and pathologies (like pneumonia, cancer, HIV infection and sarcoidosis) often in differential diagnosis with TB. The inclusion of different sets of biomarkers (e.g. cytokines, antibodies) could also help in achieving higher discrimination power amongst the closest categories.

Supporting Information
Table S1 Details on subjects included in pooled and individual specimens analyzed in the study. H: healthy