Biomarkers for diagnosis of childhood tuberculosis: A systematic review

Introduction As studies of biomarkers of tuberculosis (TB) disease provide hope for a simple, point-of-care test, we aimed to synthesize evidence on biomarkers for diagnosis of TB in children and compare their accuracy to published target product profiles (TPP). Methods We conducted a systematic review of biomarkers for diagnosis of pulmonary TB in exclusively paediatric populations, defined as age less than 15 years. PubMed, EMBASE and Web of Science were searched for relevant publications from January 1, 2000 to November 27, 2017. Studies using mixed adult and paediatric populations or reporting biomarkers for extrapulmonary TB were excluded. Study quality was assessed using the Quality Assessment of Diagnostic Accuracy Studies—2 (QUADAS-2) framework. No meta-analysis was done because the published childhood TB biomarkers studies were mostly early stage studies and highly heterogeneous. Results The 29 studies included in this systematic review comprise 20 case-control studies, six cohort studies and three cross-sectional studies. These studies reported diverse and heterogeneous forms of biomarkers requiring different types of clinical specimen and laboratory assays. Majority of the studies (27/29 [93%]) either did not meet the criteria in at least one of the four domains of the QUADAS-2 reporting framework or the assessment was unclear. However, the diagnostic performance of biomarkers reported in 22 studies met one or both of the WHO-recommended minimal targets of 66% sensitivity and 98% specificity for a new diagnostic test for TB disease in children, and/or 90% sensitivity and 70% specificity for a triage test. Conclusion We found that majority of the biomarkers for diagnosis of TB in children are promising but will need further refining and optimization to improve their performances. As new data are emerging, stronger emphasis should be placed on improving the design, quality and general reporting of future studies investigating TB biomarkers in children.


Results
The 29 studies included in this systematic review comprise 20 case-control studies, six cohort studies and three cross-sectional studies. These studies reported diverse and heterogeneous forms of biomarkers requiring different types of clinical specimen and laboratory assays. Majority of the studies (27/29 [93%]) either did not meet the criteria in at least one of the four domains of the QUADAS-2 reporting framework or the assessment was unclear. However, the diagnostic performance of biomarkers reported in 22 studies met one or both of the WHO-recommended minimal targets of 66% sensitivity and 98% specificity for a new diagnostic test for TB disease in children, and/or 90% sensitivity and 70% specificity for a triage test. PLOS

Introduction
Childhood tuberculosis (TB) is estimated to constitute approximately 5% of the TB caseload in low TB burden countries compared to an estimated 20-40% in high-burden countries [1,2]. However, notification of TB in children and subsequently deriving an accurate estimate of the disease burden remain notoriously inaccurate. This is primarily because of the greater challenge in confirming the diagnosis of paediatric TB due to the paucibacillary nature of TB disease in children and difficulty in obtaining good quality respiratory specimen [3,4]. In support of this assertion, it is estimated that more than two-third of all childhood TB cases are either unreported or undiagnosed, while 96% of the 239,000 children who died from TB in 2015 were not on treatment [5,6]. The sensitivity of sputum smear microscopy in childhood TB is less than 15%, even with optimized methods such as centrifugation of samples and use of fluorescent microscopy [7]. While culture of Mycobacterium tuberculosis (M.tb) in biological samples is more sensitive than smear microscopy, bacteriological confirmation of paediatric TB by both mycobacterial growth indicator tube (MGIT) liquid culture and Löwenstein-Jensen (LJ) solid media seldom exceeds 40%, including when using gastric aspirates and induced sputum [8,9].
Although in adult studies the sensitivity and specificity of Xpert MTB/RIF (Cepheid, USA) is comparable to that of liquid culture [10][11][12], data from paediatric studies suggest that the sensitivity of Xpert is lower in children, and substantially lower among ambulant paediatric populations compared to paediatric inpatients [13][14][15][16][17][18][19]. In 2017, the World Health Organization (WHO) endorsed the use of Xpert MTB/RIF Ultra cartridge (Ultra), based on the findings from a large multi-centre non-inferiority diagnostic accuracy study in adults with signs and symptoms of pulmonary TB [20]. The study reported that Ultra had 5% higher sensitivity relative to Xpert MTB/RIF (95% CI: +2.7, +7.8) but 3.2% lower specificity (95% CI: -2.1, -4.7), with sensitivity gains highest among smear-negative, culture-positive patients and in HIVinfected patients [21]. Preliminary data on the accuracy of Ultra testing of sputum for diagnosis of pulmonary TB in hospitalized children reported that Ultra detected 75.3% of cases positive by culture on the same sample, while the performance of Ultra is comparable to that of Xpert amongst children with a positive Xpert, Ultra or TB culture [22].
Diagnosis of childhood TB remains challenging with the current routine clinical and laboratory diagnostic tools. Thus, the need for a new, preferably non-sputum based point-of-care (POC) diagnostic tool that could give a rapid and accurate diagnosis of TB disease in children is widely acknowledged. Although tests based on host immune response hold promise in this regard, no immune-diagnostic has been developed into a POC test than can distinguish between latent TB infection (LTBI) and TB disease, and more importantly between TB disease and other respiratory infections. Both the tuberculin skin test (TST) and interferon (IFN)-γ release assays (IGRA) fail to differentiate M.tb infection from TB disease [23][24][25].
Research into TB biomarkers has gained prominence due to the lack of suitable tests based on detection of the pathogen [26], and their potential for translation into a non-sputum based POC test [27]. The majority of studies investigating novel TB biomarkers utilized adult populations, while TB diagnostic research studies are traditionally conducted in adults with the findings usually extrapolated to children. It is unlikely that adult findings can be accurately extrapolated to paediatric populations given the considerable differences in clinical presentation, pathology and underlying immune responses to M.tb between adults and children [28]. However, studies of TB biomarkers in children are now emerging, and there is growing advocacy to include children as early as possible in research for new diagnostics with greater attention to addressing particular diagnostic challenges for children [29].
Therefore, the aim of this systematic review was to evaluate emerging biomarkers for diagnosis of TB in children aged less than 15 years, and to compare their diagnostic accuracy to the WHO-endorsed target product profiles (TPP) recommended for potential new diagnostics for TB in children [27].

Search strategy and selection criteria
We conducted a systematic review of biomarkers and multi-marker biosignatures for diagnosis of active tuberculosis in exclusively paediatric study subjects, defined as age less than 15 years, in studies published between January 1, 2000 and November 27, 2017. A copy of the protocol for this systematic review is included in the Supporting Information (S1 File). PubMed, EMBASE, and Web of Science were searched for relevant publications. In the case of PubMed, searches including medical subject headings (MeSH), "text words" (tw) and titles (ti) were used. In PubMed, the 'English' filter was not used. For EMBASE and Web of Science, 'English' and 'Human' filters were used. For each database, the search term was transposed as appropriate. The PubMed search term used was as follows: ( We structured the preparation and reporting of our systematic review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [30]. Only studies of humans or that used human biological samples were eligible for inclusion. Biomarkers and multi-marker biosignatures, of immunological and microbiological origin, were included. Studies using adult or mixed adult and paediatric populations were excluded. Studies reporting biomarkers for extra-pulmonary TB (EPTB) detection were excluded. Index tests that required imaging techniques or detection from bacterial culture were excluded. As systematic reviews for interferon-gamma release assays (IGRA) and mycobacterial DNA (e.g. GeneXpert MTB/RIF, TB-LAMP) already exist [18,[31][32][33], these biomarkers were not included in our study. Studies published in English and French were eligible for inclusion. Publications were screened by title and abstract by two reviewers (TT, EM) before full-text screening. TT and EM conferred to determine appropriateness of all selected articles.

Data extraction
The form utilized for data extraction was piloted for a separate systematic review (MacLean E. et al., unpublished) and further refined for this systematic review. For all eligible articles, double data extraction was performed by TT and EM using the structured Google form. A list of the fields for data extraction and the structured Google form used are included in the Supporting Information (S2 File).

Assessment of study quality
Two reviewers (TT, EM) assessed the quality of the studies using specific sets of criteria within four domains of the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) framework [34]. As per QUADAS-2 guidelines, the selected questions were those deemed most relevant for identifying biases for studies included in the review. Each criterion was classified as either "Yes", "No", or "Unclear" when applied to the information that is available in the publications, as described in Table 1.

Diagnostic accuracy
Target product profiles for new TB diagnostic tests in adults and children have been published [27]. We reviewed the diagnostic performances of the biomarkers, where reported, to highlight biomarkers that met the WHO-recommended minimal targets of 66% sensitivity and 98% specificity for a new diagnostic test for TB disease in children, and/or 90% sensitivity and 70% specificity for a triage test.
We summarized the evidence by a review of the methodological characteristics of the published studies including immunological properties of the biomarkers, clinical samples and assays required, as well as assessment of study quality and reported diagnostic accuracy as Domain 3: Reference standard [could the reference standard, its conduct, or its interpretation have introduced bias?] Signaling Question 3: is the reference standard likely to correctly classify the target condition? Yes: Culture-based reference standard (could be composite standard) OR citation of valid diagnostic algorithm, e.g. American Thoracic Society No: Reference standard did not include culture Unclear: Inadequate description of reference standard to understand procedures Applicability: are there concerns that the target condition as defined by the reference standard does not match the question? described above. All data generated during this study are available from the corresponding author on request.

Results
A total of 1235 records were identified through database search for studies published between January 1, 2000 and November 27, 2017 and two additional records identified through reference lists. After removal of duplicates, 928 studies were screened using titles and abstracts; 98 full text articles were assessed for eligibility and 29 studies were eventually included in the systematic review (Fig 1). Reasons for exclusion are: conference proceedings and abstracts only; language (not English or French); technique (imaging-based, culture-based, commercial IGRA); reviews (narrative review, systematic review, meta-analysis); or target of paper (epidemiological, molecular biology, cost effectiveness, vaccine or drug study; biomarkers for detection of LTBI, prediction of disease progression or treatment monitoring). Studies that only included adult patients were also excluded. Table 2 summarizes the characteristics of the 29 studies in the systematic review, stratified by study design, including the statistical parameters used to assess diagnostic performance of the biomarkers when available . Most of the studies (20/29 [69%]) were published between 2010 and 2017. Six of the published studies were cohort studies, while the others used either a cross-sectional (n = 3) or case-control (n = 20) study design. Thirteen of the studies were conducted in Asia (India = 7; China = 4; Bangladesh = 2), six in Europe, five in sub-Sahara Africa, three in the Americas, and one Australia. The 29 studies reported diverse and heterogeneous forms of biomarkers, which require different types of clinical specimen and utilized diverse techniques and laboratory assays for identifying biomarkers. The biomarkers include hostresponse markers comprising cytokine biomarkers (n = 7), cell surface biomarkers (n = 2), mRNA transcript signatures (n = 5), micro RNA signatures (n = 2), antibodies (n = 9), and metabolic signature (n = 1). Three studies investigated the utility of lipoarabinomannan (LAM), a mycobacterial cell wall antigen, for diagnosis of childhood TB. Patient selection (i.e. was a consecutive or random sample of patients enrolled?). Out of the six cohort studies, three publications explicitly stated that consecutive samples of eligible subjects were recruited [35,38,39]. Two studies were deemed to have used purposive sampling by conducting biomarker analysis on a sub-selection of eligible subjects because of cost [37], or by excluding IGRA positive children with other respiratory diseases in their discovery cohort [36]. The sampling strategy was not clearly described in the study by Tebruegge et al [40]. Only one cross-sectional study and three out of 20 case-control studies reported that a consecutive sample of eligible study subjects were enrolled [41,[50][51][52]. The remaining studies used either convenience or purposive sampling strategy, or their sampling strategy was inadequately described.

Study quality
Index test (i.e. was the conduct and interpretation of the index test blinded?). Four out of six cohort studies explicitly stated that the conduct and interpretation of the index test was blinded [35,37,38,40]. None of the cross-sectional studies, and only two case-control studies reported blinding of the conduct and interpretation of the index test results [57,59].
Reference standard (i.e. is the reference standard likely to correctly classify the target condition?). All six cohort studies reported either a culture-based reference standard or a composite reference standard with citation of a valid diagnostic algorithm. In contrast, two of the three cross-sectional studies [42,43], and six of the 20 case-control studies reported such a reference standard [45,[56][57][58][59][60].
Patient flow and timing (were all patients included in the analysis?). All study subjects, after exclusions, were given the index and reference tests in three cohort studies [35,38,39], while some subjects could not be accounted for after exclusions in one study [40]. The description was inadequate to clearly ascertain patient flow and timing in the other two cohort studies [36,37]. Two cross-sectional and eight case-control studies showed that all patients were given the index and reference tests after exclusions. One cross-sectional study and one case-control study had patients that could not be accounted for after exclusions, while the description of patient flow and timing was unclear in 11 case-control studies (Fig 2). Table 2 shows the accuracy estimates from all the studies that reported such data. Cytokine biomarkers (IP-10, IL-2 and IL-13) in a case-control study by Armand et al demonstrated sensitivity ! 80% and specificity ! 98%, while circulating microRNAs in another case-control study by Zhou et al demonstrated sensitivity and specificity of 96% and 100% respectively [59,63]. These biomarkers met both recommended minimal targets for a new diagnostic test.

Diagnostic performance of biomarkers
Seven studies reported biomarkers that met both minimal TPP targets for a new triage test. These include an IL-2 ELISPOT assay using recombinant M.tb antigen (secreted L-alanine dehydrogenase [AlaDH]) that distinguished active TB and LTBI in a prospective study with a sensitivity of 100% and specificity of 81% [35]. Also, anti-BCG IgG secreted from M.tb-specific plasma cells in a cross-sectional study of a new serological method (antibodies in lymphocyte supernatants) that distinguished children with TB and other diseases demonstrated sensitivity and specificity of 91% and 87% respectively [41]. Anti-Ag85C IgG [48], anti-BCG IgG [53], microRNA-31 [60], circulating microRNAs [63], and cytokine biomarkers (IP-10, IL-5 and IL-13) [59] reported in five of the case-control studies also met both minimal targets for a triage test. However, 15 studies reported biomarkers that met just one of the minimal TPP targets for a diagnostic or triage test. These biomarkers had sensitivity greater than 66% but specificity less than the minimum of 98% set for a diagnostic test, or demonstrated sensitivity less than 90% but specificity that exceed the minimum of 70% for a new triage test [36-39, 43-47, 50, 54, 55, 57, 61, 62].

Discussion
The investigation and development of new TB diagnostics that are suitable for children has been highlighted as a research priority for the End TB Strategy by the WHO [64]. We conducted a systematic review of host-response and pathogen-derived biomarkers for diagnosis of pulmonary TB disease in children, assessed quality of the included studies using the standardised QUADAS-2 framework, and compared the diagnostic performances of the candidate biomarkers to the published TPP recommended for new diagnostics for TB in children. In general, we found that the published childhood TB biomarkers studies were mostly early-stage studies and highly heterogeneous in terms of the specific type of biomarkers, clinical samples, test methods, and reference standards used for diagnosis of pulmonary TB disease in children. Therefore, we did not perform a meta-analysis.
An optimally designed diagnostic accuracy study is a prospective study with a blind comparison of the index and reference tests in consecutively recruited study subjects from a relevant clinical population [65,66]. If a diagnostic or triage test is to be applied in multiple settings globally, then an optimal diagnostic accuracy study should also be multi-centre and/or performed in multiple diverse geographical locations and populations. The majority of the studies included in our systematic review used a case-control study design with the selection of children defined as having TB disease and comparison groups that include healthy uninfected or M.tb infected controls in most cases. Other studies enrolled children with suspected TB disease referred for investigations to ascertain their diagnosis, using either a cohort or cross-sectional study design.
In our assessment of study quality, we found that most case-control studies were at an unclear or high risk of bias. Included case-control studies typically either did not utilize a consecutive sampling strategy or the sampling strategy was unclear. Generally, it was unclear if reported biomarker results were interpreted without knowledge of the results of the reference standards. Most of the cohort and cross-sectional studies that recruited children with suspected TB disease were also found to be at risk of bias because they either did not meet the criteria or assessment was not clear in at least one QUADAS-2 domain.
The risk of overestimating diagnostic accuracy is much higher in studies that use a casecontrol study design compared to other designs [67,68]. A meta-analysis that investigated the importance of 15 design features on estimates of diagnostic accuracy reported a relative diagnostic odds ratio (RDOR) of 4.9 especially in case control studies that included healthy controls [69]. This mean diagnostic accuracy studies, particularly with the inclusion of healthy controls, are likely to overestimate the diagnostic performance almost five times. The inclusion of healthy controls introduces a design deficiency with lower occurrence of false-positive results and thus increasing the specificity [70].
The lack of a sensitive and specific reference standard for TB disease in children and of standardized case definitions are known to constitute major challenges to the assessment of accuracy of new diagnostic tools for childhood TB and for comparison of findings between diagnostic studies [71,72]. Therefore, we compared the reported diagnostic performances of the biomarkers in our systematic review to the minimal targets of diagnostic performance recommended in a WHO-endorsed TPP for new TB diagnostic tests in children [27]. For a new diagnostic test in children, a sensitivity of ! 66% for intrathoracic TB is considered optimal, as this can currently be achieved using appropriate samples with Xpert, while the specificity should be ! 98% specificity of a microbiological reference standard [18,27]. The sensitivity of a triage test should ideally be as high as that of the confirmatory test, but if a triage test could be conducted at lower levels of care and is easier to do, then conceivably more children with a higher likelihood of TB disease will be identified even if its sensitivity is lower than that of confirmatory test [27]. As such, the minimal sensitivity and specificity for a new triage test were set at 90% and 70% respectively, in order to make such triage testing potentially cost-effective in an implementation strategy.
Two case-control studies by Armand et al and Zhou et al reported cytokine biomarkers and circulating microRNAs respectively, which met both minimal sensitivity and specificity targets for a diagnostic or triage test [59,63]. Overall, majority of the studies in this review reported biomarkers that met one or both of the minimal sensitivity and specificity TPP targets for use either as a diagnostic or triage test in children, which makes them promising. Biosignature thresholds are often set to obtain an optimum accuracy using Receiver Operating Characteristic (ROC) analysis [73,74]. It is possible to re-optimise such thresholds for the biomarkers that met just one of the minimal TPP targets, which could further increase the sensitivity or specificity of the biomarkers toward meeting both targets either for a diagnostic or triage test. However, these results should be interpreted cautiously while taking into consideration the assessment of the quality of the individual studies and the potential for overestimation of diagnostic performance. In particular, findings from the case-control studies were deemed to have a high risk of bias from assessment of their quality with very probable overestimation of the reported diagnostic performances as discussed earlier.
A number of the studies in this review investigated and reported the utility of antibodies for diagnosis of TB disease in children, including a novel serological assay called antibodies in lymphocyte supernatant [41,53]. Although the WHO encourages research in serological tests, the organization has recommended against the use of the currently available commercial antibody-based tests for TB diagnosis [75].
Critically, the failure of almost all studies to clearly articulate the intended use case of their biomarker-based diagnostic test, and to benchmark a biomarker towards it, has been highlighted as one of the key issues that limit the impact and translation of biomarkers into new diagnostic tests [76]. None of the studies in our systematic review clearly stated the intended use of the reported biomarkers either as a diagnostic or triage test in children. Furthermore, it has been suggested that an "ideal" biomarker (or set of biomarkers) that could be developed into an accurate test for TB in children should fulfil the following characteristics: (i) measurable in small volumes of readily obtainable samples such as blood, urine, stool, saliva, etc.; (ii) identify M.tb with high sensitivity and specificity regardless of age, nutritional status or HIV status; (iii) distinguish children with active TB disease from latently infected children with other respiratory infections; and (iv) suitable for incorporation into a diagnostic platform that would provide rapid results at or near the point of care [77]. While the performance of majority of the biomarkers in this review is promising, most of the biomarkers will need further refining and optimization while taking into consideration these methodological characteristics of an "ideal" biomarker. As such, the biomarkers should be evaluated in stronger and better-designed prospective studies to limit risk of bias and to assess the feasibility of incorporating them into diagnostic platforms implementable in high TB burden settings.
Our systematic review has limitations. A formal assessment of publication bias was not performed; existing methods like funnel plots or regression tests are not helpful for diagnostic accuracy studies [78]. Additionally, it is always possible in a systematic review that relevant publications were not identified in the search. However, the search term was constructed with the assistance of a medical librarian, and the term was adapted from a previously used term that was extensively calibrated (MacLean E. et al, unpublished).

Conclusions
The fact that most of the studies investigating TB biomarkers in children were published within the last seven years supports the assertion that such data are now emerging. However, the results from this systematic review suggest that stronger emphasis need to be placed on improving the design, quality, and general reporting of studies investigating childhood TB biomarkers. In particular, future research studies in this area should target their biomarker research toward the TPP for a new diagnostic and/or triage test intended for use in children.
In addition, such studies should be multi-centre studies performed in diverse geographical locations and populations, such that the diagnostic or triage test can be applied in multiple settings globally. Another approach is to conduct side-by-side/parallel diagnostic accuracy studies using prospective cohorts to benchmark different triage or diagnostic tests performance against each other. These will enhance the reliability, comparability, and reproducibility of the results, as well as the potential to translate the findings to the clinic while promoting more collaborative research. We hope that this systematic review will contribute to provide targeted guidance for further scientific explorations toward the eventual development of the next-generation POC test for the rapid and accurate diagnosis of TB disease in children.