Defining an intermediate category of tuberculin skin test: A mixture model analysis of two high-risk populations from Kampala, Uganda

One principle of tuberculosis control is to prevent the development of tuberculosis disease by treating individuals with latent tuberculosis infection. The diagnosis of latent infection using the tuberculin skin test is not straightforward because of concerns about immunologic cross reactivity with the Bacille Calmette-Guerin (BCG) vaccine and environmental mycobacteria. To parse the effects of BCG vaccine and environmental mycobacteria on the tuberculin skin test, we estimated the frequency distribution of skin test results in two divisions of Kampala, Uganda, ten years apart. We then used mixture models to estimate parameters for underlying distributions and defined clinically meaningful criteria for latent infection, including an indeterminate category. Using percentiles of two underlying normal distributions, we defined two skin test readings to demarcate three ranges. Values of 10 mm or greater contained 90% of individuals with latent infection; values less than 7.2 mm contained 80% of individuals without infection. Contacts with values between 7.2 and 10 mm fell into an indeterminate zone where it was not possible to assign infection. We conclude that systematic tuberculin skin test surveys within populations at risk, combined with mixture model analysis, may be a reproducible, evidence-based approach to define meaningful criteria for latent tuberculosis infection.


Introduction
According to the World Health Organization (WHO), the global burden of tuberculosis peaked in 2000 and has since declined by 1.5% per year [1]. Although encouraging, this modest progress falls short of the Millennial Development Goals for tuberculosis elimination. In response to this persistent challenge of tuberculosis, the WHO launched its End TB Strategy in 2015 [2] that promotes integrated, patient-centered care and prevention, bold policies and supportive systems, and intensified research and innovation. In September 2018, the United Nations General Assembly held a high-level meeting to build political commitment and multisectoral action to eliminate tuberculosis [3].
With this new commitment to tuberculosis control, the Stop TB Partnership and the WHO now advocate for treatment of latent tuberculosis infection as a way to reduce the risk of tuberculosis among individuals at highest risk for disease [4]. Treatment of latent infection confers benefit not only to the individual but may also confer benefit to a population by shrinking the pool of infected individuals at risk for disease progression.
The diagnosis of latent tuberculosis infection is not straightforward because of persistent questions about the accuracy and reliability of the available diagnostic tests. The diagnosis of infection is made by demonstrating an immune response to antigens of Mycobacterium tuberculosis in the absence of clinically active tuberculosis disease. The tuberculin skin test (TST) is a century-old method for assessing tuberculosis infection, but it is limited by immunologic cross-reactivity with Bacille Calmette Guerin (BCG) vaccine and environmental mycobacteria [5,6], by immunologic boosting [7][8][9] with repeated tests, and effects of immunosuppression [10]. There are also logistical issues in obtaining good quality tuberculin, maintaining the cold chain, and the need for two separate visits from the health worker [11]. The recent development of interferon-gamma release assays (IGRA) has mitigated some of these concerns about TST [12], but the performance of IGRAs in endemic areas is inconsistent and not fully validated [13][14][15][16]. Indeed, in settings where immunosuppression may be common, both tests used in tandem may yield the highest sensitivity [17]. As countries scale up their capacity to prevent tuberculosis, it is likely that the TST will remain the mainstay of diagnosis of latent tuberculosis infection because the test is less expensive and more widely available than IGRA.
The use of TST as the method of diagnosis of latent infection is controversial because there is longstanding and ongoing debate about how best to interpret the results of TST in populations where BCG vaccination is widely used. Some have argued that the effect of cross-reactivity due to BCG vaccination may be minimal and does not change the interpretation of the skin test [18], whereas others have argued the opposite [19,20]. In an effort to parse the effects of BCG and non-tuberculous mycobacteria on the TST result, researchers have used mixture model analysis to separate underlying component distributions attributable to M. tuberculosis infections from other non-specific causes [21][22][23][24][25][26]. A finite mixture model arises when samples are drawn from a population that is a mixture of K (K >1) component populations and is used for estimating heterogeneity in effects [27][28][29]. Mixture model analysis is an alternative way to estimate the prevalence of latent tuberculosis infection, which can be compared with the criterion-based methods for assigning latent infection. The criterion based standard tuberculin skin test comprises an intracutaneous purified protein derivatives (PPD) 0.01 ml injection into the forearm where the reaction is read 48 to 72 hours later. Based on the individual person's risk exposure, the threshold used to determine the LTB status can be 5 mm, 10 mm or 15 mm [30].
Although mixture model analysis is useful to understand the epidemiology of latent infection, it does not inform the treatment of latent infection in the individual patient. For these clinical decisions, meaningful criteria for latent infection are needed [27]. The purpose of this IRB approval for this study restricts the sharing of individual-level data. An anonymized dataset is available upon request form researchers who meet the criteria for access to confidential information. Data requests may be sent to the Human Subjects Office Director at University of Georgia, Kim Fowler (phone contact: 706-542-5318, and email contact: irb@uga.edu). In particular, we welcome researchers willing to create a strong data-sharing partnership and collaboration with the Ugandan researchers who generated the data. study is to use mixture models to estimate the underlying distributions of the TST in urban Kampala, Uganda, and define meaningful diagnostic criteria for latent tuberculosis infection.

Study populations
Two study populations were used for this analysis: the Kawempe Community Health Study and the Lubaga Community Health study. Lubaga and Kawempe are contiguous divisions in Kampala City (Fig 1). Both studies were performed in Kampala, Uganda, by the same investigators and using similar methodologies and similar standard data collection tools (S1 File).
Kawempe Community Health Study. This study was described previously [31, 32]. Briefly, tuberculosis index cases were recruited from Old Mulago Hospital from 1995 to 2005 and were determined to be the initial case presenting in the household. All index cases were microbiologically confirmed using sputum microscopy and culture. Index cases were asked to list their household contacts; these household contacts were defined as any individual spending at least seven consecutive days in the same household as the index case in the three months preceding diagnosis. In this study, 1917 household contacts completed a baseline sociodemographic and tuberculosis risk questionnaire and physical examination collecting data on age, sex, relationship to the index, education level, past tuberculosis, and environmental characteristics.

PLOS ONE
Lubaga Community Health Study. This study recruited index tuberculosis cases and disease-free index controls in the Lubaga Division of Kampala, Uganda, between 2012 and 2016. Index tuberculosis cases were recruited from Lubaga Hospital and community health clinics of the Kampala Capital City Authority. All index cases were microbiologically confirmed using sputum smear microscopy, GeneXpert1, or mycobacterial culture. Index controls were matched to the index cases by age category, sex, and neighborhood and were recruited within one month of the matched index case. Index cases and index controls were asked to list household contacts, using the same definition as for the Kawempe study, and contacts who lived outside the household as well. To reduce recall bias we used a combination of standard prompts and recent timeframes to help participants remember their contacts [33]. In this study, there were 1844 contacts of the index cases and controls, 882 and 963 contacts, respectively, who completed a tuberculosis risk questionnaire.
Measurements. Many procedures were harmonized between the two studies. All contacts of cases or controls were evaluated for tuberculosis infection using the tuberculin skin test.

Ethical approvals
Institutional review board approval was obtained from Ethics Committee at Makerere University School of Public Health and the University of Georgia. Informed consent was obtained from all index cases and controls as well as their contacts. Parents of child contacts provided written consent in addition to child verbal assent.

Statistical analysis
Frequency distribution and percentages were used to study the baseline characteristics of the study population. Only subjects whose TST induration is > 0 mm were included in the mixture models. Visualization of TST induration histogram distribution and the Hartigans' dip test of unimodality [29] were further used to assess whether the TST induration distribution was unimodal or multimodal. Finite mixture of normal models was used to capture the heterogeneity in the TST induration arising from Mycobacterium tuberculosis infection or as a result of cross reactions with environmental mycobacteria or prior BCG vaccination [24,25]. A finite mixture model arises when samples are drawn from a population that is a mixture of K component populations (where K > 1). Let λi represent the proportion of the total population that the ith component population constitutes and let fi(x) represent the probability density function for the ith component population. If we represent the measure of the induration size as X, a random variable which takes values in the sample space of w, its probability density function can be represented as: . . k and we say g(x) is a finite mixture of k components. The parameters λ1. . . λk are called missing proportions which represent the proportion of the population in each component and f1(x). . . fk(x) are the probability density functions of the random variable X in each component [28].
A method using a combination of Newton-type and expectation-maximization (EM) algorithms was used to estimate parameters of the finite normal mixture models considered. This method was implemented using an R package called "mixdist" [36] in the R programming language (R Core Team). Further, to determine the number of components to be included in our final models, likelihood ratio test and Bayesian Information Criteria (BIC) were used. In both of our study populations, two-component normal mixture models were found to fit the data better than a three-component mixture of normal model or a two component gamma mixture models.
Previous studies have shown that when the class separation in a fitted mixture model is high, a sample size as small as 150 to 300 subjects can perform well [37,38]. Class separation is at its highest when the difference in the mean between the latent class is large [39]. This study sample size was 3,761 of which the 2051 samples were used in fitting the mixture model.
To assess the effects of age, sex, HIV status and BCG vaccination status on the underlying distributions, we stratified by these variables. We determined the optimal cutoff value for the TST as that TST reading where the two distributions intersect, thereby minimizing misclassification. We stipulated an indeterminate range for the TST induration result by using the 97.5th percentile value of lower distribution and the 2.5th percentile value of higher distribution. The proportion of participants from each group falling in this zone was then calculated. Further sensitivity analysis was carried out to evaluate if those with missing/unknown BCG status differ from those who have BCG status in terms of age, sex, and TST values. However, we did not find any statistical difference between them.

Results
There were 1,917 participants from Kawempe neighborhood and 1,844 from the Lubaga neighborhood. Although the two groups came from adjacent divisions of Kampala, they differed in several ways ( Table 1). The proportion of females was 56% in the Kawempe group and 47% in the Lubaga group. The Lubaga group included a greater proportion of participants in older age groups than Kawempe. In both groups, the coverage of BCG vaccinate was high, 90% in Lubaga and 73% in Kawempe. The proportion of HIV seropositive participants was more than three times higher in the Kawempe than in the Lubaga study population (12% vs 4%).
The mean and median TST induration readings were 11.3 mm (SD = 7.3) and 13.0 mm for Kawempe, and 7.6 mm (SD = 7.5) and 7.4 mm for Lubaga. The difference in the mean and median TST induration between the study populations is attributable to a higher proportion of individuals with a value of 0 mm from the Lubaga group (N = 782, 42%) than from the Kawempe group (N = 381, 20%). Among participants with TST reading > 0 mm, the mean TST induration was 14.1 mm (SD = 5.2) for Kawempe and 13.2 mm (SD = 4.8) for Lubaga. For the Kawempe and Lubaga groups, the frequency distributions of TST among participants with TST induration > 0 mm were multi-modal, according to the Hartigans' dip test of unimodality (D = 0.035, p-value < 0.001; D = 0.019, p-value = 0.01, respectively). A similar multimodal distribution was found when both cohorts were combined into a single cohort (D = 0.027, p-value < 0.001).
In the Kawempe study, the empirical probability density function of TST distribution among household contacts of index cases was decomposed into two normal distributions using an unstratified mixture model ( Table 2, Fig 2A). Using both the likelihood ratio test and BIC criteria, we determined that a model with only two component distributions provided the best fit to the data (Chi square = 71.46, df = 2, p-value < 0.0001). The mean of TST induration in the lower distribution was 4.6 mm (SD = 1.9), and the mean for the upper distribution was 15.1 mm (SD = 4.0). The lower distribution comprised 13% and the upper distribution comprised 87% of the population. The optimal cutoff value of the TST for separating the lower and upper distributions was estimated to be 7.1 mm.
In the Lubaga study, the empirical probability density function of TST distribution among both household and non-household contacts of index cases was also decomposed into two normal distributions using an unstratified mixture model ( Table 2, Fig 2B). The mean of TST induration in the lower distribution was 7.3 mm (SD = 1.9), and the mean for the upper distribution was 14.7 mm (SD = 4.1). The lower distribution comprised 10% and the upper distribution comprised 90% of the population. The optimal cutoff value of the TST for separating the lower and upper distributions was estimated to be 7.7 mm.
Because the Kawempe and Lubaga studies were performed in the same city, using a similar design and procedures, and because the findings between the two studies were consistent, we  Fig 3). As seen in the individual studies, the optimal fit was achieved with two normal distributions in the combined analysis. The mean of TST induration in the lower distribution was 5.4 mm (SD = 2.3), and the mean for the upper distribution was 15.1 mm (SD = 4.0). The lower distribution comprised 13% and the upper distribution comprised 87% of the population. The optimal cutoff value for separating the lower and upper distributions was estimated to be 7.5 mm. When stratifying the population by age category, the mean for the upper distribution was remarkably consistent, ranging from 14.0 mm to 15.7 mm across three different subgroups (Table 2), whereas the mean value for the lower distribution was more variable across age categories, increasing from 3.8 mm to 7.0 mm. A similar pattern was observed when stratifying by HIV serostatus. When stratifying by BCG vaccination, the mean value of the lower distribution was greater among those who were vaccinated compared to those who were not (6.2 mm versus 4.3 mm), but the means of the upper distributions were similar. Age younger than 5 years and HIV infection both had lower optimal cutoff values compared to their counterparts, whereas BCG vaccination required a higher cutoff value.
In the Lubaga study, but not the Kawempe study, we included contacts of index controls who did not have tuberculosis. Among 556 household contacts of index controls who were tested with TST, the distribution of the control contacts decomposed into two normal distributions with means of 6.8 mm (s.d. = 2.3) and 14.3 mm (s,d, = 4.1). Twenty-five percent of contacts fell under the lower curve, whereas 75% fell under the upper curve.
From the combined analysis, we evaluated the overlap of the two distributions ( Table 3). The value of 9.9 mm represented the 97.5th percentile value for the lower distribution, so only 2.5% of this distribution exceeded this value, whereas 90.2% of the upper distribution fell n is for the total number of subjects in each category. The overall optimal cutoff point for all the data set and each stratified group is shown in the column given by μ.

Discussion
In this analysis, we compared the TST responses from two independent groups of tuberculosis contacts in Kampala, Uganda, from two time periods about one decade apart. We found that the overall frequency distribution for each group could be decomposed into two normal distributions that had remarkably similar mean TST values between the two groups. When the two groups were combined, we found that the mean TST value for the upper distribution was 15 mm and stable across strata of sex, age group, BCG vaccination, and HIV serostatus, whereas  We posit that the upper distribution represents true infection with M. tuberculosis whereas the lower represents the non-specific effects from BCG vaccination, nontuberculous mycobacteria infection, or immune modulation. Since there is no 'gold standard' for latent tuberculosis infection, we rely on other ways to establish the validity of this assertion. For the upper distribution, there is content validity since the mean TST value of 15 mm for the upper distribution is entirely consistent with the skin test results of patients with active tuberculosis disease [40]. It is also consistent with similar studies of tuberculosis infection from South Korea, Malawi, and the Basque region of Spain [21,24,26]. Moreover, the cutoff value of 9.9 mm is essentially the same as the 10 mm criterion recommended to assign latent tuberculosis infection [41]. Like the TST survey from Malawi [21], we found the frequency distribution of reactive tests was indeed best described by two underlying distributions with mean values nearly 10 mm different (5.4 versus 15.1 mm). Perhaps most important, a greater proportion of contacts from the homes of tuberculosis cases fell under the upper distribution as compared to the contacts from control homes (87% versus 75%, respectively).
As for the lower distribution, we propose that it represents sensitization from infection with environmental mycobacteria or prior BCG vaccination. To assess the effect of environmental mycobacteria, we evaluated contacts without BCG vaccination and found that the mean value  Table 3. Overlap between the lower and upper distributions and interpretation of TST in the entire, combined study population.

TST Cutoff Value (mm) Lower Distribution Upper Distribution Interpretation
-----%----- of the lower distribution was 4.3 mm, which is consistent with an effect from environmental mycobacteria infection in the population. There also appeared to be a small effect of BCG vaccination because the mean value of the lower distribution was 1.9 mm larger among vaccinated contacts compared with those who were not vaccinated. We acknowledge that it is difficult to parse these effects without firm epidemiologic information about environmental mycobacterial infection or specialized immunologic tests to evaluate immunity to BCG. It is also possible that by not including community controls from the Kawempe study, we have underestimated the number of TST responses with low values from that study. Mixture models have been used to estimate the prevalence of latent tuberculosis infection without using a defined criterion for infection. Typically, authors have used the upper distribution as an indicator of infection and used the proportion of the population under this curve as the prevalence. Although this type of mixture modeling is useful in understanding the epidemiology of tuberculosis infection in a population, the findings from these models are not readily applied to the clinical setting because they do not infer criteria to define latent infection.
To guide clinical decisions regarding treatment of latent tuberculosis infection, it is customary to interpret the TST as a dichotomous test-either positive or negative. Since the TST is inherently context-dependent [21], the criteria for a positive test may vary depending on age distribution, environmental mycobacteria, co-morbidities such as HIV infection, and recent exposure. Indeed, our findings reflect this variability and support the use of different cutoff values for different populations, as is currently practiced. For example, if we use the optimal cutoff values for separating the lower and upper distributions as our criterion for infection, then we would propose 6 mm as the criterion for infection in children younger than 5 years and in HIV infection. The criterion for infection among contacts with BCG vaccination is nearly 9 mm, which is within the margin of error of a 10 mm reaction.
Creating a dichotomous criterion for the TST ignores potentially useful information found in the continuous measurement [25] and may lead to misclassification of latent infection. We propose another interpretation of the TST, one that categorizes the TST results, but adds a third indeterminate category to account for some of the uncertainty in the TST. Using percentiles of the underlying normal distributions estimated by the mixture model, we defined two TST values to demarcate three ranges. Values of 9.9 mm or greater contained 90% of individuals with latent tuberculosis infection (the upper distribution), whereas TST values less than 7.2 mm contained nearly 80% of individuals without tuberculosis infection (lower distribution). Contacts with values between 7.2 and 9.8 mm fell into an indeterminate zone where it was not possible to classify them as infected or not. In our sample, contacts with responses in this zone were 2.5 times more likely to fall under the lower distribution than the upper distribution, so were more likely to represent reactions resulting from BCG vaccination or infection with other mycobacteria.
By defining this third category, we acknowledge that TST readings in this range are uncertain, but we preserve clinically useful cutoff values and gain clarity about the interpretation of readings outside of the indeterminate range. As for what to do with a person who has an indeterminate reading, it may depend on the age, HIV serostatus, or history of recent exposure. But for the many adult individuals who have no known exposure, we propose repeating the test in 2 to 4 weeks. With the repeat test, we expect regression toward the mean, so subsequent readings would migrate toward the true underlying distribution or a booster response [8]. These movements might help guide decisions for treatment. An alternative approach would be to perform sequential testing, first with the TST followed by the interferonγ release assay for tests within the indeterminate range, and base decisions about latent infection on the results of these tests together [25].
With the advent of IGRAs, one may argue that the TST is obsolete and refined criteria for infection using the TST are no longer needed. Although the scientific justification of IGRAs is strong [12], the performance characteristics of these tests are not optimal or consistent in some populations, especially in Africa and Asia [14,42,43]. The c-TB (Statens Serum Institute, Copenhagen, Denmark) is a new skin test based on ESAT-6 and CFP-10 antigens, the same antigens used in the current IGRAs, that appears to be unaffected by BCG vaccination [40]. If the c-TB skin test performs well in African populations where tuberculosis is endemic and BCG vaccination is widely used, it may replace skin tests using purified protein derivative. Until then, the proposed modification to the TST in assigning latent infection may be useful in decisions to treat latent infection or revaluate after continued follow-up.
We do not presume to suggest that there are fixed or standard criteria that define latent tuberculosis infection across populations. As has been pointed out by others, population characteristics and the goals of testing affect the choice of cutoff values [21, [44][45][46]. We do propose, however, that the process of TST surveys within populations at risk, followed by a mixture model analysis, is an evidence-based approach that can define meaningful criteria for latent infection in a given population.