State-level prevalence estimates of latent tuberculosis infection in the United States by medical risk factors, demographic characteristics and nativity

Introduction Preventing tuberculosis (TB) disease requires treatment of latent TB infection (LTBI) as well as prevention of person-to-person transmission. We estimated the LTBI prevalence for the entire United States and for each state by medical risk factors, age, and race/ethnicity, both in the total population and stratified by nativity. Methods We created a mathematical model using all incident TB disease cases during 2013–2017 reported to the National Tuberculosis Surveillance System that were classified using genotype-based methods or imputation as not attributed to recent TB transmission. Using the annual average number of TB cases among US-born and non-US-born persons by medical risk factor, age group, and race/ethnicity, we applied population-specific reactivation rates (and corresponding 95% confidence intervals [CI]) to back-calculate the estimated prevalence of untreated LTBI in each population for the United States and for each of the 50 states and the District of Columbia in 2015. Results We estimated that 2.7% (CI: 2.6%–2.8%) of the U.S. population, or 8.6 (CI: 8.3–8.8) million people, were living with LTBI in 2015. Estimated LTBI prevalence among US-born persons was 1.0% (CI: 1.0%–1.1%) and among non-US-born persons was 13.9% (CI: 13.5%–14.3%). Among US-born persons, the highest LTBI prevalence was in persons aged ≥65 years (2.1%) and in persons of non-Hispanic Black race/ethnicity (3.1%). Among non-US-born persons, the highest LTBI prevalence was estimated in persons aged 45–64 years (16.3%) and persons of Asian and other racial/ethnic groups (19.1%). Conclusions Our estimations of the prevalence of LTBI by medical risk factors and demographic characteristics for each state could facilitate planning for testing and treatment interventions to eliminate TB in the United States. Our back-calculation method feasibly estimates untreated LTBI prevalence and can be updated using future TB disease case counts at the state or national level.

Introduction Tuberculosis (TB) incidence in the United States has declined substantially over the past several decades. In fact, the U.S. TB rate during 2019 declined to the lowest level on record, 27 cases per million persons (1.6% decline from 2018) [1] However, the rate has plateaued near 30 cases per million population annually since 2013 [2,3]. The annual pace of decline remains too slow to meet the national TB elimination goal of less than one case per million [4]. Genotyping of Mycobacterium tuberculosis isolates demonstrates that approximately 85% of TB disease cases in the United States are attributed to reactivation of latent infection with Mycobacterium tuberculosis that was acquired >2 years prior [3].
Preventing TB requires treatment of Mycobacterium tuberculosis infection that might progress to TB disease [5]. Targeted testing and treatment is needed to prevent TB in the large reservoir of persons with longstanding latent TB infection (LTBI) [6,7]. Estimations of the prevalence of untreated LTBI in populations at risk for TB by state could facilitate planning for testing and treatment interventions to accelerate the TB decline and eliminate TB in the United States.
Unfortunately, estimating the true burden of LTBI prevalence is challenging because LTBI is not a reportable condition in most U.S. states. A study of data from the National Health and Nutrition Examination Survey (NHANES) 2011-2012, which was the most recent cycle to test for TB infection, estimated that approximately 13.3 (95% CI: 9.6-17.8) million noninstitutionalized civilian U.S. residents would have a positive tuberculin skin test for TB infection [8]. Similar testing-based estimates at the state or local level are unavailable, because implementing a representative population-based prevalence survey would be too time-and resource-intensive for most state health authorities. However, relying on national estimates to inform state or local programs and policies for targeted testing and treatment of LTBI may potentially leading to wasted resources if, in reality, local estimates and populations at risk differ from the national pattern.
Recently, Haddad et al. [9] estimated untreated LTBI prevalence at the state and county level using a uniform annual reactivation rate applied to the entire population. However, their methodology did not estimate LTBI prevalence within populations having medical risk factors that increase risk for TB progression, nor stratify LTBI prevalence by demographic characteristics such as age or race/ethnicity. Having more detailed state-level estimates could be informative for identifying those populations that would most benefit from TB preventive care strategies. Several TB models demonstrate the potential health impact and attractive cost-effectiveness of expanded testing and treatment for LTBI in populations at high risk for TB [6,10]. Without treatment, patients with LTBI have a 5%-10% lifetime risk of progression to TB [11,12]; that risk, however, varies based on individual medical risk factors and certain demographic characteristics such as age.
In this study, we applied previously derived population-specific annual reactivation rates to population-specific annual average counts of TB cases to back-calculate estimates of LTBI prevalence for the entire United States, as well as for each of the 50 U.S. states and the District of Columbia, by medical risk factor, age group, and race/ethnicity, both in the total population and stratified by nativity (i.e., US-born or non-US-born), in 2015.

Data source: Counts of reported TB cases not attributed to recent transmission
The National Tuberculosis Surveillance System [2,3] provided aggregate counts of reported cases of TB disease in the 50 U.S. states and the District of Columbia during 2013-2017 that were not attributed to recent transmission. In the United States, recent transmission is now routinely estimated using the France et al. field-validated plausible source-case method [2,3,13] (i.e., plausible infectious source case in a person �10 years of age within 10 miles in the previous 2 years having a matching genotype result). We back-calculated solely from those TB cases not attributed or imputed (see below for details) to recent transmission because our focus was on estimating longstanding LTBI that could be diagnosed by targeted testing and then treated. For similar reasons, we excluded all cases occurring in children under the age of 1 year.

Back-calculation method overview
Our back-calculation method applied previously derived population-specific TB reactivation rates from the literature [14,15]. We divided the average annual count of TB cases not attributed to recent transmission by the corresponding estimated TB reactivation rates (point estimates and 95% confidence intervals [CI]) to estimate counts of people with those characteristics who were living with LTBI in 2015. We repeated this calculation for US-born and non-US-born populations iteratively classified into three groupings based on five medical risk factors, five age groups, and four race/ethnicity categories reported to the National Tuberculosis Surveillance System. Reported medical risk factors among TB cases were used to estimate medical risk factors among persons with LTBI.
Because the total count of persons estimated to have LTBI when summed across medical risk factor, age, and race/ethnicity categories differed from the total estimated LTBI count, we considered the sum across the five age groups within the stratified US-born and non-US-born populations to be the referent total. We then proportionally adjusted the estimated LTBI counts within the medical risk factor and race/ethnicity categories to match that referent.
Finally, to provide LTBI estimates as a proportion of the underlying population, we used the 2015 American Community Survey midpoint estimates [16] for each state's population size by age group, race/ethnicity, and nativity. Similar state-level denominators for medical risk factor prevalence stratified by nativity are not available, so those proportions are not presented.
For this analysis, we used freely available R 3.6.3 software (R Core Team, Vienna, Austria) [17]. The R code for the back-calculation model, which can be adapted for any jurisdiction or time period, is included as S2 Appendix in S1 File.

Model inputs: Estimated annual TB reactivation rates among people living with LTBI
Our back-calculation model inputs included the previously derived TB reactivation rates in the United States reported by Shea et al. [14] and the reactivation rate ratios (RRR) from an international systematic review conducted by Yeats [15]. Similar to our analysis, the Shea et al. estimates excluded children under the age of 1 year and, using an earlier genotype-based methodology, TB cases attributed to recent transmission. Shea et al. also stratified TB reactivation rates by nativity [14]. The Yeats systematic review provided the RRRs used to derive TB reactivation rates for all persons, regardless of nativity, with certain medical risk factors [15]. The estimated reactivation rates for all population groupings are presented in Table 1.
Shea et al. [14] estimated an overall TB reactivation rate of 0.084 (95% CI 0.083-0.085) per 100 person-years in the total population. By nativity, the estimated reactivation rate was 0.082 (0.080-0.083) among US-born and 0.098 (0.096-0.100) among non-US-born persons. Estimated reactivation rates were higher among people living with HIV (1.82) and varied across  [14] overall USborn and non-US-born reactivation rates, without regard to medical risk factor presence, by the Yeats (11) medical risk RRRs (S1 Appendix in S1 File

Approach for missing data and TB cases with >1 medical risk factor
All TB cases lacking documentation of M. tuberculosis culture positivity (and thus genotype result) were missing a recent transmission determination. Missing age (n = 7 cases), nativity (n = 34), race/ethnicity (n = 105), and recent transmission (n = 12,249) variables were imputed using predictive mean matching with state, reporting year, and medical risk factors as covariates in the imputation model (R "mice" package) [18]. We used multiple imputation (5 runs) with a random seed to impute missing data. Separate imputations were conducted for USborn and non-US-born subgroups. TB cases with multiple medical risk factors were hierarchically classified into the risk factor category having the highest reactivation rate.
Estimated prevalence among non-US-born persons was 14 times the estimated prevalence among US-born persons (13.9% vs. 1.0%) ( Table 3). National LTBI prevalence estimates ranged from 0.2% for children aged 1-14 years to 4.2% for adults aged 45-64 years (Table 4). By race/ethnicity, persons of Asian or other race/ethnicity in the total U.S. population were estimated to have the highest LTBI prevalence (8.7%). Among US-born persons, those aged �65 years (2.1%) or non-Hispanic Black (3.1%) were estimated to have the highest LTBI prevalence. Among non-US-born persons, those aged 45-64 years (16.3%) or Asian or other race/ ethnicity (19.1%) were estimated to have the highest LTBI prevalence.

Characteristics of persons predicted to have LTBI
About 10.5% of the US-born persons estimated as having LTBI had medical risk factors (Table 2); the most common risk factor (8.0%) was diabetes without other concomitant conditions such as ESRD or HIV, followed by immunosuppressive therapy (1.7%). Most of the USborn persons estimated as having LTBI were non-Hispanic White (46.5%). The predominant age group was 45-64 years (44.2%).
About 12.4% of the non-US-born persons estimated as having LTBI had medical risk factors. The most common (10.7%) was diabetes without other concomitant conditions such as ESRD or HIV, followed by immunosuppressive therapy (1.1%). Most of the non-US-born persons estimated as having LTBI were Hispanic (36.7%) or of Asian or other race/ethnicity (36.4%). The predominant age group was 25-44 years (43.2%).

State-level estimates of LTBI prevalence
The 2 states estimated to have the highest number of persons living with LTBI were California (1,722,575) and Texas (1,081,749). Estimated total population LTBI prevalence in the 4 states (California, New York, Texas, Florida) with the highest annual counts of TB cases ranged from 3.0% in Florida to 4.5% in California. In 11 states, estimated total population LTBI prevalence

26.5%
� CI = confidence interval; 95% CI based solely on previously derived population-specific reactivation rates (see Table 1 and S1 Appendix in S1 File  was �3%. In 19 states, the estimated LTBI prevalence in the US-born population was �1%. In 26 states, the estimated LTBI prevalence in the non-US-born population was �15% (Fig 1 and Table 3). The state-level LTBI prevalence by age groups and race/ethnicity are presented in S1 and S2 Figs in S1 File. Each state's predicted total number of people with LTBI, as well as grouped by medical risk factor, age group, and race/ethnicity, is presented in S1 Table. Median estimated state-level total population LTBI prevalence was 2.4% (interquartile range [IQR] 1.1%-4.2%). Median estimated state-level LTBI prevalence among US-born persons was 0.6% (IQR 0.3%-1.5%) and among non-US-born persons was 13.5%
�� New York city and NY (rest of NY) are combined when producing NY estimates.
https://doi.org/10.1371/journal.pone.0249012.t003 Table 4. Estimated prevalence of latent tuberculosis infection in the United States in 2015 within age group and race/ethnicity, stratified by nativity and in total population. (IQR 9.7%-18.9%). State-level estimates for the prevalence of LTBI by age group and race/ethnicity are presented in S1 Table and summarized visually in Fig 2. By race/ethnicity, highest median estimated state-level LTBI prevalence was among Asian or other race/ethnic groups (median 7.6%, IQR 5.4%-9.8%).

Discussion
We used a back-calculation method using data from the National Tuberculosis Surveillance System, estimation and imputation of recent TB transmission, and previously published TB reactivation estimates to produce national and state-level estimates of LTBI prevalence, both in the total population and within sub-groups. Our model estimated 8.6 million people (2.7%) were living with untreated LTBI in the United States in 2015, of whom the majority (68%) were non-US-born. Estimated LTBI prevalence among US-born persons was 1.0% and among non-US-born persons was 13.9%. Among US-born persons, the highest LTBI prevalence was among persons aged �65 years (2.1%) and persons of non-Hispanic Black race/ethnicity   [9]. Similar to nationally representative survey results, our results indicated substantially lower LTBI prevalence in the US-born population than in the non-US-born population of the United States; however, this ratio varied from state to state. Our results also estimated variability of LTBI prevalence among populations. This variability based on nativity was driven by both demographic differences among TB cases and previously estimated differential reactivation [14,19] between US-born and non-US-born persons living with LTBI. Geographic variations in demographic and risk factors of TB cases can explain much of the differences in estimated LTBI prevalence seen at the state level.
Compared to the LTBI prevalence estimates from NHANES 2011-12 (4.7% total, 1.5% USborn, 20.5% non-US-born) [8], our results were much lower for total (2.7%) and non-US-born (13.9%). The estimates from NHANES were based on Tuberculin skin test (TST) results and thus might have overestimated true LTBI prevalence due to cross reaction with BCG vaccine. When Interferon Gamma Release Assay (IGRA) positivity was used in NHANES, the LTBI estimate for non-US-born dropped to 15.9% (95%CI 13.5-18.7) [8], much closer to our estimate. However, our estimates for LTBI among older non-US-born groups was higher than NHANES IGRA findings: age 45-64 years (23.5% vs. 16.3%) and 65+ years (32.1% vs 9.8%). This may reflect cohort effects, with 2010 older birth cohorts (with higher TB infection rates) being replaced in recent years with lower infection rate younger birth cohorts [20].
Our findings suggest that one strategy to achieve TB elimination in the United States would be to prioritize all non-US-born persons, irrespective of medical risk factors, for LTBI screening and treatment. Focusing on non-US-born persons aged 25-64 years would reach up to 82.2%, or focusing on non-US-born Hispanic, Asian, or other race/ethnicity would reach up to 71.6%, of all estimated untreated LTBI among non-US-born persons. The U.S. Preventive Services Task Force guidance has recommended some components of this approach, focusing on screening asymptomatic adults born outside the United States in high TB prevalence countries and persons, regardless of nativity, living in congregate settings including correctional institutions and homeless shelters [21]. Additional guidance, such as the California TB Risk Assessment tool, recommends LTBI testing and treatment for all non-US-born individuals, individuals with immunosuppressive conditions or taking immunosuppressive therapy, and individuals who have had contact with someone with infectious TB disease during their lifetime [22]. Recent modeling has demonstrated that adherence to such approaches could substantially reduce the burden of TB disease, reducing incidence by 40% [23].
Both as a count and a proportion, the US-born population has a lower total LTBI prevalence. In addition, most of the cases attributed to recent TB transmission in the United States occur among US-born persons [2,3,13]. Some of our estimates of LTBI prevalence in various demographic (e.g., Hispanic persons) and medical risk groups (e.g., diabetes) are lower than those reported elsewhere [8]. The Shea et al. [14] difference in the reactivation rates between US-born Hispanic persons (0.178 per 100 person-years) and non-US-born Hispanic persons (0.086 per 100 person-years) applied in our back-calculation might have led to an underestimation of LTBI prevalence in Hispanic populations in comparison to the estimates from the National Health and Nutrition Examination Survey (NHANES) 2011-2012 [8]. The confidence interval for our estimate of the number of people with LTBI who also have diabetes (0.9 million, 95% CI 0.7 to 1.3) overlaps with the confidence interval reported using NHANES 2011-2012 (2.0 million, 95% CI 1.2 to 3.1) [8]. The discrepancy in point estimates may be a consequence of our assumption that all TB cases reported with no indication of diabetes status did not have diabetes, whereas all NHANES participants aged �12 years were systematically screened for diabetes. About 21.4% of all U.S. adults who met laboratory criteria for diabetes are not diagnosed with diabetes [24], and so it is likely that we under ascertained diabetes among TB cases.
Our study had additional limitations. First, published estimates of reactivation rates are scarce, and the 95% CIs presented in this analysis reflect only the imprecision in the input parameters derived from the Shea et al. and Yeats data sources [14,15], excluding any other potential sources of uncertainty in our LTBI prevalence estimates. Second, for TB cases with more than one medical risk factor, we considered only the risk factor with the higher reactivation rate. This hierarchy might have led to an overestimation of the LTBI prevalence by medical risk factor if actually having multiple risk factors would cause an even higher reactivation risk. This limitation also prevented us from being able to provide more refined LTBI prevalence estimates for persons with multiple medical comorbidities. Third, in deriving our reactivation rates for persons with medical risk factors for progression to TB disease (S1 Appendix in S1 File), we assumed that the TB reactivation rates based on the total population [14,15] could represent the experience of the population without identified medical risk factors. Fourth, time since the initial M. tuberculosis infection and the comorbid conditions (e.g., diabetes), including which occurred first, is unknown. Fifth, undocumented comorbid conditions among persons with reported TB may have led to an overestimation of LTBI in the total population, in that persons with unrecognized conditions would have been assigned lower reactivation rates than their conditions might actually engender. Sixth, we used a well-defined method [13] to distinguish cases attributed to non-recent transmission from those attributed to recent transmission of TB infection, but definitive classification using this method can be difficult. Seventh, the analysis here did not assess the effect of LTBI treatment; however, it is likely that treated LTBI would not have reactivated to TB disease. Finally, we imputed the missing data, including many cases missing data on recent transmission, which narrowed the estimated confidence limits.
Our back-calculation method has two advantages. First, our method is based on TB cases reported to the National Tuberculosis Surveillance System during 2013-2017; this high-quality dataset has standardized reporting for each case of TB disease in every U.S. state [2,13]. These data are available, and the analysis coding has also been made publicly available, making replication and updating of LTBI prevalence estimates feasible, either by CDC or other entities. Second, our estimates describe LTBI prevalence within geographic populations as defined by age group and race/ethnicity, which combined with American Community Survey denominators can be informative for identifying those populations in the United States who would most benefit from interventions to prevent future TB cases.