Developing a model to predict unfavourable treatment outcomes in patients with tuberculosis and human immunodeficiency virus co-infection in Delhi, India

Background Tuberculosis (TB) patients with human immunodeficiency virus (HIV) co-infection have worse TB treatment outcomes compared to patients with TB alone. The distribution of unfavourable treatment outcomes differs by socio-demographic and clinical characteristics, allowing for early identification of patients at risk. Objective To develop a statistical model that can provide individual probabilities of unfavourable outcomes based on demographic and clinical characteristics of TB-HIV co-infected patients. Methodology We used data from all TB patients with known HIV-positive test results (aged ≥15 years) registered for first-line anti-TB treatment (ATT) in 2015 under the Revised National TB Control Programme (RNTCP) in Delhi, India. We included variables on demographics and pre-treatment clinical characteristics routinely recorded and reported to RNTCP and the National AIDS Control Organization. Binomial logistic regression was used to develop a statistical model to estimate probabilities of unfavourable TB treatment outcomes (i.e., death, loss to follow-up, treatment failure, transfer out of program, and a switch to drug-resistant regimen). Results Of 55,260 TB patients registered for ATT in 2015 in Delhi, 928 (2%) had known HIV-positive test results. Of these, 816 (88%) had drug-sensitive TB and were ≥15 years. Among 816 TB-HIV patients included, 157 (19%) had unfavourable TB treatment outcomes. We developed a model for predicting unfavourable outcomes using age, sex, disease classification (pulmonary versus extra-pulmonary), TB treatment category (new or previously treated case), sputum smear grade, known HIV status at TB diagnosis, antiretroviral treatment at TB diagnosis, and CD4 cell count at ATT initiation. The chi-square p-value for model calibration assessed using the Hosmer-Lemeshow test was 0.15. The model discrimination, measured as the area under the receiver operator characteristic (ROC) curve, was 0.78. Conclusion The model had good internal validity, but should be validated with an independent cohort of TB-HIV co-infected patients to assess its performance before clinical or programmatic use.


Objective
To develop a statistical model that can provide individual probabilities of unfavourable outcomes based on demographic and clinical characteristics of TB-HIV co-infected patients.

Methodology
We used data from all TB patients with known HIV-positive test results (aged !15 years) registered for first-line anti-TB treatment (ATT) in 2015 under the Revised National TB Control Programme (RNTCP) in Delhi, India. We included variables on demographics and pretreatment clinical characteristics routinely recorded and reported to RNTCP and the National AIDS Control Organization. Binomial logistic regression was used to develop a statistical model to estimate probabilities of unfavourable TB treatment outcomes (i.e., death, loss to follow-up, treatment failure, transfer out of program, and a switch to drug-resistant regimen). PLOS

Introduction
India has the highest tuberculosis (TB) burden in the world with an estimated 2.8 million new cases in 2016 [1]. Of these cases, 87,000 (3%) were estimated to also have human immunodeficiency virus (HIV) co-infection, which is the second highest TB-HIV burden in the world after South Africa [1]. HIV co-infection in persons with TB disease increases the risk of morbidity and mortality and is one of the strongest independent predictors of unfavourable treatment outcomes (death, lost to follow-up, treatment failure) [2]. Since 2004, the World Health Organization (WHO) has recommended testing all TB patients for HIV to allow for early initiation of anti-retroviral therapy (ART) and co-trimoxazole preventive therapy (CPT), thereby reducing mortality [3]. Although, India was an early adopter of these guidelines, treatment success rates for TB patients with HIV remain lower than treatment success rates for TB patients without HIV [3]. In 2014, the treatment success rate for TB patients with HIV was 76% compared to 87% for TB patients without HIV [4]. This gap indicates opportunities to improve treatment outcomes for TB patients with HIV.
Previous studies in India have shown that the risk of unfavourable outcomes is not uniform and certain patient sub-groups experience higher unfavourable outcomes [5][6][7][8][9]. Age, extrapulmonary TB, low CD4 counts (<200 cells per cubic millimeter) at the time of initiating anti-TB treatment (ATT), and history of previous TB treatment have been independently associated with unfavourable treatment outcomes [5,8]. However, quantifying the patient-specific probability of unfavourable treatment outcomes at the time of diagnosis would be helpful for both clinical and programmatic purposes. Statistical models have been used to predict the probability of future patient outcomes based on certain known characteristics and are often used in medicine and public health to guide decision-making [10]. However, such models have not been developed in India to guide clinicians and national TB programme staff managing TB-HIV co-infection.
Our study objective was to use routine surveillance data from 2015 to develop a statistical model to predict the probability of unfavourable treatment outcomes among TB patients with HIV who were registered for first-line TB treatment at RNTCP clinics in Delhi. Surveillance data from the state of Delhi were selected for analyses because Delhi has the highest TB case notification rate among all states in India (314 cases per 100,000) [11]. In addition, in 2014, patients co-infected with drug-sensitive TB and HIV in Delhi had an 11% lower treatment success rate compared to patients with drug-sensitive TB alone, suggesting a need to improve TB treatment outcomes among TB-HIV co-infected patients in Delhi [12].

Study setting
In Delhi, TB diagnostic and treatment services are coordinated by 25 District Tuberculosis Centres, which oversee an estimated 400 sub-district level facilities and 12 large tertiary care hospitals. HIV diagnostic and treatment services are delivered through a network of 93 Integrated Counselling and Testing Centres (ICTC) and 11 ART centres, which provide a comprehensive package of treatment and support services to people living with HIV (PLHIV). All TB patients with unknown HIV status are recommended to be tested for HIV [13]. If a TB patient tests positive for HIV, they are referred to ART centres for further evaluation and ART initiation. If TB patients are known to be HIV-positive and are already on ART, their ART is continued and managed at ART centres. HIV positive TB patients are also tested for drug resistant TB by Xpert-MTB/Rif tests and those found to be having resistance to rifampicin are treated with multidrug resistant TB treatment regimens. Standard WHO definitions are used by RNTCP to classify TB patients and TB treatment outcomes [13,14]. The case definitions and treatment outcomes used by RNTCP are given in Table 1.

Study design and study population
This was a retrospective cohort study of all TB patients with known HIV-positive test results registered for first-line TB treatment in Delhi in 2015. We included both new and previously treated patients who received first-line, directly observed treatment for TB. We excluded patients with known multidrug-resistant TB, rifampicin-resistant TB, and children aged less than 15 years, in whom treatment outcomes can be impacted by factors other than HIV coinfection.

Data collection
We abstracted data from three sources: TB registers at District TB Centres; ART registers maintained at ART centres; and individual TB treatment cards for the cohort of TB patients with HIV registered under RNTCP during 2015. We estimated the sample size for our study using the general principle of having >10 patients with unfavourable events per variable for predictive modelling to prevent the problem of overfitting [15]. The expected unfavourable event proportion was~20% and we planned to include ten variables in our predictive model. This resulted in a sample size in excess of 450 patients was necessary in order to develop a predictive model.

Data variables
The dependent or outcome variable for our study was dichotomised into favourable or unfavourable outcomes among patients. Favourable treatment outcomes included cure and treatment completed. Unfavourable treatment outcomes included death, loss to follow-up, treatment failure, transfer out, or a switch to MDR TB treatment.
The independent (or predictor) variables for our study included age, sex, disease classification (pulmonary versus extra-pulmonary), TB treatment category (new or previous history of ATT), sputum smear status (positive or negative), sputum smear grade, patient's pre-treatment weight, HIV and ART status at the time of TB diagnosis, and CD4 cell count at the time of initiating the patient on ATT.

Data validation
We cross-verified data from three sources (TB registers, ART registers, and TB treatment cards) for consistency. If data were inconsistent then data related to TB diagnosis and treatment were taken from TB registers and data related HIV were taken from ART registers.

Disease classification
• Pulmonary TB: Any microbiologically confirmed or clinically diagnosed case of TB involving the lung parenchyma or the trachea-bronchial tree. Smear positive: A new case of pulmonary TB is considered to be smear-positive if one or more sputum smear specimens at the start of treatment are positive for acid fast bacilli (AFB).
Smear Negative: A patient with symptoms suggestive of TB with at least 2 sputum smears negative for acid fast bacillus and either: a) radiographic abnormalities consistent with active pulmonary TB, as determined by the treating medical officer or b) positive culture for Mycobacterium tuberculosis, followed by a decision to treat the patient with a full course of anti-TB therapy.
• Extra-pulmonary TB: Any microbiologically confirmed or clinically diagnosed case of TB involving organs other than the lungs such as pleura, lymph nodes, intestines, genitourinary tract, joint and bones, or meninges of the brain.

Types of TB cases
• New: A patient who has never had treatment for tuberculosis or has taken anti-tuberculosis drugs for less than one month. The patient can be either new smear positive, new smear negative or new extra-pulmonary TB.
• Previously treated: A patient who has taken anti-TB treatment for more than a month from any source in the past. There are four types of previously treated cases: Relapse: A patient declared cured of TB by a physician, but who reports back to the health service and is found to be bacteriologically positive. Treatment after loss to follow-up: A patient who received anti-tuberculosis treatment for one month or more from any source and who returns to treatment after having defaulted, i.e. not taken anti-TB drugs consecutively for two months or more.
Treatment after failure: A smear-positive patient who is smear positive at 5 months or more after starting treatment. Failure also includes a patient who was initially smear-negative but who becomes smear-positive during treatment.
Retreatment-Other-Patients who do not fit into the above-mentioned previously treated categories.

Sputum smear status and grade (Ziehl-Neelsen Staining Method)
• • Cured-A patient who is initially smear-positive who has completed treatment and had negative sputum smears, on at least two occasions, one of which was at completion of treatment.
• Treatment completed-Any of the following: a) a patient who was initially sputum smear-positive who has completed treatment, with negative smears at the end of the initial phase but none at the end of treatment; or b)a patient who was initially sputum smear-negative who received a full course of treatment and has not become smear-positive during or at the end of treatment; or c)a patient with extra-pulmonary TB who has received a full course of treatment and has not become smearpositive during or at the end of treatment.
• Lost to follow-up: A patient who, at any time after registration, interrupted anti-TB drugs treatment for 2 months or more consecutively any time after starting treatment.
• Treatment failure: Either: a) a patient who is initially smear-positive who remains smear-positive at 5 months or more after starting treatment; or b) a patient who was initially smear-negative but who became smear-positive during treatment.
• Death: A patient who died during TB treatment, regardless of cause.
• Transferred out: A patient who has been transferred to another Tuberculosis Unit/District and his/her treatment results are not known.
• Switched to a MDR-TB treatment regimen: A TB patient who is on first line regimen and has been diagnosed as having DR-TB and switched to a drug resistant TB regimen prior to being declared as a treatment failure from first-line treatment.

Data analyses
We analysed data using Stata version 12.1 (College Station, TX: StataCorp LP), and summarized categorical variables using proportions.
We used a log binomial model to estimate the crude relative risk and 95% confidence interval in order to describe the association between predictor variables and TB treatment outcomes.
To develop the predictive model we used all variables in our dataset. Only patients who had complete data on all the predictor and outcome variables were retained for model building. We initially assessed for multicollinearity between variables using Pearson's correlation coefficient and one of two variables that were found to be collinear (correlation co-efficient >0.7) were included. We then conducted a step-wise backward selection process to identify predictors of outcome, retaining variables with a p-value of 0.15. We also modelled age in years as a continuous variable or as ordinal categories (age groups). We categorised the variable sputum smear grade (which had five categories) into two categories: smear positive and smear negative/unknown and assessed for its inclusion in the model.
We used the binomial logit model to obtain the coefficients for the prediction model. We used the link test in Stata to assess specification errors in the logit link function. We also added interaction terms between various clinical and demographic variables while identifying the most suitable model. Models that did not identify specification errors were assessed for calibration using Hosmer and Lemeshow's goodness-of-fit test and models with a chi-square p-value more than 0.05 were considered. Models were also assessed for discrimination using area under the ROC curve; models with area under the curve greater than 0.75 were considered. Akaike Information Criteria (AIC) and Bayesian Information Criterion (BIC) values were calculated for all the models that fulfilled the above criteria, and the model with the lowest AIC and BIC values was chosen as the final predictor model. The probability of an unfavourable outcome for each patient was estimated using the following binary logistic regression equation: where P(Y) is the probability of given outcome to be predicted, βn indicates the coefficients of the model for the X1, X2. . ..Xn are independent variables and ' Ã ' in the model indicates interaction terms between two variables.

Ethics
We obtained approval for the study protocol from the local ethics committee of the New Delhi TB Centre (New Delhi, India); the Ethics Advisory Group of the International Union Against Tuberculosis and Lung Disease (Paris, France); and the US Centers for Disease Control and Prevention (Atlanta, GA, USA).

Results
There were 55,260 TB patients registered for first-line ATT in Delhi from January through December 2015 and 928 (2%) had known HIV-positive test results (Fig 1) Among these TB-HIV co-infected patients, 816 (88%) were included in our study as they had drug-sensitive TB and were !15 years of age. Of these, 157 (19%) had unfavourable TB treatment outcomes.
The demographic and clinical characteristics of eligible TB-HIV patients enrolled in Delhi, along with proportions of unfavourable outcomes, are presented in Table 2. Over two-thirds (n = 553, 67%) were aged 25-44 years of age, and male (n = 637, 78%). Almost half of these cases (n = 392, 48%) had extra-pulmonary TB, and 349 (43%) were diagnosed with HIV after presenting for TB diagnosis. Only 450 (55%) cases had a CD4 cell count recorded at TB diagnosis; of those, 342 (76%) had a CD4 cell count <350 cells per cubic millimetre. Patients who were not on ART were initiated on ART within 2-3 weeks of TB treatment. Of the 816 patients included in our study, 659 (81%) had favourable TB treatment outcomes (i.e., cured and treatment completed, Table 2). In bivariate analyses, age group, and sex were not associated with unfavourable TB treatment outcomes, while disease classification, history of previous ATT, smear status HIV and ART status at the time of diagnosis, and CD4 count at the time of initiating ATT were associated with unfavourable treatment outcome at p<0.15 ( Table 2).

Predictor model
Of 816 patients included in our study, 448 (55%) had complete information on age, sex, disease classification, history of previous ATT, HIV status, ART status, and CD4 cell count at ATT initiation. These patients were included in our model. There were no statistical differences in clinical characteristics (disease classification, type of TB, sputum smear status and the outcome) among patients included and excluded in the model ( Table 3).
The step-wise backward selection process to identify the most relevant predictors of outcome (retaining the variables with a p-value of 0.15) identified sputum smear grade [or sputum smear status (positive or negative)], previous history of ATT, disease classification (pulmonary versus extra-pulmonary), HIV and ART status at the time of TB diagnosis, and CD4 cell count at the time of initiating the patient on ATT as the most important predictors for unfavourable outcome. We used multiple combination/iterations of these variables to identify the most appropriate model. While assessing the models, we added age and sex into the model as these are programmatically relevant and this improved the model performance and therefore these two variables were retained. We also added interaction terms between demographic and clinical characteristics (age and sex; sputum smear status and type of TB; HIV status at TB diagnosis and CD4 cell category) alone and in combination. Models that had p-values >0.05 for model calibration (Hosmer-Lemeshow test with 10 groups), relatively lower levels of AIC, BIC values and relatively higher value for model discrimination (areas under the ROC curve for sensitivity and 1 minus specificity) were considered to be relatively better performing models. The various combinations/iterations of the variables that we used are shown in Table 4 and the footnote of the table provides the variable properties. The model that performed best in our iterations contained the following variables: sputum smear grade; new/previous history of ATT; disease classification (pulmonary versus extra-pulmonary); HIV status, ART status, and CD4 cell count at the time of TB diagnosis; sex and age (with interaction terms between age and sex; sputum smear status and type of TB; HIV status at TB diagnosis and CD4 cell category). The coefficients of the prediction model that we selected are shown in Table 5.
The chi-square p-value for the model calibration using the Hosmer-Lemeshow test with 10 groups for this model was 0.14, and the discrimination of the prediction model, measured as the area under the receiver operator characteristic (ROC) curve, was 0.78 (Fig 2), an indication that the model had good discriminatory and identifying ability to identify with unfavourable outcomes. The sensitivity and specificity for identifying patients with adverse outcomes at various cut-off values of predicted probabilities derived from the prediction model are shown in

Discussion
The model developed in this study was used to predict the probability of unfavourable outcomes in TB patients with HIV at the time of initiating ATT in Delhi and had good internal validity.  The results of this study should be interpreted in the context of its limitations. First, this model was developed using secondary data collected under routine programmatic conditions; therefore, the validity of this model was dependent on the accuracy and completeness of the data recorded. Although our study had substantial missing data, there are data monitoring systems in place [16], and therefore errors in recording are likely to be random and would not substantially influence model parameters. Additionally, comparison of data for patients included and excluded from the prediction model appear similar (Table 3), further suggesting that data were missing at random and therefore should not substantially influence the model. Second, we only used variables that are routinely collected by TB and HIV surveillance systems at the time of diagnosis. As such, we were unable to include variables that are known to be associated with unfavourable outcomes, such as HIV viral load, TB drug resistance status, other co-morbidities, homelessness, and socio-economic status [17] because these variables were not recorded by the surveillance systems. The extent to which model fit could have been improved by inclusion of such unmeasured variables is an area for future research. Third, we used data from patients who were registered for TB treatment under the national TB programme. Approximately 10% of patients diagnosed with TB in India do not receive treatment due to pre-treatment loss to follow-up, while a large number of TB patients receive treatment from health care providers in the private sector who are outside the national TB programme [18,19]. In addition, our model may not reflect the probabilities of unfavourable outcomes for TB-HIV patients outside Delhi or in other states of India or in other calendar years. Fourth, we have used various iterations of multivariable logistic regression for developing the prediction model manually ( Table 4). There are several other emerging newer ways of model building using sophisticated machine learning techniques which automates the process [20][21][22]. These newer techniques may have resulted in identifying models that have a relatively better  fit. This is an area for future research. Finally, the model's applicability may change in the future when case definitions and management protocols may change. Despite these limitations, developing prediction models from data that are routinely collected is a first step towards informing clinicians and programme managers about patients at risk of unfavourable outcomes. Previous studies in India have looked at patients' demographic and clinical characteristics in isolation and have described whether these variables are independently associated with unfavourable outcomes. By contrast, our study derived a mathematical equation, which can be used to estimate the probabilities of unfavourable outcomes for patients at the time of ATT initiation. This is an improvement over previous studies as it allows for a more tailored approach to care, such as more frequent monitoring of patients or more focused nutrition supplementation. That said, the model developed is not yet ready to be used in clinical or public health practice. As a next step, the model needs to be validated on an independent sample of patients to assess the model performance. This process is also known as "validation" of model using an external sample [23] and could be done using a cohort of TB-HIV patients enrolled in the year 2016. Depending on the performance of the model in this external cohort of patients, the model may be considered for clinical or programmatic use.

Clinical characteristics
In addition to the model, there were notable findings from this study that have important public health implications. First, this study shows that in 43% of the cases, HIV was detected subsequent to TB diagnosis, suggesting that routine HIV testing among TB patients remains important for identification of new HIV cases. However, 16% of TB patients registered for treatment under RNTCP in Delhi during 2015 did not have a documented HIV test result, indicating missed opportunities to detect new HIV cases. These findings call for re-examining strategies for early detection of HIV in key populations to allow for early initiation of ART and early initiation of either ATT or isoniazid preventive therapy to prevent TB altogether.
Second, the overall proportion of patients with favourable TB treatment outcomes in this cohort was 81%, higher than in previous studies conducted in southern India and in tertiary care centers in northern India, where treatment success ranged between 66% and 77% [8,9]. This could be because our study occurred in an era in which ART is widely available compared to earlier studies conducted when ART was not as readily available to patients.
In conclusion, approximately, one in five TB-HIV co-infected patients in Delhi enrolled for treatment under the RNTCP in 2015 had an unfavourable TB treatment outcome. A statistical model with good calibration and discrimination was developed using demographic and clinical data routinely collected by the RNTCP at the time of enrolling patients for TB treatment. This model should be externally validated across multiple years of RNTCP data, then with an independent cohort of TB-HIV patients to assess its performance in Delhi and other areas of India before it is used in clinical and/or programmatic practice to predict unfavourable outcomes among patients co-infected with TB and HIV.