The Risk of Tuberculosis Reinfection Soon after Cure of a First Disease Episode Is Extremely High in a Hyperendemic Community

Elevated rates of reinfection tuberculosis in various hyperendemic regions have been reported and, in particular, it has been shown that in a high-incidence setting near Cape Town, South Africa, the rate of reinfection tuberculosis (TB) disease after cure of a previous TB disease episode is about four times greater than the rate of first-time TB disease. It is not known whether this elevated rate is caused by a high reinfection rate due, for instance, to living circumstances, or a high rate of progress to disease specific to the patients, or both. In order to address that question we analysed an extensive data set from clinics attended by TB patients in the high-incidence setting near Cape Town, South Africa and found that, in fact, the (average) rate of reinfection (as opposed to the rate of reinfection disease) after cure of a previous TB disease episode is initially about 0.85 per annum. This rate diminishes rapidly over time and after about ten years this rate is similar to the rate of infection in the general population. Also, the rate of progress to disease after reinfection is initially high but declines in subsequent years down to the figure typical for the general population. These findings suggest that the first few months after cure of a TB disease episode form a critical period for controlling reinfection disease in a hyperendemic setting and that monitoring such cured patients could pre-empt a reinfection progressing to active disease.


Introduction
It is known that in high TB incidence settings the rate of recurrent TB disease is much higher than the rate for first-time disease [1]. Many of these recurrent disease cases are due to relapse but the remaining cases are reinfection cases as confirmed by fingerprint comparison between the first episode and second episode. Indeed, it has been shown that, at least in one high incidence setting, the dominant cause of recurrent disease after the first year following cure is reinfection [2]. In the same setting the rate of reinfection disease appears to be as high as four times the rate of first-time infection (primary) disease [3]. High reinfection disease rates have also been reported from China [4] and elsewhere [1].
It is not clear why the rate of reinfection disease can be elevated compared to the rate of primary disease. It may be speculated that people who have already experienced a primary disease episode have just been unfortunate in finding themselves in a high risk socioeconomic environment possibly involving their workplace, school, club or transport situation, where other infectious cases continue to be present [5,6]. Otherwise it could be that they are innately predisposed to progress to disease, possibly due to genetic risk factors [7,8], lung damage or immunological deficiency caused by the previous episode, and perhaps smoking and drug abuse. It may be that a combination of these factors could be responsible. So a high rate of reinfection or a high rate of progress to disease or both could be responsible for the high rate of reinfection disease. The relative importance of these factors is the subject of much debate [9,10] since direct observation of either of these rates is difficult. We estimated these rates by analysing an extensive data set collected from clinics serving a high TB-incidence setting near Cape Town, South Africa. The results of the statistical analysis were followed by a mathematical analysis that enabled the estimation of the average annual rate of reinfection and average annual rate of progress to disease corresponding to increasing time periods after cure. Our results show clearly that it is the very high average annual rate of reinfection immediately after cure that drives the high rate of reinfection disease. The rate of progress to disease is also elevated but to only a modest extent. We show that both these rates decrease over time.

Methods
We estimate the annual rate of reinfection after cure of a previous first-time TB disease episode and also the rate of progression to active disease after reinfection has occurred using data from a particular hyperendemic community.
By 'a reinfection disease episode' we mean a disease episode resulting from a reinfection event occurring after cure of a previous disease episode as confirmed by the two episodes having different genotypes. By 'cured' we mean cured with bacteriological proof.

Data
We used data for the period 1993 to 2005 from the epidemiological field site of Ravensmead/ Uitsig, two adjacent hyperendemic urban communities of Cape Town.
These data had been collected and a subset of these data for the period 1993 to 1998 had previously been used by the authors GVDS, RW and PvH in the published paper 3 in the references. Another, overlapping, subset of these data for the period 1996 to 2008 had previously also been used by the authors GVDS and PvH in the published paper 2 in the references. These three different subsets were used as they represented the patient records that had been processed and which were available at the time the respective studies had commenced. All genotype work was done by these authors and placed in their own databank. None of the authors of this paper know the identity of the patients whose data was used. The patient information was anonymised by independent persons and these authors have no access to any information that can identify any person. The study was approved by SUN IRB (Stell Univ INst Review Board).
The main conclusions from these previous investigations were that: 2. Overall, reinfection was the dominant cause of recurrent tuberculosis after treatment had been completed [2].
For the present investigation we considered it necessary to take the following into account: 1. It is possible that the rate of reinfection or the ensuing rate of progression to disease may vary depending on the number of previous TB disease episodes experienced by the patient so we considered only first reinfection disease episodes i.e. reinfection disease episodes after a cured first TB episode.
2. The rate of recurrent disease forms an upper bound for the rate of reinfection disease. A recurrent disease with the same strain as a previous disease episode may be the outcome of a reinfection event subsequent to the cure of the previous disease episode or it could just be a relapse of the earlier disease episode. Since it is not possible to distinguish between these two scenarios using the available data, we assumed conservatively, that a recurrent TB episode should be classified as a reinfection disease episode only if the genotypes for the two episodes were different-otherwise we assumed the second to be a relapse case. The reinfection rate estimated based on this assumption is likely an underestimate of the true reinfection rate since it is possible that patients could be reinfected with a commonly circulating strain or exposed to the same untreated index case.
3. We considered all patients who had documented first TB disease episodes that were cured with bacteriologic proof. For these patients we obtained an estimate of the time interval from the date of cure of the first disease episode to the date of the start of the recurrent TB disease episode. In the dataset only the dates on which a patient was seen at the clinic were documented. We therefore assumed that the date of cure is six months after commencement of treatment for susceptible TB and eighteen months for multiple drug resistant TB. We assumed that all TB cases for which resistance information was not sufficiently documented were not multiple drug-resistant i.e. had a treatment period of 6 months. It should be noted that since this assumption will result in an over-estimation of the time to reinfection disease for some cases, it will cause an underestimation of the rate of reinfection.
4. The estimated date of the start of the recurrent disease episode was determined based on the assumption that patients go to the clinic (and get diagnosed) as soon as symptoms appear, which is in turn assumed to occur one month after the disease episode actually begins. It may happen that a patient delays visiting a clinic so that his disease episode actually starts earlier than our estimate: If this was in fact the case for some patients, our assumption would contribute to under-estimation of the rate of reinfection disease.
5. Unfortunately fingerprint typing for both the first and the second disease episodes was not available for all the patients whose first episode had been cured (with bacteriological proof). It is reasonable to suppose that the cases where both episodes had been typed form a random sample of this larger population (i.e. genotyping information is missing completely at random). In reference [3] it is made clear that the patients with DNA fingerprints were representative for those without so that the cases where fingerprints were missing do not constitute a biased subset. We believe therefore that we may assume that fingerprints are missing in an entirely random way resulting from, for example, administrative error or loss of viability of specimens during transport from the clinics to the laboratories or just simple laboratory failures. So for those patients missing typing data, we assume the same proportion p Diff experienced reinfection disease as in the subgroup with typing data available for both episodes. Since we believe this to be a reasonable assumption we will consider all cases with a cured first TB episode instead of only the subset with genotyping data available for both episodes. In this manner we obtain a larger subset to analyse. We noted that the proportion of the confirmed reinfection cases (where the first episode had been cured) compared to all recurrent cases (after cure) is p Diff = 0.57. The risk of reinfection disease within a period of time t after cure of a first disease episode is then calculated as 0.57 times the risk of recurrent disease within a period of time t after cure of a first disease episode.

Statistical Analysis
We considered all patients for whom there was a documented first TB disease episode that was cured with bacteriologic proof. For these patients we obtained an estimate of the time interval from the date of cure of the first disease episode to the date of infection with the recurrent TB infection. In the data set only the dates on which a patient was seen at the clinic were documented. So we estimated the date of infection for the recurrent episode on the assumption that patients go to the clinic (and get diagnosed) as soon as symptoms appear. This is assumed to occur 1 month after infection. Treatment for multi-drug resistant TB (defined as resistant to at least isoniazid and rifampicin) was assumed to be over an 18 month period while treatment of susceptible TB was assumed to take 6 months. The time interval was then calculated from the date of completion of the treatment until one month before diagnosis of the recurrent episode. This over-estimates the interval since it is most likely patients report to a clinic much later than one month after appearance of symptoms. Furthermore, again to be conservative, it was assumed that all TB disease cases for which resistance information was not documented (or sufficiently documented) were not multiple drug resistant and so had a treatment period of 6 months. For actual MDR cases this over-estimates the time to reinfection. Survival analysis was performed on the time-to-event data. The survival function was estimated using the product-limit estimator [11]. In the analysis individuals who experienced recurrence after the cure of their first episode, were censored at the time of recurrence (meaning that they were only considered to be at risk of reinfection until that time and only contributed to the analysis until that time), while those who did not experience relapse or reinfection (as far as we know) prior to the 10 th of October 2005 (the date of the last clinic visit in the data set) were censored at this date. This type of censoring is called right censoring.
No information regarding for example migration or death was available after treatment ended for a patient. It was therefore necessary to make the assumption that if an individual did not return to the clinic, they survived recurrent TB disease as well as death. This assumption will most likely result yet again in an underestimate of the rate of reinfection. The survival analysis was performed using the R programming language and built in functions of an R package called 'survival'. These functions correctly account for right censoring.
The pointwise confidence intervals for the Kaplan Meier curve for the survival of recurrent TB disease are derived from the confidence interval of the log hazard [12].
The confidence limits for the risk of reinfection disease can be obtained as follows. Since the risk of a recurrent TB disease episode within time t after cure is 1 minus the probability of surviving a recurrent TB disease episode for time t, the upper confidence limit of the risk of a recurrent disease episode within time t is 1 minus the lower confidence limit of the corresponding survival probability. Similarly for the lower confidence limit of the risk of a recurrent. The confidence limits for the risk of a reinfection disease episode (i.e. recurrent TB disease with a genotype different from that of the first) are p Diff times the corresponding confidence limits of the risk of a recurrent TB disease episode.
It is this risk of reinfection disease over increasing time periods post cure of first disease episode that is examined in the next section, Mathematical Analysis, in order to investigate the annual rate of reinfection and the annual rate of progress to disease after reinfection.

Mathematical Analysis
Here we present a summary of the method used to estimate the annual rate of reinfection and the annual rate of progress to disease.
We let λ 1 and λ 2 denote the annual rate of reinfection and the annual rate of progression to active disease after reinfection, respectively.
The probability, D, of becoming reinfected and then progressing to disease within T years after cure of a previous infection is given by We observe that this shows that the probability, D, increases from 0 at time T = 0. We now use the data regarding this probability to estimate λ 1 and λ 2 . We proceed from Eq (1) by calculating dD dT ¼ Àl 1 l 2 l 1 Àl 2 e Àl 1 T À e Àl 2 T ½ and finding the secondorder approximation (valid if λ i T ( 1): Let R ¼

Results
The five points made in the Methods section were taken into account when dealing with the data and the way these points determined which cases should be covered by our study (from the period 1993 to 2005) is best explained in Fig 2 which shows the number of patients in each of several categories of interest. Our analysis was applied to the 943 patients for whom the first disease episode was cured and the results are shown in Table 1. For the data covered by our analysis, p Diff (which was defined in the Methods section) is equal to 0.57 (33/57- Table 1). We can therefore expect that about 93 of the 163 patients (0.57 Ã 163 = 93) with a cured first episode and an observed second episode (see Fig 2) experienced that second episode as a result of reinfection. This represents about 10% of the 943 patients with a cured first disease episode. Thus reinfection disease constitutes a significant proportion of the case load.
The risk of reinfection disease as determined from the data and recorded in Table 1 increases as a function of time from cure of the first disease episode and is shown graphically in Rather, it appears that in fact the rates are high initially and decrease during the subsequent years.
In Fig 3 a polynomial trend line is fitted to the graph of the probability of reinfection disease. This polynomial is differentiated to obtain the gradient to the probability graph for various times after cure. This is done for the Upper and Lower confidence interval bounds as well.
This enables the estimation, using the formulae in Methods, of quantities l 1 and l 2 ( Table 2), these being candidates for λ 1 and λ 2 . (Recall that the formulae derived in the Methods section do not distinguish between the rate of reinfection and the rate of progress to disease). l 1 and l 2 are plotted in Fig 5. Fig 5 shows that l 1 and l 2 decrease over time with l 2 tending towards a value typical for the generally accepted value of the rate of progress to disease. We deduce then that l 1 represents the annual rate of reinfection after cure i.e. λ 1 .
Recall that the derivation of the formulae assumes that λ 1 and λ 2 are constant over time. Application of these formulae will therefore yield estimates for average values of λ 1 and λ 2 over the selected time periods. In particular, the values found for larger times T will be inflated because they are affected by contributions from earlier years when the values are high. Despite these limitations, Fig 5 shows   estimate of the average annual rate of reinfection over 10 years after cure (0.114 per annum) suggests that after about 10 years the rate of reinfection is similar to the rate of (first) infection for the population in general.

Conclusion
It has been established elsewhere that in hyperendemic settings reinfection is the major cause of recurrent TB disease. We found in the present study pertaining to the specific settings identified that it can be expected that approximately 57% of patients who were cured of a first episode and who experienced a recurrent episode, experienced reinfection as opposed to relapse.
These reinfection cases constitute a significant proportion (at least 10%) of all cases where the first episode had been cured. Moreover we also note that 43% of the patients who were cured with proof had a second episode with the same strain as the first. For the purposes of our analysis we conservatively regarded these as relapse cases but it is quite likely that such patients had been reinfected by commonly circulating strains or even by the (untreated) people responsible for infecting them the first time. In fact, we have previously found an association of specific Mtb strains with humans with a particular HLA type [13], and this could also explain why, in a hyperendemic setting with a number of strains present in the environment, the strain infecting the host the second time is often the same as the first, as this strain is the best "fit" for   that host. Since reinfection with the same strain was an exclusion criterion here, the estimated rate of reinfection leading to disease is an underestimate. Our study revealed that patients cured of a first disease episode were at great risk of becoming reinfected during the initial months after cure. Initially the annual reinfection rate is about 0.85, (see the Appendix for a very simple explanation of why this is actually a perfectly reasonable value for the reinfection rate) but the rate drops off very rapidly during the subsequent two to three years and continues to drop so that by about ten years since cure the rate has a value typical of that for the population generally. It may be hypothesised that these patients continued in a high risk environment after cure but such high risk conditions diminish over time for them as, for example, they move to different employment or the infectious people to whom they had been exposed become cured themselves. But we also know that human genetic susceptibility plays a significant role in determining which of the latently or newly infected hosts progress to active disease [14], and we may expect these hosts to have repeated episodes of disease in a hyperendemic area. We hypothesize that different allelic frequencies in genes that are crucial at a number of stages in the immune defence against Mtb will impact the host resistance to this infection and its progression. Within an admixed population such as that investigated here, there will be a wide variety of susceptibility alleles, some derived from different ancestral input [15,16,17] and others representing normal variation in any population [18,19,20,21]. Resistance to TB may encompass resistance to infection, or resistance to progression of disease, and our linkage studies in this community show that two separate loci are involved [22], and we postulated that patients who maintain a zero measure for skin test positivity, are inherently resistant. This cohort would by definition not have the first episode of TB, and would therefore not be included in the present study. Those who are most susceptible will succumb rapidly again after the first infection, while as time passes and this susceptible cohort is "removed" from the pool of patients recovering from TB, the apparent rate of reinfection will go down. This mechanism is supported by Rodrigues et al [23] and Gomes et al, [24],  as fitting the data better than the hypothesis that infection increases susceptibility to reinfection. However, we need to consider another possible explanation for the high rate of reinfection shortly after cure; namely, that the patient's immune system remains compromised for a significant period after bacterial cure, as a result of the disease episode. Cytokine levels and networks may remain disrupted due to their slow recovery after an episode of TB. In patients cured of an episode of TB, the levels of IFN-γ immunoreactivity remain suppressed at the end of therapy [25] and for at least 12 months after the start of therapy [26]. IFN-γ is important in the TH1-type cytokine response, and while a depressed level of IFN-γ activity may point to genetic susceptibility in some patients, the levels at 18 months have recovered to control levels, indicating the time dependence of the effect [26]. The ratios of other cytokine such as TNF-α/IL-10 were significantly increased in TB patients before, during, and also at the end of treatment compared to those of control subjects [25]. Pathogenic mycobacteria can subvert the autophagy/apoptotic pathways, and a number of signalling cascades, and the extent and duration of these effects is not known.
The delicate balance between the survival of the pathogen and the success of the host defence system could be tipped in favour of disease success in the months after an infection, for a combination of all the above reasons [27] and this, in combination with the genetic susceptibility previously discussed, manifests as an early increase in disease after cure.
The first episode of TB has already served to identify those infected persons that are likely to progress to disease, and the public health benefit of monitoring this subset of the population for a short period to prevent the even smaller subset which may be the most genetically susceptible and is at increased risk of a second episode of disease, seems well worth the effort.
Whether such monitoring should be implemented and for how long such monitoring should continue would be best determined by a cost-benefit analysis compared to other types of control measures. This study does not provide definitive answers as to why the rate of actual reinfection soon after cure is so high. However, should monitoring take place, it is likely that the reason a patient has become reinfected so quickly could become evident. This in itself could suggest more efficient measures.