Early Prediction of Treatment Efficacy in Second-Stage Gambiense Human African Trypanosomiasis

Background Human African trypanosomiasis is fatal without treatment. The long post-treatment follow-up (24 months) required to assess cure complicates patient management and is a major obstacle in the development of new therapies. We analyzed individual patient data from 12 programs conducted by Médecins Sans Frontières in Uganda, Sudan, Angola, Central African Republic, Republic of Congo and Democratic Republic of Congo searching for early efficacy indicators. Methodology/Principal Findings Patients analyzed had confirmed second-stage disease with complete follow-up and confirmed outcome (cure or relapse), and had CSF leucocytes counts (CSFLC) performed at 6 months post-treatment. We excluded patients with uncertain efficacy outcome: incomplete follow-up, death, relapse diagnosed with CSFLC below 50/µL and no trypanosomes. We analyzed the 6-month CSFLC via receiver-operator-characteristic curves. For each cut-off value we calculated sensitivity, specificity and likelihood ratios (LR+ and LR−). We assessed the association of the optimal cut-off with the probability of relapsing via random-intercept logistic regression. We also explored two-step (6 and 12 months) composite algorithms using the CSFLC. The most accurate cut-off to predict outcome was 10 leucocytes/µL (n = 1822, 76.2% sensitivity, 80.4% specificity, 3.89 LR+, 0.29 LR−). Multivariate analysis confirmed its association with outcome (odds ratio = 17.2). The best algorithm established cure at 6 months with < = 5 leucocytes/µL and relapse with > = 50 leucocytes/µL; patients between these values were discriminated at 12 months by a 20 leucocytes/µL cut-off (n = 2190, 87.4% sensitivity, 97.7% specificity, 37.84 LR+, 0.13 LR−). Conclusions/Significance The 6-month CSFLC can predict outcome with some limitations. Two-step algorithms enhance the accuracy but impose 12-month follow-up for some patients. For early estimation of efficacy in clinical trials and for individual patients in the field, several options exist that can be used according to priorities.


Introduction
Human African trypanosomiasis (HAT) or sleeping sickness, caused by Trypanosoma brucei gambiense (most common form, West and Central Africa) and rhodesiense (East and Southern Africa), is fatal unless treated. After infection, the disease progresses from the easily treatable haemolymphatic first stage to the meningoencephalitic second stage, when parasites invade the central nervous system.
Patients who receive treatment can not be considered cured immediately, because the parasite may remain viable, redeveloping fully the disease many months later. A long post-treatment follow-up period is thus required to assess cure [1]. This follow-up time is fixed at 24 months by convention, although in comparative clinical trials it is considered acceptable to measure the efficacy at 18 months [2]. Follow-up consists of control visits generally every 6 months when lymph, blood and cerebrospinal fluid (CSF) are examined. The detection of trypanosomes in any body fluid unequivocally identifies a relapse. Unfortunately, parasites are often not detected early enough to allow for timely re-treatment, plus many patients do not adhere to this demanding and invasive follow-up schedule. To better detect the relapses and avert the risk for serious sequelae or death, the variation in number of white blood cells (WBC) in CSF is widely used as a proxy marker of relapse. Other markers of relapse are under investigation and not in routine field use.
Because most HAT patients are located in remote rural areas, the post-therapeutic follow-up is particularly challenging: poverty, distance, bad roads, lack of transportation, subsistence priorities, displacement (sometimes conflict related), add to the fear of the lumbar puncture. As a result, patients' compliance with follow-up decreases markedly after the first assessment at 6 months [3].
Such long follow-up is a handicap not only for routine patient management but also for therapeutic efficacy studies [4], and particularly when a sequence of clinical studies is required (e.g. dose-finding studies). Some time can be saved when a given investigational treatment is assumed to have insufficient efficacy due to early failures surpassing a pre-defined threshold. However, when the cumulative failure rate is below that threshold, the risk of subsequent final outcome (cure or relapse) can not be predicted.
Research on ways of shortening the follow-up is scarce. One study suggests that HAT patients with ,5 CSF leucocytes/mL at 6 months are at low risk of relapse (negative predictive value .0.93, n = 146) [5] and that at 6 and 12 months, patients with $50 and $20CSF leucocytes/mL, respectively, are at high risk. Another study tested an algorithm combining 6 and 12 months CSF exams on a cohort of 206 treated patients showing 97.8% specificity and 94.4% sensitivity to predict relapse [6]. Considering that these promising findings originated from relatively small cohorts, recruited each time in one single centre (Bwamanda and Mbuji Mayi, DRC, respectively), and that confirmed and unconfirmed efficacy outcomes (lost to follow-up, deaths during follow-up, etc) were mixed in the assessment via assumptions, further research is needed on larger datasets and with more restrictive selection criteria.
To meet this goal, we consolidated individual-patient data from 12 sites in Uganda, Sudan, Angola, Central African Republic, Republic of Congo and Democratic Republic of Congo where Médecins Sans Frontières (MSF) had conducted HAT programs, and we selected patients with confirmed diagnosis, confirmed stage, complete follow-up (thus confirmed outcome), and meeting a restrictive, laboratory-confirmed definition of relapse, so as to maximize information certainty. Our analysis aimed at identifying early efficacy indicators using the CSF leucocytes count at 6 and 12 months after treatment.

Ethics statement
The study received ethical clearance from the Médecins Sans Frontières International Ethical Review Board (Geneva, Switzerland). All data analyzed were anonymized from the start.
Using a large pooled dataset from routine MSF gambiense HAT control programs, we selected patients with confirmed secondstage disease and having received second-stage treatment, who completed their follow-up (minimum 22 months) until confirma-tion of an outcome (cured or relapsed) and who had a CSF leucocytes count performed at 6 months post-treatment. We considered 22 months as complete follow-up because in practice patients coming for control at 22-23 months are not asked to come again at 24 months.
Second stage was defined by the finding of trypanosomes in blood, lymph nodes or CSF, with $20 leucocytes/mL in CSF.
We excluded patients who (i) had missing or incoherent data on key variables, or (ii) died during treatment or follow-up, or (iii) were diagnosed with relapse before 6 months or later than 36 months post treatment, or (iv) for the first analysis only: relapsed at 6 months.
Individuals who relapsed before 6 months were excluded because they do not contribute to the objectives of this analysis, and those relapsing after 36 months because they are less certainly distinguishable from reinfections.
Cure was defined as absence of trypanosomes in all body fluids and , = 20 leucocytes in CSF at $22 months post-treatment; and relapse as trypanosomes detected in any body fluid or $50 CSF leucocytes/mL anytime [7]. Patients diagnosed with relapse without meeting this definition were excluded. We kept the patients who continued on follow-up despite having $50 CSF leucocytes/mL and who had a confirmed outcome later (either cure or relapse).
The strict inclusion criteria aimed at strengthening the validity of the results by focusing on patients that provide unequivocal information, using the advantage of having a large cohort.
Melarsoprol treatment included the following regimens: one series of 10 daily injections; 2 or 3 series of 3 injections; and 3 series of 4 injections. Eflornithine included series of either 7 or 14 days, all at 400 mg/kg/day divided in 4 infusions per day. Combination treatment included melarsoprol-eflornithine, nifurtimox-eflornithine and melarsoprol-nifurtimox co-administrations.

Statistical analysis
We used the Wilcoxon test to compare CSF leucocytes between different groups of patients. We plotted the evolution of CSF leucocytes (median, IQR) during the follow-up, overall and by treatment received.
First analysis. We analyzed the relative change (as a percent reduction) of the CSF leucocytes between baseline (pre-treatment) and 6-months, per patient. We also analyzed the absolute count at 6 months independently of the baseline count. We assessed the accuracy of these 2 diagnostic tests to predict relapse using the receiver-operator-characteristic (ROC) curve and we reported the area under the curve (AUC) with its 95% confidence interval (CI) for each test. For each cut-off of the marker, sensitivity, specificity, positive likelihood ratio (LR+) and negative likelihood ratio (LR2) were reported with their respective 95% CI [8][9].
A random-intercept logistic regression was fitted to analyze the effect of the chosen cut-off taking into account several baseline individual characteristics. The threshold p-value to include factors in the initial model was 0.4.
Second analysis. following the composite algorithm in two steps at 6 and 12 months proposed by Mumba et al. (at 6 months patients with , = 5 leucocytes/mL are considered cured and with . = 50 leucocytes/mL are considered relapsed, and at 12 months all remaining patients are discriminated with a cut-off at 20 leucocytes/mL) [6], we explored various combinations of cut-off values. The notation we used for the algorithms features the three cut-off values of CSF leukocytes as follows: (i) lower cut-off at 6 months (cure); (ii) upper cut-off at 6 months (relapse), and (iii) unique cut-off at 12 months.

Author Summary
Because Human African trypanosomiasis is fatal, it is crucial for the patient to determine if curative treatment has been effective. Unfortunately this is not possible without a 24month laboratory follow-up, which is problematic and largely unaccomplished in the field reality. Studies that assessed early indicators have used small cohorts, yielding limited statistical power plus potential bias because of including patients with equivocal outcome. We tackled this problem by pooling a large dataset which allowed for selecting cases providing strictly unequivocal information, still numerous enough to produce sound statistical evidence. We studied predictors based on the CSF leucocytes count, a laboratory technique already available in the field, evaluating their predictive power at 6 and 12 months post-treatment. We found a predictor at 6 months (10 leucocytes/mL of CSF) that has sub-optimal accuracy but may be valuable in some particular situations, plus two-step algorithms at 6 and 12 months that offer sufficient confidence to shorten the patients' follow-up. Until better biomarkers are identified, these findings represent a significant advance for this neglected disease. Benefits are foreseen both for patients and for overburdened treatment facilities. In addition, research for new treatments can be accelerated by using early predictors.
Early Efficacy Prediction for HAT www.plosntds.org Stata 10 software (StataCorp, College Station, Texas, USA) was used to perform all the data analysis.

Results
Patients selected were 1822 for the first analysis and 2190 for the second analysis ( Figure 1) and had been diagnosed between September 1995 and February 2006. Throughout this time period the same diagnostic tools were used. The largest portion of the cohort was from the centers of Omugo, Northern Uganda (44%) and Ibba, Southern Sudan (21%), as these two sites achieved higher follow-up compliance by investing specific resources. Baseline characteristics are shown in Table 1.

Post-therapeutic evolution of the CSF leucocytes count
At pre-treatment, the CSF leucocytes count was not different between the 1460 patients who cured (median 137.5 cells, IQR Early Efficacy Prediction for HAT www.plosntds.org 65-274) and the 362 who later relapsed (132 cells, IQR 53-270) (Wilcoxon test p = 0.15), whereas at 6 months it was significantly higher among patients who later relapsed (29.5 cells, IQR 11-78) than in patients who cured (4 cells, IQR 2-9) (p,0.001). This difference increased at 12 and 18 months, as expected ( Figure 2). The difference was observed in all treatment groups, except at 6 and 12 months post-treatment in patients receiving drug combinations, emerging from 18 months onwards.
The evolution of the CSF leucocytes count was similar in naïve (first-time treated) and non-naïve patients throughout the posttherapeutic follow-up (data not shown).

Prediction of final outcome at 6 months
The ROC analysis showed that the absolute CSF leucocytes count at 6 months was at least as good a predictor of outcome (AUC 0.84) as the percent reduction (AUC 0.81). The latter being also the least practical (requiring a bedside calculation involving the initial laboratory results), we did not explore it further. The CSF leucocytes count at 6 months showed the best trade off between sensitivity and specificity at cut-off values of 10 to 13 leucocytes/mL. The best accuracy was obtained with a cut-off at . 10  The multivariate analysis confirmed, after adjustment on treatment, age and sex, that the six-months CSF leucocytes count, with a cut-off at 10 cells, was very strongly associated with the risk of relapse (odds ratio = 17.2, 95%CI, 12.6-23.5). Table 3 shows the performance of the two-steps algorithms when tested with our large dataset (n = 2190) of selected patients with laboratory-confirmed outcome. In the first line we show the results reported by Mumba et al. [6] on a smaller cohort (''algorithm 5-50-20'').
The algorithms 5-40-20 and 5-40-15 also performed well, with confidence intervals overlapping the algorithm 5-50-20. The 5-30-15 algorithm was slightly more sensitive but less specific. The proportion of patients classified as cured who later relapsed ranged First analysis: leukocytes at 6 months, excluding relapses at 6 months; Second analysis: leukocytes at 6 months, including relapses at 6 months, two-step algorithms. Combination treatment: within this selected cohort, it included melarsoprol-eflornithine, nifurtimox-eflornithine and melarsoprol-nifurtimox combinations. Coma score: Glasgow Coma Scale assessing the level of consciousness. Interpretation: 3-8 = severe impairment; 9-12 = moderate impairment; 13-14 = mild impairment; 15 = normal [13]; Karnofsky index [14]. doi:10.1371/journal.pntd.0001662.t001 Early Efficacy Prediction for HAT www.plosntds.org from 3.5 to 4.5% in all tested algorithms. The portion of the cohort classified at 6 months ranged from 66.4 to 74.1%, leaving the rest to be classified at 12 months. Of the algorithms tested, the 5-50-20 appeared as the best overall with the highest LR+ and a proportion of false negatives not significantly different from the other algorithms (Figure 3).

Discussion
The CSF leucocytes count at 6 months showed a good prognostic value for final efficacy outcome. However, a small proportion of patients was wrongly classified. Translated into field patient management, those wrongly classified as relapsed would be Early Efficacy Prediction for HAT www.plosntds.org unnecessarily re-treated, sometimes with toxic drugs (e.g. melarsoprol if first-line treatment was eflornithine or eflornithinenifurtimox) and, more importantly, patients wrongly classified as cured would be at risk of death due to HAT relapse. A two-step algorithm, at 6 and 12 months, provided a better classification tool featuring an excellent ability to predict relapses with a lower misclassification rate.
At 6 months, the CSF leucocytes cut-off at 10/mL had the best trade-off between positive and negative likelihood ratios. This indicator can rule out relapse at 6 months post-treatment with a good degree of confidence (0.93 negative predictive value), but its ability to identify true relapses is sub-optimal.
Other cut-off values may be of interest for decision-making in the context of clinical trials, e.g. to continue or suspend enrolment of new participants based on 6-months data (patients already enrolled would always benefit of complete follow up).
It is important to underline that the relapse rate was 19.9% in our dataset (first analysis), which is much higher than the relapse rate reported with the new and increasingly used nifurtimoxeflornithine combination therapy (NECT) [10,11]. Because positive and negative predictive values depend on the relapse rate, the lower the relapse rate is, the lower will be the positive predictive value for relapse, but on the other hand the negative predictive value will be higher, increasing the confidence on the prediction of cure. Reporting the likelihood ratios allows to make abstraction of this phenomenon, since they are independent from relapse rates. Likelihood ratios, both LR+ and LR2 are one of the best ways to measure diagnostic accuracy. In medicine, a test is generally regarded as valuable when the LR+ is .5 or the LR2 is ,0.2 [9,12].
Because the CSF leucocytes count at 6 months alone remains insufficiently accurate for outcome determination, we evaluated various two-step algorithms at 6 and 12 months, following the model published by Mumba et al [6].
Our findings therefore confirm that the diagnostic algorithm 5-50-20 performs well to predict post-treatment outcome, allowing for a shorter follow-up period.
Other algorithms can be applied depending on the setting and priority objectives, e.g. clinical trials or individual patient management in settings with poor follow-up compliance such as in conflict areas.
In all cases, patients who are declared cured early by using these predictors should be encouraged to come for control if symptoms reappear later.
Early determination of outcome presents several key advantages for HAT control programs: first, it cuts down on uncured patients remaining infective until eventually detected or dying; second, it reduces the workload and costs of follow-up; third, it facilitates the monitoring of treatment effectiveness. For some patients it is lifesaving or preventive of serious sequelae, for most others it reduces

Strengths and weaknesses
One major strength of this study was the restrictive selection criteria, which minimized information bias that is typically present in HAT studies: most cohorts include important proportions of patients with uncertain or unknown efficacy outcome, due to the difficulties in completing the patients' follow-up.
Another strength was the large sample size, which increases the precision of the findings.
Finally, the statistical methods used, in particular the analysis by logistic regression with a random intercept controlling the intersite heterogeneity.
A weakness arose from the nature of the data used, collected by field routine programs, which is generally of lower quality than data collected prospectively within planned studies.
Another weakness arises from the reference used for ''true outcome'': a composite definition based on the presence of Early Efficacy Prediction for HAT www.plosntds.org trypanosomes or a CSF leucocytes count $50. The predictors studied are also based on the CSF leucocytes count (at an earlier time) and are therefore not independent from the outcome measurement. In particular when the predictor includes the same value (CSF leucocytes count $50, such as in the algorithm 5-50-20) the specificity is to some extent over-estimated.
The marker at the center of our analysis, the CSF leucocytes count, is subject to measurement error, being a manual laboratory technique. However, this particular laboratory exam is regarded as crucial for the patient and it has been the object of great attention in the MSF sites that were included in this study. Internal quality control was implemented in all field laboratories, through blinded double and triple CSF leucocytes counts, showing good levels of consistency in the results (authors' direct field observation, data not published). To our knowledge there are no published works to shed more light into this issue.
The timing of the follow-up assessments was treated via the consolidation of the visit dates into time ''tolerance'' windows, which are arbitrary groupings (we followed conventional windows) [2]. This interval censoring is an imperfect way of capturing the timing of events: for example what we treat as the ''6 months'' leucocytes count in reality happened anywhere between 5 and 9 months, with an uneven spread that tends to concentrate after the 6-months date. This field data distribution can be assumed to correspond well with the reality of the routine programs, but it will fit less the temporal distribution in clinical trials that usually have intensive follow-up of patients.

Conclusions
This study provides robust evidence on the value of the CSF leucocytes count to predict, at 6 and 12 months, the efficacy outcome of second-stage T. b. gambiense HAT treatment.
For decision-making on individual patients followed-up in the field, our findings confirm the good performance of the two-steps algorithm using cut-off values of 5-50-20 leucocytes/mL. Other algorithms can be used depending on the setting.
For the early estimation of efficacy in clinical trials, several options are revealed, both in one step at 6 months and in two steps at 6 and 12 months.