Assessment of WHO criteria for identifying ART treatment failure in Vietnam from 2007 to 2011

Objective We evaluated the sensitivity and specificity of the WHO immunological criteria for detecting antiretroviral therapy (ART) treatment failure in a cohort of Vietnamese patients. We conducted a stratified analysis to determine the effects of BMI, peer support, adherence to antiretroviral (ARV) drugs, age, and gender on the sensitivity and specificity of the WHO criteria. Methods We conducted a retrospective cohort study of 605 HIV-infected patients using data previously collected from a cluster randomized control trial study. We compared the sensitivity and specificity of CD4+ counts to the gold standard of virologic testing as a diagnostic test for ART failure at different time points of 12, 18, and 24 months. Results The sensitivity [95% confidence interval (CI)] of the WHO immunological criteria based on a viral load ≥ 1000 copies/mL was 12% (5%-23%), 14% (2%-43%), and 12.5% (2%-38%) at 12, 18, and 24 months, respectively. In the same order, the specificity was 93% (90%-96%), 98% (96%-99%), and 98% (96%-100%). The positive predictive values (PPV) at 12, 18, and 24 months were 22% (9%-40%), 20% (3%-56%), and 29% (4%-71%); the negative predictive values (NPV) at the same time points were 87% (84%-90%), 97% (95%-98%), and 96% (93%-98%). The stratified analysis revealed similar sensitivities and specificities. Conclusion The sensitivity of the WHO immunological criteria is poor, but the specificity is high. Although testing costs may increase, we recommend that Vietnam and other similar settings adopt viral load testing as the principal method for determining ART failure.

Introduction 24 month follow up included CD4 + levels, viral load levels, adherence to ARV, gender, BMI, peer support, and age.

Measures
For the purposes of this analysis, WHO immunologic failure was diagnosed if the participant met one of the following criteria: 1. CD4 + counts that return to or fall below pre-therapy baseline level, 2. 50% decline of CD4 + from the on-treatment peak value after at least 6 months of the initiation of ART, 3. CD4 + count <100 cells/μL after a year without any increase [3,15].
The current Vietnam guidelines for viral load defines treatment failure at viral load > 5000 copies/mL [15]. However, the current WHO guidelines define treatment failure at VL ! 1000 copies/mL [16]. Data were analyzed using these two different virologic failure thresholds as the gold standard.

Statistical analysis
We presented descriptive continuous data as median and interquartile range (IQR) and listed categorical variables as numbers and percentages. We determined the sensitivity and specificity for predicting various definitions of virologic failure mentioned previously at 12, 18, and 24 months after initiation of ART. We determined the positive and negative predictive values of the immunologic failure criteria as well. In addition, we adjusted the diagnostic test analysis with both VL > 5000 copies/mL and VL ! 1000 copies/mL, for several variables such as gender, age, BMI, peer support, and adherence to ARV. The results are presented in the tables with the corresponding 95% confidence interval (CI). The confidence intervals were based on formulae provided by Simel et al [25].
BMI was stratified between below 18 kg/m 2 and above 18 kg/m 2 . Patients were stratified into groups that have or don't have peer support. Peer support involved home-based adherence counseling by fellow HIV-infected peer supporters [24]. Adherence to ARV was stratified into no missed doses and one or more missed doses. Age was split between above and below 32 years (the median age of patients in this study was 31.90 years), while gender was divided into male and female. The analysis was carried out using R software [26]. The package used to compute the confidence intervals was "epiR" [27].

Results
This study included the baseline characteristics of a total of 605 HIV-positive patients (Table 1). These patients ranged from 20 to 56 years of age. Among the different definitions, the proportions of ART treatment failure were less than the proportions of NO ART treatment failure. The virologic criterion VL ! 1000 copies/mL got the highest proportion of ART treatment failure at different times.

Diagnostic test analysis
As shown in Fig 2, all of this information was collected based on treatment failure defined by the Vietnam guidelines (VL > 5000 copies/mL) and WHO Guidelines (VL ! 1000 copies/ mL), both considered as gold standards, with the overall WHO immunological criteria, 12, 18, and 24 months after the start of treatment.

Vietnam guidelines
The sensitivity, based on treatment failure at viral load > 5000 copies/mL and the overall WHO immunological criteria 12 months after the start of treatment, was 30% and the specificity was 93%. However, among the people who tested positive for WHO immunological criteria, only 9% actually had treatment failure (the corresponding PPV). For those that tested negative, 98% did not have the treatment failure (NPV). At 18 months, the sensitivity and specificity were 12.5% and 98%, respectively, while the PPV and NPV were 10% and 98%, respectively.
On the contrary, at 24 months after treatment initiation, the sensitivity was 22%. The PPV, among patients that tested positive, was 29% that had ART treatment failure. All the indexes are reported in Table 2.

WHO guidelines
Moving to the two-by-two table between WHO immunological criteria and VL ! 1000 copies/ mL, the sensitivity indexes were lower compared to those mentioned previously at the 12 th and  24 th months, while at 18 months, it was slightly increased ( Table 2). The specificity indexes were the same, except for at 24 months after the start of treatment.
As previously mentioned, the diagnostic test analysis was stratified at three different times for several variables including gender, age, BMI, peer support, and adherence to ARV. We summarized the results comparing the two gold standards and the CD4 + test with detecting ART treatment failure in Table 3, using the different strata. Similar to the results in Table 2, the sensitivities ranged from 0-50% and the specificities ranged from 92-100%.

Discussion
We found that the WHO immunological criteria have a very low sensitivity and high specificity. The stratified analysis also didn't obtain results in favor of the CD4 + test. Due to low sensitivity of the criteria, it was not possible to accurately detect treatment failure. Therefore, the CD4 + diagnostic test is poor for detecting ART failure, and patients' immune competence would have declined unnoticed as they progressed faster towards clinical failure and AIDS. This indicates that the WHO immunological criteria has too low a sensitivity to be used as a first line screening method. Based on this, we recommend for the WHO to change the treatment failure guidelines to be based solely on viral load in resource limited settings.
To the best of our knowledge, this is the first study to report the sensitivities and specificities of the WHO immunological criteria compared to the gold standard of viral load testing in Vietnam. Some countries have a targeted approach to viral load testing (e.g. Cambodia, India, and Vietnam) where patients are only tested if treatment failure is suspected using WHO clinical and immunological criteria [15,28,29]. Despite being less expensive than routine testing in the short term, this approach risks delaying treatment failure identification [16]. With earlier identification of treatment failure and earlier interventions to improve adherence, the more timely switch to second line ART could decrease the immunological detrition as well as prevent accumulation of resistance mutations [4,30]. This would decrease the risk of disease progression, ARV drug resistance, and further HIV transmission [16]. In the long run, it may be more cost effective to reduce these incidences through a more robust test for treatment failure, as delaying its identification can have high long-term costs including more expensive secondline drug regimens and an increased risk of transmitting drug resistant HIV strains. Viral load testing accurately and precisely identifies treatment failure as well as non-adherence [4]. Such an approach would prevent misdiagnosis of treatment failure and avoid the unnecessary change to a more expensive second line regimen [10,11,13]. By maintaining low viral loads, partners and children would also be protected from horizontal and vertical transmission [31,32]. Patients would also be protected from the progression to AIDS and associated coinfections. In doing so, we can reduce both mortality and healthcare costs for developing countries [33,34].
A major downside of relying on CD4 + levels to detect treatment failure is the inability to determine the functionality of the T cells being produced. If the patient was co-infected with HTLV (Human T-lymphotropic virus), patients' CD4 + levels could increase, but many of the CD4 + cells may actually be nonfunctional [35]. This could camouflage treatment failure, further delaying effective drug regimen switches and lead to a faster progression of AIDS [35].
Historically, there has been resistance to switching to routine viral load testing due to high costs [36]. However, there are cheaper viral load testing options, like the ExaVir TM Load (a simple reverse transcriptase assay), that have the same efficacy as other, more expensive viral load tests [37].
Countries, like Uganda, have successfully switched to using solely viral load testing to determine treatment failure [38]. Countries like South Africa and Thailand have implemented routine viral load testing in addition to the CD4 + tests [39][40][41]. This shows that routine viral load testing is feasible and the WHO should adopt this as the new guideline. We believe that Vietnam and all countries, in general, should follow these steps and update their treatment guidelines to phase out CD4 + tests in exchange for viral load testing.
One limitation of our study is the low number of patients with true treatment failure. Also, two different hospital laboratories measured CD4 + counts, which could have led to bias in the estimation of immunologic failure. This study did not control for ART treatment during the management of patients. If patients were found to have treatment failure, they were assessed in relation to adherence. If they had good adherence and genotyping showed specific resistance mutations, they were switched to a different treatment regimen. We were also limited to the variables provided in the dataset. For example, the BMI was set at either below and above 18 kg/m 2 . We couldn't adjust BMI to the cut off for normal and underweight BMI (<18.5 kg/m 2 . Certain variables were only assessed once during the study: BMI at baseline and adherence to ARV after 24 months. This presents a challenge in a causal-relationship type of analysis as we do not have information about how they changed over time. In addition, the findings should be assessed in a prospective study with a larger sample size to further confirm or refute the results.
Finally, we hope our study has shed light on the importance of implementing routine viral load testing as the required test for treatment failure in resource-limited settings.