The impact of continuous quality improvement on coverage of antenatal HIV care tests in rural South Africa: Results of a stepped-wedge cluster-randomised controlled implementation trial

Background Evidence for the effectiveness of continuous quality improvement (CQI) in resource-poor settings is very limited. We aimed to establish the effects of CQI on quality of antenatal HIV care in primary care clinics in rural South Africa. Methods and findings We conducted a stepped-wedge cluster-randomised controlled trial (RCT) comparing CQI to usual standard of antenatal care (ANC) in 7 nurse-led, public-sector primary care clinics—combined into 6 clusters—over 8 steps and 19 months. Clusters randomly switched from comparator to intervention on pre-specified dates until all had rolled over to the CQI intervention. Investigators and clusters were blinded to randomisation until 2 weeks prior to each step. The intervention was delivered by trained CQI mentors and included standard CQI tools (process maps, fishbone diagrams, run charts, Plan-Do-Study-Act [PDSA] cycles, and action learning sessions). CQI mentors worked with health workers, including nurses and HIV lay counsellors. The mentors used the standard CQI tools flexibly, tailored to local clinic needs. Health workers were the direct recipients of the intervention, whereas the ultimate beneficiaries were pregnant women attending ANC. Our 2 registered primary endpoints were viral load (VL) monitoring (which is critical for elimination of mother-to-child transmission of HIV [eMTCT] and the health of pregnant women living with HIV) and repeat HIV testing (which is necessary to identify and treat women who seroconvert during pregnancy). All pregnant women who attended their first antenatal visit at one of the 7 study clinics and were ≥18 years old at delivery were eligible for endpoint assessment. We performed intention-to-treat (ITT) analyses using modified Poisson generalised linear mixed effects models. We estimated effect sizes with time-step fixed effects and clinic random effects (Model 1). In separate models, we added a nested random clinic–time step interaction term (Model 2) or individual random effects (Model 3). Between 15 July 2015 and 30 January 2017, 2,160 participants with 13,212 ANC visits (intervention n = 6,877, control n = 6,335) were eligible for ITT analysis. No adverse events were reported. Median age at first booking was 25 years (interquartile range [IQR] 21 to 30), and median parity was 1 (IQR 0 to 2). HIV prevalence was 47% (95% CI 42% to 53%). In Model 1, CQI significantly increased VL monitoring (relative risk [RR] 1.38, 95% CI 1.21 to 1.57, p < 0.001) but did not improve repeat HIV testing (RR 1.00, 95% CI 0.88 to 1.13, p = 0.958). These results remained essentially the same in both Model 2 and Model 3. Limitations of our study include that we did not establish impact beyond the duration of the relatively short study period of 19 months, and that transition steps may have been too short to achieve the full potential impact of the CQI intervention. Conclusions We found that CQI can be effective at increasing quality of primary care in rural Africa. Policy makers should consider CQI as a routine intervention to boost quality of primary care in rural African communities. Implementation research should accompany future CQI use to elucidate mechanisms of action and to identify factors supporting long-term success. Trial registration This trial is registered at ClinicalTrials.gov under registration number NCT02626351.


Page 12
The sample size determination is expressed in terms of absolute risk differences (15 percentage points). The main analysis is in terms of relative risk differences (risk ratios) with absolute differences presented as sensitivity analyses (page 13). This is inconsistent. There are good reasons for presenting absolute risk differences despite the issues with linear models for probabilities and I do not see the authors offering a justification for abandoning them as their primary analysis.
Page 13 The two references (authors' references 41 and 42) compare odds ratios and risk ratios but I do not find in either of the references any argument of the relative merits of using the log-binomial model over modified Poisson. It seems to me that the log-binomial has the advantage that is fails to converge in situations when the multivariable risk ratio model is inappropriately leading to impossible predicted values greater than unity. There is some discussion of convergence in Williamson et al. (2013)  Page 21 Another unnecessary primacy claim.
Page 22 Good to see the members of the DSMB acknowledged by name (and good to have such a board even if adverse events were not expected).
Page 35, Table 1, sub-section 6 Is there really no evidence on fidelity? That seems a gap in an otherwise comprehensive trial. This is perhaps more of an issue for the referees of the process evaluation paper.

Points of more substance Clinical importance
If the minimal important clinical difference was 15 percentage points as page 12 suggests then the value of 4.1 reported on page 15 is not important and suggests the trial has in fact failed to detect a meaningful effect despite achieving an arbitrary level of statistical significance. Either I have completely misunderstood or the authors are over-interpreting a small difference.
Treatment of the partially exposed I am all in favour of intention to treat analyses but is it really wise to account for the 20% partially exposed reported on page 14 as being in the arm their initial clinic was in when they joined the study? In a stepped wedge design people will move from control clinics to treated clinics as the study rolls out even if they attend the same physical clinic.
With any trial of a complex intervention it is hard to know when exactly to assume that intervention took effect. Assuming it is when the intervention started seems quite a strong assumption.

Modelling visits and time
In their analysis section (page 13) the authors state that they used a fixed effect for time step. I do not see how this can be used when the women had in principle many visits spanning the study period and most of them had four or more. This is also tied to the issue of whether to include a random effect for woman. Clinic level clustering accounts, as far as I can see, for the fact that women within cluster are more similar than women between clusters. It does not account for whether women who were more likely to have a VL monitoring at time i were also more likely to have one at time j.

Summary
I am perplexed by a number of features in the analysis.
Michael Dewey