Influence of Control Group on Effect Size in Trials of Acupuncture for Chronic Pain: A Secondary Analysis of an Individual Patient Data Meta-Analysis

Background In a recent individual patient data meta-analysis, acupuncture was found to be superior to both sham and non-sham controls in patients with chronic pain. In this paper we identify variations in types of sham and non-sham controls used and analyze their impact on the effect size of acupuncture. Methods Based on literature searches of acupuncture trials involving patients with headache and migraine, osteoarthritis, and back, neck and shoulder pain, 29 trials met inclusion criteria, 20 involving sham controls (n = 5,230) and 18 non-sham controls (n = 14,597). For sham controls, we analysed non-needle sham, penetrating sham needles and non-penetrating sham needles. For non-sham controls, we analysed non-specified routine care and protocol-guided care. Using meta-regression we explored impact of choice of control on effect of acupuncture. Findings Acupuncture was significantly superior to all categories of control group. For trials that used penetrating needles for sham control, acupuncture had smaller effect sizes than for trials with non-penetrating sham or sham control without needles. The difference in effect size was −0.45 (95% C.I. −0.78, −0.12; p = 0.007), or −0.19 (95% C.I. −0.39, 0.01; p = 0.058) after exclusion of outlying studies showing very large effects of acupuncture. In trials with non-sham controls, larger effect sizes associated with acupuncture vs. non-specified routine care than vs. protocol-guided care. Although the difference in effect size was large (0.26), it was not significant with a wide confidence interval (95% C.I. −0.05, 0.57, p = 0.1). Conclusion Acupuncture is significantly superior to control irrespective of the subtype of control. While the choice of control should be driven by the study question, our findings can help inform study design in acupuncture, particularly with respect to sample size. Penetrating needles appear to have important physiologic activity. We recommend that this type of sham be avoided.


Introduction
One of the challenges of conducting a non-pharmacological clinical trial is choosing an appropriate control intervention. The simplest control arm is to offer patients routine clinical care without the experimental treatment. This controls for the expected course of the disease. However, the control arm can also be designed to control for other factors, for example, the non-specific effects associated with the time and attention that a patient receives from a clinician.
The choice of control is particularly problematic in acupuncture, which has seen a large increase in published trials in recent years [1]. In an individual patient data meta-analysis of high quality trials conducted by the Acupuncture Trialists' Collaboration [2], acupuncture reduced pain scores by 0.15 to 0.23 standard deviations in comparison to sham (placebo) acupuncture. When the control group did not involve sham, effect sizes ranged from 0.42 to 0.57 [2]. Yet within the general categories of controlthose with sham and those without sham -there were marked differences in the exact nature of the intervention received in the control group. For example, trials with sham control included those with acupuncture needles inserted at points not thought to be active, needles that did not penetrate the skin, and non-needle approaches, such as detuned electrical devices. For trials without sham control, the control group in some were simply advised to ''avoid acupuncture''; in other trials, both acupuncture and control groups were offered additional treatment, such as physical therapy for back pain.
In this paper, we aim to conduct an analysis of the Acupuncture Trialists' Collaboration dataset to determine how trial results vary by type of control. Specifically, we sought to determine the extent that effect sizes varied depending on whether needles were used for sham acupuncture, whether they penetrated the skin, and whether they were placed at or away from true acupuncture points. We also sought to determine whether there was variation in the effects of acupuncture associated with controls that did not involve sham, comparing ''routine care'', such as rescue medication made available to patients in both arms of the trial, with ''protocolled care'' where the control treatment was a standard care specified in the study protocol. Establishing effect sizes associated with commonly used types of controls will be of value in informing future clinical trial design for acupuncture, as well as helping the interpretation of published trial results.

Included Trials
Trials included in these analyses were identified through a systematic literature review that has been previously described [2]. The initial search was to November 2008, followed by a subsequent one conducted in December 2010. The searches included trials of acupuncture in four specified chronic pain conditions -non-specific musculoskeletal pain, shoulder pain, osteoarthritis, and chronic headache -where allocation concealment was determined unambiguously to be adequate. For trials of musculoskeletal pain, it was additionally specified that the current episode of pain must be of at least four weeks' duration. The search resulted in the identification of 31 trials.

Data Acquisition
Individual patient data were obtained from 29 trials. Data on the trial-level characteristics of the controls were obtained directly from trialists. Twenty trials with 5,230 patients had controls in the form of a sham acupuncture arm (Table 1), and 18 trials with 14,597 patients had non-sham controls ( Table 2).

Outcome
The primary outcome used for this analysis was the primary pain endpoint as defined by the study authors. Where multiple criteria were considered in the primary outcome or if the primary outcome was inherently categorical, we used a continuous measure of pain intensity measured at the same time point as the original primary outcome. To make the various outcome measurements comparable between different trials, the primary endpoint of each was standardized by dividing by pooled standard deviation.

Types of Sham Acupuncture Controls
The characteristics we aimed to study in those trials with a sham acupuncture control group included whether or not a needle was used, whether a needle that penetrated the skin was used, whether sham was performed on true acupuncture points or nonacupuncture points, and whether needle insertion was deep or superficial. Information on acupuncture characteristics was obtained from the trial manuscript supplemented by a questionnaire sent to trialists.
Trials were classified as ''needle sham'' if it was reported that either a penetrating or non-penetrating needle was used for sham acupuncture. A non-penetrating needle is a device specially developed for acupuncture research in which the needle retracts into the handle rather than penetrating the skin; however, the pressure of the needle against the skin is a very similar sensation to insertion. ''Non-needle sham'' included trials using non-needle methods of sham acupuncture, such as an inactivated laser or transcutaneous electric nerve stimulation (TENS) device. Needle sham trials were further classified as to whether or not the needle used in the sham acupuncture group penetrated the skin. Penetrating needles were almost always inserted at locations away from true acupuncture points (thereby investigating point location) while non-penetrating sham needles were either applied at the same points as in the true acupuncture group (testing exclusively skin penetration and not location) or at non-acupuncture points (investigating penetration and location simultaneously). For example, in the trial of Linde et al [3], needles were inserted superficially away from true acupuncture points; in contrast, the sham technique in the Kleinhenz et al [4] trial consisted of a special needle that retracted into the handle rather than penetrating through the skin at true acupuncture points.
We initially planned to investigate two other features of sham control: whether the depth of insertion for penetration was categorized by trialists as superficial or deep and whether sham was applied at or away from true acupuncture points. However, as shown in Table 1, only one trial reported using deep insertion in sham acupuncture [5]. For point location, there was strong collinearity with sham technique, with only techniques avoiding skin penetration using true acupuncture points.
As a sensitivity analysis, we re-analyzed the data excluding four trials which were determined by consensus among external reviewers as having an ''intermediate likelihood of unblinding'' [6] [7][8] [9]. However, after excluding these trials, only one remaining trial used non-needle sham acupuncture, limiting our ability to use meta-regression.

Types of Non-sham Controls
Trials that included controls without sham were categorized into two types: ''routine care'' and ''protocolled care.'' Trials were identified as ''routine care'' if patients in both treatment and control groups had access to non-specified care as needed, such as rescue medications or other conventional care, but the use of such treatment was at the discretion of patients and doctors, with no specification in the protocol as to what treatments patients could receive. If protocols proscribed some treatments, such as surgery, but did not make specific recommendations as to allowable treatments, trials were defined as ''routine care''. Control groups where treatment consisted of information or education given to a patient (''attention control'') were also considered to be routine care control groups. Trials were considered to be ''protocolled care'' if the care in the control group was specified in the study protocol. This was typically when the acupuncture group and the usual care control group both received an additional nonacupuncture treatment that was specifically indicated as part of the trial protocol. For example, trials that studied the effect of acupuncture and physical therapy compared to physical therapy alone were categorized as protocolled care.
In two trials there was a disagreement about whether the close specification of medication and other treatment in the control group constituted ''protocolled care'' or an active control group, which are excluded from analyses as per the review protocol [10]. The trial of acupuncture for migraine by Diener et al. [6] had a control group in which patients received standard pharmacological therapy for migraine prophylaxis. However, acupuncture and sham acupuncture groups did not receive prophylactic medication. The trial of acupuncture for lower back pain by Haake et al. [11] offered ''routine management'' up to and including physiotherapy, drugs and exercise to control group patients. Acupuncture and sham acupuncture patients had the same access to rescue medication as the ''routine management'' group, but did not have access to the same physiotherapy sessions or exercise consultations with physicians.

Statistical Methods
We used random-effects meta-regression to test the effect of each characteristic of sham acupuncture on the main effect estimate using the Stata command metareg. This command was also used to run a random-effects meta-regression to test the effect of routine versus protocolled care on the main effect estimate for usual care control groups. The main effect estimate of each trial was determined using linear regression, and the coefficient and standard error for each trial were entered as the dependent variable in the random-effects meta-regression.
A sensitivity analysis was performed excluding three trials by Vas et al. [12][13] [14]. In our initial publication on effect size [2], we reported that these trials have very much larger effect sizes than average and that their exclusion resulted in heterogeneity becoming non-significant in the comparisons between acupuncture and sham. The trial of acupuncture for knee osteoarthritis by Berman et al. [5] used a combined insertion and non-insertion method for sham acupuncture. As a sensitivity analysis, we performed the analysis with this trial reclassified as using nonpenetrating needles on true acupuncture points as well. We also excluded trials where the risk of bias from unblinding was not classed as being low. As a final sensitivity analysis of non-sham controlled trials, we excluded the trials by Haake et al. [11] and Diener et al. [6] for which there was disagreement as whether the control arm constituted active control, which is not eligible for analysis [10]. All analyses were conducted using Stata 12 (Stata Corp., College Station, TX).

Sham acupuncture controls
Trial-level characteristics for sham-controlled trials are described in Table 3. The majority of sham-controlled trials (80%) used needle-based sham acupuncture. The number of trials using penetrating or non-penetrating needles was similar: seven trials used non-penetrating needles and nine trials used penetrating needles. All trials using penetrating needles placed these outside true acupuncture points, while only one of seven trials using nonpenetrating needles did so. Table 4 shows the effect sizes of sham-controlled acupuncture trials categorized by the type of sham. Acupuncture is significantly superior to sham irrespective of the type of sham control, both in the main analysis and in a sensitivity analysis excluding outlying studies. Table 4 also includes the results of the primary sensitivity analyses that excluded the Vas trials, which we had previously found to be outliers [2].For example, not only was the effect size of the Vas trial for neck pain [13] about five times greater than the meta-analytic estimate, but between-trial heterogeneity was no longer statistically significant after excluding the Vas trials. Using the same rationale for exclusions, overall we found larger effect sizes were associated with acupuncture vs. non-penetrating sham needles (0.43; 95%CI: 0.01, 0.85) than vs. penetrating sham needles (0.17; 95%CI: 0.11, 0.23) although the difference between groups did not reach conventional levels of statistical significance.
Statistical comparisons between types of sham are given in Table 5, which shows the results of the random-effects metaregression for sham-controlled trials. While trials that used needles as sham did not differ significantly from trials with non-needle sham (p$0.2 for all comparisons), there is clear evidence of a greater effect size when acupuncture is compared against nonpenetrating sham than when compared to penetrating sham. Trials using a penetrating needle had an effect size of 20.21 (95% C.I. 20.41, 20.01) standard deviations lower than trials that did not use a needle sham (p = 0.036). Trials that used penetrating needles for sham control had smaller effect sizes than those with non-penetrating sham or sham control without needles. The difference in effect size was 20.45 (95% C.I. 20.78, 20.12; p = 0.007). For the sensitivity analysis that excluded the Vas trials, this effect size reduced to 20.19 (95% C.I. 20.39, 0.01; p = 0.058). There were no significant differences between nonpenetrating needles and sham techniques that did not involve needling.
In further sensitivity analyses, reclassification of the Berman trial had little effect on our results. For example, the comparison of  Table S1, Supporting Information).

Non-sham controls
Trial-level characteristics for trials without sham controls are described in Table 2. The majority of these trials (72%) were Waiting list control: Control patients were not permitted to have prophylactic treatment for 12 weeks. All patients were allowed to treat acute headache as necessary (following current guidelines).

Routine
Berman (2004) [5] Education-attention control: Patients in this arm attended six two-hour group sessions based for arthritis self-management, and received periodic educational materials by mail. Patients in the acupuncture and sham acupuncture arms did not participate in this intervention.

Routine
Cherkin (2001) [35] Self-care education: Patients in this group received a book with information about back pain, treatment, improving quality of life and coping with emotional and interpersonal issues surrounding back pain. Patients also received two professionally-produced videos which addressed self-management of back pain and demonstrated exercises. Patients in the acupuncture and massage groups did not receive this educational material.
Routine Scharf (2006) [26] Conservative therapy: Patients in the conservative therapy group had 10 visits with physicians and received prescriptions for either diclofenac (up to 150 mg/day) or rofecoxib (25 mg/day) up to week 23. Patients in this group who had ''partially successful'' results were offered the choice of attending an additional five visits. Patients in the verum acupuncture and sham acupuncture groups were permitted to take up to 150 mg/day of diclofenac for the first two weeks and a total of 1 g of diclofenac during the rest of the study. Patients in both acupuncture groups and in the conservative management group received up to six sessions of physiotherapy. All patients were prohibited from taking any analgesics other than diclofenac and rofecoxib and any corticosteroids.

Protocolled
Diener (2006) [6] Standard migraine treatment: Control group patients were treated according to the guidelines of the German Migraine and Headache Society. Patients had six to seven visits in which standard treatment was established. First choice of treatment was beta blockers, followed by flunarizine, and then valproic acid. Acute medication use was permitted in all groups.

Protocolled
Haake (2007) [11] Conventional therapy: Patients in the conventional therapy group were treated according to German guidelines. Conventional therapy patients had 10 visits with physician or physiotherapist where physiotherapy, exercise and/or similar treatments were offered. Patients in all three arms were permitted to take NSAIDs up to the maximum daily dose.

Protocolled
Williamson (2007) [36] Education and exercise: Patients in the control group were told they were in the ''home exercise'' group and received an exercise and advice leaflet.

Routine
Witt (2005) [28],   [29] Waiting list control: Patients in the waiting list control group received no acupuncture treatment for eight weeks after randomization. All patients were allowed oral NSAIDs for pain as rescue medication. All patients were prohibited from taking corticosteroids or pain medication that acted on the central nervous system.

Routine Witt (2006 -OA) [37], Witt (2006 -LBP) [38]
Conventional treatment: Patients in the control group were not allowed to use any kind of acupuncture during the first three months. All patients were allowed to use additional conventional treatments as needed.

Jena (2008) [39], Witt (2006 -Neck Pain) [40]
Conventional treatment: Patients in the control group were not allowed to use any kind of acupuncture during the first three months. All patients were allowed to use additional conventional treatments as needed. classified as routine care. Table 6 provides further details of the control groups, separately by pain type. The effect size for acupuncture in trials with routine care control (0.55, 95% CI 0.40, 0.70) was larger than when acupuncture was compared against protocolled care (0.29, 95% CI 0.01, 0.58). Although the difference in effect size was large, it was not significant (difference in effect size = 0.26, 95% CI 20.05, 0.57, p = 0.1). Removing the two studies [6] [11] in the sensitivity analysis had little effect on the effect size estimate (0.25, 95% CI 20.26, 0.76) for the comparison with protocolled care. The difference in effect size between trials utilizing protocolled vs. routine care was also similar (0.29, 95% CI 20.13, 0.72).

Principal findings
Acupuncture was significantly superior to sham irrespective of the type of sham control and superior to non-sham control irrespective of whether that constituted routine or protocolled care. That said, there were differences in effect sizes between trials with different control conditions. With regard to the types of sham control, we found that sham controls involving penetrating needles had smaller effect sizes than trials that did not use a needle control or where the needles in the control group did not penetrate the skin. An important implication is that the central estimates from our meta-analysis [2] may have underestimated the effects of acupuncture compared to sham. With regard to non-acupuncture controls, we found evidence that the effect size of acupuncture when compared to protocolled care is smaller than when compared to the less intensive routine care, although differences did not reach statistical significance.
There are two possible explanations for the differences in effect size by type of sham control: bias from unblinding and physiologic activity. It is plausible that penetrating needles are more credible to patients than non-penetrating approaches, such that patients are less likely to give biased responses on pain questionnaires. That said, there is no evidence in favor of such a hypothesis and considerable evidence against. In particular, the most common form of non-penetrating needle used was the ''Streitberger'' needle that has been carefully validated as a credible placebo in an empirical study. Indeed, study participants were unable to distinguish between the Streitberger needle and true acupuncture even when subject to both in crossover fashion [15]. The other explanation for our findings is that penetrating needles have important physiologic activity, that is, inserting an acupuncture needle superficially away from an acupuncture point may be less effective than deep insertion at a correct location, but nonetheless has some therapeutic activity against pain [16] [17].

Relationship to the literature
There has been considerable interest in the literature regarding the appropriate choice of placebo controls for non-pharmacological therapies. One approach has been to investigate trials that included a placebo arm and a no-treatment arm, and then compare outcomes between these two, and in this way explore variations in the impact of the different types of placebo. An example of this is a Cochrane review of placebo controls covering a wide range of trials for different conditions, including some acupuncture trials [18]. In a sub-group analysis the authors found that trials using ''physical placebos'' (including sham acupuncture) were associated with greater placebo effects than trials with  pharmacological placebos [18]. This finding is consistent with the results of a trial that was specifically designed to compare a sham device (sham acupuncture) with an inert pill, the sham device being associated with a greater reduction of self-reported pain [19]. These results provide supportive evidence for our finding that different types of sham control lead to different estimates of treatment effects. The data from the above Cochrane review of placebo controls were re-analysed by a different group of authors who observed that sham acupuncture interventions vs. no treatment have larger effects than other ''physical placebos'' vs. no treatment [20]. In a sub-analysis that is similar to what we report in this paper, they found that the standardised effect for acupuncture versus sham was similar for trials using penetrating sham needling (20.43; 95%CI: 20.59, 0.28) compared to trials using non-penetrating sham (20.37; 95%CI: 20.70, 0.04) [20]. By contrast, we found significantly smaller effect sizes when acupuncture was compared to sham acupuncture with penetrating needles (0.17; 95%CI: 0.11, 0.23) than when compared to non-penetrating needles (0.43; 95%CI: 0.01, 0.85). The differences might be explained by differences in the trials included -our data involved only chronic pain trials of methodologically high quality -and the greater precision afforded by individual patient data meta-analysis: note that the wide confidence intervals in the Cochrane data are consistent with the main estimates from the current analysis.

Study strengths and limitations
Combining patient data from 29 high-quality trials in a single database provides us for the first time with sufficient power to explore the role of controls in trials of acupuncture for chronic pain, because the power of meta-regression is strongly influenced by the number of trials and their variation. We were unable to address questions as to the depth and location of sham needle placement as only one trial used deep sham needle insertion and all sham-controlled trials that used true acupuncture points avoided penetrating needles. While the difference in effect size between routine and protocolled care is large and in the direction expected, it is associated with wide confidence intervals. Partially, this is due to the wide variety of non-sham controls, and the difficulty we had in categorizing them.
Even with this large dataset we do not have a full understanding of the different physiologic and psychologic effects of sham acupuncture. One limitation within the field generally is that the mechanisms for a persistent effect of acupuncture on chronic pain are incompletely understood and therefore we have no clear idea of whether a sham control inadvertently activates these mechanisms or not. This lack of understanding about the physiological mechanisms of acupuncture limits any firm conclusions we can draw regarding the extent that any of the sham controls discussed above can be considered as a true 'placebo'. Moreover when implementing sham acupuncture trials, the outcome may also be influenced by factors not included in our analysis, such as the believability of the control, prior knowledge of patients about acupuncture, whether the true acupuncture group was treated identically, the extent that practitioners were able to maintain equipoise, and practical implementation issues, such as how carefully the ring that comes with the Streitberger needle [15] was taped in place.

Implications for research
The research question remains the primary determinant on choice of control. In a strategy document developed with a range of collaborators using consensus methods, a useful distinction has been drawn between efficacy trials that seek to determine whether there are specific effects beyond the placebo in an ideal treatment environment, and effectiveness trials that seek to determine the overall impact of acupuncture in which specific and non-specific effects are combined [21]. Moreover research questions investigating the value of specific point location need to have sham needles located away from true acupuncture points while research questions testing skin penetration require non-penetrating sham  needle controls applied at the same points as in the true acupuncture group. The choice of a sham acupuncture control needs to be informed by consideration of the likely impact of the sham intervention. In the past, judgments on this have often used expert opinion on putative physiological activity of a sham control, even though we have yet to understand the mechanism(s) of the action of acupuncture [17]. A number of commentators have speculated that penetrating sham needling may be physiologically active and thus be an inappropriate sham control [16]. Our results provide support for this contention, suggesting that needle penetration should be avoided as a sham technique to control for non-specific effects associated with acupuncture in trials involving chronic pain patients. However sham acupuncture involving penetrating needles may well have a place when addressing questions of point specificity in explanatory trials. We are more cautious with regard to recommending the use of non-penetrating needles. Many forms of Japanese acupuncture use shallow insertion or non-insertion (the toya hari method) [22]. Using non-penetrating needles in controlled trials is not without its challenges: although apparently less active than other types of sham, we cannot assume that nonpenetrating needles have complete physiologic inactivity; furthermore, there are practical questions regarding whether to enroll only acupuncture-naïve patients and whether practitioners can maintain equipoise in large trials over reasonable periods of time.
When sham acupuncture is not used, the choice of control is clearly driven by the research question. For instance, in the UK National Health Service (NHS) trial of acupuncture for chronic headache, the study question of Vickers et al was related to the effects of making acupuncture more widely available in primary care, a pragmatic comparison of ''use acupuncture'' and ''avoid acupuncture'' [23]. On the other hand, Foster et al. were interested in the impact of acupuncture when added to an existing rehabilitation program [24]. Yet our findings have clear implications for sample size calculations, with larger sample sizes needed in trials where care in the control arm is carefully specified.

Conclusion
From a large database of individual patient data from highquality randomized trials, we found acupuncture to be significantly superior to control irrespective of the subtype of control. When compared against sham, trials with penetrating needles reported lower effect sizes for acupuncture than trials with non-penetrating needles or those that used non-needle sham. This suggests that penetrating needles have important physiologic activity, even when inserted superficially away from true acupuncture points. Accordingly, we recommend that this type of sham be avoided. In trials without sham control, we found that the effect size likely depends on the intensity of treatment in the control group, with smaller differences between acupuncture and protocol guided programs of treatment than between acupuncture and routine care. While the choice of control should be driven by the study question, these findings can help inform study design in acupuncture, particularly with respect to sample size.