Does the rising placebo response impact antihypertensive clinical trial outcomes? An analysis of data from the Food and Drug Administration 1990-2016

Background Recent studies show that placebo response has grown significantly over time in clinical trials for antidepressants, ADHD medications, antiepileptics, and antidiabetics. Contrary to expectations, trial outcome measures and success rates have not been impacted. This study aimed to see if this trend of increasing placebo response and stable efficacy outcome measures is unique to the conditions previously studied or if it occurs in trials for conditions with physiologically-measured symptoms, such as hypertension. Method For this reason, we evaluated the efficacy data reported in the US Food and Drug Administration Medical and Statistical reviews for 23 antihypertensive programs (32,022 patients, 63 trials, 142 treatment arms). Placebo and medication response, effect sizes, and drug-placebo differences were calculated for each treatment arm and examined over time using meta-regression. We also explored the relationship of sample size, trial duration, baseline blood pressure, and number of treatment arms to placebo/drug response and efficacy outcome measures. Results Like trials of other conditions, placebo response has risen significantly over time (R2 = 0.093, p = 0.018) and effect size (R2 = 0.013, p = 0.187) drug-placebo difference (R2 = 0.013, p = 0.182) and success rate (134/142, 94.4%) have remained unaffected, likely due to a significant compensatory increase in antihypertensive response (R2 = 0.086, p<0.001). Treatment arms are likely overpowered with sample sizes increasing over time (R2 = 0.387, p<0.0001) and stable, large effect sizes (0.78 ±0.37). The exploratory analysis of sample size, trial duration, baseline blood pressure, and number of treatment arms yielded mixed results unlikely to explain the pattern of placebo response and efficacy outcomes over time. The magnitude of placebo response had no relationship to effect size (p = 0.877), antihypertensive-placebo differences (p = 0.752), or p-values (p = 0.963) but was correlated with antihypertensive response (R2 = 0.347, p<0.0001). Conclusions As hypothesized, this study shows that placebo response is increasing in clinical trials for hypertension without any evidence of this increase impacting trial outcomes. Attempting to control placebo response in clinical trials for hypertension may not be necessary for successful efficacy outcomes. In exploratory analysis, we noted that despite finding significant relationships, none of the trial or patient characteristics we examined offered a clear explanation of the rise in placebo and stability in outcome measures over time. Collectively, these data suggest that the phenomenon of increasing placebo response and stable efficacy outcomes may be a general trend, occurring across trials for various psychiatric and medical conditions with physiological and non-physiological endpoints.


Introduction
Although the placebo effect is a powerful tool for the treatment of patients with both psychiatric and physical illnesses, the placebo response as a measurement of these non-pharmacological effects in clinical trials has historically been viewed as a problem in the context of these trials [1]. Following the finding by Walsh et al in 2001 [2] that the placebo response in clinical trials for depression was variable and growing, an assumption emerged that such growth in placebo response was likely responsible for the low success rate and poor efficacy outcomes seen in antidepressant trials [3]. However, recent analysis has shown that this assumption is no longer tenable. While the placebo response is still rising significantly, negative impacts on the efficacy outcomes of antidepressant clinical trials have not been observed [4]. Effect size, drugplacebo differences, and success rate have remained stable, due to a parallel increase in drug response. This pattern of rising placebo response and unaffected trial outcomes does not appear to be unique to antidepressant trials; we have also seen it in clinical trials for ADHD medications [5], antiepileptics [6], and antihyperglycemics [7]. In this context, it is important to note that other investigators have questioned if the placebo response is actually rising in antidepressant trials [8]. Specifically, these authors used a categorical definition of placebo response (number of responders, those with 50% reduction in symptoms from baseline). However, this is a transformed endpoint which is not used by regulatory agencies like the US FDA in their assessment of pharmacological treatments. This categorical assay of placebo response, along with the fact that these authors grouped the trials by five-year intervals, drastically reduces the sensitivity of their analysis. Considering this and other significant divergence in methodological decisions in these investigators' analysis, we conclude from our previous analysis [4] that the magnitude of placebo response as measured continuously over time in FDA reviewed clinical trials of antidepressants, has definitely increased.
Given the aforementioned findings, we decided to evaluate if this pattern of rising placebo response and stable efficacy outcomes over time is exclusive to clinical trials of psychiatric conditions like depression or ADHD or medical conditions like epilepsy or diabetes, or if this pattern could be seen in other conditions like hypertension. To evaluate this possibility, we examined efficacy data from the New Drug Approval packets for investigational antihypertensives. We chose hypertension trials because they are prone to a non-trivial placebo response [9][10][11]. Additionally, hypertension trial designs are fairly consistent and they evaluate efficacy over a period of weeks. And most importantly, a systematic analysis of primary-sourced FDA clinical trial efficacy data for hypertension trials has not be undertaken as of yet, representing a considerable gap in the literature.
While hypertension trials have many design similarities that make them comparable to trials we have analyzed for the aforementioned conditions, it is important to consider that hypertension trials have some notable idiosyncrasies. One such idiosyncrasy we considered stems from the fact that the selected primary efficacy outcome measure is thought to potentially influence hypertension trial outcomes. This is based on the idea that placebo response may vary across different contexts and styles of blood pressure measurement. Studies [12] have suggested that the "white-coat effect" on in-office blood pressure measurement may contribute significantly to the placebo response. Changing the context and increasing data points by using more frequent out-of-office measurements, such as 24-hour ambulatory blood pressure cuffs or in-home self-monitoring, may reduce the statistical noise of normal blood pressure variability. Such techniques may therefore increase reproducibility [13][14][15] and yield lower estimations of placebo response [16][17][18][19]. The adoption of such techniques in the measurement of primary efficacy endpoints in FDA clinical trials has not been quantified.
Additionally, it is important to note that hypertension trials tend to have much larger sample sizes than antidepressant and ADHD trials. Smaller trials (less than 100 patients) are infrequent in the recent history of antihypertensive trials. Given that we have found in previous analysis that the adequacy of statistical power from sample sizes has had a significant effect on the relationship between placebo response and trial outcomes in antidepressant trials, we aimed to explore the impact of large trial sample sizes as it relates to hypertension trial outcomes.
To investigate the placebo response and trial efficacy outcomes for antihypertensives, we evaluated the clinical trial data submitted as proof of efficacy and reviewed by the US Food and Drug Administration for 23 antihypertensive medications between 1990 and 2016. Our hypothesis was that the magnitude of placebo response in clinical trials of antihypertensive medications has increased over time without impacting the effect size, drug-placebo difference, or the success rate of these trials. We presumed that this pattern would occur due to a compensatory increase in the magnitude of response in the antihypertensive treatment group over time. We also explored the relationships of trial duration, number of treatment arms, baseline blood pressure, and sample size to efficacy outcomes and placebo response to see if changes in these variables could adequately explain any changes that occurred over time.

Source: FDA Access Data database
We used the New Drug Approval (NDA) packets published on the US FDA database (http:// www.accessdata.fda.gov/) [20] as our source for efficacy data. A benefit of this database is that these data have been unbiasedly reviewed for approval by FDA medical and statistical staff as compared to data from published reports [21]. Additionally, the statistical treatments and presentation of data in these reviews are of sufficient quality, completeness, and comparability such that we could analyze these efficacy data across different types of investigational agents.

Selection of programs
We selected programs for investigational antihypertensive medications (oral agents indicated for treatment of essential hypertension) if their NDAs (which include the FDA medical and statistical reviews of the trials conducted for efficacy evaluation) were available on the FDA database website (http://www.accessdata.fda.gov/).
Programs for which multiple indications were listed were only included if the trials submitted for proof of efficacy used patients diagnosed exclusively with essential hypertension. Combinations (ie. + HCT) and new formulations (ie. extended release formulations) were included if the cited trial data were not already included in a New Drug Approval packet for a previous formulation.

Selection of trials/treatment arms
For inclusion in this current analysis, we considered all of the trials reviewed for efficacy from each NDA program that we could access. Of these trials, we included all acute, placebo-controlled trials of approved doses of the investigational oral antihypertensive that were cited in the integrated review of efficacy for approval and met the following PICO criteria: P: adults, aged 18-65 years, with essential hypertension (defined as diastolic blood pressure !90 mmHg), inclusive of both male and female patients of all races. I: oral antihypertensive drugs at approved dosing levels. C: placebo pill. O: either diastolic or systolic (whichever was indicated as the primary outcome measure), seated or supine blood pressure measured after a duration of !3 weeks and 24 weeks after baseline measurements. Studies with incomparable design differences (ie. relapse prevention studies) were also excluded. The patients in these FDA studies were otherwise healthy or had all other physical illnesses under adequate pharmaceutical control.
It is important to note that sub-therapeutic doses are intentionally included in dose-finding studies in order to demonstrate the lowest effective dose. Because these treatment arms serve a purpose other than to be approved at the dose used, we excluded such treatment arms using unapproved doses of the active medication.

Data collection-Extraction from FDA efficacy review
FDA reviewers conduct independent statistical analysis of efficacy for each treatment arm at different dose levels within a trial. For this reason, we decided to examine treatment arms independently of the trials they were in. Within the NDA, efficacy endpoint analysis is conducted which compares symptom reduction between antihypertensive and placebo treated patients on the pre-specified primary outcome measure. The results from these analyses are typically presented in a table. We extracted in duplicate form the baseline and change scores for both drug and placebo, and the p-value resulting from the comparison of change scores between active treatment and control.
P-values: P-values were recorded in exact form from the endpoint analysis conducted by the FDA statistical reviewer. P-values were reported by the FDA for each individual trial arm comparison of antihypertensive treatment to placebo. In cases of both significant (p < 0.05) and insignificant (p > 0.05) p-values, we recorded the p-value along with all of the decimal places reported rather than the threshold (ie. p = 0.034 rather than p<0.05). In some cases, only the threshold was reported in the NDA and so we recorded the threshold.
Baseline Scores: Mean blood pressure at the beginning of the trial was reported for placebo and antihypertensive-treated patients. These baseline scores were extracted for each treatment arm.
Drug and Placebo Response: Drug/placebo response was defined as the change in the primary efficacy measure of blood pressure. Change scores for placebo and active treatment represented the reported mean reductions in blood pressure points between start and end scores at the conclusion of the treatment period (Baseline mean BP-Endpoint mean BP). Evaluation of such change scores is what FDA reviewers use to determine the efficacy of investigational agents. For the purposes of this analysis, change scores are expressed as a positive number if treatment reduced blood pressure and are negative if blood pressure increased. Higher change scores indicated greater treatment response.
Trial Arm Success: FDA reviewers use p-value < 0.05 to determine statistical success of a treatment arm comparison of drug to placebo. Treatment arms were denoted as failed if the resulting p-value was ! 0.05 for the comparison. The success rate was the number of treatment arms meeting statistical significance as reported by the FDA out of the total number of treatment arms in the trials.
Treatment Arm Sample Size: Sample size was calculated by adding the reported number of Intent-to-treat (ITT) patients from placebo treatment (placebo n) to the number of ITT patients from active treatment (antihypertensive n) to generate a single N value for the total sample in the treatment arm comparison. Each treatment arm had a Sample Size N comprised of placebo cell n and active treatment n.

Efficacy outcome measures
Drug-Placebo Difference: The difference in treatment response between placebo and antihypertensives was calculated by subtracting the placebo change score from the active treatment change score for each treatment arm.
Effect Size: We used Hedges' G formula to calculate a standardized effect size for the drugplacebo difference in blood pressure reduction (sometimes referred to as the placebo-subtracted treatment effect). In cases where sufficient variance estimations were reported (ie. standard deviation or standard error and sample size), we calculated effect size using the typical formula. In cases where no measures of variance were given, we used a workaround method of Corrected Hedges' G formula which uses precise p-values, following in suit with Turner et al [20]: Hedge's G workaround method using precise P-values. As proposed by Turner et al [20] in their study of antidepressant clinical trial data, this method utilizes the Inverse T-score function (TINV) in Microsoft Excel. Precise p-values (most decimal places given) and degrees of freedom are imputed into the function to calculate a t-score, which can be transformed to Hedges' G using the following equation: g ¼ tx . Hedges' G effect size has a proposed correction for small sample size as follows:

Statistical measures
Statistical measures were generated with IBM Statistical Package for the Social Sciences (SPSS). Simple meta-regression analysis was used to predict treatment response and outcomes based on year of approval and to plot the data over time. Meta-regression with random effects via maximum likelihood was modeled to evaluate potential modifiers of placebo and drug response and efficacy outcome measures. Trial and patient characteristics. We recorded the duration for each trial as the number of weeks between baseline measurement and the final measurement of blood pressure. We were also able to record the number of treatment arms included in each trial (as originally designed, including active comparators). For example, a trial with placebo, two different dose levels of investigational antihypertensive, and an active comparator arm would have been coded as having four treatment arms. Patient characteristics including proportion of male/ females, racial demographics, and mean age could not be statistically analyzed due to the fact that less than two-thirds of the trials reported these measures. Duration, number of treatment arms, baseline blood pressure (severity of hypertension), year of approval, and sample size were able to be entered as potential modifiers in the meta-regression models.

Results
There were 23 antihypertensive medications (year of approval) that met inclusion for this study: isradipine (1990), eprosartan mesylate (1997) We excluded 11 trials of subgroups including four trials evaluating males only, two trials of elderly patients (+65 yo), and five trials of only racial subgroups. We also excluded four trials with incomparable design differences (ie relapse prevention trials), eight trials with durations < 3 weeks or > 24 weeks, and four using alternative outcome measures (such as cough studies). We also excluded 26 trials that were not placebo-controlled. Exclusion of these trials left 63 trials for analysis.
From these trials, 211 treatment arms reported outcome data. After excluding 69 treatment arms with unapproved dose levels, 142 treatment arms remained for this analysis.  Table 1 presents the essential characteristics and raw data of 142 antihypertensive treatment arms organized by year of approval. Of the 142 treatments arms for antihypertensive medications, 25 p-value thresholds were given instead of exact p-value and therefore effect size estimates were calculated using the traditional formula for Hedges' G using standard deviation. Additionally, three treatment arms did not have enough data (either an exact p-value or standard deviations) to calculate treatment effect sizes (see Table 1).
As shown in Table 1, 100% (63/63) of the trials used diastolic blood pressure as the primary outcome measure. In-office seated blood pressure measurement was used in the majority of the trials (69.8%; 44/63) while in-office measurement with the patient supine was used in 27.0% (17/63) of the trials. A very small percentage (3.2%; 2/63) used out-of-office, ambulatory blood pressure monitoring as the trial's primary outcome measure.

Placebo and antihypertensive response over time
As can be seen in Fig 1, placebo response appears to be increasing over time. A simple metaregression was modeled to predict placebo response based on year of approval and significance was found (p = 0.013) with an R 2 of 0.093. Placebo response increased by 0.131 for each year following 1990.
Antihypertensive response similarly increased with year of approval (see Fig 1) and significance for the meta-regression model was found (p<0.001), with an R 2 of 0.086. Antihypertensive response increased by 0.193 for each year following 1999.

Efficacy outcomes (effect size, drug-placebo difference, and success rate) over time
The meta-regression analysis revealed that the apparent increase in effect size over time was not significant (see Fig 2) (R 2 = 0.017, p = 0.119). A lack of significant change over time was also evident in the regression analysis of antihypertensive-placebo response differences (R 2 = 0.013, p = 0.176). Overall, antihypertensives maintain superiority over placebo by about 7.2 (±3.1) diastolic blood pressure points with a mean effect size of 0.78 (±0.37). The rate of statistical superiority of drug over placebo (success rate, as determined by the statistical analysis of the FDA reviewer) for the treatment arms we analyzed was 94.4% (134/142) and did not change over time.

Placebo response and antihypertensive efficacy outcomes
Placebo response had no relationship to effect size (β = -0.002, R 2 = 0.0002, p = 0.877), antihypertensive-placebo differences (β = -0.035, R 2 = 0.001, p = 0.752), or p-values (β = 0.0001, R 2 = 0.0003, p = 0.963), showing that despite the increase in magnitude of placebo response, there has been no impact on clinical trial efficacy outcomes. However, the rise in placebo response was significantly related to the rise in antihypertensive response (β = 0.965, R 2 = 0.347, p<0.0001) indicating that as the reduction in blood pressure points went up in the placebo treatment group, the reduction in blood pressure with antihypertensives increased by nearly the same amount.

Relationship of trial characteristics to placebo/drug response and efficacy outcomes
The R 2 of the model predicting placebo response as a factor of the duration, number of treatment arms, placebo baseline blood pressure, and treatment arm sample size was 0.30 (p = 0.0003). Out of these variables, higher placebo baseline blood pressure (β = 0.364, p = 0.0007) and treatment arm sample size (β = 0.008, p = 0.0019) significantly predicted Rising placebo response and antihypertensive clinical trial outcomes higher placebo response. When examined independently, only sample size remained statistically significant (β = 0.004, R 2 = 0.072, p = 0.0312).
For the model predicting the efficacy outcome of treatment arm standardized effect sizes (Hedges' G), we entered in the duration of the trial, number of treatment arms, treatment arm sample size, and the baseline blood pressure for each treatment arm overall (weighted average of placebo and drug treatment group baselines). The overall model had an R 2 of 0.421 (p<0.0001). Duration (β = -0.038, p = 0.0003), number of treatment arms (β = 0.059, p<0.0001), and the treatment arm overall baseline blood pressure (β = -0.021, p = 0.044) were significantly related to the treatment effect size. When examined individually, only duration (β = -0.041, R 2 = 0.07, p = 0.0018) and number of treatment arms (β = 0.042, R 2 = 0.213, p<0.0001) remained significant, with shorter duration and higher number of treatment arms predicting higher effect size.

Trial characteristics over time
Sample size has increased significantly over time (p< 0.0001) with an R 2 = 0.387 (see Fig 3).

Discussion
This study evaluated clinical trial data from FDA reviews of antihypertensive medications with the aim of testing if the pattern of a rising placebo response and stable efficacy outcomes seen over time in trials of psychiatric [4,5] and medical conditions [6,7] could be seen in clinical trials for hypertension. As hypothesized, antihypertensive clinical trial data showed a similar pattern to the psychiatric and medical trials previously analyzed, wherein placebo response increased significantly over time and outcome measures of effect size, drug-placebo differences, and success rate remained the same, likely due to the parallel and significant increase in active treatment response. As confirmed by the lack of relationship between the magnitude of placebo response and trial efficacy outcomes, it appears that growth in placebo response over time did not have any impact on antihypertensive clinical trial efficacy outcomes.
It is interesting that clinical trials evaluating medications for non-psychiatric conditions using physiologically-measured endpoints, as the hypertension trials do, exhibit the same dramatic increase in placebo response over time as the other conditions, nearly doubling over 25 years of antihypertensive trial history (see Fig 1). Although with retrospective data it is not possible to determine a causal explanation, one possible explanation for the rising placebo response is that given the historical shift towards direct-to-consumer marketing of prescription medications, patients may have higher expectancy for medication effects. Although conceptually sound and based on observed evidence [22], this theory has not been tested prospectively.
Furthermore, it is likely that the nearly 50% increase in drug response (see Fig 1) is due to the additive nature of placebo response, which inherently contributes to the measurement of the overall drug effect. While the overall efficacy of the agents appears stable, as seen by the constant distance between the drug and placebo response and stable effect sizes (see Figs 1 and 2), the proportion of the drug response that represents nonspecific placebo effects has likely increased over time parallel to the placebo treatment arms. This is supported by our finding of a significant correlation (R 2 = 0.347, p<0.0001) between the magnitude of blood pressure reduction with placebo and with antihypertensives. While the additive relationship between placebo and drug response in the measure of antihypertensive response is assumed, it is important to test this assumption in light of the attention and concern given to the rising placebo response. What these data show is that there has not been a ceiling effect on the response to antihypertensive agents and that the growth in drug response likely reflects a growth in placebo response. Considered collectively, these findings suggest that efforts to reduce or control the response to placebo in antihypertensive clinical trials may not be necessary for successful efficacy outcomes.
In our exploratory analysis, we evaluated the potential role of the duration, treatment arm sample size, baseline blood pressure, and number of treatment arms on efficacy outcomes, placebo, and drug response. The results were mixed and the trial and/or patient characteristics that surfaced were not reliable. Higher baseline blood pressure did appear to have some potential relationship to higher placebo and drug response. This may potentially be related to increased effects from regression to the mean, with initially more severe cases of hypertension returning to average over the course of the trial. Greater number of treatment arms appeared to have a relationship with higher drug response and effect size and the number of treatment arms also increased over time. This finding is disjointed from previous analyses that have indicated that greater number of treatment arms (as a measure of the likelihood of a patient receiving placebo, or in other words, patient expectations) may increase the placebo response [23,24]. Shorter trial duration appeared to predict higher effect sizes and the average trial duration did not change significantly over the time period examined. Higher treatment arm sample size appeared to predict higher placebo response, although the size of the effect (β = 0.004, R 2 = 0.072, p = 0.0312) was quite small.
While these findings may inform future trial design, it is important to note that these data are from a multivariate meta-regression based on retrospective analysis which can be subject to spurious findings. Additionally, all of the factors when examined together accounted for less than half of the variance in effect size and placebo/drug responses, suggesting the influence of variables that we were not able to quantify with these data. Finally, these findings do not offer a coherent explanation for the rise in placebo response and stability in efficacy outcome measures: for example, baseline blood pressure has decreased significantly over time, which should have predicted a decreasing placebo and drug response over time based off of the meta-regression findings.
What is clear from these data is that these trials are well over-powered to demonstrate the average effect size (0.78 ±0.37:~50N required between placebo and active treatment for a statistical power of 85%). The mean trial arm sample size (~350N) exceeds the required number of patients by seven times and in some trials, up to 15 times over. The trend of overpowering is continuing as sample size has increased significantly over time (R 2 = 0.387, p<0.001, see Fig 3), while effect sizes (see Fig 2) and antihypertensive-placebo differences have not changed significantly-indicating that while there has been no demonstration of better or worse efficacy among antihypertensive trials, there has been significantly more patient exposure to the research paradigm.
One possible explanation for this observation is that more recent trials may be designed to serve dually as both efficacy and safety evaluations, requiring greater patient exposure. Another potential explanation is that regulatory and publication agencies may still view trials with~50N per treatment arm as small, even despite the reliably large effect size, and may be reluctant to accept the findings. Overpowering may protect against this bias as well as ensure that even if the treatment effect found is smaller than expected, that the treatment arm will still be successful. While such overly adequate powering may help explain the fact that trial outcomes have remained unaffected while placebo response has increased, excessive exposure and use of resources should also be considered.
Additionally, while techniques like 24-hr ambulatory blood pressure monitoring have been shown to increase reproducibility [13][14][15] and yield lower estimations of placebo response [16][17][18][19], their adoption as primary efficacy endpoints in FDA clinical trials for hypertension has not been widespread. Only two out of 63 trials (3.2%) used such a technique as a primary outcome measure. Although not prospectively tested in clinical trials, the lower variance in placebo and drug response associated with the use of ambulatory monitoring may require fewer patients to demonstrate equivalent treatment efficacy.
The limitations of this study include the fact that it is retrospective analysis and patient-level data are not available in these summary reviews. Additionally, our statistical analysis of trial and patient variables was limited because there was low reporting for patient characteristics and little representation of primary outcome measures other than diastolic, in-office measurement. Finally, these data represent the statistical analyses of FDA reviewers examining efficacy trials for investigational medications that eventually received approval. Therefore, these data do not represent the full spectrum of trials using investigational antihypertensive agents. While selection bias does not occur in the same way that it does in published studies (in that FDA reviewers evaluate all trials conducted for efficacy regardless of positive or negative outcome), there are likely biases stemming from regulatory processes (including the type of statistical analysis and only reporting on trials deemed to be of sufficient quality of conduct and design).
This study provides evidence that the magnitude of placebo response is rising in FDA clinical trials for hypertension, similar to what has been observed in trials for several other medical and psychiatric conditions. Like antidepressants, ADHD medications, antiepileptics, and antihyperglycemics, this rise in placebo response has not negatively impacted efficacy outcomes including standardized treatment effect size, raw drug-placebo difference in blood pressure reduction, and success rate. As expected in adequately powered trials, the drug response has also increased and the magnitude of placebo response shows no relationship to outcomes of effect size, drug-placebo difference, or p-values, suggesting that attempts to control the response to placebo may not be necessary. Among the trial design and patient variables that we could quantify, there was not a clear explanation for the phenomenon of the rise in placebo response and stability in efficacy outcomes over time. These data suggest that the phenomenon of increasing placebo response and stable efficacy outcomes may be a general trend, occurring across trials for various psychiatric and medical conditions with physiological and non-physiological endpoints.