More questions than answers about the outcomes of puberty suppression

Posted by MichaelBiggs on 03 Feb 2021 at 04:18 GMT

The publication of these outcomes—which I urged in March 2019 [1]—is welcome. As the last patient in the study commenced GnRHa treatment in April 2015, the outcomes after 12 months of treatment became available in 2016, the outcomes after 24 months in 2017, and the outcomes after 36 months in 2018. The statistical analysis plan is dated October 2019 (Supplement 2). The plan lowers expectations about the effects of GnRHa, and is notably more pessimistic than the experimental protocol granted ethical approval in 2010. The initial protocol, for example stated that “Early intervention is also associated with a reduction in the gender dysphoria experienced by these adolescents” [2]. By 2019, by contrast, the authors declare: “It is therefore unlikely that GnRHa treatment will result in significant reduction in body dissatisfaction” (Supplement 2, pp. 12-13). Whether it is ethical for researchers to give an experimental drug to children if they expect the drug to provide no relief to their condition deserves reflection.

The paper’s headline finding is that “GnRHa treatment brought no measurable benefit nor harm to psychological function in these young people with GD [gender dysphoria]” (p. 20). The lack of discernible improvement is surprising because children and their parents were enthusiastic about puberty blockers and would have considered themselves fortunate to be the first British adolescents to receive them. This context would have induced a powerful positive placebo response. Placebo response is unfortunately never addressed by practitioners of gender medicine, despite its crucial importance in conditions such as depression and anxiety [3].

The headline finding might partly reflect the choice to combine results for girls and boys, and to test sex differences for only 2 psychological measures out of 26 (Table 6). “Our statistical analysis plan restricted testing all outcomes for differences by sex due to the type 1 error risk,” the authors explain (pp. 20-21). There is no justification, however, for not tabulating results disaggregated by sex, as done by the landmark Dutch study [4] on which the authors’ study was modelled [2], and by the first author’s presentation of preliminary results for the first 30 patients [5]. I have demonstrated across multiple measures that the effect for boys is uncorrelated with the effect for girls, in the preliminary results and likewise in the Dutch study [6]. In both datasets, to take the clearest example, girls’ body image worsened following GnRHa, while boys’ body image improved. By combining both sexes, the paper makes it impossible to discern such patterns. The dataset released by the authors omits the variable for sex.

The paper provides little information on self-harm. There are two indexes, one created from the child’s answers and one from a parent’s. Each index sums two questions, each scored as 0, 1, or 2. The authors report only the median and the interquartile range (Table 4). The median is always 0 because most children do not harm themselves. Why not report the mean, as was done in the preliminary results [5]? Or tabulate the frequency? Disaggregating by sex would also be informative, because their preliminary results for the first 30 patients showed that the increase in self-harm—on the question “I deliberately try to hurt or kill myself”—was greater for girls than for boys (the sex difference was statistically significant, p = .014). The dataset released by the authors omits their indexes of self-harm and the questions used to construct them.

The paper is scrupulous in minimizing the number of statistical tests on this small sample, comprising only 44 individuals. The authors point out that my analysis [6] of their preliminary results does not adjust for the number of statistical tests, because it replicated the procedure used by the Dutch study [4]. The authors’ critique applies equally to that study, whose sample was almost as small, ranging from 41 to 57 (depending on the measure). Applying the Bonferroni correction, as the authors advocate (p. 9), would also eliminate 3 out 8 of the positive findings in the Dutch study. Most importantly, the improvement in overall psychological functioning (captured the Children’s Global Adjustment Scale) and the reduction in depression would no longer be statistically significant (p > .05 / 14). The authors have thus, perhaps inadvertently, undermined a significant portion of the overall evidence supporting the use of GnRHa to treat gender dysphoria.

The authors argue plausibly that their sample was too small to detect changes in psychological functioning. They do not explain why they failed to include more recent cohorts from their own clinic. Each year since 2015, the Gender Identity Development Service has administered GnRHa to over 50 children aged under 15 to its GnRHa programme. The authors therefore now possess data on the effect of puberty suppression—after twelve months—on at least 250 more adolescents in this age bracket (counting those referred to their endocrine clinic from January 2015 to December 2018). A sample size of around 300 would provide sufficient statistical power to test whether puberty suppression leads to improvement or deterioration.

The paper confirms the detrimental effect of GnRHa on bone mineral density [7]. At baseline the patients were already about half a standard deviation below the norm for their age and sex (Table 3). After 12 months, they were one standard deviation below the norm; at 24 months, more than one standard deviation below. The paper omits the range of bone mineral density, which is crucial because the serious health risks lie in the lower tail of the distribution. The dataset released with the paper enables this tail to be examined. Considering spine bone mineral density after 24 months of treatment, for example, 7 out of 24 patients recorded density two standard deviations below the norm. Indeed, 4 of them (17%) had a z-score putting them in the lowest 0.1 percentile of the distribution for their sex and age (z-score < -3.09). Bone breakages and fractures should have been treated as adverse events, but the paper does not mention whether they were recorded.

The most important outcome—but the least surprising—is that 43 out of 44 patients continued to cross-sex hormones. It is hard to square this finding with the authors’ claim that “pubertal suppression may be both a treatment in its own right and also an intermediate step in a longer treatment pathway” (p. 22). Considered as a treatment in its own right, the suppression of puberty with GnRHa might be the only treatment provided by the NHS for which there is no objective evidence that the benefits outweigh the risks—as the authors themselves admitted in their statistical plan. The only justification for puberty suppression is to prepare a child for lifelong medicalization with cross-sex hormones and surgeries, with irreversible consequences for sexuality and fertility.

Competing interests declared: I made a formal complaint in 2019 to the Health Research Authority because the authors had not published results after two years, as they promised in their experimental protocol approved in 2011.
I acted as expert witness in the case of Keira Bell and Mrs A versus Tavistock and Portman NHS Foundation Trust. The judgment was handed down in December 2020 (and so this comment is not relevant to any ongoing legal proceedings).