A PRISMA systematic review of adolescent gender dysphoria literature: 1) Epidemiology

It is unclear whether the research literature on adolescent gender dysphoria (GD) provides sufficient evidence to adequately inform clinical decision making. In the first of a series of three papers, this study sought to systematically review published evidence regarding: the prevalence of GD in adolescence; the proportions of natal males/females with GD in adolescence and whether this changed over time; and the pattern of age at (a) onset (b) referral and (c) assessment. Having searched PROSPERO and the Cochrane library for existing systematic reviews (and finding none), we searched Ovid Medline 1946 –October week 4 2020, Embase 1947–present (updated daily), CINAHL 1983–2020, and PsycInfo 1914–2020. The final search was carried out on the 2nd November 2020 using a core strategy including search terms for ‘adolescence’ and ‘gender dysphoria’ which was adapted according to the structure of each database. Papers were excluded if they did not clearly report on clinically-verified gender dysphoria, if they were focused on adult populations, if they did not include original data (epidemiological, clinical, or survey) on adolescents (aged at least 12 and under 18 years), or if they were not peer-reviewed journal publications. From 6202 potentially relevant articles (post de-duplication), 38 papers from 11 countries representing between 3000 and 4000 participants were included in our final sample. Most studies were observational cohort studies, usually using retrospective record review (26). A few compared to normative or population datasets; most (31) were published in the past 5 years. There was significant overlap of study samples (accounted for in our quantitative synthesis). No population studies are available, so prevalence is not possible to ascertain. There is evidence of an increase in frequency of presentation to services, and of a shift in the natal sex of referred cases: those assigned female at birth are now in the majority. No data were available on age of onset. Within the included samples the average age was 13 years at referral, 15 years at assessment. All papers were rated by two reviewers using the Crowe Critical Appraisal Tool v1·4 (CCAT). The CCAT quality ratings ranged from 45% to 96%, with a mean of 78%. Almost half the included studies emerged from two treatment centres: there was considerable sample overlap and it is unclear how representative these are of the adolescent GD community more broadly. The increase in clinical presentations of GD, particularly among natal female adolescents, warrants further investigation. Whole population studies using administrative datasets reporting on GD / gender non-conformity may be necessary, along with inter-disciplinary research evaluating the lived experience of adolescents with GD.

Introduction Gender Dysphoria (GD) is a categorical diagnosis in the Fifth Edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) [1]. It is also used as a general descriptive term referring to a person's discontent with assigned gender. In recent years, GD diagnoses have been increasingly made in child and adolescent services [2][3][4]. There has been a parallel increase in demand for gender transition interventions, particularly among natal females [2][3][4], and including pre-adolescents [5]. Such transitions have increasingly involved the use of puberty suppression, cross-sex hormones and surgical procedures, usually in accordance with the so-called 'Dutch model', where intervention is staged in accordance with a young person's age and stage of pubertal development [6]. Calls to improve availability of medical interventions have sometimes been made on the basis of reports of much increased levels of mental health problems, including suicide attempts, among youth with GD [7,8], and claims that the medical procedures referred to would improve mental health outcomes [6,9].
Gender and sex are two terms which are often used interchangeably but that are not synonymous. The sex of a person refers to either male or female status broadly based on the sex chromosomes and genitalia. The word is used in social, medical and legal contexts in most countries to 'categorise' people under the two sexes, as boys/men or girls/women.
The term gender is harder to define, as it reflects how the individual identifies or feels, how a person 'fits' with social norms, activities and attributes that are commonly associated with male or female sex. For many people sex and gender are consistent, but for some, there is discordance between the biological anatomy of the body and gender self-perception/identity. The term 'transgender' is often used to describe this identity, whilst others will relate to terms such as GD or gender incongruence. The literature and common understanding of this area is evolving extremely rapidly, and there is increasing acknowledgement that gender identity may not relate to a binary gender definition at all. Gender does not necessarily reflect sexual orientation, an enduring pattern of emotional, romantic and/or sexual attraction [10]. Gender identity on the other hand is a component of one's personal multi-dimensional sense of self, encompassing moral, ethical, spiritual / religious beliefs [11]. For most people who do not have abnormalities of the external genitalia, sex is documented at birth and, in the majority population, sex and gender are consistent throughout life.
A strict definition, such as in the DSM-5 [1], requires that individuals diagnosed with GD have to suffer clinically significant distress or impairment in social, school, or other important areas of functioning. A range of terminology is used to describe young people experiencing GD. It is apparent from the literature that the terminology is changing rapidly, and that the terms 'assigned female / male at birth' (AFAB /AMAB) or 'natal fe/male' are commonly used in recent literature. We have chosen to use the term 'natal fe/male' (abbreviated to NF / NM) because it is less cumbersome and is inclusive of those experiencing GD and who not identify with either male or female genders (although we acknowledge it excludes many intersex people).
There is currently an intense international debate regarding a number of issues relating to GD [12]. A recent high profile example involved the Gender Identity Development Service (GIDS) in London, UK, altering its procedures due to a High Court ruling that it would be 'highly unlikely' that children under 13, and 'very doubtful' that 14-and 15-year olds, could be Gillick competent [13], and therefore they could not consent to puberty suppression treatment [14]. This decision was met with equivocal support and criticism, not least as it has implications for the broader application of the Gillick framework for consent to medical procedures. (It has since been successfully appealed [15]) A lack of good quality evidence has been acknowledged [16]. A recent review by the Swedish Agency for Health Technology Assessment and Assessment of Social Services [17] indicated that there is very little in terms of empirical evidence in the field, both in terms of overall GD epidemiology, the association of GD with mental health problems, the rate and types of medical interventions provided and outcomes (including outcomes for those not treated medically or surgically) in the longer term.

Scope of the review
This review addresses the first of three sets of questions addressing the current state of evidence on gender dysphoria experienced in adolescence. Our over-arching aim was to establish 'what does the literature tell us about gender dysphoria in adolescence? ' We broke this down into seven specific questions: 1. What is the prevalence of GD in adolescence?
2. What are the proportions of natal males / females with GD in adolescence (a) and has this changed over time (b)?
3. What is the pattern of age at (a) onset (b) referral (c) assessment (d) treatment? 4. What is the pattern of mental health problems in this population?
5. What treatments have been used to address GD in adolescence? 6. What outcomes are associated with treatment/s for GD in adolescence? 7. What are the long-term outcomes for all (treated or otherwise) in this population?
The present paper focuses on questions 1, 2, 3a, 3b, and 3c. We shall address question 4 in a second paper, and questions 3d, and 5-7 in a final paper. The methodology below includes the searches conducted for the whole review.
We set out to include any paper offering primary data in response to any of these questions, regardless of the focus of that paper.

Protocol and registration
The systematic review protocol was submitted to PROSPERO on the 28 th November 2019, and registered on 17 March 2020 (registration number CRD42020162047). An update was uploaded on 2 nd February 2021 to include specific detail on age criteria and clinical verification of condition. The review has been prepared according to PRISMA 2020 [18] guidelines (see S1 Checklist).

Eligibility criteria
The volume of non-peer-reviewed literature in initial searches proved so great that we took the decision to only include peer-reviewed journal papers featuring original research data. This decision was made subsequent to initial PROSPERO registration, but prior to full text screening. Complete inclusion criteria were: • Focused on gender dysphoria or transgenderism; • Includes data on adolescents (aged 12-17 years inclusive); • Includes original data (not review paper or opinion piece); • Peer-reviewed publication (not theses or conference proceedings); • In English language.

Information sources
We searched PROSPERO and the Cochrane library for existing systematic reviews. We searched Ovid Medline 1946 -October week 4 2020, Embase 1947-present (updated daily), CINAHL 1983-2020, and PsycInfo 1914-2020. After selecting the final sample of articles, the first author used their reference lists as a secondary data source.

Search
The final search was carried out on the 2 nd November 2020 using a core strategy which was adapted according to the structure of each database. The core strategy included search terms for 'adolescence' and 'gender dysphoria'. The specific search strategies employed in each database are detailed in Table 1.

Study selection
The study selection process is illustrated in Fig 1. In the first stage of screening, papers were excluded based on their title or abstract if they did not clearly report on gender dysphoria or transgenderism and if they were focused on adult populations. In the second stage of screening, papers were excluded on the basis of title and abstract if they did not include original data (epidemiological, clinical, or survey) on adolescents (aged at least 12 and under 18 years). At both stages papers were retained if there was insufficient information to exclude them.
Full-text files were obtained for the remaining records. Papers were rejected at this stage if they: • Contained no original data (including literature and clinical reviews, journalistic / editorial pieces, letters and commentaries); • Included only case studies or selected case series; • Pertained to conditions other than GD (e.g., Disorders of sexual development or HIV); • Did not include clinically-identified GD (e.g., survey where participants self-identify, with no clinical contact); • Pertained to populations other than those with GD (e.g., LGBTQ more broadly); • Pertained to populations including or restricted to those aged 18 years or older. This included papers where adolescents and adults were included in the same sample, but adolescents were not separately reported (in many cases age range was not reported and so a 'balance of probabilities' assessment had to be made based on the reported mean age); • Pertained to populations restricted to those aged under 12 years of age. This included papers where adolescents and children were included in the same sample, but the majority of participants were clearly under 12 (based on mean or median age); • Where participants were practitioners, not patients; • Referred only to conference proceedings; • Were written in a non-European language (e.g., Turkish); • Could not be obtained (including due to being published in non-English language journals, or in theses).
Following initial full text screening, all remaining papers were assessed by a second reviewer to reduce the risk of inclusion bias. Where reviewers reached a different conclusion, discussion took place to reach consensus. If agreement could not be reached, a third reviewer was consulted, and discussion used to reach consensus amongst all three reviewers.
Data extracted from eligible papers were tabulated and used in the quantitative and qualitative synthesis. The following information was recorded: sample size; natal sex; age (years); dates of data collection; study design; study location. Given the limited number of specialist treatment centres globally, we assessed how many of the included papers featured the same or overlapping samples.

Quality assessment
All papers were rated by two reviewers using the Crowe Critical Appraisal Tool v1�4 (CCAT [19]). CCAT is suitable for a range of methodological approaches, assessing papers in terms of eight categories: Preliminaries (overall clarity and quality); Introduction; Design; Sampling; Data collection; Ethical matters; Results; Discussion. Each category is rated out of 5 and all eight categories summed to give a total out of 40 (converted to a percentage). In the present review, each paper was then assigned to one of five categories, based on the average rating of the reviewers, where a rating of 0-20% was coded 1 (poorest quality), and 81-100% coded 5 (highest quality). Inter-rater reliability was shown to be very good (k = 0�93, SE = 0�05).

Data collection process
Data were extracted from the papers using the CCAT form (https://conchra.com.au/wpcontent/uploads/2015/12/CCAT-form-v1.4.pdf) by two reviewers per paper and compiled by the first author (LT). Any data missing from forms was extracted by LT. Once compiled, instances of overlap between papers (i.e., if the same sample was described in two papers) were identified and tabulated, and the final sample for each question defined.

Number of studies included, retained and excluded
The PRISMA diagram in Fig 1 provides details of the screening and exclusion process. The searches returned 8655 results, reduced to 6202 following de-duplication. Titles and abstracts were screened by one reviewer (LT) and 4659 records excluded after initial screening and a further 699 excluded on second stage title / abstract screening. This left 553 eligible for full text screening. An initial screening (LT) of full texts reduced the number of records to 155. Fortyeight papers were included in the final dataset, of which 38 included data for the present paper. Full characteristics of included studies are provided in Table 2.
The majority of samples were from the Netherlands (n = 10), followed by the USA (n = 10), the UK (n = 7), Canada (n = 5), Belgium (n = 2), Finland (n = 2), Germany (n = 1), Israel (n = 1), Australia (n = 1), Italy (n = 1), Switzerland (n = 1) and Turkey (n = 1) (note two papers together described six samples, hence the total is 42). The Netherlands data all pertained to the same centre and research group. All seven of the UK samples came from the same Gender Identity Development Service (GIDS: Tavistock & Portman NHS Trust) in London, and three of the Canadian papers came from the same Transgender Youth Clinic in Toronto. Accordingly, not all 42 samples are necessarily mutually exclusive. Overlapping samples were not always acknowledged, and so where overlap may have occurred (based on location, setting, age and date variables) this has been noted and has been taken into account in any analysis. Fig 2 provides a graphical representation of overlap between samples and indicates which papers contributed data to which analyses. Based on the reported information, in total we estimate between 3000 and 4000 adolescents assessed at specialist centres for GD between 1980 and 2019 were included in the 38 papers.
Most studies were observational cohort studies, usually using retrospective record review (n = 26). A few studies included comparison to a normative sample or given population norms (n = 6). All but one paper was published within the past ten years (2011 or later) and all but seven in the past five years (2016 or later). Only five papers explicitly included data from before 2000 (a further six may have included pre-2000 data but did not report dates). All papers included both NM and NF participants, all studies reported the proportion of NM and NF participants in their sample, and most included age data (with age at assessment being the most widely reported) (see Table 2 and Fig 2).
Twenty-four samples were reported to have met clinical diagnostic criteria for GD / GID, usually using one of the DSM manuals (5/28 did not state which criteria were applied). The remaining 18 samples did not report whether participants met diagnostic criteria, but were included on the basis of being established patients within a specialist treatment centre, either in active assessment or treatment (n = 14) or were the result of secondary data mining where ICD 9/10 codes and appropriate keywords were used to establish likely GD (n = 4).
A substantial group of papers narrowly missed inclusion criteria, mostly on the age criterion and some on the verified GD criterion, and were not included in the final sample of reviewed papers. We documented characteristics of all studies excluded at the final full text screen in Table 5.

Overall findings based on included studies
1. What is the prevalence of GD in adolescence?. It is not possible to address this question from the existing literature. Whilst a number of surveys exist that would allow one to make estimates of prevalence, none were conducted using whole population samples. Further, we chose to focus this review only on papers where GD had been clinically verified as we were

PLOS GLOBAL PUBLIC HEALTH
Epidemiology of adolescent gender dysphoria: Systematic review interested in adolescents seeking and considered eligible for intervention. Given that Amsterdam and London centres are the only specialist centres in their respective countries, both of which have state-funded health systems, it would not be unreasonable to use their data as a likely indication of incidence / prevalence. The figures reported in de Graaf et al. [20] give the largest and most recent sample from these 2 locations (252 and 610 respectively), but these are sub-samples of the clinic population for whom data were available, and they do not comment on prevalence as a proportion of the population. It is possible to say there has been an increase in adolescents presenting for treatment in recent years. For example, Chen et al. [21] report the majority of their sample (73�6%) presented for treatment in the final two years of a 13 year period (2002-2015).

What are the proportions of natal males / females with GD in adolescence and has this changed over time?.
All included studies featured data on participants' natal sex, usually at the time of first being assessed by a specialist gender clinic. A simple pooling of proportions from all the papers indicates 36% were natal males and 64% natal females. Restricting our analysis only to those studies we could be certain had distinct samples (and aiming to select the largest / widest date range within, see Fig 2), the proportions remained similar at 37% natal male and 63% natal female.
One paper addressed the question of a recent shift in natal sex ratio directly. Chiniara et al. [22] conducted a within-sample analysis of their 2014-2016 referred participants in Toronto and found no change in that short time period. They also compared the natal sex ratio in their sample to those previously published and found a shift in more recent years (1:3 favouring natal females vs 0�8-0�9:1 in earlier studies).
Although only a few papers in our sample addressed the question directly, we used pooled data to explore whether there is evidence of a shift in recent years to more natal boys or girls seeking assessment / treatment. Papers were grouped into three categories according to the date range that samples were assessed: pre-2000; 2001-2010; 2011 onward. This was challenging as most studies were retrospective chart reviews covering wide date ranges from the late 1980s to beyond 2010. However, it is possible to say that the proportion of natal males is slightly lower (30%) in those papers featuring participants assessed only from 2011 onwards (ten papers). These data are summarised in Table 2. 3. What is the pattern of age at (a) onset (b) referral and (c) assessment?. Only one of the included papers focused specifically on age of onset: Matthews et al. [23] reported a mean age of onset of 6�80 years (SD 3�9) (range 1-15) among 168 referrals to the London GIDS. Six papers reported explicitly on age of referral (Costa et [20]). One paper (Becerra-Culqui et al., 2018 [28]) used medical records and took participants' age from 'the first evidence of transgender and/or gender nonconforming status' based on the presence of certain keywords in medical notes. Some papers (e.g., Chen et al., 2016 [21]) were not explicit about whether they were reporting age at referral or assessment, but usually it could be inferred that the reported age was at assessment.
Age was most usually reported at point of assessment or intervention (see Table 3). Not all papers reported full age data: a mean and standard deviation was usually given, but not always a range. The pooled mean age of assessment was 15�1 years (SD 1�0) and the range (from fewer papers) was 6�0-18�0 years.

Quality assessment
The CCAT quality ratings ranged from 45% to 96%, with a mean of 78%. Most papers achieved an overall rating of 4 (good) or 5 (very good), with strengths and weaknesses within certain discrete categories; most papers achieved good ratings in the 'preliminaries' and 'introduction' categories, whereas the 'ethics' and 'discussion' categories were most likely to include lower ratings: 17 and 16 papers respectively achieved ratings below 4. In total, only one paper was rated as 3 (moderate quality): Cohen-Kettenis & Van Goozen (2002) obtained low ratings across most categories, due to unclear sampling and diagnostic information, lack of information to permit replication, and conclusions which are not supported by the findings. Of the remainder, 16 were rated as high quality (4), and 21 as very high quality (5; see Table 4). There was no relationship between the year of publication and quality rating (r = 0�2).

Discussion
This systematic review synthesises the current evidence regarding the age and natal sex of adolescents presenting to specialist services and assessed as having gender dysphoria (GD). Based on 38 papers meeting inclusion criteria, there is evidence of an increase in frequency of presentation to services since 2011, and of a shift in the natal sex of referred cases: those assigned female at birth are now in the majority. Within these samples the average age of referral was 13 years, and the average age of assessment was 15 years. This review is the first of its kind to focus on adolescent samples where diagnostic criteria for GD were met, or significant GD features were clinically verified.
Although other good quality review papers have been published [12,17,[29][30][31], they have tended not to apply a systematic review methodology or have taken a broader scope in their inclusion criteria. We believe this is the first systematic review focused only on adolescents aged under 18 years and on clinically-verified samples taking into account likely study overlap.
Due to a lack of population-based research including cases of clinically-verified GD, this review was unable to report overall prevalence of GD (although it would not be unreasonable to use the Amsterdam and London data to make a good estimate). At present, the only means of estimating prevalence is to use population-based survey data, which carries risk of respondent bias (and such papers were excluded from our sample). Some studies used administrative records to ascertain samples of adolescents with GD. The reliability of this method is dependent on GD being accurately recorded, and on administrative data systems having universal  coverage. These criteria could not be met by the samples included in the present review and should remain a focus for development in future research. This review confirmed that the increase in referrals and the shift in sex-ratio that has been observed more widely in survey data and referred populations is also present in clinically-verified samples. We were able to report data from relatively few papers, however. Given the size of the literature, it would be useful if more studies clearly reported or clearly differentiated samples according to the stage of identification / referral / assessment participants had reached. It is clear that many of the samples reported in this review began as much larger samples with significant attrition before completing assessment and / or intervention. Not all papers fully reported either attrition figures or reasons for drop-out. Most papers were crosssectional descriptions of cohorts of clinic patients with no longer term follow-up or comparison with control or normative samples, which limits the external generalisability of the findings.
We were unable to report on age of onset of GD as this was rarely reported. Where it was reported, it was based on patient / parent recall and difficult to verify clinically (e.g., Matthews et al. [23] report a lower age of onset of 1 year). Age of referral also proved difficult to report on as most samples that could be classed as clinically-verified were further along the line in their clinical journey. Whilst some papers reported age at referral, it was not always clear whether 'referral' meant the age at which a young person first sought help and was referred to a specialist service, or the age at which they had their first contact with that service. Given that most treatment takes place at national specialist centres, waiting lists may mean that age of first specialist contact may be two or three years beyond age at first referral. This will vary according to the economic basis of national health systems.
Age at assessment data reported in this review suggests that young people may not be receiving specialist support when they need it. The Dutch model, developed by the Center of Expertise on GD in the Netherlands where GD interventions have been pioneered, considers age 12 years to be the lower threshold for puberty suppression treatment, and age 16 as the threshold for cross-sex hormone treatment [6]. Participants in our samples had an average age of 15 years at the time of assessment, and so are likely to have already undergone significant pubertal changes associated with their natal sex. It may be that young people are first presenting to services once pubertal changes have begun and GD has become established, but delays between first presentation to services and being seen in a specialist service may also be important. It is possible that critical windows of opportunity for intervention are being missed. Our third review paper will focus on the data regarding age at intervention and related outcomes in more detail. More transparent reporting of age data and consistent use of terminology would allow us to better understand the clinical landscape and could inform rational service development.

Strengths and limitations
This review has strength in the broad search strategy and thorough hand screening process applied. There are methodological limitations which need to be considered. The application of strict inclusion criteria in a rapidly growing field, with new findings emerging on an almost daily basis, means it is impossible to be completely up to date. For example, Zucker and Aitken (2019) [32] recently confirmed the shift in sex ratio of transgender adolescents in a large metaanalysis, but this was not included as it is, at present, a conference proceeding and not a peerreviewed paper. The broad initial search criteria led to the need for some narrowing of criteria following initial screening (but prior to full-text screening). The addition of parameters regarding type of publication, upper age of participants, and the clinical verification of GD naturally narrowed the pool of papers and therefore may have meant papers with important findings have been excluded (for example, if a paper included an upper age limit of 21 even though the majority were younger than 18). We recorded all papers that only narrowly missed inclusion on the age criterion (  [33] had an upper age range of 18.03, requiring its exclusion. These were otherwise good papers that we would like to have been able to include. However, it was important that we consistently apply our a priori criteria at every stage of screening, even when it meant that important papers only very narrowly missing inclusion had to be excluded. To apply flexibility only at the final screen presented a risk to the integrity of screening at earlier stages. We do not wish to give the impression that the importance of these papers has not been considered. There are several non-systematic reviews available that will include these high-profile papers. Our objective was to take in the totality of the literature and then examine the state of the evidence once strict criteria were applied. Due to the scale of the overall review, no formal hand searching was included, although we did check whether any relevant texts cited in our review of papers had been included in our sample, and they had (either excluded at an earlier stage or included in the final sample). We opted to use a quality assessment tool for studies of diverse designs (CCAT). This allowed all papers to be rated using the same system, but also involved reviewers having to make subjective ratings rather than apply a strictly quantifiable checklist. This may have led to issues with quality, such as over-statement of the significance of findings, not being sufficiently prominent. The quality of the literature is mixed, and we were unable to clearly answer all the research questions. Although this presents a limitation to this review, it also constitutes an important finding in and of itself.
Although we were able to include 38 papers from a range of countries in this review, almost half arose from two well-established treatment centres: those in Amsterdam and London. The Amsterdam team has led the way in developing assessment and treatment protocols for GD and provides a wealth of data over a long period (since 1996 within the included papers), and the London GIDS is a hub for the whole of the UK now dealing with hundreds of referrals per year. This presents the advantage of being able to observe the adolescent GD population over a long period of time, assessed using the same or similar tools, and within a relatively stable social context. It is not clear, however, what proportion of young people experiencing GD have access to these national specialist centres and how many may be accessing private facilities or self-medicating with hormones obtained via other routes: we do not know how representative these samples are. Another disadvantage is that most of the papers included in this review are likely to include data from the same samples of participants, also limiting generalisability. The overlap between samples was rarely overtly stated, and there is a risk that readers may add

PLOS GLOBAL PUBLIC HEALTH
Epidemiology of adolescent gender dysphoria: Systematic review         Higher prevalence of gender variance in ASD sample compared to non-referred samples (but similar to other clinical samples).
(Continued )   Triptorelin effective in suppressing puberty. Routine monitoring of gonadotropins, sex steroids, creatinine, and liver function may not be necessary.
(Continued ) Group no longer GD after sex reassignment surgery.
(Continued ) greater weight to collective findings than is warranted. The samples included here may also represent those most severely affected not only by GD but by poor mental wellbeing more generally. The second paper in this series will focus specifically on the evidence regarding psychological distress / psychiatric comorbidity in young people with GD.

Conclusion
GD is an area of growing prominence and therefore generates a growing literature. The observed increase in referrals, particularly in NF adolescents, warrants further investigation. Whilst improvements in availability of services and diagnostic practices are likely to have contributed to this, there has undoubtedly been a shift in cultural attitudes leading to gender nonconformity being more acceptable. The role of de-stigmatisation in the experiences of young people with GD and their decisions to seek support should be explored within the context of mental wellbeing more broadly. It is clear that this is a particularly vulnerable population often presenting as psychologically complex cases: without good epidemiological data we cannot begin to elucidate the lived reality of GD and ensure that intervention / support is equitable, appropriate and timely, and minimises harm. This review has been limited by heterogeneity in recording and reporting practices, and by limited representation beyond national, publiclyfunded clinical services. Clinical research centres should gather data prospectively on all referrals with full informed consent and document their assessment protocols, treatments and  outcomes. Whole population studies using administrative datasets reporting on GD / gender non-conformity may be necessary to gain a clear understanding of the epidemiology of clinical GD, along with inter-disciplinary research evaluating the lived experience of adolescents with GD.