Low-frequency variation near common germline susceptibility loci are associated with risk of Ewing sarcoma

Background Ewing sarcoma (EwS) is a rare, aggressive solid tumor of childhood, adolescence and young adulthood associated with pathognomonic EWSR1-ETS fusion oncoproteins altering transcriptional regulation. Genome-wide association studies (GWAS) have identified 6 common germline susceptibility loci but have not investigated low-frequency inherited variants with minor allele frequencies below 5% due to limited genotyped cases of this rare tumor. Methods We investigated the contribution of rare and low-frequency variation to EwS susceptibility in the largest EwS genome-wide association study to date (733 EwS cases and 1,346 unaffected controls of European ancestry). Results We identified two low-frequency variants, rs112837127 and rs2296730, on chromosome 20 that were associated with EwS risk (OR = 0.186 and 2.038, respectively; P-value < 5×10−8) and located near previously reported common susceptibility loci. After adjusting for the most associated common variant at the locus, only rs112837127 remained a statistically significant independent signal (OR = 0.200, P-value = 5.84×10−8). Conclusions These findings suggest rare variation residing on common haplotypes are important contributors to EwS risk. Impact Motivate future targeted sequencing studies for a comprehensive evaluation of low-frequency and rare variation around common EwS susceptibility loci.

Background Ewing sarcoma (EwS) is a rare bone or soft tissue tumor predominantly occurring in the second decade of life [1]. The specific cells of origin leading to EwS tumors are unknown, with current evidence indicating EwS likely arises from mesoderm-or neural crest-derived mesenchymal stem cells [2,3]. The overall age-adjusted incidence of EwS is 0.128 per 100,000 population with individuals of European ancestry at a 9-fold risk relative to African Americans and Asian/Pacific Islanders (0.155 in White, 0.017 in Asians/Pacific islanders, and 0.017 in African Americans) [4]. The reported disparity in EwS incidence by ancestry suggests the importance of germline susceptibility to EwS risk. A defining feature of EwS tumors is the somatically acquired translocation between EWSR1 (22q12) and a member of the ETS transcription factor family, most commonly FLI1 (11q24) (85% of cases) [5][6][7]. The resulting fusion oncoprotein produces aberrant and strong transcriptional regulators that bind to GGAA microsatellites and ETS-like motifs, which are thereby converted into potent enhancers, to promote cellular transformation by deregulating key target genes in cell cycle control, migration and apoptosis pathways [7][8][9][10][11][12]. Aside from recurrent EWSR1-ETS fusions, most EwS tumors display remarkably low somatic mutation rates [1,[13][14][15][16].

PLOS ONE
The presence of EwS EWSR1-ETS fusions provides a molecularly distinct phenotype for genomic characterization, despite small case sample sizes. Previous genome-wide association studies (GWAS) have identified 6 common genetic susceptibility loci associated with EwS risk (1p36.22, 6p25.1, 10q21, 15q15, 20p11.22 and 20p11.23) [17]. The number of identified susceptibility loci are notable given small samples, suggesting a homogenous phenotype as defined by the fusion oncoprotein may aid in identifying germline associations. Effect estimates for variants at these loci exhibit elevated odds ratios (OR > 1.7), which is high for cancer GWAS and striking in light of the rarity of EwS in familial cancer predisposition syndromes [18]. Most EwS susceptibility loci reside near GGAA microsatellites and may disrupt local binding of EWS-R1-ETS fusion oncoproteins to these microsatellites, suggesting germline-somatic interactions could be important for EwS susceptibility. As a proof-of-concept such germline-somatic interaction has been demonstrated for the chr10 EwS susceptibility gene EGR2 [11].
Despite recent efforts to characterize the genetic architecture of EwS, thus far, no study has investigated the contribution of low-frequency variants (minor allele frequencies (MAF) < 0.05) to EwS risk. The high locus-to-case discovery ratio of previous EwS GWAS and large effect sizes of common EwS susceptibility loci led our group to revisit whether current series of EwS cases would be sufficient to detect associations between rare or low-frequency variants and EwS risk. We systematically scanned across the genome for well-imputed, low-frequency variants associated with EwS susceptibility in the largest collection of genotyped EwS cases to date (733 EwS cases and 1,346 controls) [17].

Study populations
The study population for the current association analysis has been described previously [17]. In brief, EwS cases were obtained from five sources: a study published by Postel-Vinay et al. [19], the Institut Curie, the Childhood Cancer Survivor Study (CCSS), the Center for Cancer Research (CCR) at the National Cancer Institute (NCI), and the NCI Bone Disease and Injury Study [20]. Ancestry of these EwS cases was estimated using SNPWEIGHTS based on SNPs found to be suitable for inferring population structure [21]. EwS cases with less than 80% European ancestry were excluded resulting in a combined set of 733 EWS cases. A total of 1,346 principal-component-matched, cancer-free controls were selected from the NCI Prostate Lung Colorectal and Ovarian Cancer Screening trial [22], American Cancer Society Cancer Prevention Study II [23], and the Spanish Bladder Cancer Study [24] for the final analysis and included with controls previously used by Postel-Vinay et al [19]. Each study participant provided informed consent, and approval to conduct this research was granted by the Institution Review Board of Institut Curie, National Cancer Institute, as well as 26 participating institutions for CCSS.

Genotyping and quality control
For the Postel-Vinay study, DNA from tumor tissue, blood, and bone marrow was isolated using proteinase K lysis followed by phenol chloroform extraction. Genomic DNA was All genotyping was performed according to standard manufacturer protocols. In brief, WGA was performed on 400 ng DNA, and the amplified DNA was fragmented, precipitated, resuspended, and hybridized to the designated arrays. Single-base extension of probes using captured DNA as template was subsequently carried out with fluorophore-conjugated nucleotides. Arrays were then scanned by iScan (Illumina) and SNPs called by GenomeStudio (Illumina). Our downstream quality control included filtering out samples with abnormal heterozygosity rate, sex discordance, <95% completion rates, and unexpected relatedness (IBD > 10%).

PLOS ONE
Genotype imputation was performed in three sets: (1) the Postel-Vinay study, (2) the CCSS EwS cases and matched controls, and (3) all remaining NCI and Institut Curie samples. All samples were pre-phased using SHAPEIT [25] and imputed using IMPUTE2 [26]. The 1,000 Genomes Phase 3 was used as the reference [27] resulting in 16,367,531 SNPs. Among these SNPs, 10,216,839 were low-frequency variants with MAF < 0.05.

PCR validation of genotypes
Imputed genotypes for the three EwS-associated low-frequency or rare variants (rs78119607, rs112837127, rs2296730) were validated by allele-specific TaqMan assay (Thermo Fisher Scientific) at CGR following standard manufacturer protocols. The 325 samples used for validation were selected based on imputed genotype, study, and amount of available DNA.

Statistical analysis
For each variant, we report an estimate of the odds ratio (OR), 95% confidence interval (CI), and P-value (p MH ) using a Mantel-Haenszel Test where subjects are stratified by study (e.g. CCSS, Postel-Vinay, etc.), and, when stated, the genotype at linked neighboring variant(s). Because we focused on less common variants, we used a dominant model (i.e., genotype defined as presence versus absence of rare variant) and an exact, conditional test (mantelhaen. test(exact = T)) [28,29]. We used p MH < 5 × 10 −8 to define initial GWAS significance and p MH < 0.05/1684 = 1.09×10 −5 for conditional tests, where 1,684 is the number of SNPs with MAF < 0.05 and R 2 > 0.004 with one of 6 previously identified SNPs. Potential interaction between low frequency SNPs and common SNPs were examined by logistic regression models with case-control status as outcome, low frequency and common SNPs as well as an interaction term between them as predictors. All statistical tests were two-sided and performed in R v.3.6.2 [28]. We did not investigate associations with significant variants and clinical data as limited clinical data were available for the participating EwS cases.

Results
Our analysis identified evidence for associations of three putative low frequency (MAF < 0.05) imputed variants associated with EwS risk, which we advanced to validation studies described below. The variants were located at 1q23.3, 20p11.23, and 20p11.22 (Table 1, Fig 1 and S1 Fig) and tagged by rs78119607, rs112837127, and rs2296730, respectively. The MAF among controls of European ancestry ranged from 0.001 for rs78119607 to 0.046 for rs2296730 with minor allele effect sizes ranging from 0.18 to 16.64 ( Table 2). The odds ratio for the minor A allele of rs112837127 suggested a potentially protective effect (OR = 0.18) indicating that in some instances low-frequency variation could reduce susceptibility to EwS.
To validate the imputed genotypes of the three associated low-frequency and rare variants, we first examined the imputation quality score (S1 Table) and distribution of alleles (S2 Table) across three studies populations, and we did not observe significant heterogeneity among the study populations. To further confirm the findings, an allele-specific TaqMan assay was designed for the three variants and carried out in a subset of 325 samples from the EwS GWAS with available remaining DNA. As shown in S2 Fig, we were able to replicate the imputed genotypes for rs112837127 and rs2296730 with 98.46% and 100% concordance rate. The imputed genotype for rs78119607 did not replicate as no minor alleles were called by the Taq-Man assay, suggesting poor imputation of this variant using the 1000 Genomes Project reference set despite imputation scores of over 0.43 (S1 Table).
The two validated low frequency variants, rs112837127 and rs2296730, associated with EwS on chromosome 20 are in proximity to two previously identified EwS common susceptibility variants, rs6106336 and rs6047482. The identified low-frequency variants were tested for linkage disequilibrium (LD) with the common variants in 1000 Genomes Project European populations using the LDmatrix tool in LDlink (Fig 2) [30,31]. rs112837127 did not display evidence for LD with either the nearby common variant (R 2 EUR rs6106336 = 0.005, R 2 EUR rs6047482 = 0.023) or the other low-frequency variant (R 2 EUR rs2296730 = 0.003). However, rs2296730 displayed evidence for moderate levels of LD with the common rs6106336 variant (R 2 EUR = 0.311), but not the common rs6047482 variant (R 2 EUR = 0.006). Estimates of D 0 , a measure of allelic transmission, suggest the two associated low-frequency variants (rs112837127 and rs2296730) are transmitted on haplotypes of the common rs6106336 variant (S3 Fig), with the minor A allele of rs112837127 being transmitted with the major T allele of rs6106336 (D 0 EUR = 1.0) and the minor G allele of rs2296730 being transmitted with the minor G allele of the rs6106336 (D 0 EUR = 0.772). To further test if the two low-frequency variants tagged independent EwS association signals, odds ratios and P-values for the association with EwS were calculated with and without conditioning on the neighboring common variants. Conditional analyses indicated that rs112837127 was statistically associated with EwS (OR = 0.20, 95%CI = 0.09-0.40, Pvalue = 5.84×10 −8 ; Table 2) independent from neighboring common variants. As in the R 2 analyses, the low-frequency rs22966730 variant demonstrated evidence for a correlation with the common rs6106336 variant as observed in the attenuated odds ratio estimate and increase in p-value in the conditional analysis (OR = 1.61, 95%CI = 1.16-2.24, P-value = 3.50×10 −3 ; Table 2). Finally, we examined potential interaction between rs2296730 and rs6106336 (p = 0.568), rs2296730 and rs6047482 (p = 0.319), as well as rs6106336 and rs112837127 (p = 0.538) and found no significant evidence for SNP-SNP interactions.

Discussion
We report an analysis of well-imputed low-frequency variants based on common genotyped variants in a large EwS case series to investigate the contribution of low-frequency variants to the underlying genetic architecture of EwS susceptibility. We found evidence for associations of two low-frequency variants (rs112837127 and rs22966730) with EwS risk, and one of the variants, rs112837127, demonstrated an association independent of a nearby common germline susceptibility variant. Our findings suggest that in addition to common germline susceptibility variants, low-frequency variants are important for genetic susceptibility to EwS. Germline variants associated with lower cancer risk are less commonly reported, but not unheard of. Previously, three SNPs located near base excision repair genes were found to be negatively associated with Wilms tumor risk [32]. SNPs in the vitamin D receptor gene have also been linked to decreased risk in prostate cancer in African American men [33] and rs1866074 near the thymine DNA glycosylase gene were reported to be correlated with lower colorectal cancer risk [34]. The minor allele of rs112837127 is most prevalent in British and Finnish populations where the allele frequency could be > 5% while no African or east Asian population carries this allele [35]. This SNP is located in a long terminal repeat region 2.7 Kb upstream of a non-coding RNA, LINC00237, which has been found to drive self-renewal of tumor initiating cells by binding and promoting stability of β-catenin [36]. Interestingly, the activation of Wnt/β-catenin pathway has been shown to antagonize transcription activities of EWS/ETS fusion gene in Ewing sarcoma cells [37].
Whether the minor allele of rs112837127 tags a haplotype with modified LINC00237 expression remains to be investigated. As EwS is a rare sarcoma of young people, it is not unexpected that low-frequency variation contributes to EwS susceptibility. Although EwS may be an exceptional case of a rare, welldefined malignancy with high associated odds ratios, our study suggests that efforts to examine low-frequency and rare germline associations in existing samples of rare cancer sets could be fruitful, even despite limited sample sizes. Additionally, our study provides an example in which common germline susceptibility loci discovered by GWAS may harbor synthetic associations with rare and low-frequency variants [28]. These synthetic associations may be of particular importance for EwS susceptibility as it is plausible common, low-frequency and rare variation at GGAA microsatellites may interact to impact binding of EWSR1-FLI1 fusion oncoproteins and alter regulation of downstream genes in core EwS regulatory pathways. In the case of EwS, common variant associations may highlight important EwS germline susceptibility regions where low-frequency and rare variation have important roles altering EwS risk.  A limitation of our study is the lack of validation in an independent cohort as well as a lack of regional EwS sequencing of the relevant region to identify potential causal variants which can be functionally examined through in vitro experiments. Another limitation is the absence of clinical and demographic data which limited our ability to describe possible associations with the variants identified. As EwS is a rare tumor, few large case series exist for genomic investigation. Larger study populations will be essential for further confirmation of this new association. As future germline association studies investigate the genetic architecture of EwS, improved efforts to systematically interrogate low-frequency variant associations through a variety of sequencing and statistical methods are essential for accelerating understanding of the underlying genetic architecture of EwS susceptibility.