Risk of Ovarian Cancer and Inherited Variants in Relapse-Associated Genes

Background We previously identified a panel of genes associated with outcome of ovarian cancer. The purpose of the current study was to assess whether variants in these genes correlated with ovarian cancer risk. Methods and Findings Women with and without invasive ovarian cancer (749 cases, 1,041 controls) were genotyped at 136 single nucleotide polymorphisms (SNPs) within 13 candidate genes. Risk was estimated for each SNP and for overall variation within each gene. At the gene-level, variation within MSL1 (male-specific lethal-1 homolog) was associated with risk of serous cancer (p = 0.03); haplotypes within PRPF31 (PRP31 pre-mRNA processing factor 31 homolog) were associated with risk of invasive disease (p = 0.03). MSL1 rs7211770 was associated with decreased risk of serous disease (OR 0.81, 95% CI 0.66–0.98; p = 0.03). SNPs in MFSD7, BTN3A3, ZNF200, PTPRS, and CCND1A were inversely associated with risk (p<0.05), and there was increased risk at HEXIM1 rs1053578 (p = 0.04, OR 1.40, 95% CI 1.02–1.91). Conclusions Tumor studies can reveal novel genes worthy of follow-up for cancer susceptibility. Here, we found that inherited markers in the gene encoding MSL1, part of a complex that modifies the histone H4, may decrease risk of invasive serous ovarian cancer.


Introduction
Worldwide, there are approximately 125,000 deaths each year due to ovarian cancer [1]; increased understanding of factors related to its outcome and etiology should reduce the burden of this disease. We previously reported results of tumor mRNA expression studies which suggested that altered expression of a particular set of genes predicted response to chemotherapy among women with advanced-stage high-grade epithelial ovarian cancer [2]. These genes included SF3A3, MFSD7 (formerly known as FLJ22269), ID4, BTN3A3, OSGIN2 (formerly known as C8orf1), FARP1, PRKCH, C15orf15, ZNF200, MSL1 (formerly known as LOC339287), HEXIM1 (formerly known as HIS1), PTPRS, CC2D1A (formerly known as FLJ20241), and PRPF31. Expression levels differed among tumors from women with differing outcomes; namely, in combination, expression of these genes predicted early relapse (,21 months) after optimal surgery and platinum-paclitaxel chemotherapy with an accuracy of 86% and positive predictive value of 95% [3,2].
The etiology of ovarian cancer is known to be complex and, at least in part, includes inherited susceptibility factors. Mutations in BRCA1, BRCA2, MLH1, and MSH2 account for approximately 50% of familial ovarian cancer [4,5], and remaining cases with a family history are likely due to combinations of multiple alleles conferring low to moderate penetrant susceptibility [6,7] such as variants in BNC2 [8] and, possibly, TP53 [9], CDKN2A [10], CDKN1B [10], and AURKA [11]. As a complement to genome-wide searches, a useful approach for the identification of additional low-risk alleles is the study of highly-informative inherited variants in candidate genes identified from tumor expression studies. To assess whether variation in genes with differing expression levels by outcome influenced risk of ovarian cancer, we conducted a casecontrol analysis of inherited variants in genes in the predictive model mentioned above [2] as well as in HTRA1 (encoding the serine protease HtrA1) which we have shown is down-regulated in a majority of ovarian tumors [12,13] and has a key role in apoptosis [13]. As the first examination of germline variation in this novel set of genes (Table 1), we aimed to more broadly elucidate their role in epithelial ovarian carcinogenic processes.

Results
Demographic, reproductive, lifestyle, and tumor characteristics of 749 epithelial invasive ovarian cancer patients and 1,041 controls are described in Table 2. Generally, the expected distributions of risk factors were observed; a greater proportion of patients than controls had never used oral contraceptives (p,0.001), had used hormone therapy (p,0.001), were nulliparous (p = 0.01), and had a first or second degree family history of ovarian cancer (p,0.001). Such factors associated with risk of ovarian cancer were included as covariates in all genetic analyses.
Gene-level and selected SNP-level association-testing results are shown in Table 3 and Table 4, respectively. Associations for the full set of SNPs examined are displayed in Table S1. Of genes examined in relation to risk of invasive ovarian cancer and risk of invasive serous ovarian cancer, only global variation in the MSL1 and PRPF31 genes was associated at p,0.05. MSL1 gene-level principal components (summarized combinations of genotypes based on two SNPs) were associated with risk of invasive serous disease (p = 0.03). At one of the two SNPs in this gene, rs7211440 (r 2 = 0.53 with rs17678694; Figure S1), carriage of the minor allele was associated with reduced risk of both invasive and invasive serous disease (OR 0.85, 95% CI 0.72-1.00, p = 0.05; OR 0.81, 95% CI 0.66-0.98, p = 0.03, respectively, Table 4), suggesting this SNP as the primary driver of MSL1's gene-level serous association.
Outside of the two gene-level associated genes (MSL1 and PRPF31), a small number of SNPs in six other genes were associated at the single-SNP level (p,0.05, Table 4) including three SNPs associated with reduced risk of both invasive and invasive serous disease (in CC2DIA, MFSD7, and ZNF200). The strongest SNP-level association was observed at the nonsynonymous rs2305777 (T801M) in the NF-kB activating gene CC2D1A (invasive OR 0.84, 95% 0.72-0.99, p = 0.03; serous invasive OR 0.76, 95% CI 0.62-0.99 p = 0.005). Based on sequence conservation across species [14], this SNP is predicted to be relatively undamaging to protein function. Because this SNP was not correlated with other genotyped SNPs (r 2 ,0.12, Figure S1) and did not tag other common HapMap SNPs at r 2 .0.9, sequencing may be required to clarify the meaning of this SNP association. Similarly, MFSD7 rs6840253 was associated with a reduction of ovarian cancer risk (invasive OR 0.81, 95% CI 0.66-1.0, p = 0.05; serous invasive OR 0.76, 95% CI 0.58-0.98, p = 0.03), as was ZNF200 rs186493 (invasive OR 0.83, 95% 0.71-0.82, p = 0.02; invasive serous OR 0.82, 95% CI 0.68-0.98, p = 0.03). Both are promoter region SNPs (within 5 kb 59 upstream) and independent of other HapMap SNPs at r 2 .0.9. Because each is correlated at r 2 .0.6 with other genotyped SNPs, additional genotyping of modestly correlated SNPs could help elucidate these associations. Three SNPs were associated only with reduced risk of invasive serous disease (in PTPRS and BTN3A3), and one SNP associated with increased risk of invasive serous disease (in HEXIM1). Thus, while results were generally similar in both case groups, the often greater statistical significance of results among women with serous disease, despite reduced power due to a 40% smaller sample size, suggests that subtype analysis revealed heterogeneity by histology.

Discussion
Knowledge about the genetics of ovarian cancer is in a rapid state of expansion. As the most lethal gynecologic cancer, discovery of inherited factors related to etiology and outcome may assist in the development of important targeted prediction and therapeutic strategies. Recent work has clarified the roles of long-standing candidate SNPs in the progesterone receptor, retinoblastoma, p53, and cell cycle genes [10,15,11,9] and enabled genome-wide association studies [8]. Analysis of highly-informative variants in selective sets of novel genes complements these candidate SNP and genome-wide association studies, providing improved coverage in high-priority regions based on known tumor biology [16]. Here, we selected novel candidate genes based on prior evidence of their association with time to relapse of ovarian cancer, and we chose comprehensive sets of variants [2,13]. Our primary result is that variation in MSL1 was related to risk of serous invasive ovarian cancer; notably, minor alleles at rs7211440 correlated with decreased risk (OR 0.81, 95% CI 0.66-0.98, p = 0.03). Previously, increased expression of MSL1 was correlated with earlier time to relapse [2]; additional validation of the prognostic model and replication of the etiologic association are warranted.
MSL1 encodes one of five proteins that form the highlyconserved MSL complex with enzymatic capabilities as a histone acetyltransferase (HAT) [17]. HATs modify a variety of histone domains through acetylation, which, along with other coactivators, regulates histone and chromatin activation and influences gene expression [18]. The MSL complex specifically acetylates lysine residue 16 on histone H4 (H4-Lys16), which plays a crucial role in regulating chromatin folding and silencing of gene expression [19,20,17]. Knock-out models indicate that absence of the MSL complex leads to malfunctions during the S phase of the cell cycle, leading to errors in DNA replication [17]. Additionally, loss of monoacetylation of H4-Lys16 and aberrant functioning on H4 are hallmarks of cancer cells. Our data suggest that inherited variation in MSL1 may impact risk of invasive serous ovarian cancer and are consistent with findings that irregular H4 modifications may cause errors in chromatin folding and gene expression and are widespread in cancer phenotypes [21]. With suggestive SNPs in genes for p53 and CDKN2A [10,9], which also regulate histone modification, evidence for a role of inherited risk factors related to histones is accumulating.
Strengths of this work include the use of two case-control study populations (from Mayo Clinic and Duke University), advanced SNP selection techniques (e.g., high level of required correlation among alleles, inclusion of putative-functional SNPs, and selection of multiple tagSNPs in large LD bins), and excellent genotyping quality. Our assessment of risk for serous invasive disease suggested a degree of genetic heterogeneity by histologic subtype; however, we suggest caution in interpretation of results (particularly single-SNP results in the absence of gene-level significance) due to the relatively large number of tests performed. We also note that no results are statistically significant after adjustment for multiple testing using a conservative Bonferroni correction. Thus, replication of our results is warranted to confirm these associations. Avenues for future research extending from this expressionbased candidate gene work include analysis of other histone regulatory genes and detailed assessment of functional mechanisms. More imminently, examination of promising SNPs within MSL1 among a larger set of serous invasive patients and controls and additional fine-scale mapping [16] will assist in clarification of the importance of these genes to ovarian cancer susceptibility. This should, in turn, inform translation of such findings to the clinical management of women at increased risk for ovarian cancer.

Study Participants
Subjects participated in two ongoing case-control studies of epithelial ovarian cancer initiated in January 2000 at Mayo Clinic (Rochester, MN) and in May 1999 at Duke University (Durham, NC). Details of the study design have been described in more detail elsewhere [22][23][24]. Briefly, a total of 749 women with histologically-confirmed invasive epithelial ovarian cancer and 1,041 controls without ovarian cancer and without bilateral oophorectomy were recruited from the two study sites (Table 2; site-specific characteristics provided in Table S3). At Mayo Clinic, ovarian cancer cases (patients) were over 20 years of age with histologically confirmed incident epithelial ovarian cancer and enrolled in the study within one year after diagnosis. All cases seen in the gynecologic or medical oncology units which lived in the sixstate region that defines the primary service population of Mayo Clinic (Minnesota, Iowa, Wisconsin, Illinois, North Dakota, and South Dakota) were invited to participate. Controls were recruited from among women seen for general medical examinations and frequency-matched to patients on age and region of residence (i.e., state, county). At Duke University, patients were women with histologically confirmed primary epithelial ovarian cancer, between 20 and 74 years of age, and identified within a 48-county region using the North Carolina Central Cancer Registry.
Controls were identified using list-assisted random digit dialing and frequency matched to patients on race, age, and county of residence. No exclusions based on ethnicity were made. Applicants provided written informed consent, and protocols were approved by the Mayo Clinic and Duke University Institutional Review Board.

Data and Biospecimen Collection
Information on potential risk factors was collected through inperson interviews at both sites using similar questionnaires. DNA was extracted from 10 to 15 mL fresh venous blood using the Gentra AutoPure LS Purgene salting out methodology (Gentra, Minneapolis, MN). DNA from Duke University participants were transferred to Mayo Clinic for whole-genome amplification (WGA) with the REPLI-G protocol (Qiagen Inc, Valencia CA) which we have shown yielded highly-reproducible results [25]. DNA concentrations were adjusted to 50 ng/ml prior to genotyping and verified using the PicoGreen dsDNA Quantitation kit (Molecular Probes, Inc., Eugene OR). Samples were bar-coded to ensure accurate and reliable processing.

SNP Selection
We identified tagSNPs within five kb of each candidate gene using the algorithm of ldSelect [26] to bin pair-wise correlated SNPs at r 2 $0.90 with minor allele frequency (MAF) $0.05 in the  HapMap CEU population (Utah residents with ancestry from northern and western Europe) [27]. HapMap data were used because, in November 2007, they were more informative for these genes than data from Perlegen Sciences [28], Seattle SNPs (http://pga.mbt.washington.edu), and NIEHS SNPs (http:// www.egp.gs.washingon.edu). One tagSNP per bin was selected if less than ten SNPs were in a LD bin, and two tagSNPs per bin were selected in LD bins with ten or more SNPs. Among tagSNPs, SNPs were chosen to maximize Illumina SNP score (a measure of predicted genotyping success) and then MAF. FARP1 (307 kb) and PRKCH (229 kb) required over 100 tagSNPs each and were excluded from the study for cost-efficiency. For the remaining genes (Table 1), 117 tagSNPs and 19 putativefunctional SNPs (within 10 kb upstream, 59 UTR, 39 UTR, or non-synonymous from Ensembl version 34 with European-American MAF$0.05 and Illumina SNP score$0.6) were selected. Thus, a total of 136 SNPs in these 13 candidate genes were genotyped.

Genotyping
As part of a larger study, genotyping of 2,176 DNA samples (897 genomic, 1,279 WGA, and 129 duplicates) from 2,047 unique study participants was performed at Mayo Clinic along with 65 laboratory controls. We used the Illumina GoldenGate BeadArray assay and BeadStudio software for automated clustering and calling according to a standard protocol [29]. Of 2,047 participants genotyped, 44 samples (2.1%) failed (call rate ,90%), and 213 participants (10.4%) were found to be ineligible or have borderline disease and were excluded; thus 1,790 participants (including 749 patients with invasive disease and 1,041 controls) were analyzed here. A total of 1,152 SNPs for a variety of projects were attempted; 25 failed SNPs included 15 (1.3%) with call rate ,90%, nine (0.8%) with poor clustering, and one (,0.1%) with unresolved replicate or Mendelian errors in genomic DNA. We assessed departures from Hardy Weinberg equilibrium (HWE) in self-reported white, non-Hispanic controls with a Pearson goodness-of-fit test or, in the case of SNPs with a MAF ,5%, a Fisher exact test, and we excluded SNPs with MAF ,0.01 (N = 64, 5.6%) or HWE p-value,0.0001 (N = 11, 1.0%), leaving 1,052 SNPs for analysis. For WGA DNA, an additional 20 SNPs (1.7%) were excluded due to one or more of the above criteria. Among the 13 candidate genes studied, 134 of 136 SNPs were successfully genotyped in genomic samples, and all but six of these were also genotyped successfully in WGA samples (Table S4).

Statistical Methods
Distributions of demographic and clinical variables were compared between patients and controls using chi-square tests or t-tests, and estimates of pair-wise LD between SNPs were obtained using Haploview software, version 4.1 [30]. Genetic association analyses (described below) were adjusted for study site, age, body mass index, hormone therapy, oral contraceptive use, number of live births, age at first live birth, and population structure principal components which accounted for the possibility of population stratification using an approach similar to that described previously [31]. Population structure principal components were created using 2,517 SNPs from this and prior genotyping panels [23]; scatter plot matrices by self-reported race indicated that the first four population structure principal components reasonably approximated racial differences across individuals and were thus included as covariates in all models ( Figure S2).
Associations with ovarian cancer risk were assessed using logistic regression of SNPs, gene-level principal components, and gene-level haplotypes. For SNPs, odds ratios (OR) and 95% confidence intervals (CI) were estimated separately for heterozygous and homozygous minor allele genotypes, using the homozygous major allele genotype as the referent group. We included eight SNPs with HWE,0.05 (Table S4) because of acceptable genotype cluster plots, the large number of tests (Bonferroni corrected p-value#3.7610 24 ), and no assumption of HWE for single-SNP analysis. Formal genotypic tests of association were carried out assuming an ordinal (log-additive) effect using simple tests for trend. Within each gene, we used a principal component analysis to create orthogonal linear combinations of the SNP minor allele count variables (including genotypes imputed using the MACH software package [32]) to provide an alternate and equivalent representation of the collection of SNPs as a whole. The resulting smallest subset of gene-level principal components that accounted for at least 90% of the SNP variability was included in regression models, and gene-specific associations were evaluated using a multiple degree of freedom likelihood ratio test. Gene-centric haplotype-based association analyses were conducted using posterior probabilities of all possible haplotypes for an individual (excluding SNPs with HWE p-value,0.05), conditional on the observed genotypes. The expectation-maximization algorithm was used to estimate haplotypes [33] and create haplotype design variables ranging from 0 to 2. Because of the imprecision involved in low-frequency haplotypes, we excluded haplotypes with an estimated frequency of less than ten. Assessments of risk among common haplotypes tested the simultaneous effects of all haplotypes combined in logistic regression; individual haplotype associations used the most common haplotype as reference. All statistical tests were two-sided, and unless otherwise indicated, were carried out using SAS software (SAS Institute, Inc., Cary, NC).