Associations between Gene Expression Variations and Ovarian Cancer Risk Alleles Identified from Genome Wide Association Studies

Functional genetic variations play important roles in shaping phenotypic differences among individuals through affecting gene expression, and thus, very likely to influence disease susceptibility, such as cancer susceptibility. One critical question in this era of post-genome wide association studies (GWAS) is how to assess the functional significance of the genetic variations identified from GWAS. In the current study, with lymphoblastoid cell lines (LCLs) from 74 non-related women with familial ovarian cancer and 47 unrelated controls matched on gender and race, we explored the associations between seven ovarian cancer risk variants identified from GWAS (rs3814113 on 9p22.2, rs2072590 on 2q31, rs2665390 on 3q25, rs10088218, rs1516982, rs10098821 on 8q24.21, and rs2363956 on 19p13) and whole genome mRNA expression profiles. We observed 95 significant trans-associations at a permutation level of 0.001. Compared to the other risk variants, rs10088218, rs1516982, and rs10098821 on 8q24.21 had the greatest number of significant associations (25, 16, and 38, respectively). Two possible cis-associations were observed between rs10098821 and c-Myc, and rs2072590 and HS.565379 (Permutated P = 0.0198 and 0.0399, respectively). Pathway enrichment analysis showed that several key biological pathways, such as cell cycle (P = 2.59×10−06), etc, were significantly overrepresented. Further characterization of significant associations between mRNAs and risk alleles might facilitate understanding the functions of GWAS discovered risk alleles in the genetic etiology of ovarian cancer.


Introduction
Recently, genome wide association studies (GWAS) have successfully identified a number of genetic variations which confer risk to human cancer [1][2][3]. However, most of the risk variants identified from GWAS reside in intergenic, intronic, and other non-coding regions of the genome [4]. Therefore, the observed associations have yet to be translated into a full understanding of the genes and genetic elements mediating disease susceptibility. How to study the functional significance of these GWAS hits poses a big challenge in this post-GWAS era. One of the options might be the investigation of the genetics of gene expression. Several landmark studies have unequivocally shown that many transcripts in the human genome are influenced by inherited variation [5][6][7][8][9]. Functional genetic variation, which leads to gene expression changes, may play a critical role in determining phenotypic differences among individuals, and thus, is very likely to influence disease susceptibility. As such, studying the associations between genetic variation and gene expression could potentially help prioritize fine-mapping efforts and provide a shortcut to disease biology.
Epithelial carcinoma of the ovary is one of the most common gynecologic malignancies in women [10]. Family history is the strongest risk factor for ovarian cancer. Compared to a 1.6% lifetime risk of developing ovarian cancer in the general population, women with one first-degree relative with ovarian cancer have a 5% risk. Familial clustering with an autosomal dominant pattern of inheritance (hereditary ovarian cancer) results from germ-line mutations in putative tumor suppressor genes (TSGs), such as the BRCA1/2 and MLH1/MSH2 genes [11][12][13][14]. However, known mutations in BRCA1/2 and mismatch repair (MMR) genes can only explain a small part of the familial aggregation of ovarian cancer (5-13%). This suggests that other genetic events may contribute to familial ovarian cancers. Several GWAS have been done in ovarian cancer and several risk variants have been identified, including rs3814113 on 9p22, rs2072590 on 2q31, rs2665390 on 3q25, rs10088218, rs1516982, rs10098821 on 8q24, and rs2363956 on 19p13 [1][2][3]. However, the functional significance of these risk variants is largely unknown. Thus, studying the associations between gene expression and ovarian cancer risk alleles identified from GWAS might help connect risk variants to their putative target genes/transcripts and biological pathways.
To study the associations between gene expression and ovarian cancer risk alleles, we obtained the whole genome mRNA expression profiles in 121 non-redundant lymphoblastoid cell lines (LCLs) derived from 74 non-related familial ovarian cancer patients who are non-carriers of known BRCA1/2 and MMR gene mutations, as well as 47 non-cancer unrelated family controls. We genotyped seven ovarian cancer risk variants discovered from GWAS in these 121 cell lines and studied their associations with gene expression variations. To our knowledge, this is the first genome-wide study to evaluate the associations between mRNA expression variations in LCLs of familial ovarian cancer cases and GWAS discovered ovarian cancer risk alleles [1][2][3].

Results
Lymphoblastoid cell lines were derived from the blood samples of 74 non-related women with familial ovarian cancer and 47 unrelated cancer-free controls recruited for the GRFOCR (see Methods). Gene expression profiles were generated using the Illumina human HT-12 v3 Expression BeadChips. We filtered the processed data to include genes with expression above the background in at least 25% of the samples (n = 121). A total of 10,435 mRNA genes were retained for further analysis.
For each sample, the seven variants identified from three ovarian cancer GWAS were genotyped using the StepOnePlus TM Real Time PCR system and Assays-on-Demand SNP Genotyping products (see Methods). We assessed the potential implications of these GWAS-discovered variants in ovarian cancer, by performing association analysis to analyze the correlations between mRNA expression variations and variant genotypes. Significant associations were identified by evaluating the relationships between variations of mRNA expression levels (with age and case-control status adjusted) and variant genotypes through 10,000 permutations. The number of significant associations at permutation level threshold of 0.05, 0.01 and 0.001 was summarized in Table 1. The list of selected top-ranked significant associations (permutated P # 0.001 and r 2 $0.095) is shown in Table 2. One of the most significant associations is observed between rs10098821 and IER3 gene (permutated P,0.0001). IER3, is a stress-inducible immediate early response gene, whose functions include cell proliferation and apoptosis regulation. It has been found that this gene is proapoptotic in the development of ovarian cancer [15]. rs10098821 explains about 13% of the variation in IER3's expression level as measured by adjusted r 2 .
Interestingly, the three variants from the 8q24 locus, namely rs10098821, rs10088218 and rs1516982, had the largest significant associations among all seven variants. At the 0.05 permutation threshold, the number of significant associations with these three variants was 959, 821 and 618. The number was 251, 194 and 139 at the more stringent permutation threshold of 0.01, and 38, 25 and 16 at the threshold of 0.001. These three variants share a number of significant mRNA gene expression associations. At the 0.05 permutation threshold, three hundred and twelve mRNAs, which account for 33% of the mRNA correlated with rs10098821, 38% of mRNA correlated with rs10088218, and 50% of mRNA correlated with rs1516982, are correlated with all three variants ( Figure S1). For example, levels of FANCE (Fanconi anemia, complementation group E) expression is significantly associated with rs1516982 (permutated P = 8.0610 24 , adjusted r 2 = 10.3%), rs10098821 (permutated P = 0.0037, adjusted r 2 = 7.0%) and rs10088218 (permutated P = 0.0312, adjusted r 2 = 3.4%), but none of the other four SNPs ( Figure S2).
We observed two possible cis-associations in which the variant genomic location is within 1 Mb around the probe targeting gene. One cis-association is between rs10098821 and c-Myc gene, which is 806 kb away from the variant (permutated P = 0.0198, Figure 1), and the other is between rs2072590 and HS.565379, which is 697 kb away from the variant (permutated P = 0.0399, not shown). rs10098821 explained approximately 4.0% of the variation in c-Myc expression as measured by adjusted r 2 . Individuals with T variant alleles have statistically significantly lower expression of c-Myc compared to ones without T variant alleles. rs2072590 explained about 4.4% of the variation in HS.565379 expression. HS.565379 has been found to show tissuespecific expression in uterus and uterine tumor based on ESTbased gene expression profiling [16].
Then, we investigated whether there are any significant associations between these seven variants and known ovarian cancer risk genes, including BRCA1/2, MMR genes, p53, etc. We didn't observe any significant association between these variants and the BRCA1/2 genes. However, we found several significant associations between the variants and the MMR genes and the p53 gene ( Figure S3). For example, we found the expression level of the MLH1 gene is significantly associated with rs2072590, a variant on the 2q31 loci (permutated P = 0.0049, Figure 2). rs2072590 explained about 8.3% of the variation in MLHL1's expression level. The expression of the p53 gene is significantly associated with rs2665390 (permutated P = 0.018, adjusted r 2 = 0.036), rs1516982 (permutated P = 0.028, adjusted r 2 = 0.035), and rs10088218 (permutated P = 0.049, adjusted r 2 = 0.025). Additionally, the expression of the MSH5 gene is significantly associated with rs2363956 (permutated P = 0.0056, adjusted r 2 = 0.075).

Discussion
The genetic etiology of familial ovarian cancer is still a mystery. Known mutations in BRCA1/2 and MMR genes can only explain a small part of the familial aggregation of ovarian cancer. The results from recent GWAS studies have identified several common genetic variants conferring risk for ovarian cancer [1][2][3]. However, most of these variants are not in protein-encoding regions, so the functional significance of these variants is largely unknown. The current study presents an attempt to dissect the genetic susceptibility of familial ovarian cancer, as well as elucidate the potential functional significance of the identified risk variants from GWAS. Specifically, we investigated the associations between seven significant variants identified from ovarian cancer GWAS and global mRNA expression.
As expected, we have observed a larger number of distant (trans-) than local (cis-) associations. Among the two identified cis-associations, the association between rs10098821 at 8q24 and c-Myc is particularly interesting. Common variants at 8q24 have previously been shown to confer susceptibility to multiple cancer phenotypes, including prostate, colorectal, breast and bladder cancers [18][19][20][21][22][23], and previous functional studies have suggested that common variants in this region may be associated with transcriptional regulation of c-Myc [24][25]. Most risk associations at 8q24 are located 59 of c-Myc, but the three most significant SNPs for ovarian cancer lie in an apparent gene desert which is .700 kb 39 of c-Myc, suggesting either that c-Myc might not be the target susceptibility gene for ovarian cancer or that variants in this region are also capable of distant regulation of c-Myc. In a previous study [2], Goode et al compared c-Myc expression in 48 normal ovarian epithelial cell lines between individuals without rs10098821 variant alleles and ones with at least one rs10098821variant alleles. Using GAPDH as the reference mRNA, they found that the ones without rs10098821 variant alleles had higher c-Myc expression than ones with at least one rs10098821 variant alleles (Median of relative expression: 0.97 vs 0.62). However, the difference didn't reach statistical significance (P = 0.43). Similar to their findings, we have observed that individuals without rs10098821 variant alleles had significantly higher levels of c-Myc expression compared to ones with at least one rs10098821 variant alleles (permutated P = 0.0198). As we have indicated above, rs10098821 is 39 of MYC and lies about 0.8 Mb away. How this SNP might affect MYC expression is still unclear. Using these identified significant associations in the pathway analysis, we have found that the genes significantly associated with GWAS discovered ovarian cancer risk alleles are enriched in several key biological pathways, such as cell cycle, cellular response to stress/damage, energy metabolism, transcriptional factor binding, etc. Interestingly, most known familial ovarian cancer genes (i.e., BRCA1/2 and MMR) are key players in these key pathways. For example, it has been demonstrated that BRCA1 is the key regulator in sensing DNA stress/damage and subsequently promoting cell cycle arrest [26]. Although our association analysis cannot pinpoint the exact functions of these GWAS discovered variants, it provides a list of potential biological pathways for which one could focus on in future analysis.
There are several limitations to this study. First, many mRNAs are expressed in a tissue-restricted manner. The results from LCLs in this study are likely to represent a small subset of mRNA expression variations. Also, our ability to study the genetics of mRNA expression is limited by the fact that we only investigated seven variants in the analysis, although these seven variants have been associated with ovarian cancer risk in recent GWASs. Second, the effects on transcript abundance may be subtle and therefore below the sensitivity threshold of the microarray platform, and the sample size in our study is relatively small. Third, there is a concern about what the results actually mean when measuring expression in non-tumor tissue at a single point in time. The ultimate goal of our study is to identify the inherited genetic determinants of mRNA expression in normal tissues rather than somatic alterations of mRNA gene expression in tumor tissues. Studies have been shown that at least part of the mRNA gene expression is genetically determined. Therefore, even at a single time-point in non-tumor tissue, what we have observed from this study still provides useful information about how mRNA expression is genetically regulated. Forth, certain effects may only be revealed in certain contexts, such as perturbation of a particular pathway, and may occur through changes in gene transcripts mediated by alterations in microRNAs or non-coding RNAs rather than through direct effects on genes. In these cases, alternative assays will be required to implicate these genes. Finally, the significant associations are not further functionally characterized since all of the top associations are trans-associations. So far, there is still lack of established experimental methods to assess transregulation between SNPs and gene expression.
To the best of our knowledge, this study provides the first assessment of the expression level variation of mature human mRNAs in LCLs from familial ovarian cancer patients and healthy unrelated controls. Further studies are needed to identify the genetic causes and biological consequences related to the identified significant associations. Significant associations identified in this study may potentially facilitate better understanding of the genetic etiology of familial ovarian cancer.

Study Population
This study has been approved by the Institutional Research Board (IRB) of Roswell Park Cancer Institute. Written informed consents have been obtained from all study subjects. Data and samples from women with ovarian cancer and their relatives who were cancer-free were obtained from the Gilda Radner Familial Ovarian Cancer Registry (GRFOCR). Seventy-four non-related women with familial ovarian cancer were included in this study as the cases. They were identified from families with inherited ovarian cancer in which at least two first or second degree relatives had epithelial ovarian cancer diagnosed at any age. All of the women were noncarriers of BRCA1/2 or MLH1/MSH2 mutations. Over time, different methods have been used to determine the mutation status of BRCA1/2 in GRFOCR samples. For samples collected before 2002, mutation status was determined by screening all exons and intron/exon splice junctions of BRCA 1/2 by a combination of SSCP and HD analysis. Additionally, exon 11 of BRCA1 was assayed by the protein truncation test for stop codon generating mutations. If alterations were found, the altered fragment was sequenced. Since 2002, sequencing of exons and splice junctions was used. In the last 5 years, all samples (old and new) not showing a mutation were assayed for BRCA1 large-scale rearrangements. The cancer-free controls of GRFOCR were family relatives of the cases, including mothers, sisters, nieces, etc. However, in this study, we chose to use unrelated controls. Unrelated controls are women who are not relatives of any cases used in this study. Forty-seven unrelated controls were included. The cases and controls were matched on gender and race. All of the cases and controls were white women. The median age at cancer diagnosis for the 74 cases was 47 (ranging from 21 to 85), while the median age for the 47 controls at enrollment in GRFOCR was 58 (ranging from 26 to 89). All study subjects donated blood samples when they were enrolled in the GRFOCR. LCLs were established by EBV transformation using the isolated lymphocytes from the blood samples. The study was approved by the institutional IRB board.

Lymphoblastoid Cell Lines (LCLs) Culture and RNA Extraction
LCLs were maintained in RPMI 1640 (GIBCO BRL) media supplemented with 15% fetal calf serum and antibiotics at 37uC, 5% CO 2 atmospheric condition and 95% humidity. Total cellular RNAs were isolated from LCLs using TRIzol reagent according to the protocols provided by the manufacturer (Invitrogen Corp., Carlsbad, CA, USA). Purified RNAs were further processed to remove any contaminating DNA (DNA-free kit, Ambion, Inc., Austin, TX, USA). The quality and quantity of the RNA was evaluated by 260/280 ratio using NanoDrop spectrophotometry (NanoDrop ND-1000 Technologies Inc.) and Agilent 2100 Bioanalyzer (Agilent Technologies).

Gene Expression Microarray
Two hundred nanograms of total RNA from each sample were labeled and hybridized on Illumina human HT-12 v3 Expression BeadChips according to the manufacturer's recommendations (Illumina Whole-Genome Gene Expression Guide). The expression profiles have been deposited in NCBI's Gene Expression Omnibus (GEO) with accession number GSE37582.

Statistical Analysis
The raw intensity of the Illumina human HT-12 v3expression array was scanned and extracted using BeadScan, with the data corrected by background subtraction in the GenomeStudio module. The lumi package in the R-based Bioconductor Package was used to normalize the log 2 transformed intensity data by using the Quantile normalization algorithm. For data quality control, we excluded the probes with detection P value.0.05 (the P values were generated in BeadStudio software) in at least 25% (n = 121) of the samples. A total of 10,435 mRNA genes passed the quality control step and were used for downstream analysis. The association of SNP genotype with residuals of expression level adjusted for age and casecontrol status was calculated using linear regression model as described before (27). Ten thousand permutations of the expression phenotypes relative to SNP genotypes were performed (28)(29). To derive P-values adjusted for multiple testing, we determined the percentage of times out of 10,000 permutations that the observed P-value was exceeded in the permuted data analysis.  between log 2 residuals of FANCE expression levels (adjusted for age and case-control status) and genotype of the rs1516982 (top, permutated P = 8.0610 24 , adjusted r 2 = 10.3%), rs10098821 (middle, permutated P = 0.0037, adjusted r 2 = 7.0%) and rs10088218 (bottom, permutated P = 0.0312, adjusted r 2 = 3.4%).