Polymorphisms in Stromal Genes and Susceptibility to Serous Epithelial Ovarian Cancer: A Report from the Ovarian Cancer Association Consortium

Alterations in stromal tissue components can inhibit or promote epithelial tumorigenesis. Decorin (DCN) and lumican (LUM) show reduced stromal expression in serous epithelial ovarian cancer (sEOC). We hypothesized that common variants in these genes associate with risk. Associations with sEOC among Caucasians were estimated with odds ratios (OR) among 397 cases and 920 controls in two U.S.-based studies (discovery set), 436 cases and 1,098 controls in Australia (replication set 1) and a consortium of 15 studies comprising 1,668 cases and 4,249 controls (replication set 2). The discovery set and replication set 1 (833 cases and 2,013 controls) showed statistically homogeneous (Pheterogeneity≥0.48) decreased risks of sEOC at four variants: DCN rs3138165, rs13312816 and rs516115, and LUM rs17018765 (OR = 0.6 to 0.9; Ptrend = 0.001 to 0.03). Results from replication set 2 were statistically homogeneous (Pheterogeneity≥0.13) and associated with increased risks at DCN rs3138165 and rs13312816, and LUM rs17018765: all ORs = 1.2; Ptrend≤0.02. The ORs at the four variants were statistically heterogeneous across all 18 studies (Pheterogeneity≤0.03), which precluded combining. In post-hoc analyses, interactions were observed between each variant and recruitment period (Pinteraction≤0.003), age at diagnosis (Pinteraction = 0.04), and year of diagnosis (Pinteraction = 0.05) in the five studies with available information (1,044 cases, 2,469 controls). We conclude that variants in DCN and LUM are not directly associated with sEOC, and that confirmation of possible effect modification of the variants by non-genetic factors is required.


Introduction
Cancers at the ovary are the most lethal gynecologic cancer, with 21,650 new cases and 15,520 deaths in the U.S. in 2008 [1]. Most (.95%) ovarian cancers are epithelial in origin, affecting cells on the surface of the ovary [2], which are separated from the underlying ovarian stromal tissue by a basal lamina. The stroma is the supportive framework of biologic tissue consisting of an extracellular matrix (ECM) and soluble growth factors that mediate epithelialstromal interactions and regulate intercellular communication [3]. Activation of oncogenes and inhibition of tumor suppressor genes in the epithelium were previously considered to be the only alterations required for the development of epithelial cancers [4]; however, alterations in stromal components that disrupt normal cell functions can lead to morphologic changes that manifest as tumors through perturbation of the epithelium [3]. For example, radiation-induced changes in the stromal microenvironment have been shown to contribute to neoplastic progression of initiated mammary epithelial cells in vivo [5], and may include processes that activate transforming growth factor-beta (TGF-b) and initiate ECM remodelling [6,7].
The ECM is composed of different proteins: decorin and lumican are members of the small leucine-rich proteoglycan family that bind to collagen in the stroma and are involved in matrix assembly and structure, and in the control of cell proliferation [8]. The expression of both decorin and lumican is altered in various cancers [9,10], including serous epithelial ovarian cancer [11][12][13][14]. Conceivably, factors that alter epithelial-stromal interactions or the cross-talk among growth factors like TGF-b may also influence expression and/or activity of decorin or lumican, or vice versa. Such factors may include inherited genetic susceptibility. This could be particularly germane to decorin, which binds to TGF-b and serves as a regulatory control for TGF-b release and activation [15].
In view of the important role of the stroma in epithelial cancers and the role of decorin and lumican in tumorigenesis, we tested the hypothesis that inherited variation in DCN and LUM may influence the risk of serous epithelial ovarian cancer in 18 independent study populations: a discovery set that included studies from Mayo Clinic (MAY) and the North Carolina Ovarian Cancer (NCO) study, replication set 1 from Australia (AUS), and replication set 2 comprised of 12 matched studies from the Ovarian Cancer Association Consortium (OCAC).

Results
The distributions of selected covariates between cases and controls in the discovery set and replication set 1 are listed in Table 1. Covariates were distributed similarly between the discovery set and replication set 1, including the proportion of serous carcinomas across tumor stage. The MAFs for the 10 tagSNPs in the discovery set ranged from 0.08 to 0.29 among controls and were similar in replication set 1 for those SNPs in common (Table S1).
In the discovery set, decreased risks were associated with serous epithelial ovarian cancer under both co-dominant and ordinal models at DCN rs3138165, DCN rs13312816, DCN rs516115 and LUM rs17018765 (all four SNPs: P trend = 0.06) ( Table 2). Associations at all SNPs interrogated in the discovery set are in Table S2. No statistically significant associations were found in haplotype analyses (Table S3).
In replication set 1, decreased risk associations were found under both co-dominant and ordinal models at DCN rs3138165 (P trend = 0.009), DCN rs13312816 (P trend = 0.01) and LUM rs17018765 (P trend = 0.008) but not at DCN rs516115 (P trend = 0.20) or DCN rs741212 (P trend = 0.61) ( Table 2). Imputed genotypes tended to assume similar risk associations as typed SNPs, likely from high LD among these variants ( Figure S1). The squared correlation between imputed and true genotypes was 0.73 for DCN rs3138165, 0.73 for DCN rs13312816, and 0.68 for LUM rs17018765.
Associations at these four SNPs were tested further in the OCAC replication set 2, and DCN rs3138165 and rs13312816, and LUM rs17018765 showed statistically significant increased risks ( Figure 1, Table S4). Within the OCAC replication set 2, the ORs were statistically homogeneous (P heterogeneity $0.13), but not when combined with the discovery set and replication set 1 (P heterogeneity = 0.001 to 0.03). Heterogeneous ORs were not due to errors in allele coding.
Information on age at diagnosis (or age at interview for controls) and on years of study recruitment was available for all studies and was used to test for SNP interactions in post-hoc analyses in an effort to explain the OR heterogeneity across studies (Table 3). For example, the interaction between DCN rs3138165 and age group was suggestive (P interaction with age = 0.04) and per-minor allele associations were highest among women age,40 years (OR = 2.1; P = 0.01; 104 cases) and lowest among women $70 years (OR = 0.8; P = 0.24; 510 cases). Results were similar for the three other SNPs (P interaction with age = 0.07 to 0.08; data not shown). Associations at DCN rs3138165, stratified by period of recruitment, are shown in Figure 2A, and in Figures S2A, S3A and S4A for the three other SNPs. The per-minor allele summary OR was 1.3 (P = 0.01; 1,007 cases) for studies with a median year of recruitment before 2000, and 0.9 (P = 0.07; 1,494 cases) after 2000 (all SNPs: P interaction with period = 0.002 to 0.01). Because of the modifying effects of period of recruitment and potentially of age, we performed a sensitivity analysis by excluding those case-only studies that were not matched on age and year of recruitment to controls from other studies. As shown in Figure 2B for DCN rs3138165, and Figures S2B, S3B and S4B for the three other SNPs, the per-minor allele summary ORs were relatively unchanged for studies with a median year of recruitment before 2000 (OR = 1.3; P = 0.03; 612 cases), and after 2000 (OR = 0.8; P = 0.01; 1,340 cases) (all SNPs: P interaction with period = 0.002 to 0.003). However, there was a 20% change in the coefficient for the interaction term. Likewise, changes in the coefficient for the interaction term were 19% for DCN rs13312816, 38% for DCN rs516115 and 16% for LUM rs17018765, consistent with the definition of an important change-in-estimate effect [16]. This is particularly true for DCN rs516115, which showed no statistically significant association in the OCAC replication set 2.

Discussion
We could not confirm the associations of SNPs in two stromal genes, DCN and LUM, with the risk of serous epithelial ovarian cancer, the most common histological type of ovarian cancer, using a multi-stage replication approach within the OCAC. We found decreased risks at four SNPs in the discovery set and replication set 1, and increased risks in a larger sample of cases in the OCAC replication set 2. The heterogeneity in associations across studies was statistically significant and was explained, in part, by the period of case recruitment, with the four SNPs imparting up to a 30% increased risk for diagnoses before the year 2000 and up to a 20% decreased risk after the year 2000. Weaker interactions were seen with age at diagnosis and with diagnosis year in post-hoc analyses. Non-statistically significant modifying effects of OC use, parity and BMI were also observed.
Age-adjusted incidence rates of epithelial ovarian cancer have been trending lower in most of North America and Europe since the 1990s [17,18], and since our gene pool is not changing over such a short period, we speculated that our results reflect changes in the environment. As expected, there was no statistically significant association of diagnosis year with DCN rs3138165, although there were significant associations with several of the covariates and one of these, age at menarche, remained statistically significant with diagnosis year in the multivariable-adjusted model. Our findings may suggest that temporal changes in risk factors are modifiers of inherited susceptibility in DCN and LUM. However, we cannot exclude the role of unmeasured factors that are related to diagnosis year or to study site, or that our findings are due to chance. Several of the 15 studies in replication set 2 are new to OCAC, and epidemiological variables for subjects have not yet been submitted to the central Data Coordinating Center. We were, therefore, under-powered to evaluate geneenvironment interactions and can only speculate that age, increasing obesity [19], changing trends in OC hormone preparation and use [20,21], or increasing age at pregnancy/ decreasing parity [22] may be obvious candidates for future testing of temporal changes that may modify risk associations of these SNPs. Each of these hormonally-related factors is associated with ovarian cancer [23][24][25][26]. Studies examining the response of normal ovarian epithelium to hormonal factors reported that macaque primates receiving progestin alone had higher frequencies of apoptotic ovarian epithelial cells compared to control animals or those receiving estrogen alone [27]. Furthermore, the protective effect of parity may be from exposure to pregnancy hormones such as progesterone that have been speculated to clear the ovarian epithelium of precancerous cells [28]. In the macaque study, the increase in apoptotic cells correlated highly with a shift in expression from TGF-b1 isoform to TGF-b2 and -b3 isoforms in the ovarian surface epithelium [29]. TGF-b isoforms appear to be regulated by a variety of steroid hormones in a tissue-specific manner (reviewed in [29]). In contrast, ovarian carcinomas are frequently resistant to TGF-b-mediated growth inhibition [30,31] and express higher levels of TGF-b1 and TGF-b3, and lower levels of TGF-b2, than normal human ovarian specimens [30], the significance of which is unclear.
Decorin and lumican have multiple biological roles including control of cell proliferation [32]. Interestingly, decorin belongs to the family of secretory glycoproteins known as latent TGF-bbinding proteins (LTBPs) that sequesters the pro-hormone or latent form of TGF-b and prevents it from interacting with its signaling receptors, TbRI and TbRII [15,33]. LTBPs may facilitate the secretion, storage or activation of latent TGF-bs and serve as a reservoir for concentrated delivery of TGF-bs to receptors [15]. TGF-bs [33] and decorin [34] have been implicated as potent tumor suppressors; however, the diverse array of cellular processes regulated by TGF-bs seems to depend on the microenvironment: for example, promoting apoptosis and inhibiting epithelial growth in normal cells and promoting proliferation and angiogenesis in various cancer models [15,35]. The link between progesterone, TGF-bs and decorin is particularly intriguing within the context of our SNP-environment associations. However, this investigation was not designed to examine SNP-environment interactions and the power to detect significant effect modification with the available sample size was low.
We also compared our results to unpublished findings from two recent genome wide association (GWA) studies of ovarian cancer, but there was no clear support for associations at the SNPs. For example, the four SNPs were not associated statistically with serous epithelial ovarian cancer (ORs = 0.93-0.96; P = 0.28-0.76) The discrepancies in results likely reflect similar challenges in interpretation as the OCAC results and underscore the importance of understanding the distribution of individual-level environmental exposures in genetic studies [37].
The strengths of this study include the multi-stage replication strategy, representing 2,501 total serous epithelial ovarian cancer cases. To reduce the impact of population stratification, our analyses were restricted to known or presumed Caucasians. Although one study (SRO) consisted of mostly Caucasians, our results were unchanged when this study was excluded in sensitivity analyses. The characteristics of the samples from the discovery set and replication set 1 were similarly distributed, as was the period of recruitment, thus reducing the impact of effect modification on the SNP-disease associations in these three studies. By restricting the samples to serous epithelial ovarian cancers, we reduced etiologic heterogeneity that may exist among different histological types of ovarian cancer [38]. Finally, we used statistical techniques to impute untyped SNPs as an efficient approach to include these SNPs in a combined analysis of samples from the discovery set and replication set 1.
The major limitation of this investigation is the absence of epidemiological information for most of the OCAC studies included in this report. Thus, our findings in the post-hoc analyses, while intriguing, require a tempered interpretation. Although MAFs of SNPs were generally similar across OCAC studies, occasionally a 1.5 to 2-fold difference was observed, which might suggest population structure influences on associations. Furthermore, we genotyped tagSNPs, which are likely only proxies for the putative causal SNP(s).
In summary, our multi-stage replication investigation suggests that SNPs in DCN and LUM are not associated with serous epithelial ovarian cancer. Verification of possible effect modification by age and other unconfirmed temporal effects is underway in an OCAC investigation of 10,000 cases and 10,000 controls.

Ethics statement
Participants in all the studies provided written informed consent and each site's institutional review board approved the study

Discovery set
The discovery set consisted of a combination of two individual studies of epithelial ovarian cancer from MAY and NCO in the United States. Details of the study protocols have been published previously [39]. Briefly, participants included Caucasians and African-Americans enrolled between June 1999 and March 2006 (see Table S5 for detailed study descriptions). In both studies, cases were newly diagnosed, histologically-confirmed, either borderline or invasive, and enrolled within one year of diagnosis. Controls had at least one intact ovary, no history of ovarian cancer and were frequency matched to cases on age (5-yr age categories), race and state of residence.

Replication set 1
The Australian Ovarian Cancer Study recruited cases diagnosed between January 2002 and June 2006 from surgical treatment centers and cancer registries throughout Australia [40]; recruitment through the New South Wales and Victorian Cancer Registries was conducted under the Australian Cancer Study [41] (together, they form the AUS study). Controls were population-based and were randomly selected from the Australian electoral roll and frequency matched to cases on age and state of residence (Table S5).

Ovarian Cancer Association Consortium (OCAC) replication set 2
Fifteen studies from Belgium (BEL), Canada (OVA), Denmark (MAL, PVD), Finland (HOC), Germany (GER, HAN-HJO, HAN-HMO), Netherlands (NTH), the United Kingdom (SEA, SOC, SRO, UKO) and the United States (LAX, UCI), comprising 4,536 primary epithelial ovarian cancer cases and 4,622 controls for whom genotype data were available, were used as a second replication set (Table S5). Most were case-control studies, although some of these studies (PVD, SOC, SRO and LAX) consisted of cases-only and were matched by region within country to unique controls from other OCAC studies. Thus, SOC cases were matched to UKO controls, SRO cases to UKO and SEA controls, LAX cases to UCI controls and PVD cases to MAL controls, resulting in 12 matched studies for analysis.
Information on established risk factors (reproductive history, family history of cancer, medical history, and lifestyle habits) including diagnosis year was collected in the discovery set and replication set 1 and was available for two replication set 2 studies (GER, UCI).

SNP selection, genotyping and quality control (QC)
Discovery set. Tag single nucleotide polymorphisms (SNPs) were chosen from unrelated Caucasian samples within HapMap Consortium's release 22 [42] as previously described [43], and also for their predicted likelihood of successful genotyping using the Illumina Golden Gate Assay TM . We identified six tagSNPs from among 22 DCN SNPs and seven tagSNPs from among 16 LUM SNPs with minor allele frequency (MAF)$0.05 and pairwise linkage disequilibrium (LD) of r 2 $0.8. One tagSNP in each gene was predicted to assay poorly (design score = 0), was a singleton in its bin and could not be replaced. This left five tagSNPs in DCN (rs3138165, rs516115, rs10492230, rs741212, and rs3138268) and six tagSNPs in LUM (rs17714469, rs1920790, rs2268578, rs10859110, rs10745553, and rs17018765), including five putative functional SNPs, for genotyping. The SNPs were located within, and 5 kb upstream and downstream of, each gene region. The two genes comprise a contiguous segment on chromosome 12 of approximately 80 kb.
The 11 tagSNPs were genotyped as part of a larger investigation of 1,152 SNPs in a variety of pathways using the Illumina GoldenGate TM assay and Illumina BeadStudio software [44]. Genotyping was attempted on 897 DNA samples from MAY and 1,279 samples from NCO (total = 2,176 including 129 duplicate samples) and 65 laboratory controls. Case status and duplicate samples were blinded to laboratory personnel who performed the genotyping. Of these samples, we excluded 44 with call rates ,90% and Illumina QC (GenCall) scores,0.25, and 22 ineligible or mislabelled samples, resulting in 1,981 unique samples that were successfully genotyped. The sample call rate was 99.74% and Associations represent ORs (95% CI) for individual study (squares) and study-adjusted pooled (diamonds) estimates. Models are ordinal genetic risk models. HAN-HJO and HAN-HMO were combined for presentation. P het refers to P value for heterogeneity in odds ratios among studies. doi:10.1371/journal.pone.0019642.g002 the concordance for duplicate samples was 99.99%. DCN rs3138268, a nonsynonymous SNP, was monomorphic and was excluded from further analyses. The remaining 10 tagSNPs were genotyped successfully.
Replication set 1. Three tagSNPs in DCN (rs13312816, rs516115, and rs741212) were genotyped as part of a larger assay of 1,536 SNPs in AUS (LUM SNPs were not genotyped). Genotyping was attempted on 1,674 samples using the Illumina GoldenGate TM assay and Illumina BeadStudio software [44]. One non-template control and two DNA samples per 96-well plate were blindly duplicated (n = 18). Samples with call rates ,95% and SNPs with call rates ,98% were excluded. SNPs with GenTrain scores (a metric of genotype clustering),0.5 were manually checked and adjusted according to Illumina guidelines. Greater than 97% of SNPs passed this initial QC and .84% of all SNPs passed all QC criteria, resulting in 550 cases and 1,101 controls (93% Caucasians) with genotype data on 1,292 SNPs, including the three tagSNPs in DCN included in this analysis.
Imputation. SNPs genotyped in the discovery set were not necessarily the same SNPs genotyped in replication set 1 and vice versa (e.g., only DCN rs516115 and DCN rs741212 were genotyped in both datasets). Genotypes at SNPs showing significant associations with ovarian cancer in either dataset were imputed so that datasets could be combined for analysis. Thus, we imputed DCN rs13312816 in the discovery set and DCN rs3138165 and LUM rs17018765 in replication set 1 using the MACH software [45]. Briefly, genotype data from the discovery set and replication set 1 were combined with the phase II HapMap data for Caucasian samples and the unobserved genotypes were then inferred probabilistically using a hidden Markov model [45].
OCAC replication set 2. We genotyped four SNPs (DCN SNPs rs3138165, rs13312816 and rs516115 and LUM rs17018765) showing significant associations in the discovery set and replication set 1 in the 12 matched OCAC studies using the Fluidigm EP1 system (Fluidigm, San Francisco, CA) at a central laboratory. Genotyping was performed on 96.96 dynamic arrays in a run of 96 SNPs using inventoried and Custom Assay-by-Design TaqMan probes (Applied Biosystems, Foster City, CA). Genotyping used 10 ng DNA following the manufacturer's conditions using the pre-amplification protocol. Analysis was performed using Genotyping SNP Analysis software. Samples with call rates ,80% were excluded immediately. The following criteria were used as measures of acceptable genotyping for each SNP and each matched study set: (i) $2% sample duplicates included, (ii) concordance for duplicate samples $96% and overall concordance for duplicate samples across all SNPs $98%, (iii) pass rate per plate of .90%, (iv) ,25% overall failed plates, (v) overall SNP call rate by study $95%, and (vi) a difference in call rate between cases and controls of ,5%. Studies failing one of these criteria were excluded for that particular SNP, resulting in 8,886 unique samples (4,419 cases and 4,467 controls) that were successfully genotyped. Excellent concordance (100%) in genotype calls was found between study samples and those of 95 HapMap genotyped DNAs (Coriell, Camden, NJ, USA).
For all studies, genotyping quality was further assessed using tests for Hardy-Weinberg equilibrium (HWE). SNPs with significant deviations from HWE in Caucasian controls (0.001,P,0.05) were assessed and excluded if the clustering was suboptimal. SNPs with HWE P,0.001 were excluded from analysis.

Statistical Analysis
We restricted analyses to subjects who were self-reported or presumed Caucasian and cases with invasive epithelial ovarian cancer of serous histology, resulting in a final sample size of 1,317 participants in the discovery set (397 cases and 920 controls), 1,534 participants in replication set 1 (436 cases and 1,098 controls) and 5,917 participants in OCAC replication set 2 (1,668 cases and 4,249 controls). Genotypes were used to estimate allele frequencies and pair-wise LD between SNPs was estimated with r 2 values using Haploview version 4.1 [46].
We estimated odds ratios (OR) and 95% confidence intervals (CI) at each SNP using unconditional logistic regression under both co-dominant and ordinal genetic models. In the discovery set only, we also estimated haplotype frequencies for each gene and tested the global statistical significance (P,0.05) for haplotype association [47]. Individual haplotype associations evaluated the risk of serous epithelial ovarian cancer compared to all other haplotypes combined.
Prior to combining data, statistical tests of heterogeneity in the ORs between studies were evaluated. Where heterogeneity existed, statistical significance of interaction in post-hoc analyses was assessed with the Wald test in models that included a product term for the ordinal coding of genotype and categories of age or period of recruitment (before the year 2000 or after the year 2000 based on the median year of the recruitment duration for each study, Figure 3) while adjusting for study. Among the five studies (AUS, GER, MAY, NCO and UCI) with detailed information on covariates, we also examined SNP interactions with diagnosis year, oral contraceptive (OC) use, parity, body mass index (BMI), menopausal status, age at menarche and family history of breast or ovarian cancer in first degree relatives. Missing observations were represented as a separate category within each variable. Associations representing the ordinal genetic model at each SNP were then stratified by the covariate. All models were adjusted for region of residence (discovery set only) or study. Additional adjustment for age categories did not alter associations, so models are presented without age.
All statistical tests were two-sided with an alpha level,0.05 considered statistically significant, and were implemented with SAS (SAS Institute, NC). Figure S1 Linkage disequilibrium blocks for tagSNPs in DCN and LUM. Analysis is based on total number of controls from the discovery set and replication set 1. The two genes comprise a contiguous segment on chromosome 12 of approximately 80 kb. Numbers in the squares on the LD block indicate the correlation (r 2 ) between SNPs. * indicates SNPs that were significantly associated with serous ovarian cancer. (TIF) Figure S2 Sensitivity analysis for DCN rs13312816 and serous epithelial ovarian cancer stratified by case recruitment period. Forest plots represent associations represent ORs (95% CI) for individual study (squares) and study-adjusted pooled (diamonds) estimates. Models are ordinal genetic risk models. HAN-HJO and HAN-HMO were combined for presentation. P het refers to P value for heterogeneity in odds ratios among studies. (TIF) Figure S3 Sensitivity analysis for DCN rs516115 and serous epithelial ovarian cancer stratified by case recruitment period. Forest plots represent associations represent ORs (95% CI) for individual study (squares) and study-adjusted pooled (diamonds) estimates. Models are ordinal genetic risk models. HAN-HJO and HAN-HMO were combined for presentation. P het refers to P value for heterogeneity in odds ratios among studies. (TIF) Figure S4 Sensitivity analysis for LUM rs17018765 and serous epithelial ovarian cancer stratified by case recruitment period. Forest plots represent associations represent ORs (95% CI) for individual study (squares) and studyadjusted pooled (diamonds) estimates. Models are ordinal genetic risk models. HAN-HJO and HAN-HMO were combined for presentation. P het refers to P value for heterogeneity in odds ratios among studies. (TIF)

Supporting Information
Table S1 SNP and location, HWE test P-value and MAF for variants in DCN and LUM in 920 controls in the discovery set and 1,098 controls in replication set 1. (DOC)

Table S2
Odds ratios (OR) and 95% confidence intervals (CI) for the association between genetic polymorphisms in DCN and LUM and serous epithelial ovarian cancer risk among 1,317 Caucasian subjects in the discovery set. (DOC)