SEPP1 Influences Breast Cancer Risk among Women with Greater Native American Ancestry: The Breast Cancer Health Disparities Study

Selenoproteins are a class of proteins containing a selenocysteine residue, many of which have been shown to have redox functions, acting as antioxidants to decrease oxidative stress. Selenoproteins have previously been associated with risk of various cancers and redox-related diseases. In this study we evaluated possible associations between breast cancer risk and survival and single nucleotide polymorphisms (SNPs) in the selenoprotein genes GPX1, GPX2, GPX3, GPX4, SELS, SEP15, SEPN1, SEPP1, SEPW1, TXNRD1, and TXNRD2 among Hispanic/Native American (2111 cases, 2597 controls) and non-Hispanic white (NHW) (1481 cases, 1586 controls) women in the Breast Cancer Health Disparities Study. Adaptive Rank Truncated Product (ARTP) analysis was used to determine both gene and pathway significance with these genes. The overall selenoprotein pathway PARTP was not significantly associated with breast cancer risk (PARTP = 0.69), and only one gene, GPX3, was of borderline significance for the overall population (PARTP =0.09) and marginally significant among women with 0-28% Native American (NA) ancestry (PARTP=0.06). The SEPP1 gene was statistically significantly associated with breast cancer risk among women with higher NA ancestry (PARTP=0.002) and contributed to a significant pathway among those women (PARTP=0.04). GPX1, GPX3, and SELS were associated with Estrogen Receptor-/Progesterone Receptor+ status (PARTP = 0.002, 0.05, and 0.01, respectively). Four SNPs (GPX3 rs2070593, rsGPX4 rs2074451, SELS rs9874, and TXNRD1 rs17202060) significantly interacted with dietary oxidative balance score after adjustment for multiple comparisons to alter breast cancer risk. GPX4 was significantly associated with breast cancer survival among those with the highest NA ancestry (PARTP = 0.05) only. Our data suggest that SEPP1 alters breast cancer risk among women with higher levels of NA ancestry.


Introduction
Selenium is a trace element essential for essential health that has been suggested to play a preventive role for a variety of chronic diseases, including various cancers [1][2][3][4][5][6]. This is likely due to the role of selenium as a constituent of various proteins, known as the selenoproteins [2,7]. Selenoproteins are a class of approximately 25 proteins containing a selenocysteine (SEC) residue. SEC is a cysteine analogue, where sulfur has been replaced by a selenium atom, synthesized from serine bound to a tRNA [8]. The incorporation of SEC is a complex process requiring an in-frame UGA codon, occurring as a stop codon, which is recognized as a SEC codon with the aid of a stem loop called a SEC insertion sequence (SECIS) and several other trans-acting factors [8,9].
In this study we evaluated single nucleotide polymorphisms (SNPs) in several selenoprotein coding genes for an association with breast cancer: glutathione peroxidase 1 (GPX1), glutathione peroxidase 2 (GPX2), glutathione peroxidase 3 (GPX3), glutathione peroxidase 4 (GPX4), SELS, SEP15, SEPN1, SEPP1, SEPW1, thioredoxin reductase 1 (TXNRD1), and thioredoxin reductase 2 (TXNRD2). These selenoproteins were selected for analysis because their functions have been characterized and many of them have been associated with risk of various types of cancer and/or oxidative stress [2][3][4][5][6][7]9,[11][12][13][14]. The glutathione peroxidases (GPX1, GPX2, GPX3, and GPX4) primarily function to reduce oxidative stress by detoxifying hydrogen peroxide and other organic peroxides [2,13]. SELS is involved in inflammatory response [10]. SEP15 is a protein located primarily in the endoplasmic reticulum and plays a role in protein folding [2]. SEPN1 plays a role in redox homeostasis and protects against oxidative stress [15]. SEPP1 acts as a selenium transport protein, a heavy-metal chelator, and an antioxidant [10]. SEPW1 is a highly conserved protein that acts as an antioxidant protecting against oxidative stress [10]. The thioredoxin reductases (TXNRD1 and TXNRD2) are antioxidants that reduce the oxidized form of thioredoxin, an important regulator of redox-controlled cell functions and redox balance [12]. While many of these selenoprotein-coding genes have been associated with certain cancers, their association with breast cancer remains unclear.
In addition to evaluating SNPs in these genes for an association with breast cancer risk and survival, we also evaluated associations by level of Native American (NA) genetic ancestry. Higher levels of NA ancestry have been associated with reduced breast cancer risk [16,17] and serum selenium concentration has been shown to differ amongst racial and ethnic groups [18]. We also evaluated breast cancer associations by menopausal status and estrogen receptor (ER) and progesterone receptor (PR) status. Additionally, we evaluated breast cancer associations by dietary oxidative balance score (DOBS) since selenoprotein genes may interact with dietary factors that influence oxidative stress.

Ethics Statement
All participants signed informed written consent prior to participation and each study was approved by the Institutional Review Board for Human Subjects at the participating institutions: University of Utah, University of Arizona, University of Colorado, University of New Mexico, Comisión de ética, and Institutional Review Board of the Cancer Prevention Institute of California.

Study Design
The Breast Cancer Health Disparities Study includes participants from three population-based case-control studies, the 4-Corners Breast Cancer Study, the Mexico Breast Cancer Study, and the San Francisco Bay Area Breast Cancer Study [17] who completed an in-person interview and who had a blood or mouthwash sample available for DNA extraction. In the 4-Corners Breast Cancer Study, participants were between 25 and 79 years of age with a histological confirmed diagnosis of in situ (n=341) or invasive (n=1492) cancer between October 1999 and May 2004; controls were selected from the target populations of cases living in Arizona, Colorado, New Mexico, and Utah and were frequency matched to cases on ethnicity and 5-year age distribution [19]. Participants from the Mexico Breast Cancer Study were between 28 and 74 years of age. Eligible cases in Mexico were women diagnosed with either a new histologically confirmed in situ or invasive breast cancer between January 2004 and December 2007 at 12 participating hospitals from three main health care systems; controls were randomly selected from the catchment area as the cases and frequency matched to cases based on 5-year age distribution, membership in health care institution, and place of residence. The San Francisco Bay Area Breast Cancer Study included women aged 35 to 79 years from the San Francisco Bay Area diagnosed with a first primary histologically confirmed invasive breast cancer between April 1995 and April 2002; controls were identified by random-digit dialing and frequency-matched to cases based on the expected race/ethnicity and 5-year age distribution [20,21].

Data Harmonization
Data were harmonized across all study centers and questionnaires as previously described [17]. Women were classified as either pre-menopausal or post-menopausal based on responses to questions on menstrual history. Women who reported still having periods during the referent year (defined as the year before diagnosis for cases or before selection into the study for controls) were classified as pre-menopausal. Center-specific definitions were used to define postmenopausal women. Women were classified as postmenopausal if they reported either a natural menopause or if they reported taking hormone therapy and were still having periods or were at or above the 95th percentile of age for those who reported having a natural menopause (i.e., > 12 months since their last period. This age at menopause was site-specific by ethnicity: 58 years for NHW and 56 for Hispanic/Native women from the 4-Corners' Breast Cancer Study; 54 for the Mexico Breast Cancer Study; and 55 for NHW and 56 for Hispanic women from the San Francisco Bay Area Breast Cancer Study. A dietary oxidative balance score (DOBS) that included nutrients with anti-or pro-oxidative balance properties was developed as previously reported [22]. Anti-oxidants included in the score were vitamin C, vitamin E, beta carotene (data for beta carotene were not available for Mexico), folic acid, and dietary fiber; alcohol was treated as a pro-oxidant. Nutrients per 1000 calories were evaluated and quartiles of intake were based on study-specific distributions; Long-term alcohol consumption was classified into three levels: the top 25 th percentile of consumption, all other drinkers, and non-drinkers. Referent year alcohol consumption was used for those women who did not have long-term alcohol measurements. In creating the DOBS, participants were assigned values of zero for low levels (first quartile) of exposure to anti-oxidants or high exposure to pro-oxidants (fourth quartile), one for intermediate levels (second and third quartiles) of exposure, and two for high levels (fourth quartile) of exposure to anti-oxidants and low exposure (first quartile) to pro-oxidants.

Genetic Data
DNA was extracted from either whole blood (n=7287) or mouthwash (n=634) samples. Whole Genome Amplification (WGA) was applied to the mouthwash-derived DNA samples prior to genotyping. A tagSNP approach was used to characterize variation across candidate genes. TagSNPs were selected using the following parameters: linkage disequilibrium (LD) blocks were defined using a Caucasian LD map and an r 2 =0.8; minor allele frequency (MAF) >0.1; range= -1500 bps from the initiation codon to +1500 bps from the termination codon; and 1 SNP/LD bin. We distinguished European and NA ancestry in the study population by using 104 Ancestry Informative Markers (AIMs) [17]. A multiplexed bead array assay format based on GoldenGate chemistry (Illumina, San Diego, California) was used for genotyping. A genotyping call rate of 99.93% was attained (99.65% for WGA samples). We included 132 blinded internal replicates representing 1.6% of the sample set. The duplicate concordance rate was 99.996% as determined by 193,297 matching genotypes among sample pairs. In the current analysis we evaluated GPX1 (2 SNPs), GPX2 (4 SNPs), GPX3 (3 SNPs), GPX4 (1 SNP), SEPP1 (2 candidate SNPs and 1 tagSNP), SELS (2 SNPs), SEP15 (4 SNPs), SEPN1 (5 SNPs), SEPW1 (3 SNPs), TXNRD1 (7 SNPs), and TXNRD2 (20 SNPs). A description of these genes and SNPs is shown in online Table S1. SEP15 rs9433110 was not analyzed since it was not in Hardy-Weinberg Equilibrium (HWE) among NHW participants. Online Table S2 shows minor allele frequency (MAF) and HWE by ancestry groups. It should be noted that in most instances a trend in a different prevalence of MAF across ancestry groups was noted and in some instances, such as SEPP1 rs6865453, we observed a reversal in the major and minor allele from the most European to the most Native ancestry groups.

Tumor Characteristics and Survival
Data for survival and ER/PR tumor status were available for cases from the 4-Corners Breast Cancer Study and the San Francisco Bay Area Breast Cancer Study.
Information on stage at diagnosis, months of survival after diagnosis, cause of death, and ER and PR status were available from cancer registries in Utah, Colorado, Arizona, New Mexico, and California. Information on ER and PR status of tumors was available for 1019 (69%) NHW and 977 (75%) Hispanic/NA cases. Surveillance Epidemiology and End Results (SEER) summary disease stage, based on three codes of local, regional, and distant, was used. Data on survival and ER and PR tumor status were not available from the Mexico Breast Cancer Study.

Statistical Methods
Genetic Ancestry Estimation. The program STRUCTURE was used to compute individual ancestry for each study participant assuming two founding populations [23,24]. A threefounding population model was assessed but did not fit the population structure with the same level of repeatability and correlation among runs as the two-founding population model. Participants were classified by level of percent NA ancestry. Assessment across categories of ancestry was done using cutpoints based on the distribution of genetic ancestry in the control population. These cut-points, 0-28%, >28-70%, and >70-100%, maximized power within all three ancestry groups to assess ancestry-specific associations. Genetic ancestry was used as a continuous variable when included in the models to adjust for possible confounding. SNP Associations. Genes and SNPs were assessed for their association with breast cancer risk by strata of menopausal status and genetic ancestry in the whole population and by ER/PR status for the 4-Corners Breast Cancer Study and the San Francisco Bay Area Study. Statistical analyses were performed using SAS version 9.3 (SAS Institute, Cary, NC) unless otherwise noted. Logistic regression models were used to estimate odds ratios (OR) and 95% confidence intervals (CI) for breast cancer risk associated with SNPs, adjusting for age, study center, genetic ancestry, body mass index (BMI of kg/m 2 ) during referent year, and parity. The generalized logit link function was used when estimating breast cancer risk by ER/PR status. Associations with SNPs were assessed assuming co-dominant models. Based on the initial assessment, SNPs which appeared to have a dominant or recessive mode of inheritance were evaluated with those inheritance models in subsequent analyses. Stratified analyses tests for interactions were calculated using a 1 degree of freedom (df) Wald chi-square tests; p values based on 4-df Wald tests measure the overall SNP (treated as continuous) association with breast cancer risk by ER/PR status. Adjustments for multiple comparisons within the gene used the step-down Bonferroni correction (i.e., Holm method) taking into account the correlated nature of the data using the SNP spectral decomposition method proposed by Nyholt [25] and modified by Li and Ji [26].
Survival Analysis. Survival months were calculated based on month and year of diagnosis and month and year of death or date of last contact by the cancer registries. Associations between SNPs and risk of dying of breast cancer among primary invasive cases were evaluated using Cox proportional hazards models to obtain multivariate hazard ratios (HR) and 95% CI among all cases and by genetic ancestry strata. The upper two ancestry strata were combined to evaluate survival by ancestry groups since survival data were not available for the Mexico study site. In the analysis of breast cancer survival, individuals were censored when they died of causes other than breast cancer or were lost to follow-up. In addition to the minimal adjustments for age, study center, genetic ancestry, BMI during referent year, and parity, models were also adjusted for SEER summary stage.
ARTP Analysis. We used the adaptive rank truncated product (ARTP) method that is based on a highly efficient permutation algorithm to determine the significance of association of each gene and of the pathway with breast cancer overall, by genetic ancestry, and by ER/PR strata. The gene p values were generated using the ARTP package in R, permuting outcome status 10,000 times while adjusting for age, BMI during referent year, and genetic ancestry [27,28]. The ARTP method was also applied to survival data using Cox proportional hazard models in R with an additional adjustment for SEER summary stage to generate p values based on likelihood ratio tests. The survival outcome (i.e., vital status and survival months) was permuted 10,000 times in R. We report both pathway and gene p values (P ARTP ).
Multiple genes and SNPs in our analysis showed an association with breast cancer survival (Table 4). GPX4 was significantly associated with better breast cancer survival among those with the highest NA ancestry (P ARTP = 0.05). GPX4 GT/TT rs2074451 showed a marginal inverse association with survival for individuals among women with >28% NA ancestry (p=0.055; p adj = 0.055; p interaction=0.06). Several SNPs in TXNRD2 were associated with survival prior to adjustment for multiple comparisons. Additionally, TXNRD2 AA rs3788314 and TXNRD2 CT/TT rs4333017 showed significant interaction across genetic ancestry (p int =0.035 and p int =0.017, respectively).

Discussion
Based on ARTP results, GPX3 was borderline statistically significantly associated with breast cancer risk for all women and for women with lower levels of NA ancestry specifically. SEPP1 showed a statistically significant interaction by NA ancestry, with a strong association with breast cancer risk among women with higher NA ancestry. Some differences in association were observed by ER/PR tumor status. GPX1, GPX3, and SELS were significantly associated with ER-/PR+ tumors and SELS was significantly associated with ER+/PRtumors. GPX4 was significantly associated with survival among those with higher NA ancestry and SEPW1 was marginally associated with survival among women in the low NA ancestry group. Although we hypothesized that DOBS would modify associations with selenoprotein genes, only one SNP in GPX4, SELS, and TXNRD1 that interacted with DOBS remained statistically significantly associated with breast cancer after adjustment for multiple comparisons.
Among the genes evaluated in this study, only GPX3 showed a borderline significant association overall as determined by the P ARTP , and GPX3 rs8177447 was statistically significant after adjustment for multiple comparisons. This SNP is in high LD with rs3792797 and has been associated with Barrett's Esophagus [29] (see table 5 for comparison of findings of this study to other information on SNPs). GPX3 is one of multiple glutathione peroxidases, all of which are selenoproteins that play a role in catalyzing the reduction of hydrogen peroxides to minimize oxidative stress which can damage cells [10]. This enzyme acts as an efficient antioxidant in the plasma and has previously been linked to other diseases associated with oxidative stress [10]. Our study indicates that in addition to an important role in development of these cancers, GPX3 may also play a role in the development and progression of breast cancer.
Other studies have indicated that glutathione peroxidases may be associated with breast cancer risk, specifically GPX1 [14,30], a cytosolic antioxidant [31]. A meta-analysis of six case-control studies of the Pro198Leu polymorphism (rs1050450) in GPX1, did not see an association between breast cancer risk in Caucasians, although they did see a strong increased risk of breast cancer among African women [32]. Likewise, Cox and colleagues did not see an association between this SNP and breast cancer risk [33]. In a recent study by Meplan, GPX1 rs1050450 was shown to interact with hormone therapy to alter risk of breast cancer [34]. Our study did not show an association with breast cancer risk and GPX1. We did however, detect an association between GPX1 and ER-/PR+ breast tumors; however this represents a small group of women and could be a chance finding.
SEPP1 SNPs have been associated with a variety of cancers, including prostate [35,36], lung [2], and colorectal [3,37] cancer. Therefore, we evaluated three candidate SEPP1 SNPs that have been associated with oxidative stress and cancer [6,38] and are in high LD with other functional SNPs (see Table 5). SEPP1 is the major selenoprotein in plasma, acting as a selenium transport protein [13]. SEPP1 has been shown to behave as an antioxidant [13] and estrogen has been shown to increase hepatic SEPP1 concentrations [39], providing biological support for the observed associations between SEPP1 SNPs and cancer risk. Support for SEPP1 as an antioxidant comes from earlier findings that in human plasma the SEPP1 protein is involved in the degradation of peroxynitrite, which plays a role in inflammatory toxicity [31]. Additionally, associations between serum selenium levels and thioredoxin reductase activity have been correlated with SEPP1 rs3877899 [40], thereby establishing a further link between SEPP1 and the antioxidant activities of selenoproteins. Our analysis failed to find an association between our candidate SNP, SEPP1 rs3877899, that was previously linked to breast cancer [26] and is a nonsynonymous coding SNP. In our analysis of two other SEPP1 SNPs, rs230812 and rs6865453, we found an association with breast cancer risk among women with higher NA ancestry. SEPP1 rs230812 is in high LD with rs230813 and rs230819 (see Table 5) which have been associated with oxidative stress [6,38]. SEPP1 was the only significant gene associated with breast cancer risk among women with greater NA ancestry which was significantly different than the risk observed for women with low NA ancestry. In this study, a large percentage of women with greater NA ancestry were part of the Mexico City Breast Cancer Study and it is possible that differences in selenium levels in food could exist between those women and women in the United States. If women from Mexico had lower serum selenium it is possible that SEPP1 could have a greater effect on risk.
We found that some selenoproteins were associated with tumor ER/PR status. Based on our analysis of gene P ARTP , GPX1, GPX3, and SELS were associated with ER-/PR+ tumor status, while SELS was also associated with ER+/PR-tumors. Additionally, several individual SNPs were associated with ER/PR status after adjusting for multiple comparisons. An earlier study found that glutathione peroxidase expression was associated with PR-status, as well as increased patient mortality [41]. The differences in results may be explained by the fact that the earlier study looked solely at expression levels of glutathione peroxidases, while our study looked at gene and SNP interactions without looking at expression of proteins.
Other studies have shown links between selenoproteins and estrogen. GPX1 messenger RNA has been shown to be upregulated in the presence of estrogen [42]. TXNRD1 has been shown to be an important modular of estrogen signaling through the estrogen receptor response elements [43] We also observed associations between selenoprotein SNPs and survival. Notably, GPX4 rs2074451 showed marginally significant interaction with NA ancestry and having a T allele was associated with decreased likelihood of dying from breast cancer among women with >28% NA ancestry. This is in agreement with study by Udler and colleagues [42], where they reported that GPX4 rs757229 and rs713041 were associated with a greater risk of all-cause mortality after diagnosis with breast cancer. GPX4 rs2074451 is highly correlated with these SNPs (see Table 5). Additionally, three SNPs in TXNRD2 (rs3788314, rs3788317, and rs4333017) showed significant differences in survival by ancestry. TXNRD2 has been associated with oxidative stress and our previous analysis of dietary factors of oxidative stress found the strongest associations among women with higher NA ancestry [22]. Udler did not observe a significant association between any of the TXNRD1 or TXNRD2 SNPs and breast cancer survival [42].
Given their role as antioxidants and mediators of oxidative stress, we evaluated selenoprotein SNPs for interactions with DOBS and found that GPX3 rs2070593, GPX4 rs2074451, SELS rs9874, and TXNRD1 rs17202060 showed significant interactions with DOBS after adjusting for multiple comparisons. Oxidative stress and high levels of reactive oxidative species have been suggested to play an important role in breast cancer development because free radicals damage DNA, thereby decreasing genomic integrity [44]. Our observed interactions with DOBS indicate that high DOBS may reduce breast cancer risk in individuals with high-risk genotypes.
The selenoprotein genes analyzed in this study were selected due to previous studies reporting on their roles in regulating oxidative stress and/or carcinogenesis; however, the majority of the genes and SNPs had not been studied in relation to breast cancer. Table 5 compares SNPs associated with breast cancer risk and survival in our study, to those reported in the literature in alter cancer risk, influence oxidative stress, or influence gene expression. Selenium levels have been associated with breast cancer [45], along with SNPs in GPX1 [14] and GPX4 [46]. We did not find evidence for an association with breast cancer risk for these particular glutathione peroxidases, yet we found that GPX3 was associated marginally with breast cancer risk and SEPP1 was associated with risk among women with higher NA ancestry. GPX4 was associated with breast cancer survival. Glutathione peroxidases carry out similar functions and have similar mechanisms; they contain a conserved catalytic triad of Sec, Gln, and Trp that acts by sequential oxidation and reduction of the Sec residue during catalysis [10]. The primary difference between the different glutathione peroxidases appears to be tissue distribution and cellular location; therefore, it is highly likely that multiple glutathione peroxidases participate in the antioxidant defense system against oxidative damage. Our study has two primary limitations. First, we only evaluated a subset of the known selenoproteins and did not evaluate any DIOs or SPS2s. Since these selenoproteins, along with others, play a role in limiting oxidative stress it is possible that they may be associated with breast cancer risk. Our second limitation was in our analysis of ER/PR status and survival where we were unable to include data from Mexico. While our study population was large, the sample sizes for the different ER/PR subtypes were small, thereby decreasing the statistical power of our analysis. Nevertheless, our study has multiple strengths: our analytic approach that evaluated the pathway as a whole, and our analysis of genes beyond individual SNPs via P ARTP . These strengths allowed us to show that both GPX3 and SEPP1 were associated with breast cancer; these associations warrant further study in other populations. An additional strength is our genetically admixed population that allowed us to evaluate associations across the spectrum of European to Native ancestry.
While we observed few significant associations between selenoprotein genes and breast cancer risk, GPX3 was marginally significant among women with lower NA ancestry and SEPP1 was statistically significant among women with higher NA ancestry. Additionally, several genes were associated with ER/PR status. While we hypothesized that selenoprotein genes would interact with DOBS, only four SNPs significantly interacted with DOBS after adjustment for multiple comparisons. In conclusion, this study provides limited support for an association between selenoprotein genes and risk of breast cancer.