Epistasis Analysis for Estrogen Metabolic and Signaling Pathway Genes on Young Ischemic Stroke Patients

Background Endogenous estrogens play an important role in the overall cardiocirculatory system. However, there are no studies exploring the hormone metabolism and signaling pathway genes together on ischemic stroke, including sulfotransferase family 1E (SULT1E1), catechol-O-methyl-transferase (COMT), and estrogen receptor α (ESR1). Methods A case-control study was conducted on 305 young ischemic stroke subjects aged ≦ 50 years and 309 age-matched healthy controls. SULT1E1 -64G/A, COMT Val158Met, ESR1 c.454−397 T/C and c.454−351 A/G genes were genotyped and compared between cases and controls to identify single nucleotide polymorphisms associated with ischemic stroke susceptibility. Gene-gene interaction effects were analyzed using entropy-based multifactor dimensionality reduction (MDR), classification and regression tree (CART), and traditional multiple regression models. Results COMT Val158Met polymorphism showed a significant association with susceptibility of young ischemic stroke among females. There was a two-way interaction between SULT1E1 -64G/A and COMT Val158Met in both MDR and CART analysis. The logistic regression model also showed there was a significant interaction effect between SULT1E1 -64G/A and COMT Val158Met on ischemic stroke of the young (P for interaction = 0.0171). We further found that lower estradiol level could increase the risk of young ischemic stroke for those who carry either SULT1E1 or COMT risk genotypes, showing a significant interaction effect (P for interaction = 0.0174). Conclusions Our findings support that a significant epistasis effect exists among estrogen metabolic and signaling pathway genes and gene-environment interactions on young ischemic stroke subjects.


Introduction
Previous population-based epidemiological studies reported stroke incidence rate to be lower in women during midlife than that in either older aged women or men [1,2]. The relatively high risk of ischemic stroke in premature menopause or early menopause women has drawn attention to the role of estrogen in cardiovascular disease. In addition to experimental studies demonstrating the protective roles of estrogens in many forms of cardiovascular and cerebrovascular diseases [3][4][5], a large volume of epidemiological and observational findings also indicate that exposure to endogenous estrogen has been postulated to be protective for stroke in premenopausal women [6,7]. It is well established that the beneficial effects of estrogen on vascular system includes enhancing nitric oxide (NO) production and vascular relaxation [8,9], accelerating endothelial growth factor after vascular injury [4,10], improving serum lipid concentration [11][12][13]. Since abundant evidence demonstrates that estrogen might play an important role in the overall cardiocirculatory system, understanding the estrogen metabolic and signaling pathway in relation to vascular disease may shed light on the role of estrogen in ischemic stroke pathogenesis. In this study, we focused on 3 genes involved in steroid hormone metabolism and signaling: sulfotransferase family 1E (SULT1E1), catechol-Omethyl-transferase (COMT), and estrogen receptor a (ESR1). Sulfotransferase enzyme encoded by SULT1E1 (OMIM600043) catalyzes the sulfate conjugation of estrone (E1), 17b-estradiol (E2), catecholestrogens and 2-methoxyestradiol [14][15][16], which is a major pathway for estrogen metabolism in humans [15,17]. The human SULT1E1 gene is approximately 20 kb in length, consists of eight exons, and maps to chromosome 4q13 [18]. There are 23 polymorphisms identified in the SULT1E1 gene, and most of them are in low allele frequencies except -64G/ A in exon1 and other three in intron [19]. Recent researches further indicated that SULT1E1 -64G/A (rs3736599) in the promoter region is positively correlated with endometrial cancer risk [20,21]. COMT, encoding catechol-O-methyltransferase, is a phase II enzyme that catalyzes the inactivation of the major metabolites of estrogen [22]. A functional single nucleotide polymorphism (SNP) for COMT gene (OMIM16790), mapping to chromosome 22q11 in exon 4 (Val158Met, rs4680), has been identified the Met (A) allele is linked to a variant of the COMT gene, which results in 3-to 4-fold decreased enzyme activity of COMT [23]. Several evidences demonstrated that COMT Val158Met is significantly associated with breast cancer [24,25]. It is biologically reasonable to hypothesize that women who carry the mutant COMT Met allele may have higher risks of ischemic stroke. In addition to metabolism of estrogen, estrogens exert their effects by binding to specific estrogen receptors a (encoded by ESR1) and b (encoded by ESR2). The human ESR1 gene (OMIM133430) is located on chromosome 6q25, comprising of 8 exons and 7 introns. Considerable studies focusing on ESR1 c.4542397 T/C (rs2234693) and c.4542351 A/G (rs9340799) polymorphisms in intron 1 are most widely discussed, and the studies found that these two SNPs are associated with ischemic stroke [26,27], cardiovascular disease [28,29], and atherosclerosis [30].
Since role of estrogen in cerebrovascular pathophysiology and ischemia is an important area of ongoing investigation, to our knowledge, there is no study focusing on both the estrogen metabolism and signaling pathway genes. The present study was carried out with the aim to determine whether SULT1E1 -64G/ A, COMT Val158Met, ESR1 c.4542397 T/C and c.4542351 A/ G genes are associated with ischemic stroke of the young and to further explore the gene-gene and gene-environment interactions for young ischemic stroke patients.

Methods
The study was approved by the institutional review board of Taipei Medical University and the participated hospitals, including National Taiwan University Hospital, Shuang Ho Hospital, Chi-Mei Medical Center, Shin Kong Wu Ho-Su Memorial Hospital, Tri-Service General Hospital, Wanfang Hospital, Lotung Poh-Ai Hospital, and Taipei Veterans General Hospital. Written, informed consent was obtained from all participants and/ or their relatives.

Study Subjects
In this study, there were 305 ischemic stroke subjects aged 50 years recruited from 2005 to 2010, including 217 males and 88 females. Details of the participants' enrollment were described elsewhere [31]. In brief, this case-control study was conducted by the Formosa Stroke Genetic Consortium (FSGC) in Taiwan. FSGC is a platform for hospital collaborations on studies related to the molecular biology of cerebrovascular diseases. A standard operation manual of FSGC was established by an expert panel including 5 stroke neurologists and 3 epidemiologists after a series of consensus conferences. All staffs from the participating hospitals were trained on the standard procedure of case enrollment, including structured questionnaire and blood sample collection. All collaborating hospitals participated in the FSGC since 2005. The diagnostic criteria of stroke have been described in previous study [32]. The definition of ischemic stroke is an onset of focal neurological deficit with signs or symptoms persisting longer than 24 hours with or without acute ischemic lesion(s) on brain CT, or with acute ischemic diffusion-weighted imaging lesion(s) on MRI that corresponded to the clinical presentations. TIA is defined as a transient focal neurologic deficit of ischemic causes that resolves within 24 hours. The subtypes of ischemic stroke were categorized according to the Trial of Org 10172 in Acute Stroke Treatment (TOAST) criteria [33]. There were 2736 healthy subjects recruited as possible controls, including 1637 individuals from a communitybased prospective study of the nutrition health education program in Taipei City [34] and 1099 subjects who underwent physical examinations at TMUH during 2008-2009. All participants were recruited if they agreed to write informed consent. Among them, 53 subjects with prevalent stroke were excluded. 309 subjects were randomly selected as age-matched controls in the remaining 2683 candidates. The distribution of age, gender, education levels were similar between the selected subjects and the remaining candidates.

Data Collection and Assessments
All participants were informed to draw venous blood for biochemical test, including cholesterol (CHOL), triglyceride (TG), high-density lipoprotein cholesterol (HDL-C), fasting glucose, and estradiol level. Blood samples were collected when patients were confirmed as ischemic stroke and when controls agreed to participate in the study. Fasting serum CHOL, TG, HDL-C, and glucose concentrations were measured by an automatic analyzer (UniCel DXC 800, BeckMan). Low-density lipoprotein cholesterol (LDLC) was calculated using Friedewald formula [35]. Laboratory assay for estradiol was measured by radio immunoassay (RIA). The lower limit of quantitation of estradiol level was 2 pg/ml. Duplicate samples were included for 10% of the subjects for quality control purposes. Samples were labeled in such a way that laboratory personnel were unaware of the case-control status of the samples and the identity of the duplicates. Log-transformed was done for serum estradiol level to follow a normal distribution before executing association analyses. In this study, high or low serum estradiol level was defined based on the median level of healthy controls which were 1.68 and 1.38 for female and male, respectively. The anthropometrical measurements were assessed by trained technicians. Body mass index (BMI) was defined as the individual's body weight (kg) divided by square of their height (m 2 ). Waist-to-hip ratio (WHR) was computed using index of waist circumference divided by the hip circumference. Obesity was defined as waist circumstances §80 cm for female and §90 cm for male or BMI.27 kg/m 2 .

DNA Collection and Genotyping
Genomic DNA was extracted from EDTA-anticoagulated peripheral blood leukocytes by the phenol/chloroform method and then stored at 280uC until use. Genotyping was carried out by polymerase chain reaction (PCR) and restriction fragment length polymorphism (RFLP). Genotyping assays were performed using standard protocol in a total volume of 40 ul and 1 to 2 ng of sample DNA was used per assay. Genotyping was performed by laboratory technicians blinded to the case-control status. As a quality control, we repeated to validate the genotyping on 10% of the samples and the concordance rate for replicate samples was 100%. The overall genotyping success rates were .98%.

Statistics
The demographic and health characteristics of the study subjects were analyzed using Student's t-test and the Chi-square test. The Hardy-Weinberg equilibrium (HWE) test was assessed by a goodness-of-fit Chi-square test and was performed to examine possible genotyping error for each SNP among controls. Haplotype estimation was restricted to individuals for whom complete genotype data were available across all polymorphic sites, and the highest probability haplotypes estimated using the expectation maximization (EM) algorithm of SAS/Genetics 9.2 (SAS Institute, Cary, NC) were assigned to each study participant. Logistic regression models were used to estimate adjusted odds ratios (ORs) and 95% confidence intervals (CIs) for determining putative highrisk genotypes of each SNP for ischemic stroke. Age, gender, education level, disease history of hypertension, diabetes mellitus, and dyslipidemia, obesity, and cigarette smoking status were adjusted in the models as potential confounders. In addition to traditional multiple logistic regression model to explore high-order gene-gene interactions in susceptibility of young ischemic stroke, we also used several statistical approaches, including the multifactor dimensionality reduction (MDR) software (version 2.0 beta) and MDR-permutation testing (MDRPT) software (version 1.0 beta) and classification and regression tree (CART). The MDR method selects important combinations of variables on the basis of entropy measures for evaluating the information gain (IG) associated with attribute interactions [36]. The patterns of entropy recapitulate the main and/or interaction effect for each model. The CART analysis creates a decision tree that depicts how well each genotype variables predicts patient-control status [37]. Statistical analyses were conducted using SAS package, version 9.2 (SAS Institute, Cary, NC) and R software (version 2.15.0). All statistical tests were based on a two-sided probability.

Results
The basic characteristics of the study subjects are illustrated in Table 1. The average age and the distribution of gender were similar between ischemic stroke patients and healthy controls. Cases had higher prevalence of hypertension, diabetes mellitus, cigarette smoking and alcohol drinking behaviors than controls. Mean BMI, WHR, fasting glucose levels, LDLC, and TG were significantly higher in cases than in controls while HDLC were lower in cases compared to controls. To determine the ischemic stroke risk contribution of SULT1E1, COMT and ESR1, we examined whether the genotypic and allelic distribution of the gene differed between the 305 cases and 309 controls (301 cases and 308 controls had results of complete 4 SNPs). The frequencies of all 4 SNPs in the controls agreed with those expected under Hardy-Weinberg equilibrium, suggesting that genotyping error was relatively unlikely. The genotype and allelic analysis for the SNPs yielded no significant differences. However, the permutation test and test for trend showed that the polymorphism of COMT Val158Met was significantly associated with young ischemic stroke susceptibility among females. The results are showed in Table 2. We also found that female patients who carried COMT Met alleles had a significantly higher risk of large artery atherosclerosis (data not shown). MDR was used to analyze gene-gene interaction models in young ischemic stroke subjects. The two-to four-way gene-gene interaction models are listed in Table 3. The COMT Val158Met and SULT1E1 -64G/A exhibited the highest testing-balanced accuracy and high cross-validation consistency, especially for females. Figure 1 depicts the interaction maps of all genes based on entropy measures among individual variables for female. The strong interaction effect was found among SULT1E1 -64G/A and COMT Val158Met, which had the IG values of 2.05%. The maps for all subjects and males were shown in Figure S1.
In the CART analysis, the initial split of the root node was COMT Val158Met, indicating that COMT was the strongest risk factor for young ischemic stroke among all the SNPs. Further inspection of the classification tree structure suggested distinct interaction patterns for subjects with Val/Val genotype and Val/ Met and Met/Met genotypes. Among participants with Val/Val genotype, SULT1E1 -64G/A is the strongest risk factor, and the combination of COMT Val/Val genotype and SULT1E1 A/A genotype exhibited the highest risk of ischemic stroke with 73.5% patients rate (OR, 6.57; 95%CI, 2.55-16.94; p,0.0001) ( Figure 2 and Table 4). In addition, the logistic regression also showed that there was a two-way interaction in Table 4 (for all subjects, p = 0.0171; for female, p = 0.0473). We further analyzed the interaction effect between serum estradiol level and COMT and SULT1E1 genes on young ischemic stroke patients in Table 5.   Relative to the reference group that included subjects with high estradiol level and carried COMT Val/Val genotypes and SULT1E1 G alleles, those whose estradiol level was low and carried either COMT Met allele or SULT1E1 A/A genotype had 6.13-fold risk of ischemic stroke, showing a significant joint effect on risk of young ischemic stroke (P for interaction = 0.0174).

Discussion
To the best of our knowledge, this study is the first to examine the association between estrogen metabolism and signaling pathway genes, SULT1E1, COMT, and ESR1, and ischemic stroke of the young. The estrogen metabolism genetic polymorphism, COMT Val158Met, was significantly associated with risk of young ischemic stroke among females. Furthermore, we used multianalytic strategies to systemically examine the interaction among these genes. Using different analytic strategy, however, MDR and CART method showed consistent result that there was a strong gene-gene interaction between SULT1E1 -64G/A and COMT Val158Met on the risk of young ischemic stroke. Traditional multiple logistic regression results also showed that there was a significant interaction effect between SULT1E1 -64G/ A and COMT Val158Met for development of young ischemic stroke.
Although SULT1E1, a gene encoding an estrogen-metabolizing enzyme, may contribute to individual differences in the biotrans-formation of this steroid hormone, the relationship between SULT1E1 -64G/A with ischemic stroke was not observed in our study. Owing to low allele frequencies of the three nonsynonymous SNPs among 23 polymorphisms of SULT1E1 identified by Adjei et al. [19], we selected -64G/A located in the promoter region which might influence estrogen sulfotransferase enzyme as the candidate SNPs in our study. We also found that subjects with SULT1E1 -64 G/A AA genotype had significantly lower serum estradiol level than G carriers among healthy controls in our study, especially for females (Table S1). However, the controversial results concerning the association between SULT1E1 -64G/A polymorphism and cancers might be due to the uncertain function of this variant, which might be the reason for non-significant results found in this study [1,20,25].
COMT is an important enzyme in the degradation of both catecholamine and estrogens. A non-synonymous G to A base change, COMT Val158Met polymorphism, resulted in the reduction of COMT activity which may impair vascular health in several ways [38,39]. Several clinical diseases such as preeclampsia [40], hypertension [41,42] and heart disease [42,43] have been reported to be associated with this SNP. In addition, growing evidence supports that 2-methoxyestradiol (2-ME), a natural estrogen metabolite produced by COMT, has a potent antiproliferative and antiangiogenic capacity [38,39] and has direct involvement in redox-regulated signaling as a pro-  oxidant [44], thus it could be a possible disease mechanism in the protection against atherosclerosis development. Therefore, these abundant studies support our findings that subjects with COMT Met allele had a significant higher risk of young ischemic stroke among females after 1000 permutation tests. Estrogen influence multiple organ systems including cardiovascular, reproductive and skeletal systems by binding to specific estrogen receptors located within the nuclei of target cells [45]. Numerous epidemiological and experimental studies indicate the protective roles of estrogens in many forms of cardiovascular and cerebrovascular diseases [46,47]. However, most studies have focused on the association between ESR1 variants and cardiovascular disease with conflicting results [27][28][29][30][48][49][50][51][52], and the reason might owe to various study designs. Although our findings including genotype and haplotype analysis (Table S2 and Table  S3) reveal no statistically significant risk for ischemic stroke, a gene-environment interaction effect between ESR1 C-A haplotype and serum estradiol level on young ischemic stroke patients was observed (P for interaction = 0.0348, Table S4). The possible mechanism might be that the transcription factor, ERa, interacts directly with specific promoter sequences comprising 15-bp inverted palindromes known as estrogen response elements (EREs) located in the regulatory region of target genes via binding of 17aestradiol to their classical receptor ERa [53].
In the present study, the power and the possibility of false positives must be considered. According to a relevant range of minor allele frequencies (22-38%), a post hoc power calculation can reach to near 80% power to detect an effect size (OR) difference of 1.6 using Power and Sample Size Program (version 3.0.43) [54]. In addition, multiple testing is a major concern of this study. Genotype and allelic analysis for each of the 4 SNPs yielded no significant association with ischemic stroke of the young after the Bonferroni correction. The excessively conservative correction of the Bonferroni method might result in the decreased power; therefore, based on 10,000 random permutations, the association between risk of young ischemic stroke and COMT Val158Met among females remained significant.
A major strength of our study is that gene-gene interactions were consistently identified by both MDR and CART analysis. The results were also confirmed by logistic regression approach when controlling for confounding variables simultaneously. To improve the statistical power, the MDR method's conversion from multiple to single variable resulted in efficient identification of potential gene-gene interactions in relatively small samples [55]. In addition, the MDR also reduces the chances of making type I errors as a result of multiple testing through cross validation and permutation testing procedure. The CART analysis is a nonparametric strategy, a decision tree-based data mining to identify specific combinations of genetic factors relating to disease, which requires no assumption of a genetic model. Recent researches have suggested that utilizing multiple complementary analytic approaches can increase statistical power to identify possible genegene interactions effectively [56].
There were still some limitations in this study. First, the sample size is relatively small due to difficulty in enrollment of young ischemic stroke patients. Thus, further studies in larger populations are required to validate the findings. Second, we used a candidate approach to select SNPs focusing on the functional variants due to limited budget. However, with more advanced genome-wide association studies exploiting the genetic association study, we may have missed some signals that were not genotyped in the current study. Nevertheless, we cannot rule out the causal markers in the genes we studied. Finally, the menstrual status was acquired for some subjects when the estradiol level was measured.
Therefore the misclassification may have occurred when we included females who were in the ovulation stage in the high estradiol level group as the reference. However, this misclassification is non-differential that might dilute the odds ratio and lead to the result toward the null.
In conclusion, these data indicate that COMT Val158Met polymorphism is significantly associated with ischemic stroke risk among females and suggest that gene-gene interaction effect of SULT1E1 -64G/A and COMT Val158Met polymorphisms play more important roles than the individual factors for the development of young ischemic stroke. Moreover, lower estradiol level could increase risk of young ischemic stroke for those who carried either COMT or SULT1E1 risk genotypes.