Gender Differences in Genetic Risk Profiles for Cardiovascular Disease

Background Cardiovascular disease (CVD) incidence, complications and burden differ markedly between women and men. Although there is variation in the distribution of lifestyle factors between the genders, they do not fully explain the differences in CVD incidence and suggest the existence of gender-specific genetic risk factors. We aimed to estimate whether the genetic risk profiles of coronary heart disease (CHD), ischemic stroke and the composite end-point of CVD differ between the genders. Methodology/Principal Findings We studied in two Finnish population cohorts, using the case-cohort design the association between common variation in 46 candidate genes and CHD, ischemic stroke, CVD, and CVD-related quantitative risk factors. We analyzed men and women jointly and also conducted genotype-gender interaction analysis. Several allelic variants conferred disease risk for men and women jointly, including rs1801020 in coagulation factor XII (HR = 1.31 (1.08–1.60) for CVD, uncorrected p = 0.006 multiplicative model). Variant rs11673407 in the fucosyltransferase 3 gene was strongly associated with waist/hip ratio (uncorrected p = 0.00005) in joint analysis. In interaction analysis we found statistical evidence of variant-gender interaction conferring risk of CHD and CVD: rs3742264 in the carboxypeptidase B2 gene, p(interaction) = 0.009 for CHD, and rs2774279 in the upstream stimulatory factor 1 gene, p(interaction) = 0.007 for CHD and CVD, showed strong association in women but not in men, while rs2069840 in interleukin 6 gene, p(interaction) = 0.004 for CVD, showed strong association in men but not in women (uncorrected p-values). Also, two variants in the selenoprotein S gene conferred risk for ischemic stroke in women, p(interaction) = 0.003 and 0.007. Importantly, we identified a larger number of gender-specific effects for women than for men. Conclusions/Significance A false discovery rate analysis suggests that we may expect half of the reported findings for combined gender analysis to be true positives, while at least third of the reported genotype-gender interaction results are true positives. The asymmetry in positive findings between the genders could imply that genetic risk loci for CVD are more readily detectable in women, while for men they are more confounded by environmental/lifestyle risk factors. The possible differences in genetic risk profiles between the genders should be addressed in more detail in genetic studies of CVD, and more focus on female CVD risk is also warranted in genome-wide association studies.


Introduction
According to world statistics for 2006, cardiovascular diseases (CVD) are responsible for 30% of all deaths globally, and are the leading cause of death amongst non-communicable diseases. Cardiovascular diseases are also responsible for 10% of the global burden of disease [1]. Differences in CVD incidence, complications and burden exist between men and women. Women are afflicted with cardiovascular disease at an older age than men, and many risk variables for coronary heart disease (CHD) and stroke have different distributions in men and women [2][3][4][5][6][7]. However, the differences in lifestyle factors do not fully explain the differences in CVD incidence between the genders [2]. Genetic factors also contribute to CHD and stroke susceptibility [8][9][10]. A recent large population-based prospective study suggested that heritability of ischemic stroke was greater in women than men [11]. Some of the traditional CVD risk factors also have high heritability [10], some of which show gender differences [12]. A large scale study of CVD traits in a Sardinian population showed that for several traits in which heritability estimates differed by gender, for example weight and hip circumference, the heritability was larger among women [13]. The evidence for gender differences in trait heritabilities implies possible gender-gene interaction in the etiology of these traits [12].
The effect of genetic variables on CHD and ischemic stroke has been studied for several decades, yet there are only a few consistent risk factors identified to date [10,12,[14][15][16][17][18][19]. These genetic studies include few large scale candidate gene studies, as well as numerous smaller studies, and very recently several genome-wide association studies. Most of the large scale candidate gene studies published so far on CHD or stroke have performed combined analyses of both genders, using gender as a covariate [20][21][22][23][24][25]. In a Japanese casecontrol study of myocardial infarction, men and women were analyzed separately, and the significant results obtained for men and women were for different variants [26], indicating different genetic risk factors. In a large-scale genetic association study of the metabolic syndrome among CHD patients, McCarthy and colleagues identified several variants which displayed significant genotypegender interaction [27]. In recent genome-wide association studies of CHD [14][15][16][17]19] and ischemic stroke [28], the association results were reported for the combined study sample of both genders.
We estimated the effect of genetic variation on CHD, ischemic stroke and the composite end-point of CVD in two prospectively followed population cohorts. Our study had a case-cohort design on the FINRISK-92 and -97 cohorts participating in the MORGAM Project [29]. We selected 46 genes for study as putatively involved in cardiovascular pathobiology, based on their function, previous association with cardiovascular disease, and/or relevant phenotype in animal models. These genes represent a selected array of pathways, including lipid and energy metabolism, inflammation, coagulation, and thrombosis. We assessed the risk associated with common variation in each gene and CHD, ischemic stroke, and CVD while the cohort setting allowed us to control for classic CVD risk factors. We also assessed whether the variants affect relevant quantitative traits that are related to CVD risk: lipid and C-reactive protein (CRP) levels, blood pressure, body mass index (BMI), and waist/hip ratio (WHR). Our previous analysis of candidate genes like upstream stimulatory factor 1 (USF1) and Selenoprotein S (SEPS1, SELS, or SELENOS) mainly showed genetic effects in women [30,31]. In this study, we therefore proceeded with a formal genotype-gender interaction analysis for all variants, and show that for several of the associated variants, there is evidence for statistical interaction between gender and genotype.

FINRISK cohort description
FINRISK surveys are carried out every 5 years to assess the prevalence and risk factors of CVD in Finland [32]. Baseline information on all randomly sampled individuals includes anthropometric measurements, serum lipids, blood pressure and questionnaire data on CVD risk factors. Information on fatal and non-fatal coronary and stroke events and all-cause mortality during the follow-up period is obtained from national registers. We utilized the FINRISK-92 cohort (n = 5999) and FINRISK-97 cohort (n = 8141), which have been followed up for 10 and 7 years, respectively. On these large cohorts, we conducted a case-cohort study, as previously described in detail [29,31,[33][34][35]. The cohorts constituted respondents to surveys of independent random samples of the same geographically defined population. The resulting few overlaps were identified on the basis of personal ID codes, unique to every resident of Finland, and removed from the FINRISK-97 case-cohort set to ensure there was no overlap between the sets used for the analyses.
We initially studied the FINRISK-92 case-cohort set, which consisted of a total of 190 incident CHD cases, 66 incident ischemic stroke cases, 219 individuals with a history of either CHD or stroke event, 276 individuals who died during the follow-up, and a random sample (sub-cohort) of 398 individuals from the cohort. We also analyzed a second case-cohort set selected from the FINRISK-97 cohort, for genes associated with risk for CHD, ischemic stroke, the composite end-point of CVD or all-cause mortality, or strongly associated with quantitative traits in the FINRISK-92 case-cohort set. This sample included 210 incident CHD cases, 84 incident stroke cases, 436 individuals with a history of either CHD or stroke event, 352 individuals who died during the follow-up, and 407 sub-cohort individuals. The sub-cohort was a sex-and geographic-region stratified random sample, drawn from each of the original cohorts with unequal sampling probabilities so that the age distribution was similar to the cases. The selection procedure for the cases and the sub-cohort, and the exact diagnostic criteria used for CHD and ischemic stroke have been described in detail previously [29,31,[33][34][35]. The case-cohort sets included in this study are described in Tables 1 and 2. All participants gave informed consent. In 1992 it was not yet customary to ask for a written consent, thus only oral informed consent exists for that survey. In 1997 a written informed consent was obtained from all survey participants. The law about the National Public Health Institute of Finland gives the Institute a possibility to also use the samples from the 1992 survey for public health research. The study was approved by the Ethics Committee of the National Public Health Institute of Finland and conformed to the principles expressed in the Declaration of Helsinki.

Quality control of DNA samples
We implemented several quality control measures to minimize errors associated with DNA sample handling and DNA quality, and excluded a total of 19 samples chosen as cases or in the subcohort. These 19 individuals are not included in Table 1. A gender-specific PCR test identified a total of 9 samples (0.4%) that had a different gender than expected, and they were subsequently excluded from the study. We also verified that the DNA sample was of good quality by testing five highly polymorphic microsatellite markers for each sample. In these analyses, one sample was found to be contaminated and was excluded. DNA samples with low DNA yield (,7.5 mg of genomic DNA) as measured by fluorescent label PicoGreen (Invitrogen, Carlsbad, CA, USA) were subjected to whole genome amplification before genotyping, followed by additional quality control checks [36]. A total of five samples were excluded due to biased whole genome amplification, and a further 4 samples were excluded due to extremely low quantities of DNA which was insufficient for whole genome amplification.

Variant selection
For each gene, we aimed to genotype a set of variants that would capture the common variation present in the gene, as well as variants that have been previously associated with CVD or related traits. For the majority of the genes, haplotype-tagging single nucleotide polymorphism (SNP) variants were selected from the SeattleSNPs database (http://pga.gs.washington.edu/). The SeattleSNPs project has resequenced the genes using 24 Centre d'Etude du Polymorphisme Humain DNA samples, and tag SNPs have been selected using LDSelect, an algorithm that is based on the linkage disequilibrium (LD) statistic r 2 [37]. We selected tag SNPs from each multi-SNP bin with a frequency .10%. For genes that were not included in the SeattleSNPs sequencing project, we selected variants from public databases (Celera, dbSNP), at approximately 5 kb distance from one another, giving priority to variants with known frequency information. Once HapMap phase I data were available, we selected additional variants to better capture the common variation in these genes. More detailed information about gene cladistics, sequence and haplotype structure information was available for apolipoprotein E (APOE), lactase (LCT), and lipin 1 (LPIN1)-genes, and here variant selection was based on previously published sequencing and haplotype analysis [38][39][40][41]. A full list of the variants selected for study and successfully genotyped (see below) is provided in Table S1.

Variant genotyping
Variant genotyping was done using several genotyping platforms (Table S1). Approximately 5.5% of the genotypes were created with an in-house developed method of allele-specific primer extension on microarrays, as previously described [36]. Approximately 93.0% of the genotypes were produced with the MassARRAY System (Sequenom, San Diego, CA, USA), either with the homogeneous Mass Extension (hME) reaction or iPLEX reaction, using the protocols recommended by the manufacturer with these modifications: hME reactions were carried out with 5-7.5 ng of DNA and for the majority of the variants, the hME extension reaction was run using TERMIPol DNA polymerase (Solis Biodyne OÜ , Tartu, Estonia) [42] instead of Thermo-Sequenase (GE Healthcare Life Sciences, Chalfont St. Giles, UK). The two APOE variants that define the epsilon genotypes (rs429358 and rs7412) were genotyped on the MassARRAY with a modified protocol as previously described [43] (full protocol available from authors upon request). Three of the variants were genotyped with other platforms: rs4340 was genotyped by a PCR assay followed by separation on 2% agarose gel with ethidium bromide staining and rs28665122 and rs3216183 were genotyped with TaqMan (Applied Biosystems, Foster City, CA) [30,44]. For 100 samples where inadequate amount of genomic DNA was available, the DNA was amplified with GenomiPhi DNA amplification kit (GE Healthcare Life Sciences), as previously described [36].
Before genotyping the FINRISK case-cohort samples we genotyped all variants on 60 anonymous Finnish trio samples and 180 unrelated control samples. The FINRISK samples were genotyped in plates containing 2% negative control samples, 2% known duplicate samples, and 5% blind duplicate samples to allow assessment of genotyping quality. The disease status of each individual genotyped was unknown to the genotyping laboratory and samples from cases and sub-cohort individuals were distributed on the plates independently of the disease status. All genotypes were manually reviewed for various quality control aspects as previously described [36,42,45]. The genotyping success rate for each variant included in the analysis was .90%, with an average genotyping success of 95.3%. Among the 27,522 successful blind duplicate genotypic pairs, we detected 37 genotypic inconsistencies (99.87% concordance between genotypes). All variants included in analyses were in Hardy-Weinberg equilibrium (HWE) in the sub-cohort sample (p.0.01). A single Mendelian error was identified for 3 variants among the 60 trio samples (rs1926446, rs3212478, and rs1081106). However, since the genotypes for these variants were in HWE and no errors were detected among known and blind duplicates, these variants were included in the analysis.

Statistical analysis
Genotype frequencies in sub-cohort individuals were tested for deviation from HWE using Pearson's chi-square test statistics with 1 degree of freedom for bi-allelic variants and 3 for three-allelic  (7) 159 (6) 174 (7) 173 (7) 159 (6) 159 (6) 172 (6) 171 (7) 159 (6) 158 (7) 172 (7) 172 (7) Weight (kg)  (12) 90 (12) 100 (11) 100 (10) 89 (14) 90 (13) 100 (10) 100 (10) Hip circumference (cm) 104 (9) 104 (8) 102 (6) 102 (8) 105 (9) 106 (9) 103 (7) 103 (7) 107 (11) 105 (11) 104 (7) 103 (7) Waist to hip ratio  (18) 148 (23) 141 (20) 146 (21) 148 (23) 151 (23) 149 (20) 150 (22) 145 (22) 145 (19) 144 (19) 146 ( variants, applying a threshold of p,0.01. For variants in which one of the genotype groups had less than 5 individuals, HWE was calculated using an exact test. Allele segregation within trio families was analyzed with the PedCheck program [46]. Pair-wise LD between the variants in each gene, haplotype frequencies, and haplotype tags were assessed with Haploview software version 3.32 [47]. For variants in high LD with each other (r2.0.95), only one of the results is shown. Time-to-event analysis was used to assess whether any of the tested allelic variants have effect on the incidence of CHD, ischemic stroke, or CVD. The effects under recessive, dominant and multiplicative models of individual variants were tested using the proportional hazards regression model where the case-cohort design was taken into account by applying a modification of the Prentice weighting [48], with the non-case sub-cohort members and sub-cohort cases before events weighted with the inverses of their individual inclusion probabilities to account for the oversampling of cases [34]. Estimation of model parameters and standard errors was carried out in R statistical environment, using the coxph function of the package survival and its robust variance estimator. We adjusted for classic CVD risk factors: smoking, high density lipoprotein-cholesterol (HDL-C), non-HDL-cholesterol, history of diabetes, BMI, and hypertension, as well as geographic region (western Finland, northern Finland, and eastern Finland), and cohort (and gender for combined analysis in women and men). Age was used in the models as the time scale. We fitted two types of models. In the first model, men and women from both cohorts were analyzed jointly, as described above. In the second model, we carried out a test for genotype-gender interaction, defined as a departure from multiplicative, dominant or recessive model, using similar regression models and testing the null hypothesis of equality of genotype effect parameters between men and women. We report results in which the variant genotype specific p-value is #0.01 for either men or women. We verified that these results do not stem from a single cohort by testing the null hypothesis of equality of genotype effect parameters between FINRISK-92 and FINRISK-97 cohorts, using a similar regression model. For variants that conferred a risk at p,0.05 for CHD, we also studied the association in prevalent CHD cases (documented or self-reported myocardial infarction or unstable angina pectoris at baseline), using healthy sub-cohort subjects as controls. The analysis of prevalent cases was carried out using logistic regression, again with inverse sampling probability weighting, and using age, cohort and geographic region, and gender as covariates for the combined analysis of men and women. Analysis of haplotype effects was done for two variants of the F12 gene that were not in very high LD with each other and were both associated at p,0.01 with CHD and CVD. Haplotype analysis was done with an additive model, in which the common haplotype (containing the 'non-risk' alleles) was used as reference, and modeling an additive effect for the other haplotypes, in a weighted Cox proportional hazards model, applying the same weighting scheme and covariates that were used for single variant analysis, and using the PHREG procedure implemented in SAS version 9.1.3 SP4. Haplotype uncertainty was taken into account using multiple imputations, where a sample of haplotypes was obtained using Phase 2.1.1 software and the analysis was repeated for each sampled haplotype pair.
Additionally, we tested whether allelic variants were associated with quantitative traits measured at baseline in sub-cohort individuals without a history of CVD. The lipid variables studied were: serum total cholesterol, HDL-C, triglycerides, and low density lipoprotein-cholesterol (LDL-C). LDL-C was calculated from measured values of total cholesterol, HDL-C and triglycerides using Friedewald's formula and excluding individuals with triglyceride value .4.0 mmol/l. Additional variables studied were mean blood pressure (average of systolic and diastolic blood pressure, each value based on two subsequent measurements), high sensitivity CRP, BMI, and WHR. Association of the variants with baseline measurements was tested using standard linear regression, employing additive, dominant, and recessive models, while adjusting for cohort, age, geographic region, and gender. Tests for genotype-gender interaction, defined as a departure from additive, dominant or recessive model, were carried out using similar regression models and testing the null hypothesis of equality of genotype effect parameters between men and women. Individuals using lipid lowering medication were excluded from the analyses of lipid variables, and individuals using drugs for hypertension were excluded from the analysis of blood pressure. We used logarithmic transformation for CRP and triglycerides. We verified that the results reported do not stem from a single cohort by testing the null hypothesis of equality of genotype effect parameters between FINRISK-92 and FINRISK-97 cohorts, using a similar regression model.
For genes in which two or more variants (not in perfect LD) were associated at p,0.01 with a given quantitative trait, we also performed haplotype analysis to discern which allelic haplotype might be contributing to variation in the trait. Haplotype tagging variants were identified with the Haploview software version 3.32 using default settings. Analyses with the haplotype-tagging variants were performed with the haplo.stats package of the R statistical software [49], using the function haplo.glm with an additive model, and adjusting for age, cohort, geographic region and gender. The haplo.glm function estimates haplotype frequencies with the EM algorithm and calculates for each haplotype linear regression coefficient and p-value, comparing each haplotype to a base haplotype, defined as the most common haplotype. Rare haplotypes (frequency ,0.05) were combined with the base haplotype for this analysis. The global p-value for haplotype effect coefficients was calculated for the null hypothesis of no effect for any haplotype.
For the initial analyses of the FINRISK-92 case-cohort alone, time-to-event analyses and quantitative trait analyses were done as previously described [30,31,33], analyzing women and men both separately and together. We did not perform formal gendergenotype interaction analysis or haplotype analysis at this stage.
In reporting the findings, we used a cut-off value of 0.01 for the pvalues and reported uncorrected p-values. The cut-off value of 0.01 corresponds to posterior odds 6:1 of a finding being a true signal when we expect to see two signals among the 27 independent genes and our power is 70% (see The Wellcome Trust Case-control Consortium's 2007 paper for details) [19]. The effect of multiple testing was addressed with standard Q-Q-plots for the individual test statistics and with false discovery rate (FDR) analysis [50,51]. The tail-area FDR statistic for a group of tests can be interpreted as the expected proportion of null results given the observed test statistics. The analysis was carried out using the R package ''fdrtool'' [52]. The method used for power simulations is described in more detail elsewhere [34]. The reported results are for both cohorts combined, for tests of the null hypothesis of no genotype effects (or no genotypegender interaction) at 1% significance level. While simulating genotype-gender interaction we assumed no genotype effects for men while varying the effect for women.

Study outline
The case-cohort sets from the FINRISK-92 (10 year follow up, 57,858 person-years) and FINRISK-97 (7 year follow up, 54,577 person-years) population cohorts [31] are presented in Tables 1 and 2. The list of genes and the number of variants successfully genotyped for each gene are presented in Table 3, and detailed information on all variants is presented in Table S1. In addition to known CVD candidate genes, we explored the effect of variation in the LCT gene on CVD risk and CVD related quantitative traits, because of previous findings of reduced triglyceride and cholesterol values in individuals with lactose malabsorption [53,54]. We also studied one novel gene, apolipoprotein B mRNA editing enzyme (APOBEC2), which is located directly under a linkage peak (lod score of 4.44) for total cholesterol in our linkage study of 5775 individuals from twin families from the GenomEUtwin (www. genomeutwin.org). Individual results of the analysis for several of the genes have already been published: USF1, thrombomodulin (THBD), SEPS1, coagulation factor V (F5), protein C (PROC), and intercellular adhesion molecule 1 (ICAM1) [30,31,33,44]. We include these genes here to provide a more complete picture of the observed difference in genetic susceptibility between men and women, and because formal genotype-gender interaction analysis was not reported for any of the genes in our previous publications.
The study outline is presented in Figure 1. Initially, we studied the 46 genes in the FINRISK-92 case-cohort set. We selected for further study in the FINRISK-97 sample 27 genes in which one or more variants showed an association with CHD, ischemic stroke, CVD, total mortality, or any of the quantitative traits in the FINRISK-92 cohort, either in women or men separately, or in combined analyses. The selection criterion was 60% FDR. A total of 172 variants were thus typed also in the FINRISK-97 casecohort samples, as indicated in Table 3 and Table S1, and analyzed using the combined FINRISK-92 and FINRISK-97 case-cohort sets. Power simulations are presented in Figures S1 and S2. For time-to-event analysis ( Figure S1), the combination of the two cohorts has a 88% power to detect a dominant gene main effect on CVD risk of 1.8 in men at p = 0.01, a 39% power to detect a similar effect in women, and 96% power to detect this effect size when analyzing women and men together, given a risk allele frequency of 0.2 assuming a proportional hazards model. For a higher allele frequency the power is somewhat higher. For genegender interaction analysis, our study sample has power to detect only large differences in risk effects at p = 0.01, for example 38% power to detect a difference of HR = 1.0 versus HR = 1.8 for allele frequency of 0.4. For quantitative traits ( Figure S2), combining both cohorts provides a power of 75% for detecting a 0.3 standard deviation difference at allele frequency of 0.2 in men at p = 0.01, while the power is much lower for the smaller study sample of women. For gene-gender interaction analyses the power is .85% only for large differences in the effects, for example no effect in men and a coefficient of 0.6 in women.

Time-to-event analysis results
Analysis of both genders jointly. Time-to-event analysis was used to assess the association between variants and CHD, ischemic stroke and the composite end point of CVD. Results with p#0.01 from combined analysis of both cohorts and both genders are shown in Table 4. The estimated FDR for the set of all association tests (including tests for quantitative traits) with p#0.01 is 53%. These analyses identified variants in angiotensin II receptor type 1 (AGTR1), APOE, carboxypeptidase B2 (CPB2), and coagulation factor XII (F12) as conferring risk of CHD. The two variants of the F12 gene also conferred risk of CVD, as did one variant of fibrinogen alpha chain (FGA) gene. Haplotype analysis for the two F12 variants, rs4976691 and rs1801020, in which carriers of the specific 'risk' haplotypes (CA, CG, or GA for rs4976691 and rs1801020, respectively) were compared to individuals homozygous for the non-risk haplotype GG did not reveal stronger association with CHD or CVD than analysis of single variants. For ischemic stroke, only one SEPS1 variant, rs7178239, was associated at p#0.01 in the combined analysis of both genders, but only the women contributed to this effect (see below). The most consistent result was for CHD association with the F12 variant rs1801020 (men and women combined, p = 0.005 for additive model), which also conferred risk at the p,0.05 level for CHD in both women and men when analyzed separately. The rest of the variants showed association at p,0.05 level in only one gender. We tested whether the results were driven by only one of the cohorts by assessing genotype-cohort interaction, and observed no interaction at p,0.05, suggesting that the results are similar in both cohorts. Variant rs440446 of APOE showed association at p,0.05 also in both cohorts separately, while the rest of the variants showed association at p,0.05 in one cohort only, though a similar trend was observed in the other cohort.
Gender-genotype interactions. We performed gendergenotype interaction analysis to identify variants that showed different genetic effects in women and men. This test is sensitive to both effect direction and effect size. The variants that gave interaction p-value#0.01 and were associated with CHD, ischemic stroke or the composite end-point of CVD at p#0.01 in either women or men in combined analysis of both cohorts are presented in Figure 2 and Table S2. The estimated FDR for the set of all interaction tests with p#0.01 is 70%, but by using the additional criteria of association p-value#0.01 in at least one of the genders, the actual FDR is likely to be smaller. The gendergenotype interaction analysis supports our previous findings for USF1 and SEPS1 variants in which the disease risk was limited to women [30,31], providing a gender-genotype interaction pvalues,0.01 for the USF1 variant rs2774279 and for two SEPS1 variants, rs4965814 and rs9874. For the USF1 variant rs2774279, the results were also at p,0.05 for women in each cohort separately. Furthermore, for rs2774279 we also found evidence for association when analyzing prevalent female CHD cases in both cohorts combined (odds ratio of 1.58, 95% CI 1.04-2.40, p = 0.03). We identified variants in additional genes which showed gender-genotype interaction: CPB2 and coagulation factor XIII, A1 polypeptide (F13A1) conferred gender-specific risk in women for CHD, another variant in CPB2 conferred risk for CVD, and F5 for ischemic stroke; and for men, interleukin 6 (IL6) for CVD. The data obtained with F5 variant rs970741 is based on relatively small groups, with only 12 women incident stroke cases carrying the protective allele, and the result should be interpreted with caution. Genotype-cohort interaction analysis showed that none of the gender-specific results emerge from a strong effect in only one of the cohorts but rather both cohorts contribute to the result. For purpose of future meta-analyses, we provide data for all variants analyzed in both cohorts showing genotype-specific hazard ratios for men and women separately and number of individuals and person years in each genotype group (Tables S3a-c).
We tested which of the variants conferring a CHD risk at p ,0.05 were also associated with CHD in the prevalent cases. In addition to USF1 variant rs2774279, also the T allele of variant rs2073658 of USF1 conferred risk in both incident and prevalent female cases (HR = 1.62, 95% CI 1.04-2.52, p = 0.03 for incident cases, and odds ratio = 1.87, 95% CI 1.26-2.76, p = 0.002 for prevalent cases, additive model, T risk allele). A variant in the APOBEC2 gene, rs2395754, was associated with CHD in both prevalent and incident male cases (HR = 1.45, 95% CI 1.04-2.02, p = 0.03 for incident cases, and odds ratio = 1.43, 95% CI 1.06-1.94, p = 0.02 for prevalent cases, C allele homozygotes compared to T allele carriers). Quantitative trait analysis results We tested whether any of the 172 variants was associated with the CVD-related quantitative traits: total cholesterol, HDL-C, LDL-C, triglycerides, CRP, BMI, WHR and mean blood pressure, analyzing the sub-cohort individuals without a history of CVD at baseline examination. The results showing association in the combined data analyses of both genders at significance level of p,0.01 are shown in Table 5. The estimated FDR for the set of all association tests (including tests for time-to-event responses) with p#0.01 is 53%. We identified 3 variants displaying effect differences between the cohorts using genotype-cohort interaction analysis (interaction p-value ,0.05), and they were removed. APOE variant rs440446, conferring risk for CHD in time-to-event analysis (Table 4), was associated with triglyceride values, and FGA variant rs2070018 was associated with mean blood pressure, with heterozygotes having the highest blood pressure values. None of the other variants associated with CHD, ischemic stroke, or CVD at p#0.01 in women and men combined, was associated at p,0.01 with the quantitative traits tested here. However, we identified several interesting associations with each of the traits studied, as discussed below.
The strongest association identified for quantitative traits in the combined analysis of women and men was for fucosyltransferase 3 (FUT3) variant rs11673407 and WHR. For men the additive model gave a p-value = 0.00006; for women the association was weaker, but in the same direction (p = 0.07). Haplotype analysis for WHR in men using the FUT3 variants rs874232, rs778986, and rs11673407 identified haplotype CAG as the only one associated with WHR, compared to base haplotype TAA (p = 0.00008) ( Table S4a), suggesting that the true causal variant is not one of these 3 variants. Another strong association was found for a rare synonymous CRP variant, rs1800947, and CRP levels in men (p = 0.0001, recessive model).
The LCT variants were associated with total cholesterol and LDL-C in the combined data: The lactase non-persistence genotype (defined as minor allele homozygotes for variant rs4988235) was associated with higher cholesterol values. Similarly to FUT3 variant, the association was stronger for men (for total cholesterol, p = 0.003 and p = 0.005 for variants rs4988235 and rs6719488, respectively, and for LDL-C p = 0.002, and p = 0.0005 for variants rs4988235 and rs6719488, respectively), and in females the association was weaker but in the same direction. Haplotype analysis using the 3 haplotype-tagging variants rs2304371, rs6719488, and rs4988235 for men implied that haplotype GGG, tagged by the G allele of variant rs2304371 was the one associated with both traits, p = 0.003 for total cholesterol and p = 0.005 for LDL-C (compared to base haplotype ATA) ( Table  S4b). Sub-cohort men homozygotes for the G allele of rs2304371 have the highest LDL-C values, 4.02 mmol/l (n = 14), compared to 3.74 for GA genotype (n = 109) and 3.55 for AA genotype (n = 243), p = 0.014 for the additive model. Variants rs6719488 and rs2304371 are located in the LCT gene itself, while the lactase non-persistence variant is located at 14 kb distance upstream of the LCT gene. The LCT locus on chromosome 2q21.3 is known for being strongly selected during human evolution, with the lactase persistence allele varying in frequency in different populations and even between geographic regions [55]. We observed no differences in allele frequencies of the lactase persistence genotype in the geographic regions studied here (G allele frequency 0.46 in Western Finland and 0.44 in Eastern Finland).
Variants that showed different effects on CVD-related quantitative traits in women and men are shown in Table 6, using an interaction p-value cut off #0.01 and an association cut off p,0.01 in either women or men in combined analysis of both cohorts. The estimated FDR for the set of all interaction tests with p#0.01 is 70%, but the additional criteria of association pvalue,0.01 in at least one of the genders makes the actual FDR smaller than the upper limit of 70%. As for the disease risk, also here variants in different genes were associated with the traits in women and men. In women, variants in the fibrinogen genes (FGA and FGG) were associated with HDL-C. Interestingly, none of the genes that are in lipid pathways were associated with lipid variables in women at p,0.01. For weight-related variables, variants that showed gender-specific effect were identified only in women. USF1 variant rs2774279, which was associated with CHD and CVD risk, was also associated with BMI in women, though risk allele carriers had lower BMI. Women with the risk allele also had lower values of CRP. Three variants in ICAM1 gene associated with WHR in women. Haplotype analysis did not reveal any ICAM1 haplotypes associated more strongly with the trait than single alleles. The largest number of gender-genotype interactions was identified for CRP levels in females.
For men, the APOBEC2 variant rs2395754, which associated with CHD in both incident and prevalent cases, was also associated with cholesterol variables. Men carrying the risk allele had higher levels of LDL-C, p = 0.001. In men also a variant in the serpin peptidase inhibitor, clade E member 1 gene was associated with mean blood pressure. In addition to these findings, very few male-specific results at p,0.01 were identified, as shown in Table 6. The strongest associations with lipids for men were for variants that also showed the same trend in women, as discussed above.

Discussion
The hormonal environment as well as tissue specific gene expression is known to differ significantly between the genders in vertebrates. For many human diseases, gender-dependent differences in the progression and extent of disease have been explained by sex hormones. These hormones may differentially affect gene expression in somatic tissues, thus leading to the gender specific susceptibility to disease [56]. Also for cardiovascular disease, critical determinants of gender differences are sex steroid hormones and their receptors [57]. They interact with and activate, together with other proteins, genes that are possibly involved in CVD pathogenesis in the endothelial and smooth muscle cells [2,57]. Sex steroid hormones are also expressed in the liver and regulate lipid levels, mostly through hepatic effects on lipoprotein metabolism [57].
Although women and men differ in various aspects related to CHD and ischemic stroke [2][3][4]6,7], the difference in genetic effects on disease and its risk factors between women and men Table 4. Results with p#0.01for coronary heart disease and cardiovascular disease for the variants studied, analyzing men and women together.  remains largely unexplored territory [12]. Recent genome-wide association studies also do not address this issue [14][15][16][17]19,58]. In this candidate gene study we explored the genetic risk profiles for CHD, ischemic stroke and the composite end point of CVD in men and women, as well as the effect of the specific genetic variants on CVD-related quantitative risk factors. Our case-cohort study was based on two prospective cohorts from the relatively homogeneous Finnish population, the sub-cohort representing a random subsample of the original cohort. Detailed information on CVD risk factors recorded before the occurrence of CVD events allowed us to control for confounding factors, such as smoking, lipid levels, blood pressure and obesity, while the inclusion of two separate cohorts allowed for the verification of results. We identified variants in several genes as conferring disease risk for both men and women jointly, while other variants showed evidence for a gender-specific effect. We also identified variants that were associated with quantitative CVD risk factors in both men and women combined, and other variants that showed evidence for gender-genotype interaction. A recent review of gender differences in genetic effects has suggested three criteria for appropriately documented gender differences: (1) The genetic effect is based on the same genetic contrast in both genders; (2) Different genetic subsets in the 2 genders are not compared; and (3) Evidence for a nominally statistically significant gender-gene interaction exists [59]. Our study fulfils all these criteria for the genetic variants showing different effects in men and women. However, studies that replicate these results in larger study samples would be required to confirm or refute the gender-specific associations presented here. With pooling of information across the latest genome-wide association studies [14,16,17,19,28], there is ample opportunity to test for the presence of gender-genotype interactions behind CHD and ischemic stroke at a genomic level.
In this study, we identified variants in CPB2, F13A1 and LPIN1 as contributing to female-specific risk for CHD and/or CVD, in Figure 2. Gender-specific association between variants and coronary heart disease, ischemic stroke, and cardiovascular disease. Results for gender-genotype interaction at p,0.05, and association in either women or men were at p#0.01 (uncorrected p-values). Allele information: allele 1/allele 2, the minor allele is underlined. Multiplicative model: 11.12.22, dominant model: 11+12 vs 22, recessive model: 11 vs 12+22. Variants showing high pair-wise LD: CPB2 rs3581419 and rs3742264 (r 2 = 0.827), SEPS1 rs496581 and rs7178239 (r 2 .0.7), and SEPS1 rs9874 and rs7178239 (r 2 .0.7). Detailed information is found in Table S2. doi:10.1371/journal.pone.0003615.g002 Table 5. Results with p,0.01for associations between variants and quantitative traits as measured at baseline examination in subcohort subjects free of CVD at baseline, women and men combined. addition to a variant in USF1 which we have previously reported [31]. Other variants of USF1 have also been reported as showing significant gender-genotype interaction for triglycerides and BMI in familial combined hyperlipidemia families [60]. For ischemic stroke, we identified a variant in F5 as conferring gender-specific risk, in addition to our previously reported association between SEPS1 variants and ischemic stroke in women [30]. Importantly, we identified a larger number of gender-specific effects for women than for men. For men, only one variant in IL6 gene was associated with CVD at p#0.01 and interaction p,0.01. The asymmetry in positive results is similar to a previous large scale candidate gene study of the metabolic syndrome, in which genetic effects were stronger in women [27]. This is also consistent with the larger heritability estimates for stroke and several CVD-related traits in women [11,13]. These results suggest that genetic effects on CVD risk may be more readily detectable in women, while for men the genetic effects are more confounded by environmental/ lifestyle risk factors.
The most consistent result we identified when analyzing women and men jointly was for a variant in the F12 gene, rs1801020. This promoter variant is located in the untranslated exon 1 of the gene, and the T allele was found to be less common in patients with acute coronary syndrome compared to patients with stable coronary artery disease [61]. In our study sample, in which the A ( = T) allele was associated with risk of CHD and CVD, the study setting was very different, and therefore the results are not readily comparable. Variants in the F12 gene were not present in the Affymetrix 500K and Illumina 300K chips that have been used for the recent genome-wide association studies. The strongest association for quantitative trait variable was between WHR and an intronic variant of the FUT3 gene, rs11673407. The associated variant is not one of the four variants previously associated with Lewis blood phenotype [62] (rs778986 studied here) and which have been reported to be associated with several CVD-related risk factors [63].
Two of the genes we selected to this study, LCT and APOBEC2, have not been previously associated with molecular pathogenesis of cardiovascular disease. We found association between LCT variants and both total and LDL cholesterol. Haplotype analysis implied that the associated variants are in the LCT gene itself, and not necessarily related to the lactase persistence variant upstream of the gene. The C allele of the exonic variant rs2304371, which was associated with highest cholesterol values, is the ancestral allele, present in other mammals and located in a highly conserved region. We also found that a variant in APOBEC2 conferred risk of CHD in men and was associated with higher levels of LDL-C. APOBEC2 belongs to the cytidine deaminase superfamily, and is closely related to APOBEC1 [64]. APOBEC1 mediates the editing of apolipoprotein B mRNA [65]. APOBEC2 is expressed exclusively in heart and skeletal muscle [64], and its function is still largely unknown.
To summarize, we have identified several variants of relevant candidate genes that may confer risk of CHD, ischemic stroke or CVD and/or associate with quantitative CVD-risk factors in a gender-specific manner, and other variants which probably confer risk in both women and men. The identified disease associations and quantitative trait associations had uncorrected p-values#0.01 for both genders combined and on the basis of the FDR analysis we expect that half of the findings are true positives. For interaction analysis, we may expect that at least third of the reported results are true positives. However, the FDR analysis for the interaction analysis is conservative, because it does not account for the additional criteria we used of association p-value,0.01 for the trait itself in either men or women. Thus, we are convinced that some of the results represent a real effect of variants on disease/trait, but obviously require replication in other studies. In addition, our study had low power to detect genetic effects with HR,1.8 or coefficient,0.3, thus some of the variants we have studied that show no genetic effect might represent false negative results. The possible differences in genetic risk profiles between the genders should be addressed in more detail in genetic studies of CVD, and more focus on female CVD risk is warranted also in genome-wide association studies.   Figure S1 Power simulations for time-to-event analysis for risk allele frequencies of 0.2 and 0.4, combining both cohorts, using pvalue cut-off of 0.01 and assuming for interaction analysis (IA) no effect for men while testing different effect values for women. The lines connect different value points and are not interpolations. HR = hazard ratio. Found at: doi:10.1371/journal.pone.0003615.s005 (7.13 MB TIF) Figure S2 Power simulations for quantitative trait analysis using BMI as an example, testing risk allele frequencies of 0.2 and 0.4, combining both cohorts, using p-value cut-off of 0.01 and assuming for interaction analysis (IA) no effect for men while testing different effect values for women. The lines connect different value points and are not interpolations. Regression coefficients are given in standard deviation scale. BMI = body mass index. Found at: doi:10.1371/journal.pone.0003615.s006 (7.13 MB TIF)