Genetic Variation and Reproductive Timing: African American Women from the Population Architecture Using Genomics and Epidemiology (PAGE) Study

Age at menarche (AM) and age at natural menopause (ANM) define the boundaries of the reproductive lifespan in women. Their timing is associated with various diseases, including cancer and cardiovascular disease. Genome-wide association studies have identified several genetic variants associated with either AM or ANM in populations of largely European or Asian descent women. The extent to which these associations generalize to diverse populations remains unknown. Therefore, we sought to replicate previously reported AM and ANM findings and to identify novel AM and ANM variants using the Metabochip (n = 161,098 SNPs) in 4,159 and 1,860 African American women, respectively, in the Women’s Health Initiative (WHI) and Atherosclerosis Risk in Communities (ARIC) studies, as part of the Population Architecture using Genomics and Epidemiology (PAGE) Study. We replicated or generalized one previously identified variant for AM, rs1361108/CENPW, and two variants for ANM, rs897798/BRSK1 and rs769450/APOE, to our African American cohort. Overall, generalization of the majority of previously-identified variants for AM and ANM, including LIN28B and MCM8, was not observed in this African American sample. We identified three novel loci associated with ANM that reached significance after multiple testing correction (LDLR rs189596789, p = 5×10−08; KCNQ1 rs79972789, p = 1.9×10−07; COL4A3BP rs181686584, p = 2.9×10−07). Our most significant AM association was upstream of RSF1, a gene implicated in ovarian and breast cancers (rs11604207, p = 1.6×10−06). While most associations were identified in either AM or ANM, we did identify genes suggestively associated with both: PHACTR1 and ARHGAP42. The lack of generalization coupled with the potentially novel associations identified here emphasize the need for additional genetic discovery efforts for AM and ANM in diverse populations.


Introduction
Age at menarche (AM) and age at natural menopause (ANM) are components of the reproductive lifespan in women. Timing of these reproductive milestones is associated with various diseases and cancers such as type 2 diabetes, cardiovascular disease, endometrial and breast cancers, as well as with fertility issues [1][2][3][4][5][6][7][8][9].
Both cross-sectional and longitudinal studies have shown an overall decline in age of menarche in US girls from the 1960s to the 1990s [10][11][12][13][14][15][16]. These studies have also shown clear differences in age of sexual maturation of European Americans compared to African Americans, with African American girls attaining menarche earlier than European American girls [13]. Childhood obesity, higher in African American adolescents than other groups, has been linked to the earlier timing of menarche observed compared with European Americans [13,[17][18][19]. A genetic component for the timing of menarche has been investigated in numerous twin and large population studies, with heritability estimates ranging from 0.49 in the Fels Longitudinal Study to 0.72 in the Breakthrough Generations Study [11,[20][21][22].
Similar to the timing of age at menarche, the age at which natural menopause occurs is affected by multiple factors [23]. Active smoking is consistently associated with earlier menopause; however, the effects of exposure to other carcinogens and endocrine disruptors have not been completely elucidated [24][25][26]. Diet and obesity are also suggested to impact the timing of natural menopause [27,28]. Based on twin studies and motherdaughter pairs, the heritability of age at natural menopause has been estimated to be between 44-63% [29][30][31]. Family history of the timing of these reproductive events is a strong predictor of both AM and ANM [29][30][31].
Genetic and environmental factors that determine AM and ANM have been considered in numerous studies, but many of these studies have conflicting or unreplicated results [32][33][34]. Furthermore, the majority of these studies have been performed in cohorts of largely European or Asian descent [35][36][37][38][39]. As a result, generalization of genetic associations with AM/ANM to other race/ethnicities is lacking. A recent review has noted the absence of studies with non-European-descent ethnicities and suggests expanding future studies to include other race/ethnicities [40] to identify genetic factors that influence AM/ANM across all populations. Replication of known ANM loci identified in European-descent women has been demonstrated in Hispanic women in the Women's Health Initiative (WHI); however, to our knowledge, there has been no genome-wide association study (GWAS) or generalization study published to date on AM or ANM with an African American cohort [41].
In this study, we used data from the Metabochip genotyping array to characterize previously identified variants associated with menarche and menopause in African Americans in a combined cohort of African-American women from the Women's Health Initiative (WHI) and Atherosclerosis Risk in Communities (ARIC) studies [42] as part of the Population Architecture using Genomics and Epidemiology (PAGE) Study [43]. The Metabochip array is based on the Illumina iSelect platform and contains approximately 200,000 single nucleotide polymorphisms (SNPs) consisting of GWAS index variants and fine-mapping common and less common variants for GWAS-identified regions relevant to metabolic and cardiovascular traits [43,44]. Using current GWAS and candidate gene literature as a guide, we attempted to generalize previously identified menarche and menopause SNPs and gene regions identified in European-descent populations to African Americans in the PAGE Study. We then sought to identify novel SNPs associated with AM and/or ANM.

Study Participants
Women participants from two cohorts of the PAGE Study [42], Atherosclerosis Risk in Communities Study (ARIC) and the Women's Health Initiative (WHI), were included in these analyses. ARIC is a population-based prospective study of cardiovascular diseases and their causes in ,16,000 men and women aged 45-64 at baseline [45]. Participants were recruited in Forsyth County, N.C., Jackson, M.S., Minneapolis, M.N., and Washington County, M.D. From this group, 2,070 women, all of self-reported African American ancestry and with information on reproductive timing, were selected for study. The WHI is a long term national health study investigating the leading causes of mortality and frailty in post-menopausal women in the United States, including heart disease, breast and colorectal cancer, and osteoporotic fractures [46]. A subset of 2,455 self-reported African American women selected based on consent to use DNA and availability of DNA, blood lipids, and glucose and insulin measurements were included in this study. The appropriate institutional review board at each participating study site approved all procedures, and written informed consent was obtained from all participants.

Definition of Age at Menarche and Age at Natural Menopause
Age at menarche was defined as the age when menstrual periods started in years, with extreme values pooled in groups of 9 years or less and 17 years or older. Age at natural menopause was defined as the age at which cessation of regular menstrual periods due to the body's natural aging process occurred. In ARIC, women were asked, ''Was your menopause natural or the result of surgery or radiation?'' Only women who indicated natural menopause were included. Women in WHI who underwent hysterectomy, oophorectomy, or hormone replacement therapy before the onset of natural menopause were excluded. In both studies, women reporting age at natural menopause ,40 years were excluded; women reporting age at natural menopause .60 years were censored at age 60. All women included in the present study were post-menopausal.

Genotyping
Genotyping was performed on the Metabochip, a custom Illumina iSelect genotyping chip designed to genotype SNPs associated with metabolic traits and cardiovascular disease [43,44]. The array also includes 2,207 SNPs associated at genome-wide significance to any trait published in the NHGRI GWAS catalog as of August 1, 2009. For each of these GWAS-identified SNPs, an additional proxy SNP with r 2 .0.90 in the CEU HapMap II dataset, plus up to four additional SNPs with r 2 .0.5 in the YRI HapMapII dataset were also included on the array. Lastly, SNPs selected to fine-map regions of interest related to metabolic traits, copy number variant-tagging SNPs, Major Histocompatibility Complex (MHC) SNPs, SNPs on the X and Y chromosomes, mitochondrial DNA SNPs, and ''wildcard'' SNPs were also targeted, for a total of approximately 200,000 SNPs. Of these, 161,098 (81.9%) passed quality control filters for tests of Hardy-Weinberg Equilibrium (.1610 27 ) and genotyping efficiency (.95% call rate). There was no filter for minor allele frequency. The design and performance of this genotyping chip in this African American sample has been described in detail elsewhere [43].

Statistical Analysis
All participants self-reported African American ancestry. To adjust for potential population stratification, we used the principal components method implemented in Eigenstrat [47]. We excluded any ancestry outliers further than eight standard deviations away from the mean for the first ten principal components determined by EIGENSOFT.
Linear regression was performed assuming an additive genetic model to test for associations between individual SNPs and the outcomes of age at menarche in years. We examined two models for menarche: 1) a minimally adjusted model that accounted only for study sites and principal components, and 2) a fully adjusted model that included study site, year of birth, principal components, and body mass index at ascertainment, with the understanding that BMI at ascertainment may be a poor proxy for BMI at age of menarche. Age at menarche was self-reported many years later at time of examination, which has been shown to be fairly accurate [48]. We studied one model for natural menopause using Cox's proportional hazards for time-to-event (natural menopause) analysis, which adjusted for study site, principal components, and year of birth. Women with a missing age at menopause, an age at menopause ,40 years, or hysterectomy, oophorectomy, or hormone replacement therapy after age 40 but prior to menopause, were excluded from the study. Women who had menopause .60 years had their ANM set as censored at age 60. A fixed effects metaanalysis was then performed using METAL to obtain effect size and standard error (SE) estimates [49]. All analyses were carried out in either METAL or the R software package, and data were plotted using LocusZoom [50,51]. Statistical power to detect an expected association was estimated in Quanto [52] assuming the observed sample size and coded allele frequency in this African American cohort and the genetic effect size previously reported in the literature.
The overall goal of the study was to test for SNPs associated with AM and/or ANM using the Metabochip in African Americans from the WHI and ARIC studies. We looked to generalize to our population of African American women genes, gene regions (400 kb upstream and downstream of a gene of interest), and SNPs described in previous GWAS and candidate gene studies associated with AM and ANM. We tested all SNPs in the regions regardless of linkage disequilibrium (LD) with the index SNP, although we only considered a test of association generalized if the tested SNPs were identical to the index SNP or in strong LD with the index variant in HapMap CEU samples. For each candidate gene, we plotted results of single SNP tests of association using LocusZoom and examined regions 400 kb upstream and downstream of the gene/gene region of interest. Tests of association were considered significant for generalization at a liberal threshold of p,0.05. For previously reported variants not genotyped in our study, we identified SNPs in LD with our directly genotyped SNPs [53] and reported results from our minimally adjusted model (Model 1) for the proxy SNPs.
In addition to generalization, we sought to discover novel SNPtrait associations using the entire Metabochip. Significance in this discovery phase was defined as p,3.1610 207 , after Bonferroni correction (0.05/161,098). Because this threshold is highly conservative given the correlation among the SNPs on the Metabochip, we also defined an arbitrary suggestive significance level as p,1610 24 in the discovery phase.

Study Population
A total of 4,159 and 1,860 African American female participants met the study definitions for AM and ANM, respectively, and both PAGE studies were represented roughly equally (Table 1). In ARIC, the mean age at menarche was 12.9 years, which was slightly greater than the mean age at menarche in WHI (12.6 years) ( Table 1). For ANM, the WHI group had a slightly later average onset than ARIC (Table 1).
Overall, 11/21 (52%) SNPs previously identified for AM from earlier studies and 15/42 (36%) from the Elks et al. meta-analysis were directly genotyped or in strong (r 2 .0.70) LD in the CEU panel of HapMap with those genotyped (Table 2 and Table S1, respectively), and one generalized to this African American cohort: rs9385399, in LD with previously reported rs1361108 (r 2 = 1.00, p = 0.01) (Table S1). Representative results of tests of association and LD in this African American sample are given for CYP19A1, FTO, LIN28B, and CYP1B1genes previously associated with AM ( Figure 1) [35,36,54,60]. Three SNPs in LIN28B were included on the Metabochip (rs314277, rs4946651, and rs7759938), and while the direction of genetic effect was consistent with previous reports, all failed to reach statistical significance in this sample (p.0.30). Four additional SNPs in LD with these LIN28B SNPs were also not significant. At the 9q31 locus, rs7861820 and rs4452860, both located downstream of TMEM38B, had betas opposite to prior reports [36,37]. Neither SNP nor their proxy SNPs were significant at p,0.05. Similarly, SNPs in LD (rs1856142 and rs605765) with previously associated variants in and around FSHB were not significantly associated with AM in this African American sample, though rs605765 (b = 20.06) had the same direction of effect and comparable magnitude as rs1782507 (b = 20.07) [59]. Results obtained under our fully adjusted model (Model 2) were similar to those of Model 1 and are available in Table 2.
We also examined SNPs associated with AM that were reported in a recent meta-analysis performed by Elks et al. for the ReproGen Consortium [35]. Of the forty-two SNPs associated with AM in Elks et al., we detected an association with rs9385399 (p = 0.01), located downstream of CENPW, which is a perfect proxy (r 2 = 1.00) for previously associated variant rs1361108, and the only SNP to generalize to our African American sample. We also identified an association with rs2947411 (p = 0.02) with AM (Table S1), though the directions of effect were opposite. One additional SNP, rs4929923 (p = 0.06), nearly reached the significance threshold and had a similar magnitude and direction of effect compared with the previous report. Overall, AM SNPs from previously published studies of European-descent women, including the Elks et al. meta-analysis, did not generalize to our PAGE African American population.
Overall, 14/23 (40%) SNPs previously identified for ANM via GWAS and 6/20 SNPs from the Stolk et al. meta-analysis were directly genotyped on the Metabochip or were in strong LD (r 2 .0.70) in CEU panel of HapMap. 1/12 (8%) of the tested SNPs in these regions/genes generalized to this African American sample: rs8113016 (Table 3). Rs8113016, located in an intron of TMEM150B/TMEM224 and downstream of BRSK1, is in LD with previously reported rs897798 (r 2 = 0.72) and was associated with ANM in our sample (p = 0.03). An intronic APOE variant, rs769450, was associated with ANM (p = 0.03), though the nonsynonymous APOE rs7412 was not (p = 0.55); these SNPs are not in LD with each other (r 2 = 0.04). In BRSK1, no previously reported SNPs were genotyped in our study; however, directly genotyped intronic TMEM150B rs4806660 was in very strong LD with intronic BRSK1 rs1172822 (r 2 = 0.98). BRSK1 rs1168309, in strong LD with rs2384687 (r 2 = 0.85) was not associated with ANM in this African American sample (p = 0.59).
Three of the twenty SNPs recently identified by Stolk et al. as associated with ANM were directly genotyped on the Metabochip. Two of the three genotyped SNPs (rs2303369 and rs2153157) had the same directions of effect, though the magnitudes were smaller. Of the remaining 17 SNPs not directly targeted by the Metabochip, three were in strong LD (HapMap CEU r 2 ranging from 0.86 to 0.91) with the SNPs identified by Stolk et al: rs1176133, rs4668368, and rs12593363. For seven SNPs, no proxy SNP could be identified on the Metabochip (Table S2). Of the twenty SNPs identified in the Stolk et al. meta-analysis and directly or indirectly represented on the Metabochip, none were associated with ANM in this African American sample (Table S2).

Age at Menarche: Discovery
We tested all SNPs genotyped on the Metabochip for an association with AM adjusted for study site and principal components (Model 1) and adjusted for study site, year of birth, principal components, and body mass index (Model 2) ( Table 4). After accounting for multiple testing (p,3.1610 207 ), no SNPs were significantly associated with AM in either model (Table S3). The most significant SNP in both models was rs11604207 (Model 1: p = 1.59610 206 ; Model 2: p = 1.82610 206 ), which is located upstream of RSF1, a gene encoding a chromatin remodeling protein implicated in ovarian and breast cancers [67][68][69] (Table  S3).
Two genes were suggestively associated with both ANM and AM at a nominal significance threshold. PHACTR1 was suggestively associated with AM (rs73725617 ; Table S3) and ANM (rs117124693; Table 4). Though the direction of effects was similar for each SNP in PHACTR1, the SNPs are not in LD with each other. Likewise, SNPs in ARHGAP42, located at the 11q22.1 locus, were suggestively associated with AM (rs11224447 ; Table S3) and ANM (rs11224401; Table 4), but are not in LD with each other, though the direction of effects was the same.

Discussion
Here we demonstrated the use of the Metabochip genotyping array to identify SNPs associated with AM and ANM in a sample of African American women. Previous GWAS studies for AM and ANM have been performed in primarily European descent populations; generalization to diverse populations has largely been lacking [72]. Our study is the first, to our knowledge, to consider this trait in a large African American cohort. We were able to generalize only one previously identified variant for AM and two variants for ANM to our African American cohort [AM: rs1361108; ANM: rs897798 and rs9385399 (proxy for rs1361108)]. Overall, however, we were unable to generalize the majority of significant associations for previously identified SNPs associated with AM, including LIN28B or the 9q31 locus, or with ANM, including MCM8 or TMEM150b/TMEM224, which have recently been identified in several GWAS of European-descent women. Our inability to replicate earlier findings in our African American sample may have, in part, resulted from scant Metabochip coverage of these regions. The emphasis of the Metabochip on genes involved in lipid metabolism and cardiovascular traits is evident comparing coverage in the FTO region (1053 SNPs) to the LIN28B region (28 SNPs).
In the discovery phase of our AM analysis, none of our results reached genome-wide significance. However, the ANM analysis  yielded three associations that were significant after multiple testing corrections. Broadly, we demonstrate the ability to potentially uncover new variants associated with age at natural menopause in our African American cohort using the Metabochip. Several studies have shown relationships between a woman's reproductive milestones (AM, ANM, parity) and menstrual characteristics and risk for breast cancer, endometrial cancer, and ovarian cancer [73][74][75][76][77] and chronic diseases such as diabetes, osteoporosis and cardiovascular disease (briefly [78][79][80] ). The most significant result in the ANM analysis was a SNP located upstream of LDLR (rs189596789) which encodes a low density lipoprotein receptor implicated in familial cholesterolemia. KCNQ1 (rs79972789) also reached genome wide significance in our ANM analysis. Numerous variants in KCNQ1 have also been implicated in type 2 diabetes in several populations, though none were in linkage disequilibrium with rs79972789 [81][82][83][84][85][86]. Recently, Buber et al. evaluated the role of menopausal hormonal changes with cardiac events in women with mutations in KCNQ1 and congenital long-QT syndrome (LQTS) [87] and determined the onset of menopause was associated with an increase in the risk of cardiac events in LQTS women. Though not significant, suggestive AM associations included LPL and CYP4F22, which are associated with type 2 diabetes and lipid metabolism (rs1372339, rs4922116, rs1273516), and TMEM18 (rs2947411), associated with obesity and body mass index [88,89]. These ANM associations and suggestive AM associations with genes involved in cardiovascular function, lipid metabolism, and type 2 diabetes concur with research showing later AM lowers obesity and diabetes risk while earlier ANM increases risk for cardiovascular disease, obesity and insulin resistance [90,91].
Different pathways appear to be involved in the initiation and cessation of menses. Prior GWAS and linkage studies performed in European descent or Asian populations for AM and ANM show little concordance with specific genes (reviewed in [5]). Our analysis is consistent with this observation. Only PHACTR1 and ARHGAP42 SNPs were suggestively significant in both our AM and ANM analyses. PHACTR1 is a phosphatase and actin regulator which has been implicated in coronary artery disease [92,93]. Its role in menarche and menopause is yet to be determined. ARHGAP42, a Rho GTPase activating protein, has not yet been evaluated for a role in menarche or menopause. A GWAS identified intronic ARHGAP42 rs633185 is associated with blood pressure [94], but this variant is not in strong LD with ARHGAP42 variants suggestively associated with either AM or ANM in this study. A recent study by Lu et al., found SNPs in both TNFSF11 and TNFRSF11A significant for AM and ANM [58]. SNPs genotyped on the Metabochip were in weak LD with the reported SNPs and failed to reach significance in this African American sample. Given the role that both PHACTR1 and ARHGAP42 play in atherosclerosis, osteoporosis and the development of lactation glands in pregnancy, further investigation on the influence of these genes in AM and ANM is warranted [95,96].
The Metabochip was designed to be a cost-effective method of genotyping approximately 200,000 metabolic and cardiovascular SNPs and SNPs in other useful regions of the genome, such as the HLA region and the X and Y chromosomes. Overall, median SNP density on the Metabochip is approximately one SNP per 370 bases [43]. This coverage appears sufficient to replicate some loci associated with both cardiovascular or metabolic traits and AM/ ANM. However, we found instances of previously identified genes for AM/ANM with little/no Metabochip coverage (CYP1B1, LIN28B, ESR2, and BRSK1) which may have impacted our results. Additionally, prior studies that identified SNPs associated with AM and ANM were performed primarily in European-descent cohorts. Though our study included over 4,000 African American women, we had limited power to identify significant associations in most previously identified loci, which may explain why we failed to detect the same associations identified in European-descent   GWAS. For specific tests of association, our power was impacted by sample size and by minor allele frequencies. For example, the allele frequency for rs7861820 in this African American cohort was 0.11 compared to a higher frequency observed in HapMap CEU (0.57; Table S4). Interestingly, we were adequately powered (.98%) to generalize the intronic LIN28B SNP, rs314277, with AM in our sample, yet failed to find an association with this SNP or with SNPs in strong LD with it. Metabochip performance in non-European populations was recently evaluated in a pilot study in African American PAGE participants [43]. In this pilot study, Buyske et al. demonstrated that the majority (89%) of SNPs targeted by the Metabochip passed rigorous quality control with high call rates [43]. Using lipid traits as an example, Buyske et al. demonstrated that Metabochip data can be used to replicate known GWAS-identified SNP-trait relationships. Furthermore, the pilot study demonstrated that Metabochip data can be used to fine-map GWAS-identified regions to uncover potential novel index SNPs specific to African Americans in an established locus for that trait. Fine-mapping data for AM/ANM was not included in the Metabochip content. While we were able to use the Metabochip to identify potentially novel SNP-trait relationships for AM/ANM, additional fine-mapping efforts of other loci already implicated for these traits are needed. Furthermore, additional studies in general are warranted for diverse (non-European descent) populations using Metabochip or other arrays designed for fine-mapping. Admixture in the African American population and its associated decreased LD compared to European Americans challenge identification of trait-associated SNPs. Targeted fine mapping, such as use of the Metabochip, may be more appropriate in some circumstances than GWAS to evaluate specific SNPs and regions associated with particular traits.
Although the Metabochip was designed for genotyping of cardiovascular and metabolic SNPs, this study demonstrates the feasibility of utilizing such a targeted chip to identify SNP associations with age at menarche and age at natural menopause. We identified potentially novel associations with AM/ANM at loci implicated in cardiovascular traits, obesity and cancer. This may result from pleiotropic loci or may suggest that the AM/ANM timing mechanisms influence underlying disease process. With numerous genes implicated in both metabolic and cardiovascular phenotypes and both AM and ANM, further studies will allow us to consider how specific genes may influence the reproductive lifespan in women.