The genetic underpinnings of variation in ages at menarche and natural menopause among women from the multi-ethnic Population Architecture using Genomics and Epidemiology (PAGE) Study: A trans-ethnic meta-analysis

Current knowledge of the genetic architecture of key reproductive events across the female life course is largely based on association studies of European descent women. The relevance of known loci for age at menarche (AAM) and age at natural menopause (ANM) in diverse populations remains unclear. We investigated 32 AAM and 14 ANM previously-identified loci and sought to identify novel loci in a trans-ethnic array-wide study of 196,483 SNPs on the MetaboChip (Illumina, Inc.). A total of 45,364 women of diverse ancestries (African, Hispanic/Latina, Asian American and American Indian/Alaskan Native) in the Population Architecture using Genomics and Epidemiology (PAGE) Study were included in cross-sectional analyses of AAM and ANM. Within each study we conducted a linear regression of SNP associations with self-reported or medical record-derived AAM or ANM (in years), adjusting for birth year, population stratification, and center/region, as appropriate, and meta-analyzed results across studies using multiple meta-analytic techniques. For both AAM and ANM, we observed more directionally consistent associations with the previously reported risk alleles than expected by chance (p-valuesbinomial≤0.01). Eight densely genotyped reproductive loci generalized significantly to at least one non-European population. We identified one trans-ethnic array-wide SNP association with AAM and two significant associations with ANM, which have not been described previously. Additionally, we observed evidence of independent secondary signals at three of six AAM trans-ethnic loci. Our findings support the transferability of reproductive trait loci discovered in European women to women of other race/ethnicities and indicate the presence of additional trans-ethnic associations both at both novel and established loci. These findings suggest the benefit of including diverse populations in future studies of the genetic architecture of female growth and development.


Introduction
Age at menarche (AAM) and age at natural menopause (ANM) are important events in the reproductive lifespan of a woman.Menarche, the initiation of the female menstrual cycle, occurs at 12 years on average [1,2].In the United States (US), mean AAM is lower for African and Mexican American women, and higher for non-Hispanic women of European descent [2,3].Yet, epidemiologic data on the average AAM of Asian American, Native Hawaiian and American Indian/Alaskan Native women are generally lacking.An earlier age at menarche has been associated with early life obesity and risk for a variety of diseases including breast and endometrial cancer, diabetes, and coronary heart disease [4][5][6].
Menopause, the cessation of the menstrual cycle that signifies the end of the reproductive lifespan, occurs at 51 years on average, with the majority of women experiencing a natural onset of menopause (not surgically or drug-induced) sometime between ages 45-55 years [7].
National Heart, Lung, and Blood Institute (T32HL007055), from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (T32-HD007168), and the Carolina Population Center (P2C-HD050924).SFC was supported by the American Heart Association Go Red for Women Strategically Focused Research Network Grant 16SFRN27940007.KEN was supported by R01-DK089256; 2R01HD057194; U01HG007416; R01DK101855, and AHA grant 13GRNT16490017.The Population Architecture Using Genomics and Epidemiology (PAGE) program was funded by the National Human Genome Research Institute (NHGRI), supported by U01HG004803 (CALiCo), U01HG004798 (EAGLE), U01HG004802 (MEC), U01HG004790 (WHI), and U01HG004801 (Coordinating Center), and their respective NHGRI ARRA supplements.The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.The complete list of PAGE members can be found at PAGE website (http:// www.pagestudy.org).The data and materials included in this report result from a collaboration between the following studies: The "Epidemiologic Architecture for Genes Linked to Environment (EAGLE)" was funded through the NHGRI PAGE program (U01HG004798 and its NHGRI ARRA supplement).The dataset(s) used for the analyses described were obtained from Vanderbilt University Medical Center's BioVU which was supported by institutional funding and by the Vanderbilt CTSA grant UL1 TR000445 from NCATS/NIH.The Vanderbilt University Center for Human Genetics Research, Computational Genomics Core provided computational and/or analytical support for this work.The Multiethnic Cohort study (MEC) characterization of epidemiological architecture was funded through NHGRI (HG004802, and HG007397) and the NHGRI PAGE program (U01HG004802 and its NHGRI ARRA supplement).The MEC study was funded through the National Cancer Institute (CA164973, R37CA54281, R01 CA 063464, P01CA33619, U01CA136792, and U01CA98758).Funding support for the "Epidemiology of putative genetic variants: The Women's Health Initiative" study was provided through the NHGRI PAGE program (U01HG004790 and its NHGRI ARRA supplement).The WHI program was funded by the National Heart, Lung, and Blood Institute; NIH; and U.S. Department of Health and Human Services through contracts N01WH22110, 24152, 32100-2, 32105-6, 32108-9, 32111-13, 32115, 32118-32119, 32122, 42107-26, 42129-32, and 44221.Funding support for the Genetic Epidemiology of Causal Variants Across the Life Course (CALiCo) program was provided Similar to AAM, race/ethnicity appears to be an independent predictor of ANM in the US, with African and Mexican American women having earlier ANM, as compared to non-Hispanic women of European and Japanese descent [8,9].Epidemiologic investigations of ANM in other US racial/ethnic groups are still needed.Earlier ANM is influenced by smoking status and can confer increased risk for cardiovascular disease and osteoporosis later in life, while later ANM can increase the risk of hormone-related female cancers, such as breast and endometrial cancers [10,11].
For both AAM and ANM, population-level changes have been observed in the US over the last century, wherein the average AAM decreased [1] and the average ANM has increased [12].These trends may reflect the population-level shifts in the race/ethnicities of females living currently in the US, secular trends in obesity or smoking prevalence, or other environmental conditions supportive of a longer average female reproductive lifespan.
Given the racial/ethnic differences in AAM and ANM in the US, there remains significant interest in identifying the genetic factors that influence the timing of these reproductive events in diverse populations.Numerous candidate gene and genome-wide association studies (GWAS) have been performed for AAM and ANM, and as a result, more than 360 and 40 loci have been associated with AAM and ANM, respectively [13][14][15][16][17][18][19][20][21][22][23].Although the vast majority of these studies have included only women of European descent in their discovery and validation samples, more recent GWAS have begun to include women of African (up to ~18,000 women) and East Asian ancestry (up to ~16,000 women), but have not discovered any additional loci [24][25][26][27][28]. Recent generalizability studies have also begun to include these populations as well as Hispanic/Latina, Native Hawaiian, and American Indian/Alaskan Native descent women [29][30][31] to more fully describe the transferability and allele frequency heterogeneity of these established AAM and ANM loci, as well as to discover novel race/ethnic-specific loci.
Recently developed methods for trans-ethnic meta-analysis now allow researchers to combine several populations, while accounting for heterogeneity between racial/ethnic groups [32,33].Previous genetic epidemiologic research indicates that trans-ethnic meta-analyses improve the power to discover variants of low and moderate effect sizes and may reveal allelic heterogeneity at known genetic loci [17,26,27].Additionally, trans-ethnic approaches may help narrow the interval of interest around loci discovered in European-descent populations.The Population Architecture using Genomics and Epidemiology (PAGE) Study, a consortium of ancestrally diverse genetic studies from the US, is well-positioned to investigate the genetics of complex traits within a trans-ethnic context [34].
Herein, we sought to analyze the roughly 200,000 SNPs genotyped on the MetaboChip (Illumina, Inc., San Diego, CA, USA), a high-density genotyping array of primarily cardiometabolic loci [35], for association with reproductive milestones in the ancestrally diverse study participants of the PAGE Study [34].Given the known overlap between the genetic underpinnings of AAM, and related cardiometabolic traits [22], the MetaboChip provides a densely genotyped resource to search for novel reproductive associations and broadly investigate the overlap of cardiometabolic and reproductive traits.Using race/ethnicity-stratified metaanalyses (20,398 African American, 15,856 Hispanic/Latina, 8,572 Asian American, and 538 American Indian/Alaskan Native women) and a trans-ethnic modified random-effects metaanalysis of up to 42,826 ancestrally diverse women, we sought to (i) establish how many index AAM and ANM SNPs, previously described in European-descent populations, also generalize to diverse racial/ethnic groups of women in the PAGE Study, (ii) their trans-ethnic localization, and (iii) to identify novel AAM or ANM associations on the MetaboChip.

Study participants and phenotyping
The PAGE Study was designed to generalize and estimate common genetic effects across multiple ancestral populations [34].Briefly, the first phase of the PAGE study was comprised of a coordinating center, four large study sites/consortia [Causal Variants Across the Life Course .We provide a detailed description of each study included in this analysis in our S1 Text.The datasets generated as part of the PAGE study can be accessed through the dbGaP repository (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000356).All studies in this analysis obtained Institutional Review Board approval and written informed consent from all participants, with the exception of EAGLE BioVU, which obtained Institutional Review Board approval to follow an opt-out consent process, described in detail separately [36,37].
Self-reported AAM (onset of first menses) and ANM (cessation of regular menses) in years were collected by questionnaire or via medical record [38].AAM and ANM were harmonized across the studies, as reported previously [30].Detailed descriptions of the pseudo-continuous coding and outlier exclusions are provided in the S2 Text.

Genotyping and imputation
The custom Illumina, Inc. iSELECT array, MetaboChip, genotyped 196,483 autosomal SNPs including the high-density genotyping of 257 regions associated with cardiometabolic traits as of 2009 [35].As described in S2 Text, for three studies MetaboChip SNPs were imputed [MEC SIGMA, BioME, WHI African Americans [39]].Additionally, we excluded SNPs with low minor allele frequencies (MAF), <0.1%, or that had deviations from Hardy-Weinberg Equilibrium (HWE), p-value<1x10 -6 .Additional information on the specific implementation of HWE filtering and other SNP-level quality control procedures is provided in the S2 Text.
Forty-six index SNPs had been previously associated with either AAM or ANM (or if unavailable on the MetaboChip, a proxy SNP r 2 !0.8 in 1000 Genomes CEU sample) and represented distinct genetic loci (r 2 <0.2) (S1 Table ).These SNPs included all two of the known AAM and five of the known ANM loci as of when the MetaboChip was designed, including the two strongest and most widely-generalizable AAM and ANM signals to date (LIN28B and MCM8) [22,23].Additionally, seven of the previously-associated SNPs were located within six densely-genotyped loci (S2 Table ) that were associated with AAM or ANM after the initial design of the MetaboChip.
S2 Text provides additional information on the following person-level exclusions.Briefly, we identified and excluded individuals with high inbreeding coefficients, F > 0.15 [40], and either excluded one woman of each 1st degree relative pair [41], or modeled relatedness using generalized estimating equations [42] and linear mixed models [43].We generated principal components using Eigensoft for each study [44,45] and excluded ancestral outliers [46].We in collaboration with MESA investigators.Support for MESA is provided by contracts HHSN268201500003I, N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420, UL1-TR-001881, and DK063491.MetaboChip genotyping data was supported in part by grants and contracts R01HL98077, N02-HL-64278, HL071205, UL1TR001881, DK063491, RD831697, and P50 ES015915.Additional support was provided by MESA Family, which is funded by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071258, R01HL071259, by the National Center for Research Resources, Grant UL1RR033176, and the National Center for Advancing Translational Sciences, Grant UL1TR001881.Although the research described in this manuscript has been funded in part by the United States Environmental Protection Agency through RD831697 to the University of Washington, it has not been subjected to the Agency's required peer and policy review and therefore does not necessarily reflect the views of the Agency and no official endorsement should be inferred.Assistance with phenotype harmonization, SNP selection and annotation, data cleaning, data management, integration and dissemination, and general study coordination was provided by the PAGE Coordinating Center (U01HG004801-01 and its NHGRI ARRA supplement).The National Institutes of Mental Health also contributed support for the Coordinating Center.Contributions made by JRM to this manuscript were performed while as a graduate student at Vanderbilt University.Write InSciTe, LLC, did not play a role in the study design and data analysis of this work.Write InSciTe provided support to JRM in the form of salary at the time of manuscript preparation and revision, but did not play any other role in the study design, data collection and analysis, decisions to publish or about the specific content of the manuscript.Analytic contributions made by JMJ to this manuscript were performed as a Postdoctoral Research Fellow at the Icahn School of Medicine at Mount Sinai.Illumina Inc. provided support in the form of salary for JMJ at the time of manuscript revisions, but Illumina did not play a role in the study design, data collection and analysis, decision to publish or the preparation of the manuscript.The specific roles of these authors are articulated in the 'author contributions' section.NHGRI collaborators (LH) assisted in the study design, analysis, and preparation of the manuscript.All other funders had no role in study design, data excluded samples with phenotype-genotype sex discordance, low person-level call rate (<95%), or excessive heterozygosity.
Only study-and race/ethnic-specific sample sizes of 50 women or more were carried forward to statistical analyses and summarized descriptively in S3 and S4 Tables.Collectively, the studies represent 20,398 African American, 15,856 Hispanic/Latina, 8,572 Asian American, and 538 American Indian/Alaskan Native women who provided informed consent (or in the case of BioVU, did not opt out from the approved research), and had complete information on genetics, reproductive phenotypes and covariates.

Statistical modeling and analyses
Within each study the MetaboChip SNP and reproductive trait associations were modeled under an additive genetic model and adjusted for birth year, principal components, or if applicable, also center and Type 2 Diabetes case/control status.Within each racial/ethnic group we then implemented fixed-effect inverse-variance weighted meta-analyses using METAL (version from 2011-03-25) [47], to estimate race/ethnic-specific effects for those SNPs that were informed by more than half of the maximum race/ethnic sample (n = 123,493-157,710).Additional information on our data visualization and post-hoc power analyses are provided in the S1 Text.
Array-wide significance for novel SNPs was defined as a Bonferroni p-value<2.5x10 - to account for the total number of autosomal SNPs on the MetaboChip (n = 196,483).We concluded that the observed association was directionally consistent, if the trait-decreasing allele of the trans-ethnic analysis was the same as the trait-decreasing allele of previous report(s).Furthermore, using a binomial distribution we tested (p-values binomial <0.05) if we observed more directional consistency than we would expect by chance (i.e.assuming that 50% of all tests would be consistent by chance alone).Generalization of previously described reproductive loci at index SNPs (or their proxies) to our samples was declared if 1) our estimate was directionally consistent with the previous reports, and 2) the SNP association had a p-value<0.0016for AAM or p-value<0.0036for ANM, corresponding to a Bonferroni correction for the number of independent AAM (n = 32) and ANM (n = 14) loci tested.SNPs located within densely genotyped reproductive loci, which were not index SNPs or their proxies, were considered to be significant if their association was less than a p-value Bonferroni-corrected for the number of independent signals within the given locus (independent signals pruned to r 2 <0.2 in ARIC African Americans), resulting in p-value thresholds ranging from 9.0x10 -5 to 2.8x10 -4 (S4 Table ).Within each densely genotyped locus, statistically significant race/ethnic-specific lead SNPs (i.e.those with the lowest p-values in the locus) were considered to be potentially independent of the index SNP and warranting of additional conditional analyses, if they were in moderate to low linkage disequilibrium, LD (r 2 <0.5 in 1000 Genomes CEU sample).
At each of the densely genotyped reproductive loci, publicly available reference samples from 1000 Genomes were utilized to estimate the number of SNPs (and their base pair locations) in the European (CEU), African (YRI), Hispanic/Latino (MXL, PUR, CLM), and Asian (JPT) reference populations that are in high LD (r 2 !0.8) with the previously reported AAM and ANM index SNPs.The percentage reduction in the putative interval of interest was then calculated by contrasting the populations with the smallest and largest LD blocks associated with these index SNPs (S5 Table ).

Modified random-effects meta-analysis
Trans-ethnic meta-analyses of AAM and ANM were conducted using a modified randomeffects meta-analysis of study/race-ethnic-specific results, as implemented in Metasoft by Han collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: Two coauthors were affiliated with commercial organizations (JRM at Write InSciTe, LLC, and JMJ at the Genotyping Arrays Division at Illumina, Inc.) at the time of the completion of this manuscript.These affiliations did not alter our adherence to PLOS ONE policies on sharing and data materials.Therefore, the coauthors collectively have declared that no competing interests exist.
and Eskin, which applies a likelihood ratio test to allow the existence of heterogeneity to be dependent on the hypothesis of association-either the alternative (random-effects) or the null hypotheses (a fixed null effect) [48].We excluded American Indian/Alaskan Native women from the trans-ethnic meta-analyses, due to their relatively small sample size as compared to the combined trans-ethnic sample of the other racial/ethnic groups (1% for both AAM and ANM).For SNP-associations with more than half of the maximum trans-ethnic sample size for the specific trait, we estimated modified random-effects up to 22 AAM and 23 ANM study subsamples of African, Hispanic/Latina and Asian ancestry.
Secondary signal analysis.Next, we tested for the presence of statistically significant secondary signals using an approximate conditional method in Genome-wide Complex Trait Analysis (GCTA, version 64) [49,50] and using the same trans-ethnic reference samples as above to estimate trans-ethnic LD patterns.Adjusting for the significant lead trans-ethnic SNP at each locus, we contrasted the unconditional and approximate conditional p-values of the SNPs within the region.If an unconditional SNP association was suggestive (p-value<0.05) and not heterogeneous across race/ethnic groups in the trans-ethnic modified random-effect analysis, but became array-wide or Bonferroni-significant after adjusting for the lead SNP in the region, we concluded that this was evidence for a secondary signal in the region.This approach was repeated until no additional significant conditional SNP associations arose.

The epidemiology of ages at menarche and natural menopause
Our final analytic samples were comprised of 44,367 and 17,100 women with AAM and ANM information from four broad racial/ethnic groups (Table 1).The biobank studies (EAGLE BioVU, BioME) and HCHS/SOL represented a wide range of ages (S3 and S4 Tables).The median age was lower and the median birth year more recent in the AAM samples, than in the ANM samples.In both the AAM and ANM analytic samples, the obesity prevalence at examination was the highest in African American women and lowest in the East Asian women (47 versus 10% weighted prevalence).MEC Native Hawaiian, Hispanic/Latina, and WHI American Indian/Alaskan Native women had intermediate obesity prevalence estimates (37-45%).In the ANM analysis samples, the prevalence of current cigarette smoking at examination was the highest in the Native Hawaiian (20%) and African American women and lowest in other Asian samples of women (14% versus 6% weighted prevalence).American Indian/Alaskan Native and Hispanic/Latina women had intermediate prevalence estimates of smoking (10-11%).

Generalization of previously reported reproductive trait associations
In our trans-ethnic AAM analyses in women of African, Hispanic/Latina and Asian descent, we generalized the association at LIN28B with AAM at array-wide significance (S1 Fig) .Even though genotyping in the region is sparse on the MetaboChip, the strongest SNP association in the region (rs7759938) was a previously published European descent index SNP [13,17] and was directionally consistent with the previously reported risk allele (T).This SNP association was significant in the African, Hispanic/Latina, and Asian American samples after adjusting for the number of independent loci tested with AAM, and directionally consistent in American Indian/Alaskan Native women (Table 2).
In addition, we observed Bonferroni-significant evidence of generalization to diverse racial/ ethnic groups at two other AAM loci.The index/proxy AAM SNPs at NUCKS1 and TMEM38B were most strongly associated in the Hispanic/Latina subsample, and were also significant in the trans-ethnic meta-analysis and directionally consistent with the previously reported risk allele in all race/ethnic groups (Table 2).Ã When the index SNP was not genotyped on the MetaboChip, the proxy SNP in tight linkage disequilibrium (r 2 > = 0.8 in 1000 Genomes pilot 1 CEU) with the lowest p-value in the African American sample was chose to represent the index signal.If more than one SNP represented the same locus (within 500kb of each other) on the MetaboChip, only the SNPs r2<0.2 in ARIC African Americans (or HCHS/SOL Hispanic/Latinos, when missing) were included in this table allowing preference for index SNPs, and in most cases SNPs from multiple citations.The decreasing (coded) allele and previous effect size for proxies were assigned assuming that the risk index SNP would have a similar allele frequency (either minor or major) and effect as the selected proxy SNP.
ÃÃ Modified random-effects trans-ethnic meta-analysis across three racial/ethnic groups (African, Hispanic/Latina, and Asian Americans).American Indian/Alaskan Native samples were not included due to their relative small sample size.In our trans-ethnic ANM analyses, we observed evidence of diverse generalization at two ANM loci after accounting for the number of independent loci tested with ANM.The proxy SNP at BRSK1 and the selected index SNP at MCM8 were significantly associated in Hispanic/ Latinas (trait-decreasing allele frequency, TDAF, 34% and 83%; p-value<3x10 -3 ; Table 2).The BRSK1 and MCM8 associations were not significant in the trans-ethnic meta-analysis or directionally consistent with the trait-decreasing allele across all race/ethnic groups.
The first index SNP at MCM8, [rs236114 [19]] was in moderate LD (1000 Genomes AMR r 2 = 0.36) with another SNP, rs16991615 [15,18,23], which was both Bonferroni significantly associated in Hispanic/Latinas (TDAF 94%, p-value = 7.14x10 -6 ) in the trans-ethnic sample (TDAF 97%, p-value = 1.91x10 -6 ; S2 Fig) and associated with ANM in a directionally-consistent manner across all race/ethnic groups.Yet within the 9 Hispanic/Latina studies this SNP association exhibited evidence of effect heterogeneity (p-value heterogeneity = 3.43x10 -4 ), which in S3 Fig appeared to be driven by ANM-increasing effects among the MEC and MEC SIGMA Type 2 Diabetes cases, which was inconsistent with the previously reported ANM reducing allele and with the observed direction of effect in the WHI Native American/American Indian subsample.Although this apparent effect heterogeneity could be due to chance, it is also possible it could reflect differences in relevant pre-menopausal environments/health statuses of these specific MEC subsamples (e.g.gene-environment interactions).Additionally, approximate conditional analyses revealed that the significant association between rs16991615 and ANM at MCM8 may be independent from rs236114 in our sample of Hispanic/Latinas (p-value conditional = 6.2x10 -4 ).
Next, across all 32 AAM and 14 ANM loci on the MetaboChip, we assessed the directional consistency between our race/ethnic-specific and trans-ethnic results and previously reported risk-associated alleles (S1 Table ).The number of directionally-consistent SNP associations with AAM exceeded our expectation in all race/ethnic groups (p-values binomial <0.01) and trans-ethnic results (p-values binomial = 1.2x10 -6 ), with the exception of American Indian/ Native American women (p-value binomial = 0.11).For ANM the number of directionally consistent SNP-associations also exceeded our expectation based on chance in all race/ethnic (p-value binomial 0.03) and trans-ethnic results (p-value binomial = 0.01), with the exception of African American women (p-value binomial = 0.18).

Generalization at densely genotyped reproductive trait loci
Three of the six densely genotyped ANM loci, SEC16B, BDNF and FTO, generalized to the trans-ethnic sample at a lead SNP that was in moderate LD with at least one previously reported index SNP for AAM and ANM (r 2 >0.2;Table 3 and Fig 1).At SEC16B, the lead Hispanic/Latina SNP (rs78368018-A; MAF 0.3%) was significant after Bonferroni correction and also in moderate LD with the index SNP (rs633715-C; trans-ethnic MAF 14.9%).However, the lead SNP had nominal evidence of effect heterogeneity across three studies of Hispanic/Latinas (p-value heterogeneity = 0.03, Table 2) that was driven by a subsample of the MEC (S4 Fig) .Patterns of LD at BDNF and FTO revealed that the AAM signal aligned more closely with the primary BMI signal than with other independent signals for BMI previously reported at these loci [51,52].At four additional, albeit non-significant, densely genotyped loci, LD patterns revealed that our lead AAM or ANM SNPs were dependent on the previously reported index SNPs (S2 Table ; Lastly, we harnessed publicly available information on the LD blocks tagged by the index SNPs for AAM and ANM to inform narrowing of the putative interval around the loci that generalized to Hispanic/Latinas.Specifically, we found that the percent reduction in the base pair interval of interest (based on the location of SNPs in strong LD, r 2 !0.8, with the index SNP of interest) was 77-96% across five AAM loci (SEC16B, TRIM66, BDNF, GPRC5B, FTO; S5 Table ) or, in the case of TMEM18 this approach pointed to one other SNP (rs7559547).For the ANM signal at FNDC4, the percent reduction was less dramatic (28% reduction) in the base pair interval of interest.The largest LD blocks were found in the 1000 Genomes CEU, whereas the smallest LD blocks were noted in either YRI or AMR reference populations.

Trans-ethnic array-wide associations
In our trans-ethnic meta-analyses, we observed evidence of array-wide (p-value<2.5x10 - ) novel associations with AAM at CUX2, and with ANM at FRMD5 and GPRC5B.The lead SNPs at these three loci were all highly variable across studies but were on average low frequency SNPs (Table 4), and in weak LD with most SNPs in the region (r 2 <0.2;Fig 2A -2C).As shown in S6 Fig, the estimated effect for each novel SNP was strongest in Hispanic/Latinas than the other racial/ethnic groups, and in the case of GPRC5B showed evidence of heterogeneity among Hispanic/Latinas (Table 4).The lead SNPs were observed in predominantly one ancestral group, such as African (rs76455660 at CUX2, MAF = 1.7% 1000 Genomes AFR; rs184476190 at GPRC5B, MAF = 0.9% AFR) and Asian ancestries (rs116961834 at FRMD5, MAF = 6.6% in 1000 Genomes EAS), and several other race/ethnic samples were filtered out due to low frequency (MAF<0.1%),which yielded analytic sample sizes 61-75% of the total trans-ethnic sample for the given trait.Ã Bonferroni correction for the number of SNPs r2<0.2 in region in MetaboChip data from the ARIC African Americans (n = 1419 males, n = 2332 females), using a 50-SNP window and shifting the window in each iteration by 5 SNPs.
ÃÃ Strongest SNP marker in modified random-effects trans-ethnic meta-analysis across three race/ethnic groups (African, Hispanic/Latina, and Asian Americans).At SEC16B, variation at the frequency of the SNP representing the possible AAM secondary signal (rs114548967-G the decreasing allele; p-value conditional = 2.18x10 -4 ) was driven by African and Hispanic/Latina ancestries and varied between from 0.3% to 18.6% across the 12 samples contributing to the trans-ethnic meta-analysis.In the case of BDNF, this Bonferroni-significant secondary signal (rs113940328-C; p-value conditional = 3.90x10 -5 ) was monomorphic in 1000 Genomes EUR, but varied in MAF from 0.4% to 7.4% across the 15 samples contributing to the trans-ethnic meta-analysis.This secondary SNP was independent from the previously established BMI primary and secondary signals (r 2 <0.01 with other SNPs in AFR) [51,52].At CUX2 the array-wide significant secondary signal (rs10849931-C; p-value conditional = 1.05x10 -7 ) was in weak LD with a previously described SNP for coronary artery disease (rs886126, r 2 = 0.5 in 1000 Genomes EUR) [53] and weakly associated with the other previously described trait associations in the region (r 2 0.2).However, unlike the primary signal at this region (rs76455660-T, which is monomorphic in 1000 Genomes EUR) the lead conditional SNP was present in all race/ethnic groups, with remarkable variation in allele frequency across the 21 ancestry and study-specific groups analyzed jointly (9.3% to 62.3%).

Discussion
Our trans-ethnic meta-analysis of reproductive traits has expanded our understanding of the transferability of reproductive trait loci discovered in women of European descent to other race/ethnic groups.First, we observed more directionally consistent trans-ethnic associations than we expected by chance across all 32 AAM and 14 ANM loci on the MetaboChip (p-values binomial of 1.2x10 -6 and 0.01, respectively).Second, we generalized six AAM loci (NUCKS1, LIN28B TMEM38; SEC16B, BDNF, FTO), and two ANM loci (BRSK1, MCM8) to African, Hispanic/Latina, Asian or American Indian/Alaskan Native women, observing at each locus directional consistency between our trans-ethnic risk alleles and previous reports among European descent women [13,15,[17][18][19].This suggests that much of the currently Ã Strongest SNP marker in modified random-effects trans-ethnic meta-analysis informed by studies from up to three race/ethnic groups (African, Hispanic/Latina, and Asian American women).P-value represented a modified Han and Eskin p-value.Race/ethnic fixed-effect estimates and p-values of heterogeneity are shown for illustrative purposes, as the modified random-effects meta-analysis was run on studies separately.known genetic architecture for AAM and ANM appears to be transferable across ancestrally diverse racial/ethnic groups.Additionally, we conducted an array-wide analysis of AAM and ANM, and identified array-wide significant SNPs at three novel loci (AAM: CUX2; ANM: FRMD5, GPRC5B), which were most frequent in populations with African and Asian ancestry (CUX2, GPRC5B; and FRMD5, respectively).Even though previous studies have associated variation in CUX2 with Type 1 diabetes, coronary artery disease, atrial fibrillation and most recently with AAM [22,[53][54][55], our novel signals appear to be distinct and infrequent in populations of European descent.According to HaploReg v4.1, the novel SNPs at CUX2 and FRMD5 are both predicted to be enhancers in the brain, which is consistent with the modulatory effect that neurotransmitter and gonadal hormones may have on each other [56].A SNP in FRMD5 has been previously associated with triglycerides [57], but it is 6kb downstream and in weak LD with our SNP in 1000 Genomes CHB+JPT (r 2 = 0.01).Common genetic variation near GPRC5B has been previously associated with both BMI and AAM [20,51,52], but has not previously been associated with ANM.In HaploReg, the lead SNP intronic to GPRC5B was a predicted histone mark, promoter and/or enhancer in brain and ovary tissue, as well as having DNAase activity in ovary and several other tissues and binding affinity to neuron-restrictive silencer factor.
Our results, however, are limited by the imbalance in sample sizes available for the AAM and ANM analyses, and the relatively low proportion of established AAM and ANM loci from studies of European descent women to date available on the MetaboChip (9% and 32%, respectively).Although our exclusion of extreme values restricted our analytic sample size, it allowed us to report on the common genetic causes of normal variation in AAM and ANM.Additionally, our use of the MetaboChip as a common dense-genotyping array to all PAGE studies, and with consistent genotype calling and quality control applied to each study, is a key strength of this study.Nonetheless our ascertainment of AAM and ANM did rely primarily on self-report.Lastly, the available sample size of racial/ethnic minority women who experienced these reproductive milestones is only a fraction of the samples of European descent women published on previously [22].Due to relative paucity of genetic data on minority women, we did not have sufficient sample sizes to achieve statistical power to identify genetic associations of low frequency variants (MAF<5%) in most of our analyses, to seek independent replication of our novel significant findings (Table 4), or to systematically explore the role of gene-environment interactions on our findings.Yet, our observation of enrichment of directional consistency (S1 Table ) suggests that given sufficient power or more comprehensive genotyping arrays, additional AAM and ANM loci may be significantly associated with reproductive traits in minority women.Larger samples of diverse women are needed to investigate all currently known AAM and ANM loci, establish statistical significance and describe the magnitude of the novel genetic effects on reproductive traits with more precision.
Our findings advance our current understanding of the scope of race/ethnic groups, to which previously reported reproductive traits may be generalized, albeit often at another SNP within the previous association signal.Specifically, we were able to generalize the widelyreplicated LIN28B association (e.g. in African, Hispanic/Latina and East Asian studies) [24,26,28,31] to a more diverse group of Asian ancestries including Native Hawaiian women from the MEC (p-value = 1.0x10 -10 ), as well as to American Indian/Alaskan Native women from WHI, albeit at nominal significance (p-value = 0.01).We also extended evidence of the NUCKS1 association with AAM beyond women of European descent to a trans-ethnic sample of women for the first time.Although all race/ethnic groups in our study had effects that were directionally consistent with previous reports at NUCKS1, only African American and Hispanic/Latina women were nominally associated with AAM (p-value 0.02; Table 2).
Even though several studies have previously generalized reproductive trait associations at TMEM38B (AAM), BRSK1 and MCM8 (ANM) to African, Hispanic/Latina and East Asian ancestries [24][25][26]28,31], our study is the first to investigate heterogeneity within and across populations with distinct ancestries.As illustrated by our heterogeneous findings at MCM8 (S3 Fig), the role of within group heterogeneity should be investigated in future studies of populations with European admixture, like Hispanic/Latinos [25].Similar to previous work, we noted that MCM8 also replicated in our sample of Hispanic/Latinas [31], even though it did not generalize to any other racial/ethnic group.This finding and the generalization of several other loci to Hispanic/Latinas may be due to their European admixture [58], or perhaps a less similar genetic architecture of reproductive traits between European and the other race/ethnic groups analyzed herein.Using the densely genotyped regions of the MetaboChip, we also demonstrated how diverse samples can help identify potential independent signals and putative variants/regions of interest for future functional follow up (S5 Table ).For example, we also observed evidence that the MCM8 region may harbor two independent signals for ANM in Hispanic/Latinas [23].Yet, the role of ancestral differences or environmental exposures/interactions in the observed findings warrants further research [25].
Lastly, we also observed that our Bonferroni-significant AAM associations were in moderate to strong LD (r 2 >0.2;Fig 1 ) with the previously reported putative variants of the primary association signals at BDNF and FTO (Fig 1B and 1C).Previously, secondary signals have not been described at SEC16B for BMI [52], and the secondary signal we observe with AAM at BDNF appears to be distinct from BMI secondary signals based on our transethnic LD estimates (r 2 <0.2;Fig 1A).These findings suggest that the study of AAM may yield additional insights into the genetic architecture of growth and development than studying BMI alone.The co-localization of genetic signals further supports the shared genetic underpinnings of early life growth and development in both females and males using various methodologies [59][60][61][62][63].A recent study highlighted the extent of overlapping genetic loci involved in these interrelated traits, observing that the genetics of age at first birth positively correlated with the genetics of birth weight, AAM and age at voice breaking, and negatively with the genetics of smoking, BMI and ANM [64].Even though we did not data necessary to disentangle the genetic effects on AAM and BMI in early and late life in this current PAGE Study, an increasing body of work suggests that early life growth can influence both puberty and downstream cardiometabolic consequences [22].Future trans-ethnic research should leverage longitudinal data or casual inference methods, when attempting to further decompose the complex relationship between the genetics of growth and development across the life course.

Conclusions
Our study is the first trans-ethnic analysis of female reproductive traits to our knowledge.Future trans-ethnic meta-analyses should include large, diverse samples with dense genotyping 1) to fine-map the reproductive trait association signals described herein, 2) to examine the joint role of functional genetic variants and environmental risk factors, and 3) to describe genetic risk factors for extreme AAM or ANM and predict their effects on the reproductive windows of women of diverse race/ethnic groups.Our findings provide support for the relevance of multiple reproductive loci to racially/ethnically diverse groups of women, and the presence of a complex genetic architecture underpinning female growth and development across the life course.
through the NHGRI PAGE program (U01HG004803 and its NHGRI ARRA supplement).The following studies contributed to this manuscript and were funded by the following agencies: The Atherosclerosis Risk in Communities Study (ARIC) was carried out as a collaborative study supported by National Heart, Lung, and Blood Institute contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C), R01HL087641, R01HL59367 and R01HL086694; National Human Genome Research Institute contract U01HG004402; and National Institutes of Health contract HHSN268200625226C.Infrastructure partly supported by Grant Number UL1RR025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research.The Coronary Artery Risk Development in Young Adults Study (CARDIA) is supported by contracts HHSN268201300025C, HHSN268201300026C, HHSN268201300027C, HHSN268201300028C, HHSN268201300029C, and HHSN268200900041C from the National Heart, Lung, and Blood Institute (NHLBI), the Intramural Research Program of the National Institute on Aging (NIA), and an intra agency agreement between NIA and NHLBI (AG0005).The Hispanic Community Health Study/Study of Latinos (SOL) was carried out as a collaborative study supported by contracts from the National Heart, Lung, and Blood Institute (NHLBI) to the University of North Carolina (N01-HC65233), University of Miami (N01-HC65234), Albert Einstein College of Medicine (N01-HC65235), Northwestern University (N01-HC65236), and San Diego State University (N01-HC65237).Additional support was provided by 1R01DK101855-01 and 13GRNT16490017.The following Institutes/Centers/Offices contributed to the HCHS/SOL through a transfer of funds to the NHLBI: National Center on Minority Health and Health Disparities, the National Institute of Deafness and Other Communications Disorders, the National Institute of Dental and Craniofacial Research, the National Institute of Diabetes and Digestive and Kidney Diseases, the National Institute of Neurological Disorders and Stroke, and the Office of Dietary Supplements.The Mount Sinai BioMe Biobank was supported by The Andrea and Charles Bronfman Philanthropies.The Hypertension Genetic Epidemiology Network (HyperGEN) study was supported by National Heart, Lung, and Blood Institute contracts HL086694 and HL055673.MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) (CALiCo) Consortium, including the Atherosclerosis Risk in Communities (ARIC) Study, Coronary Artery Risk Development in Young Adults (CARDIA), the Hispanic Community Health Study/Study of Latinos (HCHS/SOL); Epidemiologic Architecture for Genes Linked to Environment (EAGLE)-accessing the Vanderbilt University Medical Center's biorepository (BioVU); Multiethnic Cohort (MEC); the Women's Health Initiative (WHI)], and additional collaborating studies [The Hypertension Genetic Epidemiology Network (HyperGEN) Study, the MEC-Slim Initiative in Genomic Medicine for the Americas Type 2 Diabetes Consortium (MEC-SIGMA), Multi-Ethnic Study of Atherosclerosis (MESA), and Mount Sinai School of Medicine BioBank (BioME)] Abbreviations: AfA = African American, AI/AN = American Indian/Alaskan Native, AsA = Asian American, BP = Base pair, Chr = Chromosome, Freq = Frequency for coded decreasing allele, GWAS = Genome-wide association study, H/L = Hispanic/Latina, Pop = Racial/ethnic group or trans-ethnic analysis, TE = Trans-ethnic modified random effects, MA = Minor Allele, N = Sample Size, SE = Standard Error, SNP = Single nucleotide polymorphism.https://doi.org/10.1371/journal.pone.0200486.t002 Significant SNP-associations below specific Bonferroni p-values for a given locus are shown in bold.Nominally significant heterogeneity p-values (p<0.05)shown in italics.Trans-ethnic P-value represented a modified Han and Eskin p-value.All SNPs are oriented on positive strand and positions based on Build 37. Abbreviations: AfA = African American, AsA = Asian American, BP = Base pair, Chr = Chromosome, Freq = Frequency for coded decreasing allele, GWAS = Genomewide association study, H/L = Hispanic/Latina, Pop = Racial/ethnic group or trans-ethnic analysis, TE = Trans-ethnic modified random effects, MA = Minor Allele, N = Sample Size, Ref = Reference, SE = Standard Error, SNP = Single nucleotide polymorphism.https://doi.org/10.1371/journal.pone.0200486.t003Secondary signal analysis.As shown in S7 Fig, three AAM loci (SEC16B, BDNF, CUX2) had suggestive evidence of secondary signals, which were in low LD (r 2 <0.2) with the primary AAM signal observed in our unconditional trans-ethnic analyses.

Fig 2 .
Fig 2. Regional plots of the novel array-wide significant age at menarche (Panel A: CUX2) and natural menopause loci (Panels B,C: FRMD5, GPRC5B) using a modified random-effects trans-ethnic meta-analysis of more than 31,000 women, and showing independence from previously published cardiometabolic SNP associations (shown in gray if missing).https://doi.org/10.1371/journal.pone.0200486.g002