Genome-Wide Association Study of Parity in Bangladeshi Women

Human fertility is a complex trait determined by gene-environment interactions in which genetic factors represent a significant component. To better understand inter-individual variability in fertility, we performed one of the first genome-wide association studies (GWAS) of common fertility phenotypes, lifetime number of pregnancies and number of children in a developing country population. The fertility phenotype data and DNA samples were obtained at baseline recruitment from individuals participating in a large prospective cohort study in Bangladesh. GWAS analyses of fertility phenotypes were conducted among 1,686 married women. One SNP on chromosome 4 was non-significantly associated with number of children at P <10-7 and number of pregnancies at P <10-6. This SNP is located in a region without a gene within 1 Mb. One SNP on chromosome 6 was non-significantly associated with extreme number of children at P <10-6. The closest gene to this SNP is HDGFL1, a hepatoma-derived growth factor. When we excluded hormonal contraceptive users, a SNP on chromosome 5 was non-significantly associated at P <10-5 for number of children and number of pregnancies. This SNP is located near C5orf64, an open reading frame, and ZSWIM6, a zinc ion binding gene. We also estimated the heritability of these phenotypes from our genotype data using GCTA (Genome-wide Complex Trait Analysis) for number of children (hg 2 = 0.149, SE = 0.24, p-value = 0.265) and number of pregnancies (hg 2 = 0.007, SE = 0.22, p-value = 0.487). Our genome-wide association study and heritability estimates of number of pregnancies and number of children in Bangladesh did not confer strong evidence of common variants for parity variation. However, our results suggest that future studies may want to consider the role of 3 notable SNPs in their analysis.

Human fertility is a complex trait determined by gene-environment interactions in which genetic factors represent a significant component. To better understand inter-individual variability in fertility, we performed one of the first genome-wide association studies (GWAS) of common fertility phenotypes, lifetime number of pregnancies and number of children in a developing country population. The fertility phenotype data and DNA samples were obtained at baseline recruitment from individuals participating in a large prospective cohort study in Bangladesh. GWAS analyses of fertility phenotypes were conducted among 1,686 married women. One SNP on chromosome 4 was non-significantly associated with number of children at P <10 -7 and number of pregnancies at P <10 -6 . This SNP is located in a region without a gene within 1 Mb. One SNP on chromosome 6 was non-significantly associated with extreme number of children at P <10 -6 . The closest gene to this SNP is HDGFL1, a hepatoma-derived growth factor. When we excluded hormonal contraceptive users, a SNP on chromosome 5 was non-significantly associated at P <10 -5 for number of children and number of pregnancies. This SNP is located near C5orf64, an open reading frame, and ZSWIM6, a zinc ion binding gene. We also estimated the heritability of these phenotypes from our genotype data using GCTA (Genome-wide Complex Trait Analysis) for number of children (h g 2 = 0.149, SE = 0.24, p-value = 0.265) and number of pregnancies (h g 2 = 0.007, SE = 0.22, p-value = 0.487). Our genome-wide association study and heritability estimates of number of pregnancies and number of children in Bangladesh did not confer strong evidence of common variants for parity variation. However, our results suggest that future studies may want to consider the role of 3 notable SNPs in their analysis.

Introduction
Fertility i.e., the number of children that a woman will have in her life, is a relevant aspect of health, for individual family planning, clinical care, and for understanding long-term population growth. Challenges to fertility can arise from genetic abnormalities, in addition to infectious or environmental agents, delayed childbearing, behavior, and certain diseases [1].
Relative to the birth rate in developed countries, the fertility rate in Bangladesh has been historically high, largely because of a pattern of early childbearing and socioeconomic factors [2][3][4][5][6][7]. Following a decline in fertility in Bangladesh in the late 1970s and 1980s from 6.3 to 3.4 births per woman, the fertility rate plateaued at 3.3 for about ten years during the 1990s and then resumed its decline during the early 2000s [2][3][4][5][6][7]. According to the 2011 Bangladesh Demographic and Health Survey, the total fertility rate for Bangladeshi women aged 15-49 years is 2.7 births per woman, including a rate of 2.4 in urban areas and 2.7 in rural areas. Human fertility is a complex trait determined by gene-environment interactions in which genetic factors represent a significant component [8][9]. Studies on fertility candidate genes have identified alleles involved in the occurrence of reproductive system diseases causing infertility or subfertility [10][11][12][13]. For example, in recent years, a new physiological role in human fertility regulation has emerged for the tumor-suppressor p53 gene (P53) as the P53 Arg72Pro polymorphism has been associated with recurrent implantation failure in humans [14]. However, a significant proportion of human infertility remains unexplained. The emerging evidence indicates that fertility genes represent a set of genes whose role is changing, and acquiring clinical relevance, possibly due to their interaction with the prevalent environmental context, such as changing reproductive patterns (e.g., birth control, family planning, delayed childbearing, and spacing birth order). Although some polymorphic variants (alleles) have been identified in fertility genes, how these alleles affect human fertility and interact with the reproductive environment remains unclear [15][16].
To better understand inter-individual variability in fertility, we performed one of the first genome-wide association study (GWAS) of common female fertility phenotypes in humans in a large rural population of married Bangladeshi women who were not selected for impaired fertility. The study population has not altered their reproductive behavior as the western women. Although family planning campaigns have been implemented in rural Bangladesh the average fertility is still significantly higher in this population than in middle-and high-income countries.

Study description
The DNA samples genotyped in this study were obtained at baseline recruitment from individuals participating in the Health Effects of Arsenic Longitudinal Study (HEALS) [17]. GWAS analyses of fertility phenotypes were conducted using questionnaire responses and single nucleotide polymorphism (SNP) data on 1,691 women randomly selected from the HEALS study. HEALS [17] is a prospective investigation of health outcomes associated with arsenic exposure through drinking water in a cohort of adults in Araihazar, Bangladesh, a rural area southeast of the capital city, Dhaka. Between October 2000 and May 2002, we recruited healthy married individuals (age 18-75 years) who were residents of the study area for at least five years and primarily consumed drinking water from a local well. We enumerated 65,876 individuals residing in Araihazar, from which using a population-based sampling frame, 11,746 men and women were enrolled. During 2006-2008, additional recruitment of 8,287 participants using the same methodologies from the same underlying source population expanded the cohort size to over 20,000 individuals. A subset of women married 2 years or longer in the cohort with GWAS data were included in this analysis (N = 1,686).
At baseline, trained study physicians conducted in-person interviews and clinical evaluations and collected spot urine and blood samples from participants in their homes using structured protocols. The questionnaire collected extensive information on demographic and lifestyle factors [17]. This included age, television ownership, medication use (including hormonal contraception use), smoking status, study stage, and enrollment year. Arsenic exposure was also assessed in the study population using two primary measures, drinking water and urinary arsenic concentrations [17].
The study protocol was approved by the Institutional Review Boards of The University of Chicago, Columbia University, and the Bangladesh Medical Research Council. Informed consent was obtained from all participants. Because of the prevalence of illiteracy in our study population, consent was obtained verbally and documented by study physician. The consent procedure was approved by the three Institutional Review Boards who provide oversight for this project.

Fertility phenotypes
The fertility phenotypes, including number of children and number of pregnancies, were measured during the baseline in-person interviews. Only women were asked to report on the number of living children and the number of pregnancies. The number of pregnancies variable included deaths, abortions and stillbirths, whereas the children variable only reflected number of live children. Although correlated at 0.99, both fertility phenotypes were included in the analysis as they could measure different fertility associated issues. We created an additional variable, 6 versus 1 children, to assess the extremes of fertility from the interview data. Hormonal contraception reflects current use of any type (oral, injectable, etc.) of hormonal contraception.
Genotyping and quality control DNA was extracted from clot blood using Flexigene DNA kit (Cat # 51204) from Qiagen. Concentration and quality of all extracted DNA were checked by Nanodrop 1000. As starting material, 250 ng of DNA was used and genotyping was conducted using Illumina HumanCytoSNP-12 v2.1 chips with 299,140 markers. Standard Illumina protocol was used for scanning the chips on the BeadArray Reader and processing the image data in BeadStudio software to generate genotype calls.
Standard GWAS QC was conducted among the 299,140 SNPs. SNPs were excluded with (1) poor call rates (<95%), (2) monomorphic SNPS, and (3) HWE p-values <10 -10 . This QC resulted in individuals with high-quality genotype data for 259,597 SNPs. Imputation was performed using MaCH on the basis of the HapMap 3 Gujarati Indians in Houston (GIH) population (Build 36). We also implemented the following QC exclusion criteria for SNPs post-imputation: (1) MAF <0.005 and (2) SNP imputation score <0.3. Both genotyped and imputed SNPs were included in these analyses, which yielded 1,211,988 million SNPs after QC procedures. All QC was performed using PLINK [18].

Statistical methods
Despite population-based sampling, the study population, being stable with geography-defined residence, included some related individuals. Standard GWAS analysis methods are not appropriate for related individuals due to lack of independence of genotypes. The EIGENSTRAT program revealed that all subjects in this Bangladeshi sample were clustered together and could not be assigned into subgroups, indicating that there was no significant population stratification within the sample [19]. EMMAX (Efficient Mixed Model Association Expedited) was also used to account for cryptic relatedness. EMMAX was originally developed to account for known relationships among highly inbred mouse strains, has recently been shown to reliably account for related sample structure, including relationships among cases [20]. EMMAX uses a variance component method that utilizes an empirically estimated relatedness matrix to model the correlation between genotypes of all cases and controls [19]. EMMAX, therefore, supplies the statistical control necessary to remove excess inflation in test statistics due to the known non-independence of cases and the possible non-independence of controls. EMMAX performs a linear mixed model regression for each marker that includes the previously estimated relatedness matrix as a covariate. EMMAX reports the Armitage trend test score for each marker and associated p value [21]. We consider SNPs to be genome-wide significant if the significance exceeds a P-value of 5 × 10 −8 [22]. However, we present findings for SNPs that reached P <10 -5 to consider whether our strongest associations are consistent with previous studies [23]. We show QQ plots of the p values resulting from the analysis of all SNPs in an EMMAX transformed distribution of Χ 1 2 random variables plotted against 1 million randomly generated Χ 1 2 observations. The QQ plot demonstrates close agreement between the two distributions, indicating that population stratification and cryptic relatedness are adequately controlled. Furthermore, the inflation parameter calculated with the transformed p values and the randomly generated observations is 1.03, which is also within the generally accepted limit (i.e. less than 1.05) [24]. The final models were adjusted for age, study stage, TV ownership, hormonal contraceptive use, enrollment year, in addition to being adjusted for relatedness using the EMMAX procedure described above. We did not adjust for urinary arsenic exposure or smoking status in our final models as adjustment did not result in a substantive change. We also conducted a sensitivity analysis, excluding women who report current use of hormonal contraceptives. The models exclduing hormonal contraceptive users were adjusted for age, study stage, TV ownership, enrollment year, and relatedness.
For the number of children and pregnancy phenotypes, we estimated heritability using Genome-wide Complex Trait Analysis software (GCTA) [25]. Similar to GEMMA, this software estimates a relatedness matrix based on the pairwise genetic covariance. We estimated h g 2 , the amount of variance in the trait that was explained by the interrogated SNPs. For h g 2 , GCTA fits the given phenotype with a linear mixed model, while it uses the estimated relatedness matrix as the variance term, to estimate the variance explained by all SNPs.

Results
The characteristics of the 1,686 women included in this analysis are summarized in Table 1. They averaged 34.7 years in age and a body mass index of 20.1. All women in the study population were married. The average number of pregnancies and children per woman was 3.1 (slightly higher than the reported national average for rural women). 438 (26.0%) women reported use of hormonal contraceptives.

Genome-Wide Association Analysis
The quantile-quantile (Q-Q) plots showing the distribution of p-values for number of pregnancies and number of children in the overall study population (including hormonal contraceptive users) is presented in Fig. 1A. The observed p-values for number of pregnancies matched the expected values over the range of 1.0<-log10(P)<4.0. The very slight departure was observed at the extreme tail (-log10(P)>4.5) of the distribution of test statistics for number of pregnancies, suggesting that the associations identified are likely due to true variants rather than potential biases such as genotyping error, sample relatedness, or potential population stratification. Similar findings were observed for the Q-Q plot of number of children and extreme number of children ( Fig. 1B and 1C) where the observed p-values for number of children matched the expected values over the range of 1.0<-log10(p)<4.0 and the departure was observed at the extreme tail (-log10(P)>4.5) of the distribution of test statistics.
We also present our top SNPs in Table 2 excluding hormonal contraceptive users. The most strongly associated SNP for both number of children (P = 5.24E-07) and number of pregnancies (P = 8.88E-07) was rs696888. Rs696888 is located near C5orf64, an open reading frame, and ZSWIM6, a zinc ion binding gene (Fig. 3). The top SNP identified in the overall study population (rs100009124) has the second strongest association for number of children (5.24E-07) and reached P<10 -5 for number of pregnancies ( Table 2).

Heritability
We estimated heritability for the parity traits by estimating h g 2 , which is the upper limit of the amount of phenotypic variance we could have expected to explain with our GWAS. The estimated the heritability of these phenotypes from our genotype data for number of children was h g 2 = 0.149 (SE = 0.24, p-value = 0.265) and the estimated heritability for number of pregnancies was h g 2 = 0.007 (SE = 0.22, p-value = 0.487). We subsequently did not find evidence that the SNPs interrogated by our genotyping and imputation were associated with the phenotypic variation in the continuous parity-related traits of this study population.

Discussion
In one of the first investigations of this hypothesis using a GWAS approach, we identified a borderline significant association for rs100009124 and number of children when corrected for multiple comparisons in our analysis. The same SNP was the top (but non-significant) association for number of pregnancies. This SNP appears to be in a gene desert on chromosome 4. When we excluded hormonal contraceptive users, the association with rs100009124 remained a top (but non-significant) association for number of children and number of pregnancies, but the strongest association for both the number of children and number of pregnancies was in rs696888, a SNP on chromosome 5 located near C5orf64 and ZSWIM6. In the analysis of extreme number of children, the SNP on chromosome 6 that was the top (but non-significant) association appears to be closest to the gene HDGFL1.
Consideration of the nearest gene and gene function for the top SNPs identified in this analysis was limited. The only borderline significant SNP identified in this project is located in a region without a gene within 1 Mb-a gene desert. However, there is non-coding RNA (LINC00613) within *100kb. To our knowledge, there is no previous report of this SNP in the fertility literature. For the analyses excluding hormonal contraception users, the genes of interest appeared to be C5orf64 (an open reading frame) and ZSWIM6 (a zinc ion binding gene). Neither of these genes has been indicated in the fertility literature previously. However, the gene near the SNP associated with extreme number of children is HDGFL1, a hepatoma-derived growth factor. This gene has been reported previously to be associated with mild intellectual disability in an investigation of the genetic factors in father-daughter incest [23]. At present, there are a modest number of mutations, particularly in genes expressed in the hypothalamus, pituitary, gonads, and outflow tract are known to cause infertility in humans [24]. Most of these disrupt normal puberty and subsequently cause infertility. However, in our dataset, none of the known genes involved in fertility appeared to play a role. In GWAS investigation of fertility in animal studies, particularly in cows and bulls, investigators have reported top SNPs located close to or in the middle of genes with functions related to male fertility, such as the sperm acrosome reaction, chromatin remodeling during the spermatogenesis, and the meiotic process during male germ cell maturation [26][27]. However, we did not collect information on children or pregnancies for males in our study population and our findings are subsequently restricted to females.
This study was conducted in a cohort recruited originally to understand the health effects of arsenic. There is evidence in that arsenic exposure alters female reproductive physiology [28]. However, when we considered arsenic exposure in our parity models, we did not observe an effect. In addition, a limitation of this GWAS of parity is that information on known reproductive issues was not elicited. We also focus on our analysis on females as fertility information was not collected from males, but it is possible that male fertility impacts the ability to reproduce as well. The collection of additional fertility related data would enhance the suitability of the parity phenotypes reported here for GWAS analyses.
As investigation of this hypothesis is novel, the findings presented here offer some insight into genetic variation and parity. Although our null GWAS doesn't rule out substantial heritability as it's possible that many very weak effects could account for parity variation, overall, our findings are consistent with a low heritability of phenotypic variability for parity. In summary, our genome-wide association study of number of pregnancies and number of children in Bangladesh did not confer strong evidence of common variants for parity variation. However, our results suggest that future studies may want to consider the role of specific SNPs on chromosomes 4, 5 and 6 in their analysis.