Systematic Evaluation of Genetic Variants for Polycystic Ovary Syndrome in a Chinese Population

To date, eleven genome-wide significant (GWS) loci (P < 5×10−8) for polycystic ovary syndrome (PCOS) have been identified through genome-wide association studies (GWAS). Some of the risk loci have been selected for replications and validated in multiple ethnicities, however, few previous studies investigated all loci. Scanning all the GWAS variants would demonstrate a more informative profile of variance they explained. Thus, we analyzed all the 17 single nucleotide polymorphisms (SNPs) mapping to the 11 GWAS loci in an independent sample set of 800 Chinese subjects with PCOS and 1110 healthy controls systematically. Variants of rs3802457 in C9orf3 locus (P = 5.99×10−4) and rs13405728 in LHCGR locus (P = 3.73×10−4) were significantly associated with PCOS after the strict Bonferroni correction in our data set. The further haplotype analysis indicated that in the block of C9orf3 gene (rs4385527 and rs3802457), GA haplotype played a protective role in PCOS (8.7 vs 5.0, P = 9.85×10−6, OR = 0.548, 95%CI = 0.418–0.717), while GG haplotype was found suffering from an extraordinarily increased risk of PCOS (73.6% vs79.2%, P = 3.41×10−5, OR = 1.394, 95%CI = 1.191–1.632). Moreover, the directions of effects for all SNPs were consistent with previous GWAS reports (P = 1.53×10−5). Polygenic score analysis demonstrated that these 17 SNPs have a significant capacity on predicting case-control status in our samples (P = 7.17×10−9), meanwhile all these gathered 17 SNPs explained about 2.40% of variance. Our findings supported that C9orf3 and LHCGR loci variants were vital susceptibility of PCOS.


Introduction
Polycystic ovary syndrome (PCOS) is a complex metabolic and endocrine disorder in reproductive-age women with a prevalence of approximately 5%-10% [1,2]. The syndrome is defined by clinical or biochemical hyperandrogenism (HA), oligomenorrhea/amenorrhea (O) and polycystic ovaries (PCO) on ultrasonography [3]. It is associated with obesity, infertility and metabolic complications including impaired glucose tolerance (IGT), insulin resistance (IR) and dyslipidemia etc. Moreover, it is also common with increased risk of endometrial cancer, type 2 diabetes (T2D) and other cardiovascular diseases [4,5,6] which leading to the detrimental impact on women's health.
Despite the pathogenesis of the disorder has not been completely elucidated yet, previous epidemiologic studies have suggested that PCOS have a strong genetic background [7]. Two genome-wide association study (GWAS) conducted in Chinese Han population indicated that common variants located in 11 genomic areas (the first GWAS: THADA, LHCGR and DENND1A loci; the second GWAS: FSHR, C9orf3, INSR, HMGA2, YAP1, RAB5B/SUOX, TOX3 and SUMO1P1 loci) were associated with PCOS [8,9]. And several studies in European ancestry cohorts provided further evidence for association with variants from LHCGR, FSHR, THADA, YAP1 and DENND1A loci and PCOS [10,11,12,13]. Moreover, another recent followup study replicated four of the PCOS susceptibility loci (DENND1A, THADA, FSHR and INSR) in a cohort of European population, and the risk score analysis indicated the vital role in the etiology of PCOS across ethnicities for the susceptibility loci identified in the Chinese GWAS [14].
Obviously, these prior GWAS and follow-up studies have improved our understanding of the pathophysiology of PCOS. However, to our knowledge, a systematic study, to fully determine the effects of the previous reported genome-wide significant variants in an independent sample, is rarely conducted. Here, we sought to replicate association with all the genetic variants identified in the two Chinese GWAS in an independent Han Chinese cohort study, which would present a more informative profile of the previous reported variants. Additionally, a previous study suggested that some of these variants might be correlated with some phenotypes (such as hypersecretion of testosterone and luteinizing hormone, insulin resistance, and etc.) in PCOS patients [15]. Thus, in this study, we also investigate the correlations between the phenotypes of PCOS and the PCOS susceptibility variants.

Subjects
The cohort study consisted of 800 cases with PCOS and 1,110 normal controls. All the PCOS women were recruited at the Department of Obstetrics and Gynecology of the First Affiliated Hospital of Anhui Medical University, and the healthy controls data were collected from hospitals (physical examination centers) and community surveys. Medical-history and basic character profile (origin, age, height and weight) were obtained by clinical visit and surveys. All the subjected were of central Han Chinese origin. All the PCOS cases were diagnosed by at least two gynecologists according to the Revised 2003 Consensus on Diagnostic Criteria [3], and thus met any two of the following three criteria: oligo-or anovulation, clinical and/or biochemical signs of hyperandrogenism and polycystic ovary morphology. The body mass index (BMI) was calculated using the following formula: weight (kg)/height (m) 2 . For the PCOS patients, the levels of hormones containing follicular-stimulating hormone (FSH), luteinizing hormone (LH), total testosterone (T) and prolactin (PRL) were measured by chemiluminescence immunoassays. And 75-g oral glucose tolerance tests (OGTT) were conducted in the cases. Plasma glucose levels at 0 min and 2 hours after OGTT were measured using the oxidase method and insulin levels were measured by chemiluminescence immunoassays. The homeostasis model assessment of insulin resistance (HOMA-IR) was derived by the calculation: fasting plasma glucose (mmol/l) Ã fasting insulin (mIU/ml)/22.5. Clinical characteristics are displayed in Table 1.
This study was conducted in accordance with the tenets of the Declaration of Helsinki and its Amendments Approval was received from the Ethics Committee of Anhui Medical University. After providing a complete description of the study to the subjects, written informed consent was obtained.

Genotyping
The 17 previously reported independent single nucleotide polymorphisms (SNPs) from 11 loci were genotyped in this study. Genotyping was performed using the Ligase Detection Reaction-Polymerase Chain Reaction (LDR-PCR) method. For each plate, one sample was randomly selected to be genotyped twice, and all the non-missing genotypes were consistent. The successful rate of genotyping for all SNPs was over 95%, and the genotype distributions in both cases and controls obeyed Hardy-Weinberg equilibrium (HWE).

Statistical analysis
HWE analysis was conducted adopting PLINK [16], and a P value of > 0.05 was considered obeying HWE. We analyzed the association between the 17 SNPs and PCOS using additive logistic regression model. In order to eliminate the potential effect of BMI, BMI was considered as a covariate for adjustment. The haplotype analyses of THADA, FSHR, C9orf3, DENND1A genes were also performed with SHEsis software, available online http://analysis.bio-x.cn/ myAnalysis.php. The haplotypes were generated using expectation-maximization algorithm. Frequencies of the different haplotypes were compared with Chi Square analysis. Given the prior evidence of association with PCOS for the SNPs, adjustment for multiple comparisons was finished adopting Bonferroni correction. After correction, P value < 2.9 × 10 −3 was considered significant. Linear regression analyses with BMI covariate were used to test association between the SNPs and PCOS phenotypes (hormones levels: T, FSH, LH and RPL; and glucose homeostasis: fasting glucose, 2-hour postprandial glucose, fasting insulin, 2-hour insulin and HOMA-IR). All phenotypes with abnormal distributions were logarithmically transformed. The analyses were also carried out using SNPTEST [17]. A Bonferroni corrected P value of 1.47 × 10 −3 (0.05/34; accounting for 17 SNPs against 2 trait categories: hormones and glucose homeostasis) was considered statistically significant in genotype-phenotype analyses.
Polygenic scoring analysis was conducted as listed below: the risk-profiles of the 17 SNPs from the previous reports were selected to generate scores using PLINK "-score" function. For each individual, the sum across SNPs of the number of reference alleles (0,1 or 2) at that SNP multiplied by the score for that SNP was calculated, and then the average score per nonmissing SNP was generated. The case-control status was predicted by logistic regression analysis of polygenic scores. Nagelkerke R2 showed an estimate of the variance explained. We conducted sign tests using the binomial distribution, comparing the direction of ORs of the 17 SNPs between current study and previous GWAS reporters. P value was generated under the null hypothesis (H0: p = 0.50).

The Basic Information of the Study Population and the Selected SNPs
The clinical characteristics of the PCOS patients and controls have presented in Table 1. Average ages were 26.5 years in the cases and 26.0 years in the controls, respectively. Mean BMI was 23.5 in the former group, as well as 20.4 in the latter. The hormone and glucose homeostasis were only measured in the cases, and the average values for the characteristics were listed in Table 1. The basic information of the 17 SNPs detected was demonstrated in Table 2. The results indicated that all the study population obeyed the Hardy-Weinberg equilibrium.
To assess the polygenic signal from the previous GWAS report adapted to our samples, we conducted a polygenic scoring analysis. We used the published GWAS data set as the training In this systematic study, four blocks were produced for the further haplotype analysisme in THADA, FSHR, C9orf3 and DENND1A loci. After strict Bonferroni correction, P value less than 0.125 was considered to be significantly different. From the results in Table 4, GA haplotype (rs4385527 and rs3802457 of C9orf3 gene) in the controls was significantly higher than that of the cases (8.7% vs 5.0%, P = 9.85×10 −6 ). This indicated that women with GA haplotype might be a protective factor (OR = 0.548, 95%CI = 0.418-0.717) against PCOS. However, GG haplotype in the cases was significantly higher than that in the controls (79.2% vs 73.6%, P = 3.41×10 −5 ), which revealed GG haplotype to be a risk factor suffering from PCOS (OR = 1.394, 95%CI = 1.191-1.632). We also explored some other haplotypes presented with marginally significance between the controls and the PCOS cases (P values marked with # in Table 4), however, the differences failed to reach statistical level after strict Bonferroni correction. The SNP sites for the block of THADA gene is: rs12468394, rs13429458 and rs12478601. The SNP sites for the block of FSHR gene is: rs2268361 and rs2349415. The SNP sites for the block of C9orf3 gene is: rs4385527 and rs3802457. The SNP sites for the block of DENND1A gene is: rs10818854, rs2479106 and rs10986105. P value less than 0.0125 are marked in bold. P value less than 0.05 but more than 0.0125 are marked with # Association analyses of the 17 SNPs against quantitative traits were conducted in the PCOS group (S1 Table). Unfortunately, none of the SNPs illustrated significant association with the quantitative traits after multiple test corrections.

Discussion
PCOS is defined as a complicated genetic disease, whereas the etiology has not been elucidated sufficiently. Previous GWAS had identified 11 loci (including 17 independent SNPs) associated with PCOS with P <5×10 −8 [8,9]. McAllister et al. [18] predicted that the candidate genes by the PCOS GWAS might comprise a hierarchical signaling network which would influence the theca cell hormone biosynthesis. This would bring a completely new era of the genetic diagnostic for PCOS. In this study, we tended to confirm the effects of all these variants on PCOS adopting an independent sample. We found that the effects of all the 17 SNPs were directionally consistent with those in previous Chinese GWAS [9] and other ethnicity [12,13]. Even after strict Bonferroni correction, two of the 11 previously reported susceptibility loci demonstrated significant association with PCOS in our sample (LHCGR and C9orf3). The further polygenic score analysis showed that these SNPs collectively accounted for approximately 2.40% of the variance to PCOS, suggested an important role in the etiology of PCOS for them.
To our knowledge, it was the first systematic replication study of all the reported SNPs variants from the GWAS studies with strict statistical analysis (Bonferroni correction). The validity of the findings should be convincing as the bias was controlled to the lowest level. Additional scores for non-significant SNPs (P>2.9×10 −3 or 0.05) revealed association with PCOS for SNPs that individuals were not valid to show association, whereas risk score analysis demonstrates the validity contrasted with individual SNP analysis.
In the present study, after the strict Bonferroni correction, A allele of rs3802457 (C9orf3 gene) was considered as a risk susceptibility to PCOS (P = 5.99×10 −4 ). Compared with the previous GWAS results, the risk rate were extremely similar to each other (OR:0.62 vs 0.77). It was well acknowledged that haplotype analysis could identify chromosomes where sequencing might be finished to explore functional variants. To our knowledge, it was the first time to perform the haplotype analysis of C9orf3 gene (rs4385527 and rs3802457) in such a large cohort study of Chinese PCOS patients. The results revealed GG haplotype of C9orf3 gene as a risk factor suffering from PCOS, whereas haplotype GA might be a protective factor against PCOS. C9orf3, known as Aminopeptidase O, encodes a member of the M1 zinc aminopeptidase family that catalyzes the removal of an amino acid from the amino terminus of a protein or peptide. It has been reported to play an important role in the generation of angiotensin IV [19,20]. However, the function of C9orf3 was rarely discussed. Kerns et al. reported that variants within C9orf3 was detected to be associated with the development of erectile dysfunction in African-American men who have received radiotherapy for prostate cancer [21]. Arefi et al. [22] considered that rennin-angiotensin system (RAS) played a vital role in the pathogenesis of PCOS including insulin resistance. Renin activity combining with other clinical variables behaved more sensitive to diagnose women with PCOS according to the previous study [23]. Therefore, we hypothesized that C9orf3 variants might lead to rennin-angiotensin system abnormality which would cause or deteriorate the pathogenesis of PCOS. Although rs3802457 located in intron, there were plentiful possible mechanisms that could explain the alteration of C9orf3, such as: splicing variation, microRNAs regulation or some other undetected copy-number variation (CNV), etc.
The LHCGR gene encodes the LH/choriogonadotropin (HCG) receptor of two structurally homologous glycoproteins which belongs to the family of G-protein coupled receptors [24]. LHCGR is expressed on theca cells, differentiated granulosa cells (GCs) and luteal cells of the ovary. It has been well known that, LH induces follicular development containing maturation, ovulation and luteinization during the midcycle surge of female menstrual cycle and hCG is required for the maintenance of a successful pregnancy [25,26]. LHCGR SNP variants were identified as PCOS susceptibility loci (rs13405728) in the first Chinese GWAS [8]. In our study population, we found that G allele frequency of rs13405728 in the controls was closely associated with the PCOS cases (24.6% vs 19.2%, P = 3.73 × 10 −4 ) and the risk factor behaved extremely similar to the former GWAS study (OR: 0.72 vs 0.71). Meanwhile, a genotype-phenotype correlation analysis also revealed that LHCGR (rs13405728) variant was associated with the phenotype in PCOS with oligo-ovulation or anovulation in another Chinese population [15]. Our results combining with the previous reports provided forceful supports for the relations between LHCGR gene variants and PCOS in Chinese population. Interestingly, due to the rarity of rs13405728 in Europeans (comparing to Chinese), most of the previous replication analysis in European cohorts study failed to explore any significant evidence between rs13405728 and PCOS [10,11,13,27]. But one excellent mapping study in a European ancestry cohort identified another two SNPs in LHCGR locus were significantly associated with PCOS [12]. Nevertheless, plentiful evidences from different ethnicity have confirmed that LHCGR variants as a vital genetic factor for the pathogenesis of PCOS.
Although the present results offered the most significant genetic value for C9orf3 and LHCGR variants of PCOS, the OR of the two variants for association with PCOS was not particularly greater than the other variants. This raised the possibility that they might be emerged as the most significant by statistical chance, even with the current sample size.
In summary, we successfully confirmed that two (LHCGR and C9orf3) of 11 previously identified PCOS loci by a systematic variants association study in an independent Chinese cohort. Haplotype analysis of C9orf3gene revealed GG haplotype as a risk genetic factor for PCOS. However, the detailed mechanisms by which LHCGR and C9orf3 caused PCOS need further clinical investigation and functional studies in vitro and in vivo.
Supporting Information S1 Table. Association analysis of quantitative traits with genotype in women with PCOS, adjusted for BMI. (DOC)