Phenotype and Tissue Expression as a Function of Genetic Risk in Polycystic Ovary Syndrome

Genome-wide association studies and replication analyses have identified (n = 5) or replicated (n = 10) DNA variants associated with risk for polycystic ovary syndrome (PCOS) in European women. However, the causal gene and underlying mechanism for PCOS risk at these loci have not been determined. We hypothesized that analysis of phenotype, gene expression and metformin response as a function of genotype would identify candidate genes and pathways that could provide insight into the underlying mechanism for risk at these loci. To test the hypothesis, subjects with PCOS (n = 427) diagnosed according to the NIH criteria (< 9 menses per year and clinical or biochemical hyperandrogenism) and controls (n = 407) with extensive phenotyping were studied. A subset of subjects (n = 38) underwent a subcutaneous adipose tissue biopsy for RNA sequencing and were subsequently treated with metformin for 12 weeks with standardized outcomes measured. Data were analyzed according to genotype at PCOS risk loci and adjusted for the false discovery rate. A gene variant in the THADA locus was associated with response to metformin and metformin was a predicted upstream regulator at the same locus. Genotype at the FSHB locus was associated with LH levels. Genes near the PCOS risk loci demonstrated differences in expression as a function of genotype in adipose including BLK and NEIL2 (GATA4 locus), GLIPR1 and PHLDA1 (KRR1 locus). Based on the phenotypes, expression quantitative trait loci (eQTL), and upstream regulatory and pathway analyses we hypothesize that there are PCOS subtypes. FSHB, FHSR and LHR loci may influence PCOS risk based on their relationship to gonadotropin levels. The THADA, GATA4, ERBB4, SUMO1P1, KRR1 and RAB5B loci appear to confer risk through metabolic mechanisms. The IRF1, SUMO1P1 and KRR1 loci may confer PCOS risk in development. The TOX3 and GATA4 loci appear to be involved in inflammation and its consequences. The data suggest potential PCOS subtypes and point to the need for additional studies to replicate these findings and identify personalized diagnosis and treatment options for PCOS.


Introduction
Polycystic ovary syndrome (PCOS) is the most common endocrinopathy in reproductive age women, affecting 7-10% of this group. Cardinal features include irregular menstrual cycles, hyperandrogenism and polycystic ovarian morphology [1]. Obesity and insulin resistance are also common, along with increased risk of diabetes, metabolic syndrome and other cardiovascular diseases [2,3]. Despite the detrimental impact of the disorder on women's health, the etiology remains poorly understood.
Twin studies suggest that genetic influences explain over 70% of PCOS pathogenesis [4]. Therefore, genome-wide association studies have been performed to uncover DNA variants associated with PCOS [5][6][7][8][9]. There are now 16 published variants that mark loci associated with PCOS in women of Han Chinese and European ethnicity. It is not clear which variants in the loci are causal for increased risk of PCOS nor which genes in the loci are marked by the genotyped variants. Insights have been gained previously through association with quantitative phenotype traits and PCOS subphenotypes [7,9,10]. Insight has also been gained through examination of gene expression in tissues of interest in other diseases, including type 2 diabetes [11][12][13][14].
We therefore hypothesized that the relationship between PCOS risk loci and quantitative phenotypic traits would illuminate the underlying PCOS features affected at each locus. We also hypothesized that gene expression and associated patterns would identify candidate genes and pathways that could provide insight into the risk mechanism at the loci. The data suggest relationships between PCOS risk variants and gonadotropins, reproductive, metabolic and developmental pathways.

Gene Expression
The number of reads was 30.3X10 6 ±4.9 X10 6 , uniquely aligned reads was 23.5X10 6 ±4.5 X10 6 (77±8%), and the number of exonic reads was 9.6X10 6 ±3.5 X10 6 (40±11% of the unique reads). Skin and fat RNA expression clusters did not overlap and were therefore analyzed separately. Four samples did not cluster with their respective tissue types and were removed from the analysis (3 fat and 1 skin).

Upstream Regulators
The top 5 predicted upstream regulators for each variant are shown in Table 1. The entire list of upstream regulators is shown in S1 Table. The most common upstream regulators included TGFB1 (THADA, FSHR, LHCGR, DENND1A, KRR1, TOX3) and the immune and inflammatory regulators tumor necrosis factor, interleukins and interferon gamma (THADA, FSHR, LHCGR, GATA4/NEIL2, DENND1A, ERBB3, KRR1, TOX3). Other upstream regulators included mitochondrial genes and AMPK upstream at the ERBB4 locus, important for energy production. Transcription factors HOXA9, TP53 and GATA2 were upstream regulators of SUMO1P1 and KRR1, respectively. ERBB2 was an upstream regulator of THADA and KRR1.

Canonical Pathways
The expression patterns at the majority of the loci overlapped with several canonical pathways involved in the immune and inflammatory response ( Table 2 and S2 Table). In addition, the expression pattern at rs12478601 in fat (THADA locus) overlapped with a canonical pathway involved in adipogenesis. In skin, canonical pathways included cholesterol biosynthesis pathways.
The expression pattern for rs2178575 (ERBB4 locus) in skin overlapped with canonical pathways involved in mitochondrial dysfunction, oxidative phosphorylation and MAPK signaling. The expression pattern for r2268361 (FSHR locus), rs13405728 (LHCGR locus) and rs1795379 (KRR1 locus) overlapped with a canonical pathway involved in axonal guidance. The expression pattern at two loci overlapped with canonical pathways involved in ErbB signaling (rs13164856; IRF1 locus) and ErbB2 signaling (rs11031005; FSHB locus).

Associated Networks
The expression pattern at rs12478601 (THADA locus) in skin and fat, rs804279 (GATA4 locus) and rs1795379 (KRR1 locus) in fat and rs2268361 (FSHR locus) in skin predicted involvement in reproductive diseases ( Table 3). The expression pattern at rs11031005 (FSHβ locus) in fat predicted a relationship to endocrine function and the pattern for rs2178575 (ERBB4 locus) in skin predicted a relationship to metabolic disease.
All of the loci were associated with at least one network involved in cell processes, including cell-to-cell signaling, movement, cell function and maintenance, cell growth, proliferation and development and cell death and survival. In addition, 8 loci were associated with networks predicted to involve embryonic development (THADA, ERBB4, FSHR, RAD50, DENND1A, FSHβ, KRR1, SUMO1P1). Finally, 7 loci were associated with cholesterol or lipid biosynthesis (THADA, FSHR, LHCGR, RAD50, FSHβ, RAB5B, SUMO1P1).

Discussion
The risk attributed to genetic loci associated with PCOS can be understood only through fine mapping to locate the causal risk allele and identification of the candidate genes affected. Phenotype, both baseline and interrogated, and expression data from adipose and skin of women with PCOS suggest candidate genes and pathways that predispose to PCOS at the genetic risk loci. Specifically, phenotype data from current and previous studies suggest risk in relation to  gonadotropins and glucose metabolism. Interrogation with metformin points to a locus with type 2 diabetes risk. The expression data point to the importance of growth factors and inflammatory and immune pathways in upstream regulation and effector pathways including reproductive, endocrine, metabolism and inflammatory pathways. The differences in potential upstream regulators and networks affected in the case of each variant point to potentially differing etiologies for PCOS risk at the distinct risk loci. The phenotype insights are most helpful to ascertain. The rs11031006 variant is associated with LH levels and the LH:FSH ratio, as we previously reported for the rs12294104, which is in linkage disequilibrium [7,16]. The variant is located upstream of FSHβ and larger studies demonstrate that FSH levels are also suppressed in the presence of the variant [7,17]. Therefore, the variant may act as an upstream suppression site lowering FSH levels, with the resulting free α subunit pairing with LHβ and increasing LH levels [18,19]. FSHβ expression was not evaluable in adipose or skin and the effect of the variant could not be examined in these tissues. The expression pattern in relation to the variant did predict involvement in endocrine development and disease. Finally, the overlap between the expression pattern at the FSHβ locus and ERBB2 signaling is supported by the relationship between gonadotropins, estradiol and HER-2 upregulation in breast cancer patients [20].
The rs12478601-C variant, identified initially in women of Han Chinese ethnicity and replicated in European women, was found in the only PCOS risk locus that predicted any defined response to metformin [6,21,22]. An independent THADA locus has been extensively evaluated and is associated with a lower insulin sensitivity in one study and decreased beta cell responsiveness [23]. Further, the expression pattern overlaps with the canonical adipogenesis pathway. Taken together, the data point to a relationship between the THADA locus in PCOS and the independent causal association with insulin resistance and metabolism [7].
There were few variants associated with gene expression at the loci identified. In carriers of the rs804279-A risk allele (GATA4 locus), B cell non-receptor Src tyrosine kinase expression (BLK) was increased, a gene responsible for B-cell receptor signaling and B-cell development. It is a susceptibility gene for multiple autoimmune phenotypes. Interestingly, BLK protein also stimulates insulin synthesis and secretion in response to glucose and enhances expression of β cell transcription factors [24]. One hypothesis would be that the increased expression is a response to insulin resistance in PCOS. NEIL2 was nominally decreased at the same locus, a gene that plays a role in gene maintenance and protection from DNA mutagenesis and the inflammatory response [25]. Upstream regulators at the locus were also dominated by inflammatory and immune genes. Taken together, the increase in expression of an autoimmune risk gene and trend toward decreased expression of an inflammatory suppression gene may point to an inflammatory risk profile in PCOS.

Phenotype and Gene Expression in PCOS
At rs1795379 (KRR1 locus), there was increased expression of GLIPR1 and PHLDA1. PHLDA1 is the pleckstrin homology like domain family A member 1, a proline-histidine rich nuclear protein that plays an important role in the anti-apoptotic effects of insulin-like growth factor-1 (IGF-1) [26]. Interestingly, PHLDA1 is induced by IGF-1 and insulin upregulates IGF-1 receptors in conditions with insulin resistance, as in PCOS [27]. GLIPR1, or GLI pathogenesis related 1, has proapoptotic activity in prostate cancer cells, is expressed at high levels in the testes and may have a role in sperm-oocyte interactions [28]. The increased gene expression points to possible relationships to insulin resistance and fertility at the locus, as suggested by the overlap in expression as a function of the rs1795379 and pathways involved in reproduction.
A variant located in an intron of DENND1A demonstrated a nominal decrease in DENND1A expression although it was no longer significant after correction for multiple testing [10,21]. Although rs2479106 was not initially replicated, it was the variant replicated in the recent metaanalysis GWAS of data from European women [8], but was not associated with expression and not in linkage disequilibrium with rs10986105. Upstream regulators of rs10986105 include several factors important in reproductive pathways. For example, JUN is induced by GnRH and TGFB1 is important for reproductive regulation and has been implicated in PCOS [29,30].
Previous studies have demonstrated differential gene expression in adipose tissue at the RAB5B and INSR PCOS risk loci [31]. The gene expression patterns at these two loci did not replicate in the current study. The loci with significant expression data were found in new loci that were not assessed in the previous study [31].
The IPA upstream regulatory analytic identified upstream transcriptional regulators that can explain observed gene expression differences as a function of genotype at each locus. The IPA network analysis integrates upstream regulators with downstream effects that may explain what is happening upstream and how it influences a phenotype or functional outcome downstream. The most common upstream regulators belong to the immunomodulatory and inflammatory axis. Previous studies have examined the association between polycystic ovary syndrome risk and variants in 80 inflammatory genes and found no association [32]. However, the study did not examine pathways that may influence PCOS through PCOS risk loci identified by GWAS, as in the current study. The second most common upstream regulators included growth factors, notably TGFB1. TGFB1 is important for reproductive regulation and has been implicated in PCOS [29,30]. The networks affected are mainly related to cell growth, maintenance and embryonic development. These upstream regulators and networks may be specific to adipose and skin tissue sources. Adipose tissue and skin have previously been demonstrated to share the most cis-effects in the regulatory regions [12]. Study of additional tissues may provide differing expression patterns and further insight into the mechanism of risk at these PCOS loci.
The analysis pathways also suggest the hypothesis that these loci confer risk for PCOS through distinct mechanisms. The FSHB, FSHR and LHCGR loci influence PCOS risk based on their relationship to gonadotropin levels, as demonstrated in the current study and previous studies [7,9,15]. The THADA, GATA4, ERBB4, SUMO1P1 and KRR1 loci appear to confer risk through metabolic mechanisms. The RAB5B locus also appears to confer risk through metabolic mechanisms, particularly through glucose metabolism, as described previously [15]. KRR1 and DENND1A expression patterns suggest reproductive function. The IRF1, SUMO1P1 and KRR1 loci may confer PCOS risk through a developmental mechanism. The TOX3 and GATA4 loci appear to be involved in inflammation and its consequences.
The main limitation is the small sample size for PCOS-specific adipose and skin tissue expression. These studies will need replication in additional, larger PCOS cohorts. Other limitations include the possibility that the tissues examined in the current study are not those that are critical for functional expression of all genes. We also used subcutaneous adipose, which may not reflect important expression differences in visceral adipose, although it has been demonstrated to be similar for a small number of relevant genes [33,34] associated previously with metabolic risk in PCOS. The subcutaneous adipocytes are also larger in women with PCOS and had lower lipoprotein lipase activity, suggesting a qualitative abnormality [35]. Ideally, ovarian tissue and pituitary/hypothalamic tissue should be studied next. By examining only cis-eQTLs we are potentially missing trans-eQTL effects [12]. However, given the small sample size there is concern for overcalling relationships. Finally, we did not examine controls because the hypothesis addressed possible eQTL effects in women with PCOS.
Phenotype data demonstrate a strong relationship between the risk alleles and gonadotropin levels. The data also demonstrate candidate genes at two PCOS risk loci marked by GATA4/NEIL2 and KRR1. Expression patterns as a function of genotype point to specific influences on PCOS risk at each locus spanning reproductive, metabolic and inflammatory pathways and response to metformin maps to a locus that has a metabolic profile. These data suggest the hypothesis that PCOS can be subdivided into differing risk patterns that may be able to pinpoint the underlying etiology as a function of genotype.

Study Subjects
Subjects were of European ethnicity, aged 18-45 years old and with PCOS defined by the NIH criteria, i.e. irregular menses (< 9 menstrual periods/yr) and clinical or biochemical hyperandrogenism (n = 427) [36]. Clinical hyperandrogenism was defined as a Ferriman Gallwey (FG) score greater than 9, the upper 95% confidence limit for the Boston-based control populations. Biochemical hyperandrogenism was defined as an androgen level greater than the 95 percent confidence limits in control subjects with regular, ovulatory menstrual cycles: testosterone >63 ng/mL (2.8 nmol/L), DHEAS >430 μg/dL (1.16 μmoL/L) or androstenedione levels >3.8 ng/mL (0.13 nmol/L). Subjects with non-classic congenital adrenal hyperplasia, hypothyroidism, elevated prolactin levels, diabetes, Cushing syndrome and primary ovarian insufficiency were excluded [37]. Control subjects (n = 407) consisted of women aged 18-45 years with regular menses, between 21 and 35 days, and no hyperandrogenism [37]. Subjects were on no hormonal medication for at least 3 months and no medications that influence insulin, inflammation, or lipid levels for at least one month. A subset of subjects, meeting all criteria for PCOS as described above, were also studied in a protocol in which metformin was administered (n = 38) [36].

Protocol
All PCOS subjects were studied !10 days after their last menstrual period and after a 12 hour fast [37]. Subjects underwent a detailed history; physical exam including measurement of waist circumference at the umbilicus and hip circumference at the widest diameter; a pelvic ultrasound (Phillips, 5 MHz convex array transducer); and blood samples for DNA, lipids, glucose, insulin, gonadotropin and sex-steroid levels. LH and FSH levels were obtained at ten minute intervals to calculate an average gonadotropin concentration. Using data from blood samples collected every 10 min over 12 h in women with PCOS and ovulatory controls [38], we documented that the mean LH secretion from 12 h of frequent blood samples correlates well with the value obtained from the mean of three samples collected from 0800-0820 h (r = 0.92, p<0.01) [37].
The subset of 38 subjects with PCOS also underwent an intravenous glucose tolerance test (IVGTT), with glucose 0.3 g/kg administered at time 0 and regular human insulin 0.03 U/kg injected at 20 minutes [22]. They subsequently underwent a subcutaneous adipose biopsy under local anesthesia with 1% lidocaine. Adipose tissue and skin, which was saved for 16 subjects, were rinsed with sterile 0.9% NaCl solution, placed on ice and transported for immediate freezing in liquid nitrogen.
After the initial visit, subjects started treatment with metformin ER 500 mg per day, with the dose increasing by 500 mg every two weeks to a final dose of 1500 mg/day, which was administered for a total of 12 weeks. Subjects returned every two weeks for anthropomorphic measurements, estradiol and progesterone levels, and a pelvic ultrasound to monitor folliculogenesis. Subjects returned for additional visits if follicle size indicated impending ovulation. Compliance was determined by questioning at the biweekly visits. After 12 weeks of metformin ER 1500 mg/day, subjects were admitted to the Massachusetts General Hospital Clinical Research Center to repeat the study as outlined above. Plasma metformin levels were measured in the fasting state at the study conclusion before the final IV glucose tolerance test. Clinical parameters were previously reported [22].
The study was approved by the Partners Human Research Committee. All subjects provided written, informed consent.

Genotyping, RNA Sequencing and Statistical Analysis
Patient DNA was isolated from whole blood according to manufacturer's specifications (Qiagen, USA) and genotyped using the HumanOmniExpress BeadChip (Illumina, San Diego). Associations between five SNPs not previously examined in our data (Table 1 and S3 Table) and PCOS phenotypes were evaluated [7][8][9].
Linear regression using an additive genetic model was used to test for association of PCOS risk variants with 30 log-transformed quantitative traits in the combined sample of PCOS cases and controls in the discovery Boston cohort. A p value < 0.0003 was considered significant after Bonferroni correction for 12 evaluable gene variants (S3 Table) and 15 independent traits (FSH, LH, prolactin, 17-OH progesterone, testosterone, cholesterol, sex hormone binding globulin (SHBG), estradiol, blood pressure, pulse, thyroid stimulating hormone (TSH), body mass index (BMI), fasting glucose, fasting insulin and ovarian volume), with other variables highly correlated.
Associations between genetic variants and response to metformin were analyzed using Chi square [39] and one way ANOVA. To correct for multiple testing, a p value of 0.0021 was considered significant to correct for comparisons of 2 independent outcomes (change in baseline glucose and glucose mediated glucose disposal, and testosterone and ovulation) and 12 evaluable variants.
SNPs from all loci associated with PCOS in European and Chinese GWAS and available on the HumanOmniExpress BeadChip were examined in relation to adipose and skin expression [5][6][7][8][9].
Skin (n = 10) and adipose tissue (n = 33) were isolated, tissue homogenized and RNA extracted (RNeasy or RNeasy lipid kits, Qiagen, USA). RNA libraries were prepared using Illumina TruSeq strand RNA sample prep with RiboZero treatment and sequencing was performed using the Illumina HiSeq 50 cycle single-read sequencing.
Transcript annotations for hg19 (build 74) were downloaded from Ensembl. Splice junction sequences were generated using the USeq (v8.8.2) MakeTranscriptome application using a radius of 46. Novoindex (2.8) was used to create a transcriptome index using the combined splice junction and hg19 chromosome sequences.
Reads were aligned to the transcriptome index described above using Novoalign (v2.08.01), allowing up to 50 alignments for each read. USeq (v8.8.2) SamTranscriptomeParser was used to convert the coordinates of reads aligning to splice junctions to genomic space. Reads aligning to multiple locations in the genome with equal confidence were discarded.
Ensembl Build 74 transcript annotations were collapsed into gene annotations using USeq (v8.8.2) MergeUCSCGeneTable. USeq (v8.8.8) DefinedRegionDifferentialSeq was used to generate read counts for each gene. DESeq2 was used to test for differential expression in any of the genotypes using the likelihood ratio test (LRT) option [40]. Genes with significant adjusted p-values were used in Ingenuity Pathway Analysis (IPA 1 , Ingenuity Systems, QIAGEN) and for eQTL expression analysis. For eQTL expression analysis, all genes within an LD block defined by an r 2 = 0.1 were considered. In addition, expression in fat and skin for all genes within 1 MB of the variant of interest were examined, which encompassed the LD block defined above. Pathway analysis was performed using IPA 1 , which creates algorithmically generated pathways from expression data and compares the ratio of overlap to known pathways. The upstream regulator data is generated by determining differentially expressed genes between genotypes and comparing them to genes regulated by the upstream molecules to determine the significance of enrichment of the gene expression data for the genes downstream of an upstream regulator. For both pathway and upstream regulator analysis, significance is calculated using the Fisher exact test.