Evaluation of Five Candidate Genes from GWAS for Association with Oligozoospermia in a Han Chinese Population

Background Oligozoospermia is one of the severe forms of idiopathic male infertility. However, its pathology is largely unknown, and few genetic factors have been defined. Our previous genome-wide association study (GWAS) has identified four risk loci for non-obstructive azoospermia (NOA). Objective To investigate the potentially functional genetic variants (including not only common variants, but also less-common and rare variants) of these loci on spermatogenic impairment, especially oligozoospermia. Design, Setting, and Participants A total of 784 individuals with oligozoospermia and 592 healthy controls were recruited to this study from March 2004 and January 2011. Measurements We conducted a two-stage study to explore the association between oligozoospermia and new makers near NOA risk loci. In the first stage, we used next generation sequencing (NGS) in 96 oligozoospermia cases and 96 healthy controls to screen oligozoospermia-susceptible genetic variants. Next, we validated these variants in a large cohort containing 688 cases and 496 controls by SNPscan for high-throughput Single Nucleotide Polymorphism (SNP) genotyping. Results and Limitations Totally, we observed seven oligozoospermia associated variants (rs3791185 and rs2232015 in PRMT6, rs146039840 and rs11046992 in Sox5, rs1129332 in PEX10, rs3197744 in SIRPA, rs1048055 in SIRPG) in the first stage. In the validation stage, rs3197744 in SIRPA and rs11046992 in Sox5 were associated with increased risk of oligozoospermia with an odds ratio (OR) of 4.62 (P  =  0.005, 95%CI 1.58-13.4) and 1.82 (P  =  0.005, 95%CI 1.01-1.64), respectively. Further investigation in larger populations and functional characterizations are needed to validate our findings. Conclusions Our study provides evidence of independent oligozoospermia risk alleles driven by variants in the potentially functional regions of genes discovered by GWAS. Our findings suggest that integrating sequence data with large-scale genotyping will serve as an effective strategy for discovering risk alleles in the future.


Introduction
Infertility is a common reproductive disease and male factor infertility accounts for half of this problem [1,2]. Genetic causes are responsible for 10-15% of severe male infertility, including chromosome number defects, Y chromosome microdeletions and autosomal chromosome mutations [3,4]. Although enormous progress in the understanding of human reproduction, about 50% cases are still defined as idiopathic infertility because of the unknown causes [5].
A significant proportion of idiopathic male infertility is accompanied by abnormal semen quality, mainly oligozoospermia. Genetic variants of genes involved in spermatogenesis may be associated with spermatogenic impairment [6]. Identification of potentially functional genetic variation in spermatogenesis will improve our understanding of idiopathic infertility etiology and will contribute to the development of targeted therapies. However, to date, only a few genetic variants have been identified to be associated with oligozoospermia.
More recently, genome-wide association studies (GWAS) investigated idiopathic male infertility [7], but the interpretation of the results was limited by the small sample capacity. Our subsequent GWAS with larger sample size identified four susceptibility loci associated with non-obstructive azoospermia (NOA) in Han Chinese [8]. And implicated genes for the four susceptibility loci were PRMT6 (protein arginine methyltransferase 6), PEX10 (peroxisomal biogenesis factor 10), SOX5 (SRY-related HMG-box gene 5), SIRPA (signal regulatory protein, a-1) and SIRPG (signal regulatory protein, b-2). Although these loci showed evidence for association with NOA in Han Chinese men, it is unknown whether they also contribute to the susceptibility of oligozoospermia.
Rapid technological advances in next generation sequencing (NGS) have opened the door to discover all possible genetic variations in the entire genome including not only common alleles detected by GWAS, but also rare, causal variants. In order to investigate potentially causal variants of the candidate genes on oligozoospermia, we carried out a two-stage study by using NGS in the discovery phase and SNPscan in the follow-up validation phase.

Study subjects
The study was approved by the Ethics Review Board of Nanjing Medical University. The protocol and consent form were approved by the Institutional Review Board of Nanjing Medical University prior to the study. All participants provided their written informed consent to join in this study. We performed a two-step case-control analysis. The first stage included 96 idiopathic male infertility with oligozoospermia and 96 healthy controls. 688 oligozoospermia cases and 496 controls were recruited in the second stage. Some cohorts within the sample sets have been reported in previously published data [9,10]. The patients recruited from the Center of Reproductive Medicine between March 2004 and January 2011, were diagnosed as infertility without infertile wives. Patients were selected on the basis of a comprehensive andrological examination, including medical history and physical examination, hormone analysis, karyotype, and Y chromosome microdeletion screening. Patients with a history of orchitis, obstruction of vas deferens, chromosomal abnormalities, or microdeletions of azoospermia factor (AZF) region on the Y chromosome were excluded. All controls with normal reproductive function were from the early pregnancy registry of the same hospitals, whose wives were in the first trimester of pregnancy and confirmed as having healthy babies 6-8 months later. After completing a questionnaire, each subject donated 5 ml of blood which was used for genomic DNA extraction and an ejaculate for semen analysis. Semen analysis for sperm concentration and motility was performed following World Health Organization criteria [11].

Solexa sequencing
The exons and promoters of genes were amplified using polymerase chain reaction (PCR) in 48 overlapping fragments, by the use of the primer pairs shown in Table S1. After PCR, by the use of DNase I (Fermentas Life sciences), a fragmented DNA sequences library of each participant was created and purified using the QIA quick purification kit (QIAGEN). After the step of DNA End-Repair and A-Tailing, the total DNA was ligated to the PE Adapter oligo mix with T4 DNA Ligase and then followed by 2% TBE PAGE gel purification with size selection. The purified DNA was used directly for cluster generation and sequencing analysis using the Illumina Solexa Sequencer according to the manufacturer's instructions. After performing the image analysis and base calling, we received the primary data in FASTQ format. The subsequent procedures performed with Solexa were summarizing data production, evaluating sequencing quality, calculating length distribution of reads and filtrating reads contaminated. To identify single-nucleotide variants (SNVs) and indels, clean reads were aligned against hg19. SNVs and indels were identified using Samtools.

Follow-up genotyping by SNPscan sequencing
Selected SNPs were genotyped by a custom-by-design 48-Plex SNPscan TM Kit (Cat#:G0104; Genesky Biotechnologies Inc., Shanghai, China). This kit was developed according to patented SNP genotyping technology by Genesky Biotechnologies Inc., which was based on double ligation and multiplex fluorescence PCR [12]. In order to validate the genotyping accuracy using SNPscan TM Kit, 5% duplicate samples were analyzed by single nucleotide extension using the Multiplex SNaPshot Kit (Applied Biosystems Inc., Foster City, CA, USA), and the concordance rates were more than 99%.

Identification of coding variants in biological candidate genes
By using NGS, we identified a total of 287 genetic variations in our candidate genes in 96 cases with oligozoospermia and 96 healthy controls. Because of the exploratory nature of the analysis, P-value , 0.2 was considered statistically suggestive. In this study, we used genotypic and other association models to obtain the minimum P-value, and identified 59 out of 287 genetic variations, among which, seven genetic variants (rs1129332 in PEX10, rs3791185 and rs2232015 in PRMT6, rs3197744 in SIRPA, rs1048055 in SIRPG, rs146039840 and rs11046992 in Sox5) were predicted to be potentially functional and chosen to be replicated in the follow-up study. The genotype distribution of the selected variants were presented in Table 1.

Associations between oligozoospermia-predisposed variants and spermatogenic impairment
In the second stage, we replicated these seven variants in 496 controls and 688 cases. Their associations with oligozoospermia were shown in Table 2. Most SNPs were common (MAF.5%), only rs146039840 in SOX5 was rare (MAF#2%). Among these variants, rs3197744 (G.T) and rs11046992 (G.A) were associated with oligozoospermia. The TT genotype of rs3197744 in the 3'-UTR region of SIRPA increased the risk of oligozoospermia with OR of 4.62 (1.58-13.47), compared with the GG genotype. And the genotype frequencies of rs11046992 in SOX5 were 41.13% (GG), 46.66% (GA) and 12.06% (AA) in the cases and 48.79% (GG), 43.15% (GA) and 7.86% (AA) in the controls. Logistic regression analysis revealed that rs11046992 AA genotype was associated with a significantly increased risk of oligozoosper- mia, compared with the GG genotype (P = 0.005, OR = 1.82(1.20-2.76)).
As to the other SNPs, no significant differences of distribution frequencies were identified between the case and control groups.

Discussion
Whether NOA associated genes identified in our previous GWAS study contributing to oligozoospermia were still unknown. Thus, we addressed this issue by deep exon-sequencing and largescale genotyping across five genes discovered by GWAS. In the first discovery stage, we identified seven potentially functional genetic variants, and in the second stage, we validated the associations of SIRPA-rs3197744 (G.T) and Sox5-rs11046992 (G.A) with the risk of oligozoospermia.
SIRPA, which belongs to the signal regulatory family, is a membrane glycoprotein belonging to the immunoglobulin (Ig) superfamily [13,14], and is especially abundant in macrophages, dendritic cells, neutrophils, and neurons [15,16,17,18]. Growth factor receptors and growth hormone receptor signaling is suppressed by the up regulation of SIRPA [14,19,20]. SIRPA also regulates the NFkB activity that renders the cells resistant to TNF mediated apoptosis [21]. It was reported that polymorphisms in SIRPA modulate engraftment of human hematopoietic stem cells [22]; however, a related role in oligozoospermia has been documented. In this study, we found the rs3197744, which is located in the 3' UTR region, significantly increased the risk of spermatogenic impairment (P = 0.005). It is believed that microRNAs down-regulated gene expression by the mRNA cleavage or translational repression through base pairing in the 3' UTR of messenger RNAs (mRNAs) of target genes. Rs3197744 may lead to altered binding activity to microRNAs, which might regulate the gene expression [23]. We used the MicroSNiPer to predict the effects of this SNP on putative microRNA targets [24,25]. As shown in Fig. 1, we found that rs3197744 substitution may disrupt the binding of miR-4277 to the 3' UTR of SIRPA, and may increase the binding of miR-148a, miR-148b and miR-506.These alternations may change the expression level of SIRPA and hence this modifies the susceptibility to oligozoospermia.
SOX5 is a member of the SOXD gene family, which includes three genes, SOX5, SOX6, and SOX13 [26]. SOX proteins are transcription factors with a high motility group box DNA binding domain similar to that of the sex-determining region (SRY) protein [27,28]. The SOX5 gene encodes two major proteins, the fulllength 84-kDa SOX5 (L-SOX5) and the 48-kDa SOX5 (S-SOX5). The S-SOX5 protein is expressed in tissues with motile cilia, suggesting a role of this transcription factor in motile cilia genesis [29]. The ZNF230 gene, which could be induced by SOX5, is a recently cloned gene which is transcribed only in fertile male testis and may be related to human spermatogenesis [30]. In this study, we found the rs11046992 located in the upstream region of SOX5 significantly increased the risk of spermatogenic impairment (p = 0.005). It may affect the binding sites of transcription factors, but the exact molecular mechanisms are unknown.
In conclusion, our two-stage genetic association study provided convincing evidence that two SNPs in the five previously GWASidentified genes were associated with risk of oligozoospermia. These findings may be useful in understanding male infertility etiology and more epidemiological and functional studies are still needed to validate our findings.

Supporting Information
Table S1 The forward (F) and reverse (R) primers for multiplex competitive amplification. (DOCX)