Identification of Putative Biomarkers for the Early Stage of Porcine Spermatogonial Stem Cells Using Next-Generation Sequencing

To identify putative biomarkers of porcine spermatogonial stem cells (pSSCs), total RNA sequencing (RNA-seq) analysis was performed on 5- and 180-day-old porcine testes and on pSSC colonies that were established under low temperature culture conditions as reported previously. In total, 10,184 genes were selected using Cufflink software, followed by a logarithm and quantile normalization of the pairwise scatter plot. The correlation rates of pSSCs compared to 5- and 180-day-old testes were 0.869 and 0.529, respectively and that between 5- and 180-day-old testes was 0.580. Hierarchical clustering data revealed that gene expression patterns of pSSCs were similar to 5-day-old testis. By applying a differential expression filter of four fold or greater, 607 genes were identified between pSSCs and 5-day-old testis, and 2118 genes were identified between the 5- and 180-day-old testes. Among these differentially expressed genes, 293 genes were upregulated and 314 genes were downregulated in the 5-day-old testis compared to pSSCs, and 1106 genes were upregulated and 1012 genes were downregulated in the 180-day-old testis compared to the 5-day-old testis. The following genes upregulated in pSSCs compared to 5-day-old testes were selected for additional analysis: matrix metallopeptidase 9 (MMP9), matrix metallopeptidase 1 (MMP1), glutathione peroxidase 1 (GPX1), chemokine receptor 1 (CCR1), insulin-like growth factor binding protein 3 (IGFBP3), CD14, CD209, and Kruppel-like factor 9 (KLF9). Expression levels of these genes were evaluated in pSSCs and in 5- and 180-day-old porcine testes. In addition, immunohistochemistry analysis confirmed their germ cell-specific expression in 5- and 180-day-old testes. These finding may not only be useful in facilitating the enrichment and sorting of porcine spermatogonia, but may also be useful in the study of the early stages of spermatogenic meiosis.


Introduction
Spermatogenesis in mammals begins at puberty and continues throughout life. The process is initiated and maintained by a small population of spermatogonial stem cells (SSCs), attached to the basement membrane of the seminiferous tubules [1]. The identity of the spermatogonial stem cells has not been fully elucidated, but they have the potential to colonize and restore spermatogenesis in germ cell depleted recipient testes [2,3].
In the present study, we measured the relative expression level of genes in different stages of testicular development by using sequencing-based tools in in vitro cultured porcine SSCs (pSSCs); 5-day-old porcine testicular cells, which are the source of in vitro cultured pSSCs; and 180-day-old porcine testicular cells, which contain relatively few spermatogonia and more differentiating cells. We hypothesized that analyzing the relative expression of genes in different stages of testicular development may help identify genes with spermatogonia-specific expression that may be biomarkers of pSSCs. Similar approaches using RNA sequencing-based transcriptome analysis have been used to identify biomarkers in breast [20], colorectal [21], thyroid [22], and metastatic prostate cancer [23]. This high-throughput sequencing technology can identify rare transcripts, isoform genes due to alternative splicing, and differential transcription levels [24]. In addition, the sequencing-based quantitative analysis used in this technology compensates for the lower sensitivity of conventional microarrays in detecting the expression of rare mRNAs and false positives caused by cross-hybridization of highly similar sequences [25,26].
In this study, we performed high-throughput RNA sequencing to profile the gene expression changes in pSSCs, as well as in 5-day and 180-day-old porcine testes (pTestes). By analyzing these transcriptomes, we identified a number of expressed genes specific to pSSCs and spermatogenesis.

Materials and Methods
In-vitro culture of pSSCs Five-day-old pTestes were obtained from crossbred piglets (Landrace × Large White Yorkshire) from the Sam-Woo breeding farm, Yang Pyoung, Korea. Castration was not conducted as part of this study but was performed in accordance with the farm management plan. pTestes (180-day-old) from Landrace × Large White Yorkshire piglets were donated by the Daesung Corporation (slaughter house), Chung Ju, Korea. The derivation of pSSCs and in-vitro culture method was described in our previous study [16]. Briefly, testes were collected from 5-day-old piglets, dissociated with collagenase IV, and meshed using a 40-μm mesh. The red blood cells (RBCs) were removed using RBC lysis buffer. The isolated cells were seed onto 0.2% gelatincoated cell culture dish and incubated 31°C in 5% CO 2 . Stempro-34 medium was used for pSSC culture with various growth factors.

RNA extraction and reverse transcription-polymerase chain reaction (RT-PCR)
RNA was extracted from pSSCs and total cells of 5-and 180-day-old pTestes using Qiagen RNA extraction kit according to the manufacturer's instructions (Qiagen, Venlo, Netherlands). cDNA was synthesized from 1 μg of total RNA using RT-PCR premix kit (iNtRON, Seongnam, South Korea). Target gene PCR amplification was performed with 30 cycles of 30 s at 95°C, 30 s at 55°C, and 30 s at 72°C. All primer sets are listed in S1 Table. Real-time PCR The relative amounts of putative pSSC-specific marker genes mRNAs were estimated in duplicate samples by fluorescence and quantified using Qiagen Rotor gene PCR detection system. The reaction was initiated in a total volume of 20 μL, which contained 10 ng of cDNA and 1 pM of each primer, in a reaction buffer containing iQ SYBR Green Supermix (170-8880; Bio-Rad Laboratories). The cycle threshold (Ct) values were normalized against beta-2-myoglobin (B2M) gene expression. The results were expressed as target gene expression relative to control gene expression. PCR amplification was performed with 40 cycles of 20 s at 95°C, 20 s at 55°C, and 20 s at 72°C with the same primer sets used for real-time PCR.

RNA isolation and RNA library construction for high-throughput sequencing
For RNA library construction, total RNA was extracted from passage 2-3 of pSSCs, 5-, and 180-day-old porcine testes. Quality and quantity of total RNA were determined using a Nano-Drop 1000 spectrophotometer (Thermo Scientific, DE, USA) and Bioanalyzer 2100 (Agilent Technologies, Carlsbad, CA, USA). Illumina-compatible libraries were constructed using a Tru-Seq RNA Library Preparation Kit (Illumina, San Diego, CA, USA) according to the manufacturer's instructions. Briefly, messenger RNA was purified by poly A selection of total RNA. Singlestranded cDNAs were synthesized with random hexamer primers by using the chemically fragmented and converted mRNA; then, complementary cDNA strands were synthesized to generate double-stranded (ds) cDNAs for the construction of the TruSeq library. The short ds-cDNA fragments were connected using sequencing adapters, and appropriate fragments were separated by agarose gel electrophoresis. Finally, the RNA libraries were constructed using PCR amplification; the RNA libraries were quantified by qPCR, according to the qPCR Quantification Protocol Guide (San Diego, CA, USA), and were verified using the Agilent Bioanalyzer 2100.
To analyze gene expression levels and to identify alternative spliced transcripts, the RNA-Seq reads were mapped to the Sus scrofa genome by using TopHat [27], which can report split-read alignments across splice junctions, and gene expression was determined using the Cufflinks software [28] using default settings. The reference genome sequence of Sus scrofa (ssc_ref_S-scrofa10.2) and annotation data were obtained from the NCBI website. The transcript counts at the isoform level were calculated, and relative transcript abundances were measured in units of FPKM (fragments per kilobase of exon per million fragments mapped) by using Cufflinks. In addition, novel transcript and novel alternative splicing transcripts were identified in each sample by using the Cufflinks reference annotation based transcript assembly (RABT) method, which allows for the discovery of reference transcripts and novel transcripts on using the -g option.

Statistical analysis
Experimental reproducibility. The RNA sequencing was performed only once. Additionally, 3 biological and 4 technological replicates were performed for RT-PCR, real-time PCR, and immunohistochemistry analyses. The real time RT-PCR data were analyzed by one-way analysis of variance using SPSS statistical package ver. 21.0 for Windows. A t-test was performed for comparisons between control and experimental groups. All data are expressed as the mean ± standard deviation. The null hypothesis was rejected when the probability was P < 0.05. mRNA expression. Raw data were calculated as Fragments Per Kilobase of exon model per Millon mapped fragments (FPKM) for each transcript in each sample by Cufflinks software. We excluded transcripts where zeroed FPKM values existed for more than one sample. We added one to the FPKM value of the filtered transcript to facilitate log10 transformation. Filtered data was transformed by logarithm and normalized by the quantile normalization method. For each transcript, we calculated the fold change between case and control and determined the significant genes using a filter of |fold change| 4.
Hierarchical clustering. Hierarchical clustering analysis was also performed using complete linkage and Euclidean distance as a measure of similarity to display the expression patterns of differentially expressed genes (DEGs), which were satisfied with a filter of |fold change|4.
Multidimensional scaling. We used the multidimensional scaling (MDS) method to visualize the similarities among group pairs. MDS converts the structure in the similarity matrix to a simple geometrical picture displayed as a scatter plot. The larger the dissimilarity between two samples, the further apart the points representing the experiments will be in the plot. We applied to the Euclidean distance as the measure of the dissimilarity.

Immunohistochemistry
For immunohistochemistry analysis, 5-day old pTestes were fixed in Bouin's solution overnight at 4°C. Samples were subsequently washed in 70% to 100% (v/v) ethanol, embedded in paraffin, sliced into 5 μm-thick sections and mounted onto glass slides. The mounted pTestis tissue was rehydrated using xylene and 100% to 50% ethanol. To determine the expression of matrix metallopeptidase 9 (MMP9), insulin-like growth factor binding protein 3 (IGFBP3), CD14 and CD209 in the testis cells, antigen was retrieved by boiling for 30 min in tris-ethylenediamine tetra acetic acid (EDTA) solution. The antigen-retrieved tissues were stained with a 1:100 dilution of MMP9 (SC-10737; Santa Cruz Biotechnology), CD14 (Lab made, rabbit polyclonal) and CD209 (Lab made, rabbit polyclonal) antibodies followed by goat anti-rabbit IgG-HRP (SC-2004; Santa Cruz Biotechnology) and with a 1:100 dilution of IGFBP3 antibody (SC-6004; Santa Cruz Biotechnology) followed by donkey anti-goat IgG-HRP (SC-2020; Santa Cruz Biotechnology). The peroxidase substrate detection kit (SK-4100; Vector Laboratories) was used for detection of the putative pSSC marker genes according to the manufacturer's instruction. Mouse anti-human PGP9.5 IgG (7863-1004; AbD Serotec) was used for co-immunostaining new putative porcine SSC markers with PGP9.5. Alexa Polyclonal antibody production CD14 and CD209 primary antibodies were synthesized from AbFrontier (Seoul, South Korea). Peptides were designed with "NH 2 -GNPYMDPEALQHQEDPMAS-COOH" amino acid sequence near the C-terminal region and "NH 2 -DPKEPEEKTWTGPVLVERC-COOH" near the N-terminal region of CD14 and CD209, respectively. The immunization, determination of titer, and purification methods are described in supplementary information.

Sampling of pSSCs
The goal of this study was to identify the biomarkers of porcine spermatogonial stem cell that can be used to understand the biological and physiological study of spermatogenesis and early differentiation of spermatogonia. Based on our previous study [16], highly purified populations of pig spermatogonial stem cells were successfully isolated and their gene expression profiles were analyzed. Alkaline phosphatase (AP) staining of established pSSCs colonies in 31°C culture conditions showed the strong reactivity (Fig 1A), and the colonies strongly expressed PGP9.5 (Fig 1C). In addition, pSSCs colonies showed stronger expression of PGP9.5, PLZF, α6, and β1 integrins than 5-and 180-day-old pTestes ( Fig 1B). Based on data presented in [16]  RNA sequencing of samples from pSSC, 5-, and 180-day old porcine testes To identify the similarity of gene expression among the mRNAs from pSSCs, 5-, and 180-dayold pTestes, pairwise scatter plots were utilized. The correlation value was 0.0869 between pSSCs and 5-day-old pTestis and 0.580 between 5-day and 180-day-old pTestis (Fig 2A). The hierarchical clustering result showed that pSSCs were grouped with 5-day-old pTestis ( Fig 2B  and 2C).
Differentially expressed genes in pSSC, 5-, and 180-day old porcine testes In the first step of the principal gene analysis, 18309 (of the total 28493) transcripts with zero FPKMs were excluded (Fig 3A). Of the remaining 10184 transcripts, 607 and 2118 transcripts had a 4 fold difference in gene expression in the pSSCs compared to cells from 5-and 180-day-old pTestes, respectively (Fig 3B). Among the differentially expressed genes, 293 transcripts were upregulated and 314 transcripts were downregulated in 5-day-old pTestis cells compared with pSSCs, whereas 1106 transcripts were upregulated and 1012 were downregulated in 180-day-old pTestis cells compared with 5-day-old pTestis cells (Fig 3B).
The 20 most upregulated and downregulated genes from the comparison of pSSCs to 5-dayold pTestes (Table 1) and 5 to 180-day-old pTestes ( Table 2) are presented (excluding uncharacterized genes). Additional upregulated (4 fold) and downregulated (<4 fold) genes in each comparison analysis are listed in the supplementary information.
Interestingly, most of the upregulated genes in 180-day-old pTestes compared to 5-day-old pTestes are related to spermatogenic differentiation, including spermatocytogenesis and spermiogenesis of germ cells. For examples, protamine 1 and 2 are expressed only during the post meiotic stages [29] and bind to the phosphate backbone of DNA, and the ratio of these proteins during spermatogenesis is important to determine the shape of the sperm head. An imbalance of this ratio causes subfertility or infertility [30]. Transition protein 2 (TNP2) is a key molecule during histone to protamine replacement. Defects of TNP2 in testes showed an abnormal development of the sperm head with acrosomal defects, impaired migration of the spermatozoa in the female genital tract, and defects of normal fertilization by loss of ability to penetrate the zona pellucida [31,32]. Spermatogenic germ cell development is a predominant activity rather than testes growth in 180-day-old pTestes, therefore increased expression of these genes is predictable, suggesting that the transcriptome analysis in this study to identify differential gene expression is valid for identifying new putative gene expression markers in pSSCs.

Expression of putative SSC markers
Among the 20 highly expressed genes in pSSCs, the expression patterns and levels of MMP1, MMP9, GPX1, CD14, CD209, IGFBP3, KLF9, and CCR1 were evaluated by RT-PCR and real time RT-PCR in pSSC, 5-, and 180-day-old pTestes. The band intensities of these genes were greater in pSSCs than in 5-and 180-day-old pTestes ( Fig 4A). In addition, real time RT-PCR confirmed the higher relative expression level of these genes in pSSCs than in 5-and 180-dayold pTestes ( Fig 4B). To identify the expression and localization of these proteins in 5-and 180-day-old pTestis tissues, immunohistochemistry with antibodies for CD14, CD209, and IGFBP3 antibodies was performed. Due to the lack of species-specific compatibility of commercial antibodies for MMP1, MMP9, GPX1, KLF9, and CCR1 to porcine, only CD14, CD209, and IGFBP3 were tested. CD14, CD209, and IGFBP3 were expressed specifically in spermatogonia ( Fig 5). However, the CD209 antibody also stained cells in the interstitial area. In porcine testes, three types of germ cells were detected in the basement membrane: cells that stained for both PGP9.5 and CD209 (white arrow), cells that stained for only PGP9.5 (green arrow), and cells that expressed only CD209 (red arrow) (Fig 5B). In addition, expression of MMP9 in 5-day old testes was also determined (S1 Fig). Together with RT-PCR and real time RT-PCR data, immunohistochemistry clearly confirmed the increased expression of these genes in spermatogonia in 5-and 180-day-old pTestes, suggesting that these genes can be used as putative biomarkers of the early stage of porcine spermatogonia. Interestingly MMPs are among the highly expressed genes in pSSCs. MMPs are a family of enzymes playing an essential role in proteolytic degradation of the extracellular matrix and are involved in matrix remodeling in relation to embryonic developmental processes, inflammation and tumor invasion and metastasis [33][34][35][36]. MMP2 and MMP9 are known to bind to heparan sulfate proteoglycans, and CCR1, one of the highly expressed genes in pSSCs. CCR1 + myeloid cells appear to enhance tumor invasion by producing metalloproteinases, MMP9 and MMP2 [37]. Both MMP9 and MMP2 are positioned to interact with the cell surface adhesion molecules or receptors and to regulate the turnover of these molecules [38]. MMP9 is abundantly expressed in both Sertoli cells and gonocytes and MMP1 is localized to Sertoli cells in human fetal testis [39]. However, in the present study, MMP9 expression was observed specifically in spermatogonia of 5-day-old pTestes and in in vitro cultured pSSCs. In rodent testes, colocalization of MMP2, laminin γ3, and β1 integrin was observed in the basal ectoplasmic region and activated when the spermatid detached and migrated from the epithelium [40,41]. Interestingly, treatment with specific MMP2 and MMP9 inhibitors delayed detachment of spermatid by inhibition of proteolysis of laminin [40]. Proteolysis of laminin through MMPs may be essential in migration of meiotic germ cells. In rodent testes, proximal extracellular matrix (ECM), which is mainly composed of type IV collagen, laminins, heparin sulfate proteoglycan and entactin, constituted the basement membrane of seminiferous tubule, and contact with Sertoli cells and spermatogonia [42]. Taken together, high MMP expression levels in both in vitro cultured pSSCs and spermatogonia in 5-day-old pTestes may play a role in the migration of spermatogonia by degradation of the surrounded ECM, because, in vivo, gonocytes and spermatogonia are initially located at a distance from the basement membrane of spermatic cord and seminiferous tubule at early ages, but they move to settle on the basement membrane before puberty.
In addition, MMP1 can degrade insulin like growth factor binding proteins (IGFBPs) into fragments with low affinity for insulin like growth factor (IGF), thus the bioactivity of IGFs to cell surface IGF receptors [43]. Although the precise role of IGF axis and their regulatory mechanism in germ cell development, expressions of IGFs have been reported in human Sertoli cells and primary spermatocytes [44] and equine testes [45]. In addition, Total RNA Sequencing to Identify the Biomarkers of pSSC Total RNA Sequencing to Identify the Biomarkers of pSSC spermatogonia can be an important source of testicular IGFs that may regulate Sertoli cells through a paracrine factor [46], and spermatogonia proliferation and differentiation as an autocrine factor [47].
In addition, urokinase plasminogen activator (u-PA) inhibited the activities of IGFBP3 by proteolysis and increased the bioactivities of IGFs [48,49]. Therefore expression of MMPs, u-PA, and IGFBP3 in pSSCs may be relevant to regulation of IGFs function in the early stages of spermatogenesis.
CD14, a 55-kDa glycoprotein found on the cell surface of myeloid cells [50], is also known as a receptor for lipopolysaccharides (LPS) or a complex of LPS and LPS binding protein [51][52][53] and regulates LPS-induced cell activation [54]. CD14 mRNA was upregulated in male reproductive organs, including testis, epididymis, and seminal vesicles by LPS-treatment [55]. Although the role and mechanism of CD14 in testes are not identified yet, flow cytometric analyses confirmed the expression of CD14 on a subpopulation of cryptorchidism testis cells in enriched SSC [56]. In addition, the antioxidant enzymes, especially glutathione peroxidase-1 (GPX1), were upregulated in in vitro cultured pSSCs. Additionally, GPX1 was also upregulated in type A spermatogonia in rodents [57].
CD209 is a type II membrane protein with a C-type lectin extracellular domain [58]. CD209 plays an important role in establishing the initial contact between dendritic cells and the resting T lymphocytes through its recognition of intercellular adhesion molecule-3. Immunofluorescence analysis revealed that CD209 is only expressed on a small percentage of CD14 positive blood cells, whereas it is highly expressed on immature dendritic cells in peripheral tissues and in vitro derived monocyte-derived dendritic cells [58,59]. The staining patterns were almost identical to those observed in the present study (Fig 5B).
In conclusion, transcriptome analysis of in vitro cultured pSSCs, 5-, and 180-day-old pTestis cells were conducted to discover putative biomarkers for porcine spermatogonia and spermatogonia stem cells. Among the highly expressed genes in pSSCs compared to 5-day-old pTestis cells, the expression ofMMP9, MMP1, GPX1, CCR1, IGFBP3, CD14, CD209, and KLF9 was evaluated by RT-PCR and real time RT-PCR to confirm that expression of these genes was specific to pSSCs. In addition, localization and expression of CD14, CD209, and IGFBP3 were determined by immunohistochemistry in spermatogonia of 5-and 180-day-old pTestes, suggesting that newly identified putative biomarkers for porcine spermatogonia can be useful to study early events of spermatogenesis and to enrich spermatogonia in vitro.