Seasonal differences in the testicular transcriptome profile of free-living European beavers (Castor fiber L.) determined by the RNA-Seq method

The European beaver (Castor fiber L.) is an important free-living rodent that inhabits Eurasian temperate forests. Beavers are often referred to as ecosystem engineers because they create or change existing habitats, enhance biodiversity and prepare the environment for diverse plant and animal species. Beavers are protected in most European Union countries, but their genomic background remains unknown. In this study, gene expression patterns in beaver testes and the variations in genetic expression in breeding and non-breeding seasons were determined by high-throughput transcriptome sequencing. Paired-end sequencing in the Illumina HiSeq 2000 sequencer produced a total of 373.06 million of high-quality reads. De novo assembly of contigs yielded 130,741 unigenes with an average length of 1,369.3 nt, N50 value of 1,734, and average GC content of 46.51%. A comprehensive analysis of the testicular transcriptome revealed more than 26,000 highly expressed unigenes which exhibited the highest homology with Rattus norvegicus and Ictidomys tridecemlineatus genomes. More than 8,000 highly expressed genes were found to be involved in fundamental biological processes, cellular components or molecular pathways. The study also revealed 42 genes whose regulation differed between breeding and non-breeding seasons. During the non-breeding period, the expression of 37 genes was up-regulated, and the expression of 5 genes was down-regulated relative to the breeding season. The identified genes encode molecules which are involved in signaling transduction, DNA repair, stress responses, inflammatory processes, metabolism and steroidogenesis. Our results pave the way for further research into season-dependent variations in beaver testes.

Introduction Animal Ethics Committees (local approvals: SGGW/11/2010 and UWM/87/2012/DTN). Beavers were captured by the same group of qualified staff over a period of two years (2011)(2012), in the same period of the year, during two different stages of reproductive activity: in April-'breeding period' (n = 4) and November-'non-breeding', sexual silence (n = 4). Every experimental group consisted of different individuals. The animals were placed in cages and transported to the laboratory of the Research Station for Ecological Agriculture and Preservation of Native Breeds of the Polish Academy of Sciences in Popielno. Each beaver was weighed in the cage, and the weight of an empty cage was subtracted. Beavers were anesthetized by injection of xylazine (3 mg/kg BW; Sedazin1, Biovet Puławy, Poland) and ketamine (15 mg/ kg BW; Bioketan, Vetoquinol Biowet, Poland) and sacrificed by exsanguination. The testes were sampled, frozen immediately in liquid nitrogen and stored at -80˚C until total RNA isolation.

RNA isolation and transcriptome sequencing
Total RNA for transcriptome sequencing was isolated from four testicular compartments (n = 2 for each experimental group) using the Qiagen RNeasy Kit and the Qiagen RNase-Free DNase Set for cleanup (Qiagen, USA) according to the manufacturer's recommendations. The concentration and purity of total RNA isolated from testicular tissues were analyzed with the Tecan Infinite M200 plate reader (Tecan Group Ltd, Switzerland), and RIN values were evaluated in the Agilent Bioanalyzer 2100 (Agilent Technology, USA). Total RNA was isolated using an OpenExome kit (Poland) to prepare sequencing libraries. The cDNA library was developed based on total RNA with the TruSeq Stranded mRNA LT Sample Prep Kit v3, using the appropriate index adapters for each sample. The concentration of mRNA ranged from 305 to 348 ng/μl. Libraries were pooled and sequenced to produce 2 x 100 bp paired-end (PE) reads in the Illumina HiSeq 2000 high-throughput sequencer. All samples were sequenced in a single lane with human reference mRNA as positive control. Libraries were quantified with the use of the KAPA library quantification kit.

De novo transcriptome assembly
The quality of raw paired-end reads was evaluated with the use of the FastQC software v.0.10.0 (www.bioinformatics.babraham.ac.uk). Adaptor sequences were removed from raw sequencing reads with the AlienTrimmer tool in default mode [18]. Low-quality reads (PHRED <20) were removed from the dataset. Trimmed sequences from 4 testis samples were de novo assembled in the Trinity software v. r20140717 [19] with a 20-core processor and a 90 GB RAM server of the Regional IT Center (Olsztyn, Poland). The reconstructed contigs were used to develop a reference transcriptome of Castor fiber with a minimum transcript length of 500 nt. De novo assembly was performed with the Trinitystats.pl script, and the assembly statistics included average contig length, GC content, total assembled bases and N50 parameters. Duplicate transcripts were removed from the reference transcriptome in the CD-HIT-EST v.4.6 software (http://weizhonglab.ucsd.edu/cd-hit/) to minimize data redundancy (by clustering sequences at 95% identity) [20]. The Benchmarking Universal Single-Copy Orthologs (BUSCO) v.1.1 tool [21] was used to check the completion of transcriptome assembly based on the percentage of transcripts identified as putative core eukaryotic genes (CEGs).

Comprehensive transcriptome analysis
The assembled transcripts with FPKM > 2 in at least two probes were used in a comprehensive analysis because this approach proved to be highly effective in other studies [22,23]. Selected contigs were blasted (blastx, cut-off E value of 1e-5) against five ENSEMBL databases of protein sequences from Mus musculus, Rattus norvegicus, Homo sapiens, Ictidomys tridecemlineatus and Dipodomys ordii. The highly expressed transcripts (FPKM >2) were aligned with the NCBI database of rodent proteins with the use of the CloudBlast (blastx-fast) tool implemented in BLAST2GO to obtain genes specific for beaver testes [24]. Transcripts were annotated to three main gene ontology categories: biological process (BP), cellular component (CC) and molecular function (MF). The search for conserved protein motifs and the prediction of protein families were performed in the InterProScan v.5 program [25] for each highly expressed transcript.
The obtained beaver contigs were compared with the rat genome to determine the abundance and distribution of transcripts. Contigs with FPKM >2 and longer than 500 bp were used as search queries in blastx against the NCBI nr protein database to identify testicular unigenes. Unigenes were annotated with the BLAST tool against the nr database with cut-off Evalue of 10 −5 . The obtained unigenes were mapped onto the Rattus norvegicus genome, a model organism that is most closely related to the beaver, with the use of the GMAP software [26]. Each unigene was compared with the rat genome. Based on this alignment, the reciprocal best-hits method with a 100 kb window was used to determine the distribution of beaver transcripts across the rat genome. The RBH graph was generated in the Circos program [27].

Analysis of gene expression
Trimmed paired-end reads of each sample were realigned to a non-redundant beaver transcriptome in the Bowtie software [28]. Mapped contigs were filtered to remove transfrags with low expression levels. Non-normalized read counts and TMM normalized FPKM were calculated in the eXpress software [29] to estimate transcript expression levels.
The expression of contigs with absolute log fold-change (logFC) of 2 or higher and the corrected p-value (FDR) of less than 0.05 was significantly different. Two statistical methods were used to confirm differentially expressed transcripts in the testes (autumn vs. spring): DESeq [30] and edgeR [31]. Differentially expressed contigs were aligned to the NCBI nr protein database with blastx (cut-off E-value of 10 −5 ) to identify the coding DNA sequence (CDS). Differentially expressed genes (DEGs) were visualized in a heatmap plot with gplots in R package (http://www.r-project.org/). All DEGs were compared against the Homo sapiens database in the KEGG Orthology Based Annotation System (KOBAS) for metabolic pathway and Gene Ontology enrichment analyses with p-value < 0.05 and q-value < 0.9.
RNA isolation, cDNA synthesis and qRT-PCR for validation of transcriptome sequencing data Total RNA was isolated from beaver testes collected during breeding (n = 4) and non-breeding seasons (n = 4) using the A&A mini column kit (A&A Biotechnology, Poland) with a DNase treatment step. The concentration and the amount of the isolated RNA were determined spectrophotometrically (Infinite M200 PRO, Tecan, Switzerland), and RNA integrity was verified on 1.5% agarose gel. One μg of RNA was reverse transcribed into cDNA with a total volume of 20 μL with the use of the QuantiTect1 Reverse Transcription Kit (Qiagen, USA). The reaction included two steps: preliminary genomic DNA elimination (2 min at 42˚C) and main reverse transcription (30 min at 42˚C). The reaction was terminated by incubating the samples for 3 min at 95˚C in a thermal cycler (Eppendorf, Germany).
To validate transcriptome sequencing results, the expression of differentially regulated genes, such as PRRKG1, DSG2, SMARCA2, BMX and AGT, was determined by qRT-PCR with the use of the 7300 PCR System and the Power SYBR Green Master Mix (Applied Biosystems, USA). The constitutively expressed ACTB and GAPDH genes were used as reference (housekeeping) genes according to a previously described procedure [32,33]. Specific primer sequences (S1 Table) for amplifying target and reference genes were designed in the Primer Express Software 3 (Applied Biosystems). PCR reaction mixtures with a final volume of 25 μL consisted of cDNA (20 ng for target genes and 5 ng for reference genes), 400 nM of the primers, 12.5 μL of the Power SYBR Green PCR Master Mix (Applied Biosystems, USA), and RNase-free water. The qRT-PCR reaction conditions were as follows: enzyme activation and initial denaturation at 95˚C for 10 min, followed by 40 cycles of denaturation at 95˚C for 15 s, annealing for 1 min at 60˚C for all tested genes. In no template controls (NTC), cDNA was replaced with RNase-free water. The efficiency of qRT-PCR for each tested cDNA was determined at 100%. All samples were run in duplicates. The specificity of amplification was tested at the end of the reaction by analyzing the melting-curve. The relative expression of the tested genes was calculated with the use of the comparative cycle threshold method (ΔΔCt) described by Schmittgen & Livak [34], where ΔΔCt is obtained by subtracting ΔCt of the geometric mean, calculated from the Ct value of both reference genes (GAPDH and ACTB), from the corresponding ΔCt of each experimental sample of the target gene. The group with the lowest expression was selected as the calibrator. The results of qRT-PCR were processed statistically in the Statistica software (Statoft Inc. Tulsa, USA) with the use of the Student's t-test and were expressed as means ± SEM. The results were regarded as statistically significant at p < 0.05.

De novo transcriptome assembly
A total of 380.22 (2 x 190.11) million of paired-end reads were sequenced in the Illumina HiSeq 2000 sequencer for the de novo assembly of the transcriptome of Castor fiber testicular tissue (Table 1).
After processing of raw reads (adaptor sequences were cut and low-quality reads with Phred value < 20 were removed), 373.06 million (98.12%) clean reads were obtained and assembled de novo using the Trinity software. Assembly results are summarized in Table 2. The assembly was performed on 260,311 contigs. To minimize redundancy, contigs with sequence similarity higher than 95% were clustered in 231,045 non-redundant contigs grouped into 130,741 unigenes. Each unigene was described by a group of related transcripts included in the same de Bruijn graph, and each cluster was regarded as a single gene. Non-redundant contigs ranged from 500 nt to 20,720 nt, with N50 value (the length of the contig describing half of the sum of the lengths of all contigs) of 1,734 nt. The longest transcript was identified as fibrous sheath interacting protein 2 with 99% coverage and 69% sequence identity with Marmota marmota (using blastx, NR Rodentia). More than 100 k contigs were longer than 1000 nt. A total of 39,657 unigenes with ORFs longer than 300 nt were detected.

Comprehensive analysis of beaver testicular transcriptome
In the set of all non-redundant contigs, 88% (2663/3023) complete and 4.4% (136/3023) fragmented core eukaryotic (vertebrate) genes were found. More than 92% of the predicted housekeeping genes confirmed the completeness of transcriptome assembly. The whole set of clean reads was aligned back (with MAPQ >30) to the non-redundant Castor fiber transcriptome, and 88.7% (Cf_a1) to 91.5% (Cf_s2) of reads were successfully mapped ( Table 2). These data were used for transcriptome profiling (comprehensive analysis) and differential expression analyses. Gene counts and TMM normalized FPKM values were calculated with the eXpress tool, and transcript expression levels were estimated. To remove weakly expressed transcripts from assembly datasets, only contigs with FPKM > 2 in at least 50% of the samples were used in comprehensive analysis. In the group of 26,228 unigenes that fulfilled the above requirement, the longest isoforms were extracted and blasted (cut-off Evalue of 10e-5) with five protein Ensembl databases. The blastx search revealed 14,734, 14,696, 14,043, 13,072 and 14,065 highly expressed homologs for Rattus norvegicus, Ictidomys tridecemlineatus, Mus musculus, Dipodomys ordii and Homo sapiens, respectively. In this group, 258 corresponding homologs were identified exclusively in rat, 33 in squirrel, 16 in mouse, and 1 in kangaroo rat. The remaining 172 beaver transcripts were not found in Rodentia, and they had only human homologs (Fig 1).
Highly expressed transcripts were compared with the rat genome (the best described Rodentia genome with the closest phylogenetic relationship with Castor fiber). A total of 15,774 unigenes were annotated using the NCBI nr database. The results of contig mapping to the rat genome are visualized in Fig 2. Approximately 60.14% of highly expressed transcripts longer than 500 bp with FPKM>2 were mapped to orthologs localized on the rat genome. A comprehensive analysis revealed that transcriptome assembly had a broad representation and could be used as the reference transcriptome for Castor fiber testicular tissue.
In the cellular component category, the highest number of transcripts matched the term integral component of membrane (793), cytoplasm (713), nucleus (641) and extracellular exosome (579). The molecular function category consisted of 1885 terms which were assigned to 3819 unigenes. In this category, the majority of unigenes (457) were associated with ATP binding (Fig 3).
The majority of functional annotation results (3,699 out of 8,188) matched Ictidomys tridecemlineatus. InterPro domain annotations showed functional information for 13,069 unigenes. Most of the unigenes annotated in the InterProScan were found in protein domain and

Seasonal differences in beaver testicular transcriptome
A comparison of the testes harvested during breeding (April) and non-breeding (November) seasons revealed 152 transcripts (including 42 protein coding genes) that were significantly differentially expressed (log2FC >1; FDR 0.05) (Fig 4). The remaining non-annotated differentially expressed contigs were putative noncoding RNAs or novel genes. Of these, 37 genes were up-regulated and 5 genes were down-regulated in beaver testes from the non-breeding season relative to the samples obtained in the breeding season ( Table 3).

Comparison of RNA-Seq and qRT-PCR values
An analysis of mRNA abundance in genes evaluated by qRT-PCR revealed a correlation with RNA-Seq results (Fig 5). The mRNA levels PRKG1, DSG2, SMARCA2 and BMX were higher and AGT expression was lower in November than in April.

Discussion
To the best of our knowledge, this is the first study providing a comprehensive analysis of transcriptome profiles of European beaver testes with the use of the RNA-Seq technique. It is also the first study to describe variations in transcriptome profiles between breeding and nonbreading seasons. The RNA-Seq technique is a powerful tool for generating large-scale transcriptome data that were useful for presenting differentially expressed genes in various species, tissues or cell types. According to some studies, Illumina sequencing data are replicable with relatively little technical variation [35,36].
In the current study, we obtained a total of 373.06 million high-quality reads (98.12% Q2 bases). De novo assembly of contigs yielded 130,741 unigenes which can be useful in future studies of functional genomics in the European beaver. The average length of unigenes with N50 value of 1,734 was 1,369.3, and it was higher than that reported by other researchers [37][38][39][40]. The above indicates that our transcriptome sequencing results were successfully assembled and that they are of high quality.
A comprehensive analysis of the testicular transcriptome revealed more than 26,000 highly expressed unigenes which exhibited the highest homology with Rattus norvegicus and Ictidomys tridecemlineatus genomes and lower homology with other rodent species such as Dipodomys ordii or Mus musculus. More than 8,000 highly expressed beaver genes were found to be involved in fundamental biological processes, cellular components or molecular pathways. The largest group of genes encoding biological processes comprises genes responsible for the regulation of metabolic processes, transcription and regulation of transcription. A set of genes encoding male reproductive functions was also identified. This group comprises genes controlling the development and proliferation of Sertoli cells, spermatoid differentiation, sperm development, sperm motility and the steroid hormone mediated signaling pathway. Some testisspecific markers involved in beaver reproductive functions could be omitted if they did not meet stringent search criteria, i.e. if they did not show at least a two-fold change in the expression level.
The present study revealed 42 genes that were differentially regulated in the testes during the breeding season than in the non-breeding season. Surprisingly, during the non-breeding season (November), a much larger group of genes (37) was significantly more highly expressed, and a smaller group of genes (5) was less expressed than during the breeding season (April). The up-regulation of a larger number of genes in mouse testes after androgen withdrawal was determined in a microarray analysis by Petrusz et al. [41] and Sadate-Ngatchou et al. [42]. In the present study, the observed variations in the transcript profiles of beaver testes could be linked with plasma androgens concentrations. However, in our previous study of  beavers, plasma testosterone levels did not differ significantly between the analyzed reproductive seasons [16].
In the group of differentially regulated high-impact genes, 18 genes encoded signaling molecules, 7 genes encoded transcription factors and DNA repair molecules, 9 genes encoded cell surface proteins, 10 genes encoded stress response or inflammatory process, 12 genes encoded ion channels and 4 genes encoded extracellular matrix components. Only the most interesting genes are briefly discussed below.
Several up-regulated transcripts from the non-breading season could be involved in the regulation of spermatogenesis in beaver testes, including VLDLR which belongs to the low-density lipoprotein receptor (LDLR) family. It has been reported that VLDLR overexpression in germ cells disrupted spermatogenesis in transgenic mice [43], which suggests that the VLDLR transgene inhibits the development of sperm cells. Similar conclusions could be drawn from our results which showed markedly higher transcript levels in beaver testes during the non-breeding season when spermatogenesis is limited.
SMAD proteins are intracellular mediators of the transforming growth factor-β (TGFβ). These transcription factors are implicated in bone morphogenetic protein (BMP) and activin signaling, they are highly involved in the regulation of male fertility, and they influence germ and somatic cells during fetal and postnatal life [44,45]. In 15-day-old mice, the SMAD5 transcript was identified by in situ hybridization in spermatogonia, whereas a weak signal was detected in spermatocytes. In adult testes, the signal was strongest in spermatogonia and spermatocytes and weaker in round spermatids and in Sertoli cells [46]. Interestingly, our findings revealed up-regulation of the SMAD5 transcript in beaver testes during the non-breeding season, which could indicate that the BMPs/SMADs pathway could play a role in seasonally dependent reproductive activity of the beaver.
The present findings indicate seasonally dependent changes in the expression of genes encoding transporter proteins. The expression of SLC7A1 and KCNMB2 (BK channels) in beaver testes was significantly higher in November. The SLC7A1 gene mediates the transport of cationic amino acids across cell membrane. It has been reported that rat Sertoli cells express SLC7A1 which relies on the NA+-independent transport system to deliver arginine or other cationic amino acids to germ cells [47]. The KCNMB2 gene is a calcium-gated potassium channel that has been described in various mammalian endocrine cells, including human and hamster testicular Leydig cells [48]. It has been suggested that these channels contribute to Leydig cell steroidogenesis. Interestingly, the presence of iberiotoxin, a specific channel blocker, did not induce changes in testosterone production by hamster Leydig cells in vitro under basal conditions, but a significant increase in testosterone levels was reported when hCG was added to culture media [48]. According to Gong et al. [49], these ion channels could also play an important role in spermatogenesis control. Sperm cells could possess a Ca2+-activated K+ channel which has been implicated in sperm activation and gamete interaction [50]. The red points represent the contigs of p adjusted < 0.05 and log2 fold change > 1. (C)-represents the Venn diagram of the quantity of DEGs calculated by both methods: edgeR and DESeq (log2 fold change > 1, p adjusted < 0.05). Values in the "up" line denote the number of up-regulated genes, values in the "down" line denote the number of down-regulated genes, and values in the "all" line denote the total number of down-and up-regulated genes. (D)-represents the matrix of Pearson correlation coefficients calculated for 152 contigs designated as significantly expressed using the DESeq statistical method. Blue color represents high correlation and red-low samples correlation. Samples are clustered by both rows and columns. Cyan and red bars indicate the samples grouped by seasons. E-represents the heat map of differences in the expression of 42 genes between spring and autumn. The rows represent selected genes; in each column, individual samples are grouped into two seasons (upper dendrogram): spring (red bar) and autumn (cyan bar). The colors and intensity of fields in the heat map indicate transformed fold change (log2 fold change values from -3,767 to 3,767): red-underexpression, green-overexpression. The gene order was generated automatically in the dendrogram (left side of the graph).
https://doi.org/10.1371/journal.pone.0180323.g004 Transcriptome profile of beaver testes above findings indicate that changes in the membrane potential of germ cells could be an important element of the signaling mechanism. The large heparan sulfate proteoglycan (HSPG2, perlecan) is an extracellular matrix component that is normally expressed in the basement membrane (BM) underlying epithelial and endothelial cells. It was detected in the seminiferous tubule BM and in the interstitium, where the protein was localized around blood vessels and Leydig cells [51]. HSPG2 could also play an important role in self-renewal and differentiation of spermatogonial stem cells in the testes [51,52].
Receptor tyrosine kinase ErbB4 has been detected on circulating human monocytes and neuronal macrophages, which points to its involvement in inflammatory processes. ErbB4 was also highly expressed in mouse Sertoli cells and in germ and Leydig cells [53]. Interestingly, targeted inactivation of ErbB4 in Sertoli cells induced a reduction in testis size, decreased testicular androgen production and delayed spermatogenesis [53].
The down-regulated transcripts identified in our study encode proteins that could be involved in stress responses and inflammatory processes. IRAK1BP1 is an inhibitory component of TLR, IL-1 and TNFR-related pathways [54,55]. Overexpression of IRKA1BP1 in macrophages increased the expression of anti-inflammatory IL-10, but decreased the synthesis of proinflammatory IL-6. IRKA1BP1 also contributes to LPS-induced tolerance by influencing NF-B [55].
In this study, the identified down-regulated genes also included angiotensinogen gene (AGT) which is a part of the rennin-angiotensin system. In rat testes, only renin mRNA was detected, whereas both renin and angiotensinogen mRNA were found in mouse testes [56]. The AGT transcript was identified as one of the up-regulated genes (2.64-fold increase, microarray analysis) in the testes of transgenic mice carrying the androgen-binding protein (ABP) gene [41]. The importance of the angiotensin-converting enzyme (ACE, a component of the rennin-angiotensin system) for sperm functions has been demonstrated in male mice by targeting genes that lack somatic ACE but retain testicular ACE [57]. Interestingly, we found that the LNPEP (IRAP) gene, which is involved in the rennin-angiotensin system, was up-regulated in beaver testes outside the breeding season. The LNPEP gene encodes zinc-dependent aminopeptidase, cleaves vasopressin, oxytocin and bradykinin, and catalyzes the final step in the conversion of angiotensinogen to angiotensin IV. The results of our study indicate that angiotensin could be produced locally in beaver testes, and they provide important insights into the biology of the renin-angiotensin system in this species.
In summary, this is the first study to generate a representative testicular transcriptome of the European beaver with the use of the powerful RNA-Seq technique. It is also the first study to describe seasonally dependent variations in the transcriptome profiles of beaver testes. We identified a set of 42 genes encoding molecules that are involved in signal transduction, DNA repair, stress responses, inflammatory processes, metabolism and steroidogenesis. Our findings pave the way for further research into the processes that occur in beaver testes during various periods of reproductive activity.  Table. Highly expressed genes (FPKM >2) involved in the biological processes. Gene Ontology terms related to spermatogenesis, steroid hormone mediated signaling pathway, sperm motility, male gonad development, Sertoli cell development, spermatid development, spermatid differentiation, sperm axoneme assembly and Sertoli cell proliferation using the Blastx-fast algorithm in the Blast2GO tool. The columns contain the following information: "SeqName"-the ID of contigs, "Description"-full protein names, "Length"-contig length values, "sim mean"-average similarity between the aligned sequences. (DOCX) S3 Table. Highly expressed unigenes (FPKM >2) assigned to 116 KEGG pathways. (DOCX) S4 Table. Highly expressed genes (FPKM >2) associated with the steroid hormone biosynthesis KEGG pathway (Blastx-fast algorithm in the Blast2GO tool). The columns contain the following information: "SeqName"-the ID of contigs, "Description"-full protein names, "Length"-contig length values, "sim mean"-average similarity between the aligned sequences.  Table. Functional annotation of differentially expressed genes based on GO, KEGG and Reactome human databases. The columns contain the following information: "Term"-names of terms in each database, "Database"-database name, "ID"-the id of terms in each database, "Input number"-number of assigned unigenes, "Background number"-number of all genes assigned to a given term in each database, "p-Value","Corrected p-Value"-statistical significance of enrichment, "Input"-short names of assigned DEGs. (XLS)