Comprehensive Identification of Sexual Dimorphism-Associated Differentially Expressed Genes in Two-Way Factorial Designed RNA-Seq Data on Japanese Quail (Coturnix coturnix japonica)

Japanese quail (Coturnix coturnix japonica) reach sexual maturity earlier, breed rapidly and successfully, and cost less and require less space than other birds raised commercially. Given the value of this species for food production and experimental use, more studies are necessary to determine chromosomal regions and genes associated with gender and breed-differentiation. This study employed Trinity and edgeR for transcriptome analysis of next-generation RNA-seq data, which included 4 tissues obtained from 3 different breeding lines of Japanese quail (random bred control, heavy weight, low weight). Differentially expressed genes shared between female and male tissue contrast groups were analyzed to identify genes related to sexual dimorphism as well as potential novel candidate genes for molecular sexing. Several of the genes identified in the present study as significant sex-related genes have been previously found in avian gene expression analyses (NIPBL, UBAP2), and other genes found differentially expressed in this study and not previously associated with sex-related differences may be considered potential candidates for molecular sexing (TERA, MYP0, PPR17, CASQ2). Additionally, other genes likely associated with neuronal and brain development (CHKA, NYAP), as well as body development and size differentiation (ANKRD26, GRP87) in quail were identified. Expression of homeobox protein regulating genes (HXC4, ISL1) shared between our two sex-related contrast groups (Female Brain vs. Male Brain and Ovary vs. Testis) indicates that these genes may regulate sex-specific anatomical development. Results reveal genetic features of the quail breed and could allow for more effective molecular sexing as well as selective breeding for traits important in commercial production.


Introduction
Japanese quail (Coturnix coturnix japonica) have the fastest growth rate of all species in the Phasianidae family of birds, which includes species bred and raised primarily for human consumption such as chickens and turkeys [1]. Females reach sexual maturity at around 5-6 weeks on average, which is extremely early compared to the five-month maturation period of chickens. These features make the breed attractive not only for commercial production and genetic improvement, but also for research on reproduction and sex [2]. The feed and space required per bird is less for quail than for chickens or turkeys, making them excellent for commercial purposes [3]. In addition, the breed is valuable for experimental purposes [4]. Quail are commonly used as an experimental model for other species of commercial and non-commercial poultry and share many characteristics and behaviors with domestic chicken [5]. The birds lay eggs almost indefinitely under photoperiods longer than~12 h and possess the ability to adapt and breed successfully in laboratory conditions [6]. Domestic quail produce eggs equivalent to their body weight every 19 days, with an egg-to-body weight ratio higher than that of the chicken. Females are reported to reach sexual maturity at 40 days, have a growing mortality of 5-20 percent, hatch ability of 60-70 percent, a fertility of 75-90 percent, and have a productive life of one year.
Given the demand for quail for commercial production and experimental use, more studies are necessary to determine chromosomal regions and genes associated with gender and breeddifferentiation in the species. There are many biological strategies for understanding and using genetic molecular information. Two major approaches to comprehensive transcriptome profiling exist-microarray and next generation sequencing (NGS)-based RNA-seq. The microarray chip-based approach is the widest used for identifying numerous genes, simultaneously. Several exploratory gene expression studies have been performed on quails using microarrays [7,8]. However, this method does not provide a quantitative assessment of transcript abundance, which could affect the appropriate identification of suitable gene candidates for sexing. In turn, RNA-seq methods have recently become more popular for this line of research, and several comparative studies of the two approaches have shown that RNA-seq approaches are both more reliable and reproducible [9]. While many studies are routinely performed using RNAseq, there have been no previous attempts to profile the quail transcriptome. This may be due to the absence of an official quail genome with transcriptome annotation that can be used as reference for transcript NGS sequence assembly, which involves mapping the overlapping reads that are generated onto reference genomic sequences. While official reference genomes and transcriptomic information exist on public databases for avian species such as zebra finch and chicken, only a recently developed DNA reference genome exists for quail [10]. In cases like this, a Trinity read assembly-based approach is suggested for the analysis of RNA-seq data without reference genomic and transcriptomic information [11].
In this paper, comprehensive RNA-seq profiling of Japanese quail was performed using 12 samples that include two factors (different quail lines and tissues) and a two-way factorial analysis. Model-based methodology has been developed for the RNA-seq which makes it possible to consider several related experimental factors [12]. By considering multiple factors, new undiscovered genes can be detected. Here, Trinity and edgeR were used to analyze transcriptome data obtained from 4 tissues [male brain (MB), female brain (FB), ovary (O), and testis (T)] and 3 different lines of Japanese quail (random bred control [RBC], low weight [LW], and heavy weight [HW]) with the initial goal of identifying significant sex related genes.

Material and Methods
Animal handling and tissue sampling Samples were collected from 12 total animals, 4 from each of the three Japanese quail breeding lines: randomly bred control (RBC), heavy weight (HW), and low weight (LW). These lines were selectively bred over 80 generations for their body weight. Individuals were reared at The Ohio State University Poultry Facility, Columbus, OH. The animal care and experimental procedures followed protocol (2013A00000041) approved by the OSU Institutional Animal Care and Use Committee. Individuals were housed in cages specifically designed for quail with light provided 14 hours a day. Quail were sacrificed at 3-months of age during their light cycle. Samples from the brain, testis, and ovary were collected, snap-frozen in liquid nitrogen, and stored in a freezer at -80 degrees for processing. To extract total RNA from the ovary, large yolks were removed before homogenization. The whole testis and brain were used to extract total RNA.

Procedures of RNA sample preparation
To generate RNA-seq reads, we applied Illumina-based NGS sequencing. Total RNA was prepared from tissues, quantitated using Nanodrop spectrophotometer (Thermo Scientific), and quality-assessed with the RNA 6000 Nano assay kit (Agilent) using a Bioanalyser 2100 (Agilent). NGS sequencing libraries were generated from one microgram of total RNA using the Truseq RNA Sample Prep Kit (Illumina) according to the manufacturer's protocol. Briefly, RNA was purified using oligo-T attached magnetic beads. After purification, the total poly A + RNA was fragmented into small pieces (the average insert size for paried-end libraries was approximately 350bp). The segmented mRNA fragments were reverse transcribed. These fragments were purified with a QiaQuick PCR extraction kit, resolved with EB buffer, and linked to sequencing adapters. Adjoining distinct MID tags separated each cDNA library. The resulting libraries were then paired-end sequenced (2x101bp) with the Illumina HiSeq-2000 system. Finally, complete paired-end sequences were obtained as individual fastq files (forward and reverse) from output images with the CASAVA V 1.8.2 base calling software (https://support. illumina.com/sequencing/sequencing_software/casava.html) with ASCII Q-score offset 33.

Read assembly pipeline
Unfortunately, no official reference genome for Japanese quail exists. In this situation, there are effectively two representative approaches of estimating gene expression using RNA-seq. One approach is to map reads to a related species. In the case of quail, the closest reference genome would be that of the chicken. This approach can only measure conserved regions in both species and is therefore not suitable for precisely profiling the quail's transcriptome. The second, and more suitable, approach when a reference genome is not available is to reconstruct the genome/transcriptome by assembling reads to one another, creating a de novo assembly. We employed the Trinity pipeline for 'non-model' organisms, mapping the reads into assembled consensus without genome reference for measurement of transcriptome levels. Trinity enables the identification of transcript isoforms in non-model species. Prior to assembly, we employed Trimmomatic [13] for removal of adapter sequences. Next, we performed assembly and mapping with the following specific parameters: (1) In order to make reference consensus and assemble the transcriptome, we combined left and right reads, respectively, in paired-end reads using 12 samples; (2) Using combined left and right reads, we created the assembled transcriptome by Trinity.pl with 'min_ker_cov_2' option given the large number of reads; (3) The read mapping onto the assembled transcriptome was performed using bowtie2 [14] through the tophat2 interface [15]; (4) For the abundance estimation, we employed the RSEM tool [16]; (5) Gene annotation was performed using TransDecoder, implemented within Trinity [17]. Furthermore, we applied several developer-recommended gene annotation tools such as blastx, blastp, and hmmscan; (6) Using Trinotate, we extracted information of annotated genes with a default option. In order to define known transcriptomic regions, we used Trinotate (http:// trinotate.github.io) with default options.
Statistical analysis of RNA-seq data for detecting differentially expressed genes (DEGs) In order to normalize gene expression in each sample, we used the TMM normalization method, implemented using edgeR [18]. We used a two-way factorial experimental design, in which 12 RNA-seq samples were obtained following two factors: tissue (FB, MB, O, and T) and line (RBC, LW, and HW). A two-way analysis of deviance (ANODEV) model has been shown to be effective in simultaneously considering the effect of factors for RNA-seq analysis. We applied this approach for our RNA-seq data. A more detailed statistical model is presented below: From left to right, columns represent intercept (baseline), LW, RBC, MB, O, and T, respectively. For easier understanding of the structure of explanatory variables, the linear predictor is simplified and represented as follows: ,where i = {"RBC", "HW", "LW"} and j = {"FB", "MB", "O", "T"}. As shown in (Eq 1) and (Eq 2), we applied a negative binomial-based GLM for detecting DEGs. In the design matrix (Eq 2), "HW" and "FB" are baseline. Therefore, by setting the values in the contrast matrix for interested parameters, we can perform statistical tests for detecting DEGs from certain hypotheses (i.e. using the contrast matrix = (1, -1, 0, 0, 0, 0), we can detect DEGs between HW vs. LW). We performed DEG tests with the following null hypotheses: (1) Effects of breeding line are zero; all organisms have identical gene expression; (2) Tissue effects are zero; all tissues have identical gene expression; (3) The difference between two specific groups is zero. These diverse statistical tests allowed comprehensive identification of DEGs in several conditions by considering two explanatory factors. The P-values from the whole tests on the model were adjusted by the Banjamini and Hochberg method [19] for controlling multiple test errors. In addition, we employed the DAVID functional annotation tool [20] for gene set enrichment analysis.

Clustering analysis of RNA-seq data for investigating relationship of samples
In order to investigate the impact of tissue and line on gene expression and to understand the relationship between these two factors we employed hierarchical clustering, an unsupervised learning method, to unfold the structure of relationships among samples. In clustering analysis, the optimal k (number of clusters) is unknown. In the machine learning field, Silhouette scores have widely been used to estimate the optimal number of clusters [21,22]. Generally, this score is determined by assuming that a well-organized cluster will have a smaller distance value within-cluster and a larger distance value in between clusters. We employed the cluster package within R to calculate the Silhouette score [23]; an individual score is generated for each sample, and we used an average of these scores to evaluate clustering. Many clustering methods for segregation of samples exist based on similarity such as K-means, hierarchical, etc.; for this study, we chose to use hierarchical clustering with two types of correlation based distance measures, Pearson and Spearman correlation coefficients, for understanding the similarity structure of the sample. While the Pearson correlation coefficient is widely used, it often results in different values depending on the distribution of the targeted two vectors. Therefore, we employed two alternate types of correlation measures in this study. In addition, when analyzing tree type similarity structure through hierarchical clustering, an agglomeration method should be used for calculation of the distance between clusters including several samples and the other cluster. We used an average method for determination of representative distance within clusters in this paper.
Technical validation of DEGs detected from RNA-seq analysis using qRT-PCR As RNA-seq data was generated under a non-biological replicated design, technical validation was performed in order to verify analysis results. We performed qRT-PCR experiments with 6 RBC-breed samples of brain tissue (n = 3, male and female, respectively) for 5 significantly detected DEGs-CHD1, NIPBL, VIP, HXC4, and MYP0. Total RNA was isolated from 4 different tissues using the TRIzol reagent (Invitrogen Carlsbad, CA, USA) according to manufacturer's instructions. We used Applied Biosystems Step-One Real-Time PCR system, EvaGreen dye (Biotium, Hayward, CA, USA) with β-actin (ACTB) as the control gene in order to determine relative gene expression (ΔCt). Primers for 5 DEGs are listed in S3 Appendix. In addition, a two-group t-test was employed to determine gene significance.

Results
De novo RNA-seq assembly and mapping of reads A total of 534,610 transcriptome contigs were assembled using the Trinity pipeline. The average and median lengths of assembled contigs were 1,417 bp and 542 bp, respectively. There were 757,777,223 total assembled bases. The summary statistics of the reads aligned to assembled contigs is shown in Table 1. We generated assembled contigs with sufficient depth coverage using 12 whole samples without considering differences between breeding lines. This resulted in 4,502 annotated genes and associated transcription levels (read counts of known transcriptome regions) after filtering out non-expressed genes from all groups. In addition, we filtered results for transcripts having the highest length among duplicated annotated genes. A total of 3,503 unique annotated genes remained after filtering through this (above) criteria; these filtered genes were used in analyses to identify DEGs.
Significant genes between quail lines using contrast analysis upon twoway factorial designed model First, we identified DEGs among the RBC, LW, and HW breeding lines under the null hypothesis, H 0 : Breed ¼ 0, for detection of DEGs differentially expressed in one or all lines. We identified a total of 205 significant (FDR-adjusted P-value < 0.05) DEGs. We then performed downstream analysis of assembled transcripts to explore the biological mechanisms of these DEGs. Results of gene-set clustering analysis using DAVID revealed significant enrichment of the extracellular matrix (ECM) related cluster, which includes extracellular region (3.1E-5), proteinaceous extracellular matrix (3.8E-3), and extracellular matrix (4.6E-3) pathways among others. To further investigate line differences, we performed pairwise comparison on the three quail lines using a contrast matrix with a GLM. We observed 102, 136, and 72 DEGs in the HW vs. LW, HW vs. RBC, and LW vs. RBC contrasts respectively (Fig 1A). Results of the pairwise test on the entire set of genes are listed in S1 Appendix. Here, we focused on the interpretation of difference between HW and LW lines, as these lines display a drastic phenotypic difference in terms of body size. Therefore, we expected detection of genes involved in regulating body size in the DE-list obtained from the comparison of HW and LW lines. We performed clustering analysis using edgeR on TMM-normalized values with whole genes in order to clarify the relationship between quail lines and tissue samples. The optimal number of clusters was estimated as 3 using the Silhouette scoring method with two types of correlation coefficients (Fig 1B). From this number, we visualized the tree with a cutoff of k = 3. HW and RBC lines have similar gene expression pattern compared to the LW line as shown in (Fig 1C and 1D). In addition, samples were clearly distinguished based on the tissue except for brain tissues; FB and MB. Across hierarchical clustering analyses, RBC-FB and RBC-MB were found to be clustered together. This indicates that gene expression is very similar in female and male quail brains. For more detailed examination of this phenomenon, we performed a statistical test for comparison between tissues.   Table 2. We previously observed a less significant effect in the FB and MB contrast compared to the others through clustering analysis. For this reason, the number of DEGs identified between FB and MB is also smaller than that of other groups, as shown in (Fig 2A-2D). In addition, the comparison between O and T showed the greatest number of DEGs. In general, the effect of differences among tissues on gene expression was greater than that of differences between breeding lines. Although we ran all possible pairwise comparisons among all tissues, our focus is primarily on the sex-related difference between male vs. female brain (MB vs FB) and testis vs. ovary (T vs O). As shown in (Fig 2E and 2F), 60 genes were commonly identified as DEGs between the two contrasts. In this heatmap, we observed that very different gene expression patterns between MB vs MB or T vs O. These genes would be strong candidate genes for establishing an understanding of sex-differentiation in quail for purposes of molecular sexing, filling a gap in the research previously conducted on this species.

Sexual dimorphic genes in quail species
Analysis and interpretation of DEGs shared between female and male tissues (FB vs. MB and O vs. T) contrast groups provided insight into sex-related differences. Analysis therefore advanced the primary goal of this study-the identification of potential novel candidate genes for molecular sexing in quail. The link between CHD1 and sex chromosomes has been well established. CHD1 is used as a universal target for molecular sexing in birds, and has been studied extensively in multiple avian species [24,25]. Two homologous CHD1 genes, CHD1-W and CHD1-Z, have been mapped onto each W or Z chromosome, the heterogametic sex chromosomes of birds [26]. While sexing is typically performed through PCR-amplification of these two genes in many species of birds, several concerns regarding this method make identification of species-specific sex related genes particularly salient. Polymorphisms of CHD have been identified in auklets but in no other species; without knowledge of the existence of polymorphisms, the probability of incorrectly assigning sex is high [27]. CHD1 did not appear as a significant differentially expressed sex-related gene in this analysis, further confirming that novel markers are needed for the quail breed. Several of the DEGs we identified as significant sexrelated genes have been previously found in avian gene expression analyses. UBAP2, a DEG shared between the Female Brain vs. Male Brain and Ovary vs. Testis contrast groups, has been found to be sex-related, specifically in chickens [28]. In a study of sexually dimorphic DEGs in chicken brains before gonadal differentiation, a UBAP2 gene located on chromosome W in the female embryonic brain was found up regulated. Other presumably sexrelated genes have been found in chickens by genetic mapping to sex chromosomes. The chicken genome assembly contains 8 genes on the W-chromosome with homologous sequences on the Z-chromosome-ATP5A1, CHD1, HINT, KCMF1, NIPBL, SMADS, SPIN, and UBAP2 [29]. Recently, 2 other genes have been mapped to the W-chromosome in chickens as well-ZFR and ZNF532 [30]. NIPBL, found to be sex-related in these studies on chickens, was also found as a differentially expressed gene with a statistical significance by the present analyses in quail. Apart from UBAP2 and NIPBL, the other significantly expressed genes shared between our two sex-related contrasts have not yet been confirmed as linked to avian sexual dimorphism; we propose that these genes are specific to quail and can be used for species-specific sexing.
A recent study investigated the evolution of avian sex chromosomes using retroposon insertions and random insertion/deletions in order to reconstruct gametologous gene trees [31]. Trees were reconstructed from three sex-related genes to elucidate their evolution-CHD1, NIPBL, and ATP synthase a-subunit isoform 1 (ATP5A1). The goal was to determine when the Z and W chromosomes differentiated, as they are highly differentiated in most extant birds. CHD1 was found to differentiate in the neognathae ancestors; meanwhile, divergence of the previously mentioned NIPBL occurred in the neoavian ancestor. In the third gene they analyzed, ATP5A1 (not found significant in the present analyses), differentiation occurred independently in 3 different types of birds-chickens, the screamer, and duck-related bird ancestors. This research and analysis of divergence is important for knowing which genes can be used in sexing tests in particular species of birds. The three insertions postulated to cover almost all species of living birds. However, species-specific information for quail can allow for more accurate and accessible molecular sexing.
Two genes from the top list of significant DEG's, VIP and ERVV1, appear to be linked to sex and sexual dimorphism in mammals, although no previous research has been performed in the avian species. Gene VIP has been associated with sexual dimorphism in chickens [32] and rats [33,34]. Although ERVV1's function is unknown, placenta-specific expression has been observed, indicating that it may be sex-related [35]. Also interesting is the appearance of two homeobox protein-regulating genes in the top DEG list-HXC4 and ISL1. Homeoboxes are DNA-sequences that regulate patterns of anatomical development; these results suggest that HXC4 and ISL1 may be responsible for anatomical sex differences between male and female quail.
DEGs related to body size of quail lines, identified through comparison of HW and LW Statistical comparison of the three breeding lines (RBC, LW, and HW) resulted in a total of 205 significant (FDR-adjusted P value < 0.05) DEGs. Gene ontology information on this gene list was obtained using DAVID. The ECM related cluster was significantly enriched in this analysis. In a previous transcriptomic study on chickens [36], comparison of microarray data from slow-growing and fast-growing birds revealed the significant impact of the ECM-receptor interaction pathway on chicken muscle development. The ECM-receptor interaction was expected to impact size differentiation in LW and HW quail, which may involve considerable genetic and regulatory changes at organismal level. Five genes were found to be significantly associated with the ECM-receptor interaction pathway: CHAD, FINC, LAMC2, SV2C, and SDC4. Knowledge of genes associated with size differentiation not only deepens our understanding of valuable genetic features of the quail species, which are not well understood, but also provides knowledge and potential for selective breeding strategies to increase body size and maximize profits in commercial production of quail. Table 3 lists the top 10 most significant DEGs between HW and LW. The choline kinase alpha (CHKA) gene, only expressed in brain and testis, was significantly differentially expressed between HW and LW (Fig 2E and 2F). Previous studies have shown that this gene impacts neuronal differentiation, causing neurite outgrowth [37] as well as influencing development of the brain in rodent models [38]. Given these reports and our results, it appears that CHKA may play a significant role in brain size development in the quail species as well.  Remarkably, the differentially expressed ANR26 protein-coding gene (ANKRD26) has been highly associated with body size in mice [39,40], making it a potential candidate gene for determining body development in quail and other bird species. The G protein-coupled receptor 87 (GRP87), a lysophosphatidic acid (LPA) receptor, was differentially expressed only in the testis tissue of LW quail. LPA affects skeletal development [41,42]. GPR87 may be also a potential candidate gene for regulating bone growth that eventually contributes to different body size between HW and LW, since these G coupled receptors prevent apoptosis and increase cell proliferation. NYAP, a gene that is typically expressed in developing neurons [43], was found differentially expressed between HW and LW in whole tissues. NYAP was found to be overexpressed in the brain of female LW quails compared to samples from both male and female HW quails. The number of differential expressed genes between HW and LW associated with brain and neuronal develop indicate that there may be several significant developmental differences between HW and LW varieties of quail which should be explored further.

Conclusion
In summary, two of the genes identified here as significant sex-related genes, NIPBL and UBAP2, have been established in previous studies as related to sex-differentiation in various avian species. The appearance of these genes as differentially expressed indicates that our contrast and experimental design allows for accurate identification of transcriptomic markers related to sex in quail. The other differentially expressed genes (e.g. TERA, MYP0, PPR17, CASQ2), which have not been previously linked to sex differentiation in neither avian nor other species, may be specific to sexual dimorphism in quail and should be further explored. Evolutionary analysis of the sex-related genes identified as significant can be used to study their history and determine when the Z and W chromosomes differentiated. Analysis of expression of homeobox protein regulating genes in the sex-related contrast group (Female Brain vs. Male Brain and Ovary vs. Testis contrast groups) reveals genes (e.g. HXC4, ISL1) which may regulate sex-specific anatomical development. In addition, and more importantly, genes can be used as tools for molecular sexing to improve streamlining of a process that is now used non-specifically for all birds but has potential for optimization and specification in quail.

Accession Numbers
We generated raw RNA-seq and processed count data, which are available at GSE64961 in the GEO database.
Supporting Information S1 Appendix. Output tissue.