Identification of Differentially Expressed Genes in Breast Muscle and Skin Fat of Postnatal Pekin Duck

Lean-type Pekin duck is a commercial breed that has been obtained through long-term selection. Investigation of the differentially expressed genes in breast muscle and skin fat at different developmental stages will contribute to a comprehensive understanding of the potential mechanisms underlying the lean-type Pekin duck phenotype. In the present study, RNA-seq was performed on breast muscle and skin fat at 2-, 4- and 6-weeks of age. More than 89% of the annotated duck genes were covered by our RNA-seq dataset. Thousands of differentially expressed genes, including many important genes involved in the regulation of muscle development and fat deposition, were detected through comparison of the expression levels in the muscle and skin fat of the same time point, or the same tissue at different time points. KEGG pathway analysis showed that the differentially expressed genes clustered significantly in many muscle development and fat deposition related pathways such as MAPK signaling pathway, PPAR signaling pathway, Calcium signaling pathway, Fat digestion and absorption, and TGF-beta signaling pathway. The results presented here could provide a basis for further investigation of the mechanisms involved in muscle development and fat deposition in Pekin duck.


Introduction
Pekin duck is a world-famous species for its fast growth, but its breast muscle yield is lower than that of other lean-type ducks [1]. Work carried out by the Chinese Academy of Agricultural Sciences since the 1990s has produced a new strain of lean-type Pekin duck with increased carcass skeletal muscle yield and decreased carcass fatness. This new strain of lean-type Pekin duck passed the national certification awarded by the Chinese State Variety Approval Committee of livestock and poultry in 2004. However, the potential mechanisms underlying increased muscle development and decreased fat deposition in lean-type Pekin ducks is unclear to date.
In birds, there are no significant changes in muscle fiber numbers during postnatal development [2,3]. Instead, the postnatal muscle mass is increased by increasing the size of the muscle cells, a process referred to a hypertrophy that is controlled by both anabolic and catabolic mechanisms [4]. Among the complex hypertrophy regulating network, the insulin-like growth factor 1 (IGF1) signaling pathway plays a crucial role in promoting hypertrophy by activating tyrosine kinases, which activate phosphoinositide 3-kinase (PI3K)/Akt signaling [5,6]. Conversely, forkhead box O (FOXO) family proteins inhibit hypertrophy of muscle fibers through suppression of the PI3K/Akt pathway [7].
In addition, adipose tissue mass is controlled by a balance of cell proliferation and an increase in fat cell size, known as hyperplasia and hypertrophy, respectively [8]. Multiple hormones and growth factors collaborate to regulate adipocyte differentiation and deposition [9]. Growth hormone (GH) has been shown to stimulate preadipocytes to undergo adipogenesis by priming the cells for the poliferative effect of IGF1 [10], a hormone which, in addition to insulin, is believed to be involved in adipocyte differentiation [11,12]. In addition, it is believed that the expression of peroxisome proliferator-activated receptora (PPARa) and CCAAT/enhancer binding protein a (C/EBPa) are important in the maintenance of the differentiated state of adipocytes [11]. Given the complexity involved in regulating skeletal muscle development and fat deposition, identification of the differentially expressed genes (DEGs) in duck breast muscle and skin fat is a critical first step to understanding the function of these genes.
In the past few years, next-generation high-throughput DNA sequencing techniques have provided fascinating opportunities in the life sciences and dramatically improved the efficiency and speed of gene discovery and DEGs exploration [13]. Previous studies have confirmed that the relatively short reads produced by Illumina sequencing can be effectively assembled and used for gene discovery and comparison of gene expression profiles [14,15].
Identification of DEGs has been performed in many vertebrate species, including some bird species such as chicken [16,17], goose [18], turkey [19] and zebra finch [20,21]. Recently the duck (Anas platyrhynchos) genome sequence was finished [22] and the draft genome is now publicly available (http://www.ensembl.org/ Anas_platyrhynchos/Info/Index). The duck genome will greatly improve the accuracy of duck RNA-seq analysis and will largely promote the identification and functional exploration of DEGs in duck.
Here we constructed six mRNA libraries. Three libraries from Pekin duck breast muscle at two-, four-and six-weeks of age (W2, W4 and W6, respectively), and three from Pekin duck skin fat at W2, W4 and W6. By high throughput RNA sequencing and subsequent bioinformatics analysis, we identified DEGs between Pekin duck breast muscle and skin fat samples. The results presented here could provide a basis for further functional investigation of DEGs between breast muscle and skin fat in Pekin duck.

Ethics Statement
All procedures of the present study were approved by the welfare committee of the Institute of Animal Science, Chinese Academy of Agricultural Sciences. All surgeries were performed according to recommendations proposed by the European Commission (1997), and all efforts were made to minimize the suffering of animals.

Birds and Tissue Sample Collection
Thirty 1 day old Z5 Pekin duck (lean-type) full-sibs were selected randomly from the Pekin duck breeding farm at the Chinese Academy of Agricultural Science (Beijing, China), where they were raised under normal conditions. The dietary nutrient levels provided at different stages are listed in Table 1. Breast muscle and skin fat samples were collected from four healthy ducks selected at each time point (denoted as M2, M4, M6 and F2, F4, F6, respectively) for RNA isolation. The ducks were controlled for body weight by selecting ducks with an body weight within 100 g, 150 g, and 200 g of the average body weight of raised ducks at two-, four-, and six-weeks of age, respectively. All samples were snap-frozen in liquid nitrogen and stored at 280uC.

RNA isolation, Library Preparation and Illumina Sequencing
Tissue samples were sent to BGI for RNA isolation, library preparation and Illumina sequencing. Breifly, total RNA was isolated from all breast muscle and skin fat samples using the RNAiso plus kit (Takara) following the manufacturer's instruc-tions. The total RNA samples were mixed in equimolar ratio to generate an RNA pool for each tissue and time point (M2, M4, M6 and F2, F4, F6). The RNA quality was analyzed by 1.0% agarose gel electrophoresis and spectrophotometric absorption at 260 nm in a Nanodrop ND-1000 Spectrophotometer. All RNA samples were treated with DNase I recombinant (Roche). The mRNA was separated from 6 mg of total RNA using oligo (dT) magnetic beads and fragmented into short fragments using the fragmentation buffer. First strand cDNA was synthesized from the short mRNA fragments using a random hexamer-primer, purified and dissolved in EB buffer for end repair and single nucleotide A

Quality control and filtering of raw reads
Quality control was first performed on the primary sequencing data produced by Illumina HiSeq 2000. We then carried out adapter trimming and filtering of the initial reads to decrease the data noise. Specifically, the reads were trimmed and filtered to remove: i) adapter contamination, ii) reads in which unknown bases were more than 5% of the read and iii) low quality reads(Q score ,20). The filtered reads (high-quality reads) were used for downstream analysis.

Alignment of high-quality reads to reference
The high-quality reads were aligned to the duck reference transcriptome using SOAPaligner/SOAP2 [23] with no more than 2 mismatches. Basic alignment statistics were performed, and the distribution of the reads on the reference transcriptome was determined to evaluate the randomness. Subsequent gene coverage was calculated to determine the percentage of a gene covered by reads.

Analysis of DEGs
DEGs between samples were determined based on the reads per kilobase per million reads (RPKM) method [24]. The RPKM value of for each gene with a minimum RPKM value of 0.001 was compared between each tissue and time point using the method described by Audic and Claverie [25]. The false discovery rate (FDR) [26], which was used to determine the threshold of the P value in multiple tests and analyses, was set at less than 0.001 to judge the significance of gene expression differences.      We firstly mapped all DEGs to GO terms in the database (http://www.geneontology.org/) and calculated gene numbers for every GO term. Significantly enriched GO terms were determined using hypergeometric distribution based on 'GO:: TermFinder' (http://smd.princeton.edu/help/GO-TermFinder/ GO_TermFinder_help.shtml). To further understand the biological functions of the DEGs, KEGG [27] was used to perform pathway enrichment analysis.

Real time quantitative polymerase chain reaction (qRT-PCR)
Real time quantitative PCR (qRT-PCR) was performed using the SYBR PrimeScript RT-PCR Kit (TaKaRa) with SYBR Green dye to validate specific gene transcription, RNA-Seq data and variations in gene expression among individuals. The RNA used for qRT-PCR was prepared in the same manner as the total RNA extraction and DNase I treatment described above. A reference gene (b-actin) was used as a control for detecting the expression levels of these genes. The primer pairs used for qRT-PCR are listed in Table 2 and the RPKM values of the selected 18 genes are present in Table 3. The qRT-PCR reactions were carried out with an iCycler IQ5 Multicolor Real-Time PCR Detection System (Bio-Rad). The qRT-PCR reaction contained 1 mL of cDNA template, 12.5 mL of SYBR Premix ExTaq, 9.5 mL of sterile water, and 1 mL of each gene-specific primer. Thermal cycling parameters were 1 cycle at 95uC for 2 min, 40 cycles of 95uC for 15 s, and 60uC for 34 s. Dissociation curve analysis was done after each real time reaction to ensure that there was only one product. The qRT-PCR analysis of each sample was done in triplicate.
The relative gene expression level was determined by the comparative cycle threshold (C T ) method [28]. The DC T value was calculated by subtracting the target C T of each sample from the b-actin C T value.

Data deposition
Data described in this study is available in the NIH Short Read Archive (SRA) under accession number SRX393261.

Results and Discussion
Sequencing, quality control and filtering of initial reads RNA-seq data were obtained from three breast muscle samples (M2, M4, and M6) and three skin fat samples (F2, F4, and F6). Around 31,47 million initial reads were obtained from each sample, with a total of 228 million initial reads generated. After quality control and adapter removal, 29,45 million high-quality reads for each sample and a total of 218 million high-quality reads were available for further analysis (Table S1). Overall, ,95.76% of the raw reads were classified as high-quality reads. This indicates that the data are sufficient in sequencing depth and read quality and that the data are well suitable for the analysis of DEGs in breast muscles and skin fats of Pekin duck.

Alignment of high-quality reads against the duck reference transcriptome
After quality control and filtering of initial reads, we aligned the high-quality reads obtained above to the reference transcriptome (ftp://ftp.ensembl.org/pub/pre/fasta/cdna/anas_platyrhynchos/ Anas_platyrhynchos.duck_1.cdna.fa.gz).
The statistical results for read alignment against the duck transcriptome are summarized in Table 4, 5, 6, 7, 8, 9, 10. For the   [29,30], and slightly lower than the results of Li et al. in chickens [31]. This may be due to the fact that the duck genome is only a draft, and requires more work to improve it to the level of chicken or human. For excellent RNA-seq data, the reads should randomly distribute along the transcriptome [32]. In this study, we first standardized a read location on a gene to a relative position and then counted the number of reads in each relative position to evaluate read randomness. The read distribution along the duck transcriptome is shown in Fig. 1. Overall, most reads were aligned at the body of the genes for each of the six sample datasets, indicating that the mRNA used for sequencing in this study was randomly broken into segments.
The average expression of each mapped gene was calculated using the RPKM method [24]. For the total dataset, 14,709 genes, representing 89.4% of the annotated duck genes expressed in our dataset. The number of expressed genes ranged from 12,731 (M6) to 13,886 (F2 ; Table 11). Furthermore, analysis of the coverage of each gene was performed to determine if a gene was fully covered (.90%) by the reads. Around 71% of the annotated duck genes were fully covered by reads for the total dataset, and 58% to 63% were fully covered for each sample (Fig. 2).
Because genetic variation between individuals can be extensive and inevitable, we selected 12 ducklings from a group of 30 fullsibs raised under the same conditions in order to reduce the variation observed within groups. In addition, to determine the variation in gene expression among samples within the groups studied, we performed real-time quantitative PCR on 10 randomly selected genes (Fig. 3). The results showed no significant differences among samples within each group, indicating genetic variation among individuals is not significant. In addition, to verify the accuracy of the RNA-seq data, we compared the RPKM and relative expressions of another five randomly selected genes using qRT-PCR (Fig. 4). The expression trends obtained by using qRT-PCR were highly consistent with the RPKMs of the five genes, suggesting the RNA-seq data are accurate.

Analysis of DEGs
High-throughput sequencing technology is rapidly becoming the standard method for measuring RNA expression levels [24]. One of the main goals of these experiments is to identify DEGs. Adipocytes share a common mesenchymal cell origin with skeletal muscle cells [33,34]. However unlike muscle cells, a lack of exposure to certain signaling factors [35] triggers the differentiation of cells into fat cells. Therefore, investigating the differences in gene expression levels between breast muscles and skin fat at multiple time points of development will be very helpful in exploring the molecular mechanisms underlining the lean-type duck phenotype. Only genes that met the following criteria were treated as DEGs: expression between two conditions differs by more than 1 fold change and FDR ,0.001. In total, hundreds to thousands of genes were differentially expressed between the various tissues and time points. Generally, the number of DEGs between breast muscle and skin fat were much larger than between time points within tissues (Fig. 5, Table S2). More DEGs showed higher expression in skin fat than in breast muscle, suggesting that gene transcription in skin fat is more active than that in breast muscle. When comparing DEGs in F6 with that of F4 and F2, 694 and 772 genes showed higher expression, while 2095 and 1895 genes showed lower expression, respectively. The large number of DEGs between tissues and time points indicate that there are huge differences in gene transcription required for breast muscle and skin fat development. According to the RPKM values of DEGs at  Differentially Expressed Genes of Duck PLOS ONE | www.plosone.org different tissues and time points, the DEGs could be divided into three expression patterns: those whose expression levels increase in skin fat and decrease in breast muscle with duck growth, those whose expression levels decrease in skin fat and increase in breast muscle with duck growth, and those whose expression levels did not follow any obvious pattern. Given that the breast muscle growth rate increases from 2 to 6 weeks of age, the first type of DEGs may be inhibitors of breast muscle development, while the second type may be promoters.

Gene ontology (GO) and KEGG pathway analysis of DEGs
The GO project is a collaborative effort to develop and use ontologies to support biologically meaningful annotation of genes and their products [36]. In this study, we performed GO analysis to describe the properties of DEGs in duck breast muscle and skin fat and screened for significantly enriched GO terms (P,0.05) for each sample pair (Table S3). For cellular components, DEGs in different sample pairs were significantly enriched (P,0.05) into various GO terms, include contractile fiber between F2 and F4, F2 and F6, and F4 and F6; membrane and membrane part between M2 and F2; cytoplasmic part and cytoplasm between M6 and F6, and plasma membrane between M2 and M6. For biological processes, the DEGs between F2 and F4 and between F2 and F6 were significantly enriched into muscle development related terms (muscle cell development, striated muscle cell development, muscle fiber development, muscle system process, muscle structure development, striated muscle cell differentiation and muscle contraction). In addition, the DEGs between F4 and F6, M2 and F2 and M2 and M6 were extremely significantly enriched (P, 0.01) multicellular organismal processes and single-multicellular organism processes.
KEGG is a "computer representation" of the biological system [37] and the KEGG database can be utilized for modeling and simulation, browsing and retrieval of data. In this work, we performed KEGG pathway analysis of DEGs to determine the significantly enriched pathways (P,0.05) for each sample pair (Table S4). The DEGs were significantly enriched into some pathways of muscle development or fat deposition. For example, the DEGs between F2 and F4, F2 and F6, F4 and F6 and M2 and F2 were enriched significantly enriched into the MAPK signaling pathway, a major regulator of skeletal muscle development [38]. The DEGs in all sample pairs excepting between F2 and F4 and between F2 and F6 were significantly enriched into PPAR signaling pathway, which is able to increase recruitment, proliferation and differentiation of preadipocytes leading ultimately to improved adipose tissue [39]. The DEGs between M2 and F2 and M4 and F4 were significantly enriched into wnt signaling pathway, which has been implied to be a molecular switch that governs adipogenesis [40]. Other pathways involved in muscle development and fat deposition that were significantly enriched by DEGs in one or more sample pairs include fat digestion and absorption, phosphatidylinositol signaling system, GnRH signaling pathway, fatty acid metabolism and p53 signaling pathway. A number of pathways enriched into by DEGs are involved in muscle development and fat deposition, suggesting that muscle development and fat deposition are major events that require expression of different genes.    The formation of a specific phenotype is a result of the interaction of a certain genotype with the environment, and environmental factors exert their influence on phenotype through genetic or epigenetic mechanisms [31]. The genetic effects perform a natural role in this process. In this study, a number of important genes for muscle development and fat deposition were differentially expressed between various tissues and time points. They include insulin-like growth factor 1 receptor (IGF1R), IGF1, myostatin (MSTN), transforming growth factor beta 3 (TGFb3), transforming growth factor beta-induced 1 (TGFb1), TGFbinduced factor homeobox 1 (TGIF1), forkhead box O6 (FOXO6), forkhead box O3 (FOXO3) and the members of the PPAR signaling pathway and the MAPK signaling pathway, which are described in detail below.
To validate these genes, we performed qRT-PCR for each gene mentioned above and compared the expression level with the RPKM value of each gene (Fig. 6, Table 3). It has been reported that IGF1 can stimulate the development of muscle mass by increasing protein synthesis while decreasing proteolysis and myogenesis [41]. In this report, the expressions of IGF1 and IGF1R increased in breast muscles with duck growth, suggesting the breast muscle development increases from 2 to 6 weeks of age. The expressions of MSTN, TGFb3, TGFbI, TGIF1, FOXO6 and FOXO3 were all decreased in the breast muscle with increasing age, but fluctuated in the skin fat samples. MSTN is a member of the transforming growth factor b superfamily of secreted growth factors that negatively regulates skeletal muscle size [42]. In mature adult muscle, TGF-b negatively affects skeletal muscle regeneration by inhibiting satellite cell proliferation, myofiber fusion, and expression of some muscle-specific genes [43] and TGF-b1 induces the transformation of myogenic cells into fibrotic cells after injury [44]. In addition, inhibition of FoxO transcriptional activity by expression of a dominant negative (DN) FOXO allele prevents at least half of disuse-induced muscle fiber atrophy in vivo [45,46], and muscle-specific over expression of FOXO1 or FOXO3a is sufficient to cause skeletal muscle atrophy in vivo [47,48]. A number of inhibitors for muscle mass growth showed a trend for decreased expression over time in this study, indicating that the development of breast muscle increases with the age of the duck.

DEGs in the MAPK signaling pathway
Notably, the MAPK signaling pathway contained many DEGs in the breast muscle and skin fat sample pairs (M2-VS-F2, M4-VS-F4 and M6-VS-F6; Table S4). We therefore summarized the DEGs in the pathway in Table S5.
The MAPK pathway is a chain of proteins in the cell that communicates signals from a receptor on the surface of the cell to the DNA in the nucleus of the cell. The MAPKs are responsible for a variety of processes including the transcriptional activation of cytokines, chemokines and other inflammatory mediators [49]. The p38 MAPK signaling pathway, one module of the MAPK protein family, is a major regulator of skeletal muscle development [38]. In this study, we observed 214 DEGs in the MAPK signaling pathway between breast muscle and skin fat sample pairs. Some members of this pathway, which are crucial for skeletal development, were identified as highly expressed in breast muscle compared to skin fat. For example, it is reported that serum response factor (SRF) is required for satellite cell-mediated hypertrophic muscle growth [50]. The expression of SRF in breast muscle was significantly higher than in skin fat in this study. Also, the expression of voltage-dependent calcium channel gamma subunit 1 (CACNG1), a regulator of skeletal muscle strength [51], was enriched in breast muscle (RPKM .500) compared to skin fat (RPKM ,7). In addition, it has been implied that fibroblast growth factor (FGF) plays a prominent role in regulating muscle hypertrophy [52] and inducing hypertrophy and angiogenesis in hibernating myocardium [53]. In this work, many FGFs, such as FGF13, FGF1, FGF6, FGF4 and FGF10 were identified as DEGs, and most of them were highly expressed in breast muscle compared to skin fat. The results above indicate that the phenotypic formation of lean-type Pekin duck may be due to differences in gene expression between breast muscle and skin fat.
In summary, we identified and analyzed DEGs in breast muscle and skin fat of Pekin duck using RNA-seq data. To our knowledge, this is the first time that RNA-seq analysis with duck reference genome has been done in ducks. The current work provides important basic information for further comprehensive investigation of the mechanisms involved in muscle development and fat deposition of duck.