Identification of the Novel Candidate Genes and Variants in Boar Liver Tissues with Divergent Skatole Levels Using RNA Deep Sequencing

Boar taint is the unpleasant odour of meat derived from non-castrated male pigs, caused by the accumulation of androstenone and skatole in fat. Skatole is a tryptophan metabolite produced by intestinal bacteria in gut and catabolised in liver. Since boar taint affects consumer’s preference, the aim of this study was to perform transcriptome profiling in liver of boars with divergent skatole levels in backfat by using RNA-Seq. The total number of reads produced for each liver sample ranged from 11.8 to 39.0 million. Approximately 448 genes were differentially regulated (p-adjusted <0.05). Among them, 383 genes were up-regulated in higher skatole group and 65 were down-regulated (p<0.01, FC>1.5). Differentially regulated genes in the high skatole liver samples were enriched in metabolic processes such as small molecule biochemistry, protein synthesis, lipid and amino acid metabolism. Pathway analysis identified the remodeling of epithelial adherens junction and TCA cycle as the most dominant pathways which may play important roles in skatole metabolism. Differential gene expression analysis identified candidate genes in ATP synthesis, cytochrome P450, keratin, phosphoglucomutase, isocitrate dehydrogenase and solute carrier family. Additionally, polymorphism and association analysis revealed that mutations in ATP5B, KRT8, PGM1, SLC22A7 and IDH1 genes could be potential markers for skatole levels in boars. Furthermore, expression analysis of exon usage of three genes (ATP5B, KRT8 and PGM1) revealed significant differential expression of exons of these genes in different skatole levels. These polymorphisms and exon expression differences may have impacts on the gene activity ultimately leading to skatole variation and could be used as genetic marker for boar taint related traits. However, further validation is required to confirm the effect of these genetic markers in other pig populations in order to be used in genomic selection against boar taint in pig breeding programs.


Introduction
Boar taint is the offensive odour or taste that can be evident during the cooking or eating of porcine meat derived from noncastrated male pigs. It is preliminary caused by the accumulation of androstenone and skatole in the adipose tissues. Skatole is a metabolite of tryptophan which is produced by intestinal bacteria such Clostridium and Bacteroides genera in gut and metabolised in the liver (reviewed by Wesoly and Weiler [1]). It has a fecal-like odor and unlike the smell of androstenone, the vast majority of people are able to detect the smell of skatole. It is well known that high concentrations of male sex steroids such as androstenone prevent the enzymes responsible for skatole metabolism resulting in the reduction of skatole metabolism in liver and accumulation in adipose tissue [2]. Notably, the most common practice to prevent this smell is the castration of male piglets. But castration is undesirable due to ethical and economical concerns [3,4] and castration of piglets is announced to be banned in the European Community by 2018 [5] creating an urgent need to develop alternative methods to prevent tainted meat.
In pigs, skatole is absorbed by the intestinal mucosa into the portal vein and passes through the liver where it is efficiently metabolised. Three major metabolites of skatole isolated from pigs are 6-sulfatoxyskatole (MII), 3-hydroxy-3-methyloxindole (MIII) and 3-methyl indole [6]. Among these skatole metabolites, MII is secreted in plasma and urine as a sulphate conjugate, and MIII is found to be related to the skatole levels in fat [6]. It has been demonstrated that the liver has a potential capacity to extract skatole from blood [7]. However, in boars a proportion of skatole passes the liver without being metabolised and accumulates in the adipose tissue that is responsible for tainted meat [6]. With the aim to identify candidate genes, a number of quantitative trait loci (QTL) analysis have been conducted for skatole in purebred and crossbred pig populations [8,9]. Several QTL for skatole were identified on different pig chromosomes such as on SSC6, SSC7, SSC12, SSC13, SSC14 and SSCX in different pig populations [9,10]. A few studies performed polymorphism and association analysis of selected genes [11,12] and a study was devoted to perform a genome wide association [13] for skatole in pigs. In this regard, the genes coding for enzymes of the cytochrome family received considerable interest due to their role in skatole metabolism, such as cytochrome P4502E1 (CYP2E1) is the main hepatic enzyme involved in the metabolism of skatole in the liver [14]. Significant associations have been identified for the polymorphism of CYP2E1 and CYP21 genes, and these genes are reported to reduce skatole levels in pigs [7]. A mutation in the coding region of CYP2A6 was found to be associated with high level skatole in fat [15].
Several studies are devoted to indentify the genes and pathways involved in the androstenone metabolism in liver [16,17,18] but to the authors' knowledge, no study was devoted to perform a global transcriptome analysis for divergent skatole levels in boar fat as well as to identify the pathways that might be involved in skatole metabolism in liver. RNA-Seq is a recently developed next generation sequencing technology for transcriptome profiling that boosts identification of novel and low abundant transcripts [19]. It could be used to analyse changes in gene expression across the entire transcriptome [19,20]. RNA-Seq also provides evidence for identification of splicing events, polymorphisms and different family isoforms of transcripts [21]. Therefore, the major aim of this study was to elucidate the genes and pathways involved in skatole metabolism in liver tissue using RNA deep sequencing technology. For this purpose, we performed differential expression analysis of genes in liver samples from boars with high skatole (HS) and low skatole (LS) in their backfat. Additionally, gene polymorphism analysis and differential exon usage analysis were also performed for the differentially expressed genes.

Analysis of RNA Deep Sequencing Data
We sequenced cDNA libraries from 6 samples from liver tissues (3 from HS in backfat and 3 from LS in backfat) using Illumina HiSeq 2000 as a part of our previous work [16]. The details of the sequencing are mentioned by Gunawan et al [16] and the raw sequencing data is deposited in GEO database and available under the accession id GSE44171. The sequencing produced clusters of sequence reads with maximum of 100 base-pair (bp). After quality control and filtering, the total number of reads for liver samples ranged from 11.8 to 39.0 million with a median of 22.8 million. Total number of reads for each group of liver sample and the number of reads mapped to reference sequences are shown in Table 1. In case of liver from LS group 43% to 74.4% of the total reads were aligned to reference sequence whereas, in case of liver from the HS group 61.3% to 84% were aligned.

Differential Gene Expression Analysis
Differential gene expression from livers of boars with HS and LS levels in backfat were calculated from the raw reads using the R package DESeq [22]. The significance scores were corrected for multiple testing using Benjamini-Hochberg correction. We used a negative binomial distribution based method implemented in DESeq to identify differentially expressed genes (DEGs) in the liver with divergent (HS and LS in backfat) skatole levels. A total of 448 DEGs were selected from the differential expression analysis using the criteria p adjusted ,0.05 and log2 fold change.1.5 (Table S1). In the liver tissues, 383 genes were found to be highly expressed in high skatole group whereas, 65 genes were found to be highly expressed in low skatole group (Table S1). The range of log2 fold change values for DEGs was from 26.79 to 5.82. Heatmaps ( Figure 1) illustrate the top 30 up and top 30 down regulated genes identified in the liver tissues from HS and LS boars. The top 30 up and down regulated genes identified in the liver tissues with different skatole levels along with log FC and p values are listed in Table 2. The differential expression analysis of our data revealed both novel transcripts and common genes which were previously identified in various gene expression studies. The novel transcripts from our analysis and commonly found genes are mentioned in detail in the discussion section.

Biological Function Analysis for DEGs
To investigate gene functions and to uncover the common processes and pathways among the selected DEGs, Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems, www. ingenuity.com) was used. In the liver samples, out of 448 DEGs, 300 were assigned to a specific functional group based on the information from IPA ( Figure 2). A large proportion (67.0%) of the DEGs from liver tissues in the high skatole group belong to the Gene Ontology (GO) category metabolic processes. The enriched GO metabolic processes include small molecule biochemistry, protein synthesis, carbohydrate metabolism, DNA replication, recombinant and repair, energy production and lipid metabolism. Other enriched GO categories include post translation modification and amino acid metabolism. The genes classified into each functional group are listed in the Table 3. IPA assigned 68 DEGs between high and low skatole group liver samples to six different canonical pathways. Canonical pathway analysis identified remodelling of epithelial adherens junctions and TCA cycle as the dominant pathways which play regulatory roles in the metabolic pathways ( Figure 3). Other pathway categories including super pathways of methionine degradation, mitochondrial dysfunction, UDP-N-acetyl-D-galactosamine biosynthesis and cystein were also enriched ( Figure 3). The genes assigned to these pathways in the liver with high and low skatole levels are presented in Table 4.

Validation of Selected DEGs with Quantitative Real Time PCR (qRT-PCR)
In order to validate the RNA-Seq results, on the basis of differential expressions and functions related to skatole, a total of 10 genes (ATP5B, DHRS4, GSTO2, IDH3B, HSD17B2, KRT8, PGM1, PRDX1, SDHD and SLC22A7) were selected and quantified using qRT-PCR. For this purpose, the same samples as used in the deep sequencing were used. Comparison of qRT-PCR data for 10 selected genes showed qualitative concordance of the expression with the RNA-Seq results ( Figure 4A). To further validate the expression of selected genes more robustly, new grouping of independently high (n = 3) and low (n = 3) skatole were done among the remaining 94 pigs. The mRNA expressions of selected genes also showed similar pattern of expression in this new groups ( Figure 4B). Gene expression values for qRT-PCR were normalized using housekeeping genes PPIA and GAPDH [23].

Differential Exon Usages Analysis
In order to explore the expression pattern of different exons of a gene between HS and LS boars, selected DEGs were analysed for differential exon expression [24]. Distribution of differential exon events and selected differential exon usage genes are shown in figure 5A and 5B, respectively. We identified 126 (P-adj,0.05) differential exon expressions in 66 DEGs (Table S2) that showed differential usage of exons between high and low skatole. Selected differential exon usage identified in DEGs for liver samples are given in Table 5. It is important to note that some genes showed more than one variable splicing. We found that about 34.8% of the alternative spliced genes underwent multiple differential exon usage events ( Figure 5A), illustrating the complexity of porcine transcriptome. Figure 6A, 6B and 6C show an example of differential exon expression for three genes (ATP5B, KRT8 and PGM1) which were shown above to be associated with the skatole  level. The first and second exon of the ATP5B gene showed significantly higher expressions in the low skatole ( Figure 6A) group. The first and fifth exon of the KRT8 gene showed significantly higher expression levels in the low skatole than in high skatole group (Fig. 6B). Figure 6C showed that the 12 th exon of the PGM1 gene was expressed significantly higher in the LS than in HS group of boars.  In the liver samples, 427 gene polymorphisms were identified in 107 DEGs (Table S3). Selected polymorphisms identified in DEGs for liver samples are given in Table 6. The distribution of SNPs number and selected SNPs used for validation is shown in figure 7A and 7B, respectively. We found that about 68.4% of genes had multiple polymorphisms ( Figure 7A). Read counts for individual samples for identified polymorphisms in liver tissues are given in Table S4. In order to validate the SNP results, on the basis of functional SNP and functions related to skatole, a total of 6 SNP were selected for association study ( Figure 7B and Table S5). We have selected SNPs in ATP5B, KRT8, PGM1, CYP4A25, SLC22A7 and IDH1 to validate their segregation (Table S5) and association in our population (n = 100). Out of 6 SNP, five SNPs were found to be associated with skatole levels in our (n = 100) population (Table S5). Polymorphisms in ATP5B (g.23661024 T.C), KRT8 (g.18670859.A), PGM1 (g.137174784C.A), SLC22A7 (g.43833000 G.A) and IDH1 (g.122862530 C.T) were associated with skatole level (Table 7).

Analysis of RNA-Seq Data
The present study describes the transcriptome profiles of liver tissue from boars with high (HS) and low (LS) skatole content in the backfat by using RNA-Seq. To the best of our knowledge, this study provides the first insight into the transcriptome signature in liver tissues by using RNA-Seq that might be involved in the skatole metabolism. Using the whole transcriptome sequencing technique, we were able to identify the levels of differentially Table 3. Functional categories and corresponding genes those were over expressed in liver from high skatole boars.  expressed genes and to associate these genes with divergent skatole levels in terms of boar taint. According to the mapping results, the average number of reads was 22.85 million reads and on an average 65.5% of the reads was categorized as mapped reads corresponding to exon reads ( Table 1). The proportion of reads mapped to exons of annotated genes was in accordance the previous studies [25,26,27] in pig liver transcriptome (60.2-74.9%), but was higher than that reported by Esteva-Codina et al. [28] (44.1%) in porcine male gonad and Gunawan et al. [16] (40.8%-56.63%) in boar livers with divergent androstenone levels.
The percentage of annotated reads varies from 15.6% to 74.9% in porcine transcriptome studies [16,25,26,27,28,29], supporting our results. The differences between mapping percentages might be due to several factors such as primer biases, GC content, dinucleotide fragmentation sites, independent cell types, laboratory protocols and the selection of reference genome build for annotations [30]. Another factor is that the current reference transcriptome assembly might not cover all the transcribed mRNA [31] and consequently low abundant transcripts or rare alternative splicing isoforms are less likely to be mapped to transcriptome assembly [28]. Illumina sequencing data have been described as replicable with relatively little technical variation [32]. Therefore, the findings of this study clearly demonstrated the power of RNA-Seq and provide further insights into the transcriptome of liver tissue at a finer resolution in skatole divergent boars.

Differential Gene Expression Analysis
In this study, 448 genes were differentially regulated in liver tissues with divergent skatole levels (Table S1). The top two up regulated gene in the liver sample were SERPINA12 with log2 fold change 5.81 and KRT8 with log2 fold change 4.32 (Table 2).
SERPINA12 is identified from visceral adipose tissues of rats, an animal model for obesity and type 2 diabetes [33]. It is reported that SERPINA12 plays an important role in the progression of obesity and insulin resistance [34]. Wada et al. [35] categorized the serine proteases as to be the genes for small hormone-like molecules such as corticosteroid and thyroid hormones. SER-PINF2, another member of the same family is involved in 2aminoacetophenone metabolism which is an important intermediate product of phase 1 skatole metabolism. Notably, the skatole metabolism is divided into two phases: an oxidative phase 1 metabolism and a conjugative phase 2 metabolism. During phase I, skatole is degraded to several intermediate products.  the most diversified family of IF proteins with its type I (K9-22) and type II (K1-K8) classes [36]. KRT8/KRT18 IFs can modulate the adhesion, size and cell-cycle progression of hepatic cells, in association with differential plectin/receptor of activated C kinase 1 (RACK1) [37]. The function of highly polymorphic KRT8 in the skatole metabolism in the liver is not quite clear. However, this gene mapped close to a region on SSC5 affecting skatole and indole levels [38]. The top two down regulated gene in higher skatole group were LOC100737759 with log2 fold change 26.79 and LOC100625674 with log2 fold change 25.88. Till now, it is difficult to identify either the actual gene name or function through orthologous database or BLAST sequence similarity searches.

Keratins (Ks), the intermediate filaments (IFs) of epithelial cells, constitute
The NCBI database referred LOC100737759 as 'putative 2-oxo-4-hydroxy-4-carboxy-5-ureidoimidazoline decarboxylase-like' which might be involved in amino acid metabolism. Similarly, LOC100625674 is referred as 'cadherin-18-like' which is a calcium-dependent cell adhesion protein and preferentially contributes in connecting cells. Cadherin-18 is a dominant gene belonging to the remodelling of epithelial adherens junction pathway [39] which is found to be a dominant pathway in our study.
There are similarities between gene expression differences found with RNA-Seq and those reported in previous transcriptome studies in pigs [17,18,38,40]. Similar to the previous studies [41,42], cytochrome P450 superfamily genes were found to be differentially regulated in skatole catabolism. Phase I skatole metabolism includes mostly oxidative reactions usually performed by the membrane bound cytochrome P450 system [43]. Cytochrome P450 isoenzymes are the main enzymes playing roles in phase 1 skatole metabolism, where skatole is degraded to several intermediate products including such as indole-3-carbinol(I3C), 2aminoacetophenone (2AAP) and 3-metyloxyindole (3MOI) (details reviewed by [1,44,45]). Notably, CYP4A24, CYP4A25 and CYB4B24 were found to be up-regulated in the high skatole Duroc6F2 population in this study which is in agreement with previous results for a Landrace population [8]. Another cytochrome genes family found to be differentially expressed in our transcriptome analysis is cytochrome subunit 5A (COX5A) coding for cytochrome oxidases and previously reported to be associated with skatole levels [7,13,46]. Ramos et al. [13] reported that the cytochrome subunit family COX4I1 gene, coding for a subunit of a cytochrome oxidase, is involved in the regulation of porcine skatole metabolism. Additionally, the cytochrome subunit 8C (COX8C) gene, coding for a subunit of cytochrome oxidase is also reported to be involved in the regulation of porcine skatole metabolism [13,46]. The COX5A gene on SSC7 at position 63.3 Mb maps to an already identified QTL region between 61.5 and 69.6 Mb which is known to affect skatole levels in Duroc and Landrace population. The main enzymes of phase 2 skatole metabolism are UGT (uridine 59-diphospho-glucuronosyltransferase) and SULT1A1 (sulfotrasferase) [47]. UGT is a family of conjugation enzymes and UGT1A3 is found to be differentially regulated in this study ( Table 2). Members of the UGT enzyme family such as UGT1A5 and UGT2A1 are reported to be differentially regulated in porcine transcriptome analysis for androstenone [18]. Different groups of transferases including glutation S transferase omega 2 (GSTO2) and glutathione S- transferase mu 2 (GSTM2) were found to be differentially regulated in this study (Table 2). Glutahione S-transferases (GSTs) are functionally diverse enzymes mostly known to catalyse conjugation reactions of endogenous substances, haem, fatty acids, xenobiotics and products of oxidative processes [48]. In most cases, the effect of conjugation is a decrease of biological activity and increased excretion of these metabolites from the body [49]. During phase 2 metabolism, the water solubility of the skatole metabolism is increased to facilitate excretion via urine [6,50]. The GSTs, reported to transport different molecules [48], might indicate that the GSTO transports the skatole to the tissues. It could be speculated that GSTO2 might be involved in the excretion of skatole from the porcine body.
Interestingly in this study, four members of SLC family genes such as SLC22A7, SLC25A1 and SLC25A25 were found to be up-regulated and SLC9A4 was found to be down regulated in the high skatole group ( Table 2). The solute carrier family (SLC) are important proteins in the regulation of body iron homeostasis and skeletal muscle contains a large proportion of body iron implying the importance of SLC for meat quality traits [51]. The SLC25A1 gene encodes a transporter protein which is responsible for the movement of citrate across the mitochondrial inner membrane [52]. Two members of SLC family (SLC22A13 and SLC22A14) genes were reported to be candidates for taint compounds and sex steroids in pigs [8]. The SLC superfamily is comprised of transporter families involved in the cellular uptake and secretion of endogenous molecules. The substrate panel of SLC22As includes endogenous compounds like tryptophan metabolites and sulphated steroids [53]. Skatole results from a multistep degradation of tryptophan by microbial activity, mainly in the hind gut of the pigs [1,45]. High concentration of skatole requires a high amount of tryptophan for microbial degradation [1]. Our results show that among the differentially expressed gene in live tissues, genes playing roles in ATP binding (ATP5A1, ATP5B, ATP5D), isocitrate dehydrogenases (IDH1 and IDH3B) and acetyl-CoA (ACSL5, ACOX1) pathways were enriched in functional categories such as in the small molecule biochemistry, protein synthesis, carbohydrate metabolism and energy production ( Table 3). In accordance with these results, Ramayo-Caldes et al. [27] also showed that small molecule biochemistry and energy production are members of the enriched GO categories for DEGs in the liver samples.
Pathway analysis of DEGs (Table 4) showed similar patterns with GO analysis and remodelling epithelial of adherens junctions and tricarboxylic (TCA) pathway were found to be the most dominant pathways in this study. Additionally, the mithochon-dria1 dysfunction and UDP-N-acety1-D-ga1actosamine biosynthesis pathways were found to be enriched in our study (Figure 3). A total of eight genes (ACTB, ARPC3, ARPC1A, TUBA1A1, TUBA1B, TUBA1C, TUBB2A and TUBB4B) belonging to the remodelling epithelial of adherens junctions pathway are identified in this study and may be involved in the skatole metabolism in the liver ( Table 4). The intercellular adherens junctions (AJ) are specialized sub-apical structures that function as principle mediators of cell-cell adhesion [39]. Their assembly-disassembly is dynamic and stringently regulated during tissue morphogenesis and homeostasis [54]. The TCA cycle, found to contain six genes (DLST, IDH3B, MDH1, MDH2, OGDHL and SDHD) is involved in the most important metabolic steps in the mitochondria. The TCA cycle is a catabolic pathway of aerobic respiration and the main source of ATP needed to maintain homeostasis, is produced by oxidation of pyruvate in the TCA cycle [55]. Another over represented canonical pathway in high skatole liver group was the cystein biosynthesis pathway which was previously reported to be responsible for amino acid metabolism in mammalian liver [56]. This pathway includes adenosylhomocysteinase (ACHY), cystathionase (CTH) and FtsJ RNA methyltransferase homolog 1 (FTSJ1) genes (Table 4) which are involved in the amino acid metabolism. These deep sequencing results indicated for the first time the TCA cycle and the cystein biosynthesis to be possibly involved in the metabolism of skatole in porcine liver.

Differential Exon Expression Analysis
Since an additional important advantage of RNA deep sequencing is the detection of differential exon usage events [25], we used the RNA deep sequencing data to characterize and compare the patterns of differential exon expressions variation in high and low skatole levels. RNA deep sequencing technology    provide valuable information regarding alternative and novel splice variants reflecting more complex mechanism of RNA regulation. A previous study by Moe et al. [7] showed that a number of genes involved in RNA processing and translation are differentially expressed between boar taint compounds. This study extends these observations by identifying a number of genes with differential exon expression between high and low skatole level boars. Chen et al. [25] reported that about 18.8% of the annotated genes showed differential exon usage events in pigs with divergent meat quality traits. This study revealed differential  level of exon expression for ATP5B, KRT8 and PGM1 genes in low skatole in comparison to high skatole group suggesting that differential processing of RNA could be associated with the regulation of skatole level.

Gene Variation Analysis
In addition to the transcriptome quantification, RNA-Seq technology provides valuable information regarding gene polymorphisms which could be directly correlated with the relevant phenotype. Several holistic gene expression analyses have been performed for boar taint compounds by using microarray or Real-Time PCR technology [17,18,40]. Our study extends these observations by correlating differentially regulated genes with associated polymorphisms. Gene polymorphisms in the exonic regions might have direct effect on the expression of transcripts and the identified polymorphisms from RNA deep sequencing may give additional insight to the variation in the skatole levels. This study revealed 45 SNPs in 8 highly polymorphic DEGs from liver samples ( Table 6). Two highly polymorphic genes KRT8 and ATP5B containing five and three SNPs are mapped close to a region on SSC5 at 18.6 Mb and 23.6 Mb, respectively (Table 6). Several QTL regions incorporating the KRT8 location are reported to affect skatole levels in pigs [38]. On SSC6, we identified 13 polymorphisms in the gene PGM1 at position 137.1 Mb, six polymorphisms in gene CYP4A25 at position 152.1 Mb and a set of 7 polymorphisms mapped to the gene PRDX1 at position 153.2Mb. Several QTL have been identified previously for skatole on SSC6 [9,13,57]. On this region of interest, Ramos et al. [13] reported that several SNP markers located close on the region on SSC6 that were significantly associated with skatole levels. Genes coding for cytochrome family have previously been shown to be associated with skatole levels [12,58] and are mapped on SSC6. These genes are known to be involved in phase I metabolism of skatole [58,59] implying that these genes could be important positional and functional candidate for boar taint compounds.
Two polymorphism were identified on SSC7 at position 43.8 Mb, mapped to the gene SLC22A7 and a SNP was identified on SSC7 at position 80.5 Mb on the gene DHRS4 (Table 6).
Grindflek et al. [8] identified an androstenone related QTL region on SSC7 between region 33.6-41.9 Mb and 80.8-88.3 which is in close proximity to the polymorphisms on gene SLC22A7 and DHRS4, respectively. In addition, a suggestive QTL for skatole is identified on SSC7 in the Yorkhsire pigs [8], overlapping the region harbouring our genes indicating that these markers could be very important for skatole levels. In this study, eight SNPs were identified on IDH1 gene at position 122.8 Mb on SSC15; yet to the best of our knowledge, no QTL related with skatole trait has been reported in this region. However, a highly significant QTL with a wide confidence interval from 42.5-70.7 Mb is reported on SSC15 in Norwegian Landrace and Duroc affecting the boar taint compounds including androstenone and skatole [8]. Moreover, this region is very rich for several genes involved in cytochrome P450 family and sulfotransferase family activity which are the key enzymes in both of the phases of skatole degradation [1,8]. Therefore, fine mapping and detailed study of the genes on this region could be interesting.
The selected polymorphisms in genes ATP5B, KRT8, PGM1, SLC22A7 and IDH1 were found to be associated with the phenotype skatole level in this study (Table 7). To the best of our knowledge, no study investigated association of the highly polymorphic ATP5B, KRT8, PGM1, SLC22A7 and IDH1 genes with boar taint compounds before. Xu et al. [60] reported an association for a SNP in exon 8 (g.75 G.A) in the ATP5B gene with the meat quality traits such as ratio lean to fat, fat meat percentage, intramuscular fat content and intramuscular water content. The ATP5B gene encodes the catalytic subunit of mitochondrial ATP synthesis complex and catalyzes the ratelimiting step of ATP formation in eukaryotic cells [61]. ATP5B probably plays a key role in the porcine skeletal muscle development and may provide further insight into the molecular mechanisms responsible for breed-specific differences in meat quality [60]. However, this study implies that in addition to the meat quality traits, this gene could be an important candidate for boar taint trait. The function of highly polymorphic KRT8 is associated to pathological processes in liver but the involvement in boar taint is not quite clear. Mutation in KRT8 is reported to be involved in human liver disease [62]. However, this gene maps close to a QTL region on SSC5 affecting skatole and indole levels [38] warranting more studies about this gene and polymorphisms with regards to the boar taint. The gene PGM1 is involved in glucose metabolism pathway and Lefaucheur et al. [63] found higher expression of genes in glycolytic pathways including this gene in the Large White which is in agreement with high glycolytic and low oxidative metabolism muscle tissues. However, no study has been published so far to unravel the involvement of this gene in boar taint compounds metabolism. The gene SLC22A7 is involved in the sodium-independent transport and excretion of organic anions and the substrate panel of SLC22As includes important endogenous compounds like tryptophan metabolites and sulphated steroids [53]. Skatole results from a multistep degradation of tryptophan by microbial activity, mainly in the hind gut of the pigs (reviewed by Wesoly and Weiler [1]). Therefore, the marker identified on SLC22A7 could be a valuable SNP for boar taint. IDH1 is the most important isocitryte dehydrogenase in the citrate and fatty acid synthesis that is related to energy metabolism and tissue morphology [46]. Energy metabolism is represented by glycolysis and glycogenolyis. It is well established that leaner pigs have a lower ability to synthesize fatty acids combined with greater mobilization, which results in adipose depots with more unsaturated lipids [64]. The pigs with higher metabolism rate such as fatter pigs like Large White and Duroc have higher androstenone and skatole levels than the lean breeds with lower energy metabolism like Pietrain [65]. It is worth to mention that the polymorphisms identified in this study are mostly synonymous and three SNPs from each synonymous and non-synonymous category are validated in this study (Table 6). However, these polymorphisms are suggested to be validated in other porcine populations before considering in selection breeding. The androstenone and skatole levels is correlated (r = 0.27) in the 100 Duroc F2 male pigs used in this study. Grindflek et al. [66] reported r = 0.32-0.36 in 1533 Norwegian Landrace, Strathe et al. [67] found r = 0.37 in 920 Danish intact male, and Windig et al. [68] detected r = 0.37 in 6072 finishing pigs composed of different sire and dam lines. It could be shown that correlation could vary according to the breed and number of animals.

Conclusion
Here we showed the whole genome expression differences in liver tissues for varying skatole levels in backfat of boars. RNA-Seq provided a high resolution map of transcriptional activities and genetic polymorphisms in this tissue. However, due to incomplete porcine annotations, only around 65.5% of the total reads could be mapped to annotated references. The improvements in pig genome annotations may lead to a better coverage and detailed understanding of genetic and functional variants such as novel transcripts, isoforms, sequence polymorphisms and non-coding RNAs. On the basis of number of the DEGs, our results confirm Table 8. Details of primers used for qRT-PCR analysis and genotyping. regulation of transcriptome activity in the liver tissue for skatole degradation. This study proposed candidate genes such as SERPINA12, KRT8, CYP4A25, COX5A, SLC22A7, PRDX1 and HSD17B2 that might be involved in the liver for skatole metabolism. Importantly, most of the DEGs are functionally related to pathways involved in boar taint and incorporated within the published QTL positions affecting boar taint compounds. Furthermore, various gene polymorphisms were detected in the liver DEGs and their associations are validated with skatole levels. Potential polymorphisms and association were identified for selected mutations in selected DEGs such as ATP5B, KRT8, PGM1, SLC22A7 and IDH1. In addition, differential exon usage analysis of three genes (ATP5B, KRT8 and PGM1) revealed significant differential expression of exons of these genes in the pigs with divergent skatole levels. This transcriptome, polymorphisms and alternative splicing analysis using RNA deep sequencing combined with association analysis revealed potential candidate genes affecting boar taint compound. It is speculated that these polymorphisms could be used as markers for boar taint related traits. However, further validation is required to confirm the effect of these genetic markers in other pig populations.

Animals and Phenotype
Tissue samples and phenotypes were collected from the Duroc6F 2 cross animals. F 2 was created by crossing F 1 animals (Leicoma 6 German Landrace) with the Large White pig breed. Duroc6F 2 boars were on average 116 days old and had on average 90 kg live weight at slaughter. All pigs were slaughtered in a commercial abattoir called Landesanstalt für Schweinezucht -LSZ Boxberg. Slaughterhouse management gave the necessary permissions for the tissue and organ collection. Carcass and meat quality data were collected according to guidelines of the German performance test [69]. As described in Gunawan et al. [16], tissue samples from liver were frozen in liquid nitrogen immediately after slaughter and stored at 280uC until used for RNA extraction. Fat samples were collected from the neck and stored at 220uC until used for skatole measurements. For the quantification of skatole an in-house gas-chromatography/mass spectrometry (GC-MS) method was applied as described previously [70]. Pigs having a fat skatole level less than 0.25 mg/g and greater than 0.25 mg/g were defined as low and high skatole samples, respectively [67,71]. Six boars were selected from a pool of 100 pigs and the average skatole value for these selected animals were 0.2760.20 mg/g. RNA was isolated from the liver tissues of 3 pigs with (HS, high skatole group) high (0.4560.08 mg/g) and 3 pigs with (LS, low skatole group) low levels of skatole (0.0960.02 mg/g). Notably, these six boars were among the ten boars which have been previously used for androstenone study [16]. Among the ten pigs used in androstenone study, six pigs were found with extremely high and low skatole levels and were considered for this study. There is correlation between the androstenone and skatole levels (r = 0.27) in the 100 Duroc F2 pigs used in this study. Furthermore, these 100 boars were used for association study (Table S5). Total RNA was extracted using RNeasy Mini Kit according to manufacturer's recommendations (Qiagen). Total RNA was treated using oncolumn RNase-Free DNase set (Promega) and quantified using a spectrophotometer (NanoDrop, ND8000, Thermo Scientific). RNA quality was assessed using an Agilent 2100 Bioanalyser and RNA Nano 6000 Labchip kit (Agilent Technologies).

Library Construction and Sequencing
Details of the library construction and sequencing procedures were described previously by Guanwan et al. [16]. The library preparations were sequenced on an Illumina HiSeq 2000 at GATC Biotech AG (Konstanz, Germany). All sequences were analysed using the CASAVA v1.7 (Illumina, USA). As described in Gunawan et al. [16], the deep sequencing data have been deposited in NCBI SRA database and are accessible through GEO series accession number GSE44171 (http://www.ncbi.nlm. nih.gov/geo/query/acc.cgi?acc = GSE44171).

Genome Reference and Mapping
The first step data analysis was the quality control and filtering step. In this step, PCR primers identified in the raw reads using the FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/ fastqc/) quality control application and bad quality sequences with a Phred score of ,20 were trimmed off. In this study, the raw reads after quality control were mapped to NCBI Sscrofa10.2 genome build using RNA-seq read mapper TopHat [72]. TopHat is a ''splice aware'' mapper that uses Bowtie short read aligner [73] for aligning the raw reads to the genomes and further analyses these mapping results for splice junction discovery. After mapping the raw reads to the genome build BEDTools utilities [74] was used to compute the coverage of raw reads to Sscrofa10.2 gene positions for each sample. The expression table thus created was further used in the analysis of differentially expressed genes.

Differential Gene Expression Analysis
The differential gene expression analysis was designed to contrast the difference in the expression of genes between high and low skatole samples. For differential gene expression analysis with raw count data, the R package DESeq was used [75]. The normalization procedure in DESeq handles the differences in the number of reads in each sample. For this purpose, DESeq first generates a fictitious reference sample, with read counts defined as the geometric mean of all the samples. The reads count for each gene in each sample is divided by this geometric mean to obtain the normalized counts. To model the null distribution of the count data, DESeq follows an error model that uses the negative binomial distribution, with variance and mean linked by local regression. The method controls type-I error and provides good detection power [75]. After analysis using DESeq, DEGs were filtered based on p-adjusted value,0.05 and fold change $1.5 [76]. Additionally, the gene expression data was also analyzed using a Generalized Linear Model (GLM) function implemented in DESeq to calculate both within and between group deviances. As a sanity checking and filtration step, we cross matched the results from both analysis (p-adjusted #0.05 and fold change $1.5 criteria and GLM analysis) and only those genes which appeared to be significant in both the tests (p-value #0.05), were selected for further analysis [16]. The results of GLM analysis are given in Table S6.

Differential Exon Expression Analysis
In addition to analyzing the genes that are differentially expressed between high and low skatole samples, the differential expression levels of exons of the same genes between the different phenotype samples were also analysed. For this purpose, we used the R package DEXSeq [24]. The mapped read count data were converted into exon ''counting bins'' as described in [24]. In the next step, the algorithm normalized sequencing depths for all the samples as described in [75]. In the final step, Generalized Linear Models (GLMs) were employed by the algorithm for each counting bin to test for differential expression between phenotype samples. After the analysis, differentially used exons were filtered using the criteria p-adjusted value,0.05.

Pathways and Networks Analysis
A list of the DEGs was uploaded into the Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems, www.ingenuity.com) to identify relationships between the genes of interest and to uncover common processes and pathways. Networks of the genes were then algorithmically generated based on their connectivity. The 'Functional Analysis' tool of the IPA software was used to identify the biological functions most significant to the data set. Canonical pathway analysis was also utilized to identify the pathways from the IPA library of canonical pathways that were most significant to the data set. We used ''Benjamini-Hochberg'' multiple testing correction to calculate a p-value determining the probability that each biological function or canonical pathway assigned to the data set. The significance levels of p-values obtained for every biological function or canonical pathways were corrected for multiple testing using ''Benjamini-Hochberg'' correction.

Gene Variation Analysis
In this analysis, SNP calling was performed on the mapping files generated by TopHat algorithm using samtools mpileup command and associated algorithms [77]. From the variants so generated, only those variants with a minimum Root Mean Square (RMS) mapping quality of 20 and a minimum read depth of 100 were selected for further analysis. In the final step, the selected variants were cross-checked against dbSNP database to identify mutations that are already studied. In order to find out the differentially expressed genes that also harboured sequence polymorphisms, we crosschecked and filtered with the chromosomal positions of these variants against those of DEGs and retained only those variants which mapped to DEG the chromosomal positions. By this way, we were able to isolate a handful of mutations that mapped to DEGs from many thousands of identified potential sequence polymorphisms. In the next step, to understand whether these identified polymorphisms segregate either in only one sample group (high skatole or low skatole group) or in both groups (high and low skatole group), we calculated the read/coverage depth of these polymorphisms in all the samples [16]. The identified SNPs were furthermore classified as synonymous or non-synonymous using the GeneWise software (http:// www.ebi.ac.uk/Tools/psa/genewise/last accessed 21.03.2013) by comparing between protein sequence and nucleotides incorporated SNP position [78]. The results of this analysis are detailed in the results section and read coverage for individual samples are given in Table S4.

Quantitative Real-Time PCR (qRT-PCR) Analysis
For qRT-PCR experiment, total RNA from liver samples were isolated from the 6 boars used for deep sequencing. Additionally, RNA was isolated from the similar tissues of 6 independent boars with divergent skatole level among the remaining 94 boars. cDNA were synthesised by reverse transcription PCR using 2 mg of total RNA, SuperScript II reverse transcriptase (Invitrogen) and oligo(dT)12 primer (Invitrogen). Gene specific primers for the qRT-PCR were designed by using the Primer3 software [79]. Detailed information for primers used in this study was given in Table 8. In each run, the 96-well microtiter plate contained each cDNA sample and no-template control. The qRT-PCR was conducted with the following program: 95uC for 3 min and 40 cycles 95uC for 15 s/60uC for 45 s on the StepOne Plus qPCR system (Applied Biosystem). For each PCR reaction 10 ml iTaqTM SYBRH Green Supermix with Rox PCR core reagents (Bio-Rad), 2 ml of cDNA (50 ng/ml) and an optimized amount of primers were mixed with ddH 2 O to a final reaction volume of 20 ml per well. All samples were analysed twice (technical replication) and the geometric mean of the Ct values were further used for mRNA expression profiling. The geometric mean of two housekeeping genes GAPDH and PPIA were used for normalization of the target genes. The delta Ct (DCt) values were calculated as the difference between target gene and geometric mean of the reference genes: (DCt = Ct target 2Ct housekeeping genes ) as described in Silver et al. [80]. Final results were reported as fold change calculated from delta Ct-values.

Validation of SNP and Association Study
For the validation of association, six SNPs from the highly polymorphic DEGs as well as the genes known to be involved in the skatole metabolism were selected (Table 7). Genotyping in 100 boars were performed by PCR-RFLP method. In brief, a working solution with a final concentration of 50 ng/ml DNA was prepared and stored at 4uC for further analysis. Polymerase chain reactions (PCR) were performed in a 20 ml volume containing 2 ml of genomic DNA, 1 6 PCR buffer (with 1.5 mM MgCl 2 ), 0.25 mM of dNTP, 5 pM of each primer and 0.1 U of Taq DNA polymerase (GeneCraft). The PCR product was checked on 1.5% agarose gel (Fischer Scientific Ltd) and digested by using the appropriate restriction enzyme (Table 8). Digested PCR-RFLP products were resolved in 3% agarose gels. Details of GenBank accession numbers, primers sequences, annealing temperature and SNP position used in this study are listed in Table 8. Statistical analyses were performed using SAS 9.2 (SAS Institute Inc, Cary, USA). Effects of slaughter age, husbandry system (pen) as well as genotype on boar taint compound skatole were assessed with a fixed effect model (ANOVA) using PROC GLM. For all models, fixed effects included genotype and pen (group. individual) and age of slaughter was fitted as a covariate for boar taint compound skatole. Due to the skewed nature of skatole, data were transformed with natural logarithm before ANOVA to achieve normality. Least square mean values for the loci genotypes were compared by t-test and p-values were adjusted by the Tukey-Kramer correction [81].

Supporting Information
Table S1 Differentially expressed genes in liver from boars with high and low skatole in backfat.

(XLS)
Table S2 Differential exon expression in liver from boars with high and low skatole in backfat.