RNA Deep Sequencing Reveals Novel Candidate Genes and Polymorphisms in Boar Testis and Liver Tissues with Divergent Androstenone Levels

Boar taint is an unpleasant smell and taste of pork meat derived from some entire male pigs. The main causes of boar taint are the two compounds androstenone (5α-androst-16-en-3-one) and skatole (3-methylindole). It is crucial to understand the genetic mechanism of boar taint to select pigs for lower androstenone levels and thus reduce boar taint. The aim of the present study was to investigate transcriptome differences in boar testis and liver tissues with divergent androstenone levels using RNA deep sequencing (RNA-Seq). The total number of reads produced for each testis and liver sample ranged from 13,221,550 to 33,206,723 and 12,755,487 to 46,050,468, respectively. In testis samples 46 genes were differentially regulated whereas 25 genes showed differential expression in the liver. The fold change values ranged from −4.68 to 2.90 in testis samples and −2.86 to 3.89 in liver samples. Differentially regulated genes in high androstenone testis and liver samples were enriched in metabolic processes such as lipid metabolism, small molecule biochemistry and molecular transport. This study provides evidence for transcriptome profile and gene polymorphisms of boars with divergent androstenone level using RNA-Seq technology. Digital gene expression analysis identified candidate genes in flavin monooxygenease family, cytochrome P450 family and hydroxysteroid dehydrogenase family. Moreover, polymorphism and association analysis revealed mutation in IRG6, MX1, IFIT2, CYP7A1, FMO5 and KRT18 genes could be potential candidate markers for androstenone levels in boars. Further studies are required for proving the role of candidate genes to be used in genomic selection against boar taint in pig breeding programs.


Introduction
Boar taint is an off-odor and off-flavor meat trait, mainly caused by high levels of androstenone, skatole and/or indole in adipose tissue [1]. The taint has been described as being similar to urine and manure and may occur in meat from uncastrated sexually mature male pigs [2]. Consumers commonly show a strong aversion to tainted meat. Currently, surgical castration of male piglets is a common practice in many countries to produce taintfree porcine meat [3]. However, castration is undesirable due to ethical and economical concerns [4] and rearing entire males instead of castrates has a number of advantages including higher efficiency, leaner carcasses and lower faecal and urinary nitrogen losses [5]. By 2018, castration of piglets is going to be banned in the European Community [6]. Consequently, there is an urgent need to develop alternative methods to prevent tainted meat. In literature, it has been mentioned that lowering the slaughter weight or choosing a definite breed can reduce the boar taint [7], however, these could lead to some economical drawbacks. Skatole is a derivative of tryptophan produced in the hindgut of pigs by intestinal bacteria. The level of intestinal skatole production is mainly dependent on nutritional factors and no genetic control has been demonstrated so far [8]. On the other hand, for androstenone high heritability estimates (h 2 = 0.25 to 0.87) and differences between sire lines have been reported [9;10;11]. Consequently molecular breeding seems to be a promising way to produce pigs without boar taint.
Androstenone is synthesized in the testis from pregnenolone [8;12;13], in relation with sexual development. It is mainly degraded in liver and deposited in adipose tissue because of its lipophilic properties [14]. Metabolism of androstenone is presented in two phases: phase I consists metabolism by hydrogenation and phase II consists metabolism by sulfoconjugation in testis or in liver [8;14;15;16]. Therefore, in theory, high levels of androstenone in fat can be dedicated to a high intensity of testicular synthesis and/or a low intensity of liver degradation [8]. This phenomenon is mainly controlled by enzymes and regulatory proteins such as cytochrome P450 and hydroxysteroid sulfotransferase family. Cytochrome P450s (CYPs) act as mono-oxygenases, with functions ranging from the synthesis to the degradation of endogenous steroid hormones [17]. Androstenone synthesis is initiated by cleavage of cholesterol to produce pregnenolone. This reaction is catalysed by the enzyme CYP11A [8]. Formation of 16androstene steroids from pregnenolone is orchestrated by CYB5 which causes overproduction of 16-androstene steroids in testis [18;19]. Two other cytochrome P450 enzymes CYP17 and CYP21 have also been investigated for the involvement in steroidogenesis [8]. 3-b-hydroxysteroid dehydrogenase (3b-HSD) enzyme encoded by HSD3B gene [20] reduces androstenone to b-androstenol in pig liver microsomes [14]. The 16-androstene steroids in the liver and testis are sulfoconjugated by hydroxysteroid sulfotransferase (SULT2A) [16;21].
A number of quantitative trait loci (QTL) and genome-wide association analysis have been conducted for androstenone in the purebred and crossbred pig populations [2;22;23;24;25;26]. Gene expression analysis has been used to identify candidate genes related to the trait of interest. Several candidate genes have been proposed for divergent androstenone levels in different pig populations by global transcriptome analysis in boar testis and liver samples [27;28;29]. Functional genomics provides an insight into the molecular processes underlying phenotypic differences [30] such as androstenone levels. RNA-Seq is a recently developed next generation sequencing technology for transcriptome profiling that boosts identification of novel and low abundant transcripts [31]. RNA-Seq also provides evidence for identification of splicing events, polymorphisms, and different family isoforms of transcripts [32]. The major aim of this study was to elucidate the genes involved in androstenone metabolism in testis and liver tissues using RNA-Seq technology. For this purpose, we analyzed differential expression of genes between high and low androstenone sample groups and polymorphisms that appear on the differentially expressed genes.

Analysis of RNA-Seq Data
We sequenced cDNA libraries from 10 samples per tissue using Illumina HiSeq 2000. The sequencing produced clusters of sequence reads with maximum 100 base-pair (bp) length. After quality filtering the total number of reads for testis and liver samples ranged from 13.2 million (M) to 33.2 M and 12.1 M to 46.0 M, respectively. There was no significant difference in the number of reads from low and high androstenone samples (p = 0.68). Total number of reads for each tissue group and the number of reads mapped to reference sequences are shown in Table 1 and 2. In case of testis 42.20% to 50.34% of total reads were aligned to reference sequence whereas, in case of liver 40.8% to 56.63% were aligned.

Differential Gene Expression Analysis
Differential gene expression for testis and liver with divergent androstenone levels were calculated from the raw reads using the R package DESeq [33]. The significance scores were corrected for multiple testing using Benjamini-Hochberg correction. We used a negative binomial distribution based method implemented in DESeq to identify differentially expressed genes (DEGs) in testis and liver with divergent androstenone levels. The smear plots for differential expression between high and low androstenone levels in testis and liver are given in Figure S1. A GLM analysis (implemented in DESeq package) was also done on the same data set to identify genes with a significant difference between within group deviance and between group deviances. Finally DEGs were selected based on criteria p adjusted ,0.05 and fold change $1.5 from first analysis and p adjusted ,0.05 in GLM analysis (Table S1). A total of 46 and 25 DEGs were selected from the differential expression analysis for testis and liver tissues respectively (Table 3 and 4). In testis tissues, 14 genes were found to be highly expressed in high androstenone group whereas, 32 genes were found to be highly expressed in low androstenone group. In the liver tissue, 9 genes were found to be highly expressed in high androstenone group whereas, 16 genes were found to be highly expressed in low androstenone group (Table 3 and

Biological Function Analysis for DEGs
To investigate gene functions and to reveal the common processes and pathways among the selected DEGs, Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems, www. ingenuity.com) was used. In testis samples, out of 46 DEGs 39 were assigned to a specific functional group based on the information from IPA ( Figure 2). A large proportion (84.7%) of the DEGs from testis high androstenone group fell into Gene Ontology (GO) categories such as molecular transport, small molecule biochemistry, amino acid metabolism, embryonic development, carbohydrate metabolism, lipid metabolism and reproductive system development and function ( Figure 2). The genes classified into each functional group are listed in the Table 5.
For the liver androstenone samples, out of 25 DEGs, 22 could be assigned to a specific functional group based on the information from IPA (Figure 3). A large proportion (88.0%) of the DEGs from liver high androstenone group was enriched with GO functional categories such as amino acid metabolism, small molecule biochemistry, cellular development, lipid metabolism, molecular transport, cellular function and maintenance and cellular growth and proliferation ( Figure 3). The genes classified into each functional group are listed in Table 6.

Validation of Selected DEGs with Quantitative Real Time PCR (qRT-PCR)
In order to validate the RNA-Seq results, on the basis of differential expressions and functions related to androstenone, a total of 10 genes were selected and quantified using qRT-PCR. ARG2, CYP2C33, MSMO1, EDN1 and CYP2B22 genes from testis samples and IP6K1, BTG3, CYP7A1, FMO5 and HSD17B2 genes from liver samples were selected. For this purpose, the same samples used in the deep sequencing were used. Comparison of qRT-PCR data for 10 selected genes showed complete concordance of expression with the RNA-Seq results ( Figure 4A and 4B). To further validate the expression of selected genes more robustly, new grouping of independently high (n = 5) and low (n = 5) androstenone are done among the remaining 90 pigs. The mRNA expressions of selected genes showed similar pattern of expression in this new groups ( Figure 4C and D). Gene expression values for qRT-PCR were normalized using housekeeping genes PPIA and GAPDH [34].

Gene Variation Analysis
In total 222,225 and 202,249 potential polymorphism were identified in high and low androstenone testis groups. Among these identified polymorphisms, 8,818 in high androstenone group and 8,621 in low androstenone group were global polymorphisms with reference and accession identifiers in dbSNP database. Similarly in liver high and low androstenone samples 169,181 and 164,417 potential polymorphisms were identified. There were 6,851 global polymorphisms in high androstenone liver sample and 6,436 global polymorphisms in low androstenone liver sample.
Polymorphisms identified in DEGs for testis and liver samples are given in Table 7 and Table 8. In the testis samples 12 gene polymorphisms were identified in 8 DEGs (Table 7). Additionally our results of deep sequencing in limited number of animals revealed that mutations for the genes CD244 and ARG2 were specific for high androstenone testis tissues, whereas mutations in genes IFIT2, DSP and IRG6 were specific for low androstenone testis samples. Furthermore, we have selected SNPs in IRG6, DSP, MX1 and IFIT2 genes to validate their segregation and association in our population (Table S3 and Table S4). Polymorphisms in IRG6 (g.118838598G.A), MX1 (g.144420441C.T) and IFIT2 (g.106102335 G.T) were associated with androstenone level (Table 9).
Thirty six mutations were identified in 11 DEGs in liver samples (Table 8). Variation in HAL gene was specific for high androstenone liver samples whereas FMO5, HIST1H4K and TSKU gene variations were specific for low androstenone liver samples (Table 8). Read counts for individual samples for identified polymorphisms in testis and liver tissues are given in Table S2. Additionally, we have validated SNPs in highly polymorphic genes CYP7A1, KRT18 and FMO5 and their association in our population (Table S3 and Table S4). The SNP in CYP7A1 (g.77201533 A.G), KRT18 (g.16788495 G.A) and FMO5 (g.104473018 G.A) were found to be associated with the phenotype androstenone level (Table 9).

Analysis of RNA-Seq Data
The present study describes the transcriptome profiles of testis and liver for androstenone by using RNA-Seq. To the best of our knowledge this study provides the first comprehensive insight into the transcriptome of androstenone metabolism in testis and liver tissue by using RNA-Seq. Using the whole transcriptome sequencing technique, we were able to identify the levels of differentially expressed genes and associate these genes with divergent androstenone levels in terms of boar taint. Our findings clearly demonstrated the power of RNA-Seq and provide further insights into the transcriptome of testis and liver in androstenone at a finer resolution. Illumina sequencing data have been described as replicable with relatively little technical variation [35]. Although 45% to 50% ( Table 1 and 2) of the fragments do not map to annotated exons in our study, we were able to identify genes associated with divergent androstenone levels. Porcine annotation is incomplete, as evidenced by read mapping annotation. The percentage of annotated reads varies from 15.6% to 60.8% in similar porcine transcriptome studies [36;37;38]. The differences between mapping percentages might be due to several factors such as primer biases, GC content, dinucleotide fragmentation sites, independent cell types and laboratory protocols [39;40]. Another factor is that the current reference transcriptome assembly might not cover all transcribed mRNA and consequently low abundant transcripts or rare alternative splicing isoforms are less likely to be mapped to transcriptome assembly [38].

Differential Gene Expression and Gene Polymorphism Analysis in Testis
In this study, 46 genes were differentially regulated in testis with divergent androstenone levels ( Table 3). Our findings of differential gene expression are in accordance with the current understanding of androstenone metabolism as well as the previous findings in functional studies. In our study, the most up and down regulated genes DKK2 and KRT82 were found to be novel genes related to androstenone metabolism. Dickkopf-related protein 2 is encoded by the DKK2 gene which was identified as the highest up regulated gene in our study (Table 3). DKK2 can act as either an agonist or antagonist of Wnt/beta-catenin signalling [41]. WNT signalling in the testis has not been well understood, however it has been shown to play an important role in proliferation and selfrenewal of mouse and human spermatogonia [42]. Mutation in bcatenin leads to the over activity of b-catenin in Sertoli cells caused testicular cord disruption, germ cell depletion, and inhibition of Müllerian duct regression suggesting that inhibition of b-catenin signalling is essential for Sertoli cell and testicular cord maintenance and germ cell survival [43]. Baes et al. [44] found that breeding against androstenone may have slightly adverse effects on semen quality. In the light of these external references it could be speculated that up regulation of DKK2 gene in this study may have antagonistic effect on Wnt/beta-catenin signalling pathway which has shown to cause negative effects on sperm production. KRT82 was the highest down regulated gene in high androstenone testis tissues in our study ( Table 3). The protein encoded by this gene is a member of the keratin gene family which contains at least 54 functional keratin genes in humans [45]. Keratin-related genes are known to be affected by androgen exposure, especially by Dihydrotestosterone (DHT) exposure [46]. Relation of DHT with androsterone has been shown by Rizner et al. [47]. These literature evidences show that down-regulation of KRT82 gene is the end result of high androgen metabolism in testis and not directly involved in androsterone synthesis.
There are similarities between gene expression differences found with RNA-Seq and those reported in previous microarray studies in porcine testis and liver tissues [27;28;29]. Grindflek et al. [48] and Moe et al. [29] reported cytochrome P450 superfamily genes to be differentially regulated in their investigated testis samples. In our study, other members of cytochrome P450 family genes were found to be differentially regulated in addition to genes reported by these previous studies. Our findings showed that genes CYP4B1, CYP4A11 and CYP2C33 were up regulated and gene CYP2B22 was down regulated (Table 3). Among these genes, CYP4A11 was enriched in Gene Ontology categories molecular transport, small molecule biochemistry and lipid metabolism and gene CYP4B1 was revealed in small molecule biochemistry (Table 5). In accordance with our results, Moe et al. [29] also showed lipid metabolism to be one of the enriched GO categories for DEGs in testis samples.
In addition to transcriptome quantification, RNA-Seq technology provides valuable information regarding gene polymorphisms which could be directly correlated with the relevant phenotype. Several holistic gene expression analyses have been performed for boar taint compounds by using microarray or Real-Time PCR technology [27;28;29]. Our study extends these observations by correlating differentially regulated genes with associated polymorphisms. Gene polymorphisms in the exonic regions might have direct effect on the expression of transcripts and connecting our identified polymorphisms from RNA deep sequencing with GWAS studies may give additional insight to variation in the androstenone levels. Results in our study revealed 12 mutations in androstenone testis samples (Table 7). On SSC3 four polymorphism were identified, two at 35 Mb (insertion) on gene HBA2,   . Cross matching the chromosomal positions of these SNPs with data from dbSNP database showed that one of the SNPs (at position 106,102,335) has already been annotated in the SNP database (dbSNP ID: rs80925743). A QTL region for androstenone was identified on the same chromosome between 87.9 cM and 108.7 cM by Lee et al. [24] and the SNPs identified in our study fit into this previously identified androstenone QTL region.

Differential Gene Expression and Gene Polymorphism Analysis in Liver
Twenty five genes were found to be differentially regulated in liver tissue with divergent androstenone levels ( Table 4). The top two up regulated genes in our liver sample were LOC100512122 with log fold change 3.89 and LOC100511195 with log fold change 3.57. However, we were not able to identify either the gene names or function through orthologue databases or BLAST sequence similarity searches. As a result, the functions of these genes cannot be discussed in detail here.
IP6K1 was the third highest up regulated in our liver samples. Inositol hexakisphosphate kinase 1 (IP6K1) is a member of the inositol phosphokinase family which encodes protein responsible for the conversion of inositol hexakisphosphate (InsP6) to diphosphoinositol pentakisphosphate (InsP7/PP-InsP5) [50]. Chakraborty et al. [51] have shown that targeted deletion of IP6K1 in mice liver has increased Akt and mTOR signalling and decreased GSK3b signalling. Since this gene is highly expressed in liver, several factors including the diet of the sample population might have a larger impact on the expression of this gene. At this point, we are not able to pinpoint the effect of this gene (IP6K1) on androstenone metabolism in liver.
In our liver sample, HSD17B2 was the highest down regulated gene with fold change 22.86 (Table 4). Hydroxysteroid (17-beta) dehydrogenase 2 (HSD17B2) regulate the availability of testosterone and androstenedione in tissues by catalysing interconvertion of active and inactive forms of steroids [52]. Gene expression studies by Moe et al. [28] have also shown the down regulation of  HSD17B2 gene in liver sample. Moreover, different members of the HSD enzyme family (HSD17B4, HSD17B11 and HSD17B13) were found to be differentially regulated in Duroc and Norwegian Landrace populations [28].
Our results showed that cytochrome P450 family gene CYP7A1 is differentially regulated in liver samples. CYP7A1 is the rate-limiting enzyme in the synthesis of bile acid from cholesterol. The conversion of cholesterol to bile acid is the major pathway for cholesterol metabolism [30]. CYP7A1 is a cytochrome P450 heme enzyme that oxidizes cholesterol using molecular oxygen. Downregulation of CYP7A1 causes reduced fat catabolism in liver which may lead to higher fat accumulation and androstenone level due to Table 6. Functional categories and corresponding DEGs in high androstenone liver tissues.  the dynamic relationship between androstenone in plasma and adipose tissue [53]. Gene expression profiles by Moe et al. [28] have also shown cytochrome P450 family genes to be differentially regulated in liver samples. Another gene family found to be differentially expressed in our transcriptome analysis is flavin-containing monooxygenases (FMOs) gene family. The FMO family of enzymes converts lipophilic compounds into more polar metabolites and decreases activity of the compounds [54]. In the study conducted by Moe et al. [28], using microarray analysis, FMO1 is reported to be upregulated in higher androstenone pigs. In contrast, FMO5 was found to be down-regulated in high androstenone liver samples in our study. Since androstenone is a lipophilic compound, we speculate that androstenone level was negatively correlated with FMO5 activity. Since androstenone is a lipophilic compound, we speculate that androstenone level was negatively correlated with FMO5 activity.
Among the differentially expressed genes in liver, the gene CDKN1A was enriched in GO categories amino acid metabolism, small molecule biochemistry, lipid metabolism and molecular transport. The differentially expressed gene CYP7A1 was enriched in GO categories such as small molecule biochemistry, lipid metabolism and molecular transport. Gene Ontology functional analysis by Moe et al [28] has also shown that the GO categories lipid metabolism and amino acid metabolism were enriched.
Gene polymorphism analysis has shown that there were thirty six mutations in 11 DEGs in liver samples (Table 8). Eight SNPs were identified on SSC4 at position 77 Mb (Table 8) which were mapped to gene CYP7A1. In close adjacency to this region Quintanilla et al. [25] identified an androstenone related QTL at position 72 cM. An additional SNP was identified on SSC4 at position 104 Mb mapped to gene FMO5 however, this position was not mapped as androstenone related QTL region by any previous studies. Our results have also shown that this SNP at position 104,473,018 has been already reported in dbSNP database (dbSNP ID: rs80837900) ( Table 8). Six polymorphisms on SSC5 at position 16.7 Mb were mapped to gene KRT8. Out of these identified polymorphisms, four SNPs were previously mapped to dbSNP database (Table 8). Another set of 6 polymorphisms on SSC 5 at position 16.7 Mb mapped to the gene KRT18. Three SNPs among these six polymorphisms were already identified and reported in dbSNP database (Table 8).
Grindflek et al. [2] detailed an androstenone QTL region on SSC5 between 20.4 and 22.2 Mb in close proximity to our reported polymorphisms (Table 8). An insertion gene polymorphism at position 82 Mb on the same chromosome mapped to HAL gene was also indentified in our study. On SSC 7 we identified 3 SNPs, one at position 22 Mb on gene HIST1H4K and two at position 36 Mb mapped to gene CDKN1A. One of the SNP mapped to CDKN1A at position 36.9 Mb has already been reported in dbSNP database (dbSNP ID: rs80964639). An androstenone QTL region on SSC7 between position 33.6 and 88.3 Mb was already described by Grindflek et al. [2]. Two SNPs on CDKN1A identified in our study falls into this previously mentioned QTL region. Genome wide association study by Grindflek et al. [2] described androstenone related QTL region on SSC9 at position 7.5 to 8.0 Mb. We report an SNP on the same chromosome at position 10 Mb, close to the previously reported QTL region (Table 8). In addition, we obtained an insertion polymorphism mapped to NNMT gene at position 40 Mb. We identified two polymorphisms at 38 Mb on SDS gene in the vicinity of the androstenone QTL region on SSC14 at 37 cM [25]. Furthermore, our analysis revealed 7 additional SNPs on SSC14 at 101 Mb on MBL2 gene. Lee et al. [24] described a QTL on SSC14 at position 87.9 to 108.7 cM for androstenone in the Large White 6 Meishan crossbred population.
Selected polymorphisms in genes IRG6, MX1, IFIT2, FMO5, CYP7A1 and KRT18 were found to be associated with the phenotype androstenone level in this study (Table 9). An association study was performed for a SNP (g.494 A.G) in the FMO5 gene but no statistical relation could be detected with the off flavour score in the Berkshire x Yorkshire resource population [55]. Location of IFIT2 gene on SSC14 incorporated the QTL affecting androstenone in Yorkshire pig [56] and subjective pork flavour in Large White and Meishan pigs [24]. MX1 is an interesting candidate gene for disease resistance in farm animals [57] but this study first identifies association with boar taint compounds. No study investigated association of CYP7A1 with boar taint compounds. Some study reported association of this gene with plasma cholesterol in pigs [58]. Boar taint is related to the adipose tissues since lean pigs have low boar taint compounds [59]. The function of highly polymorphic KRT18 is relating to  Table 9). a Root mean square phred score. doi:10.1371/journal.pone.0063259.t007 pathological processes in liver but involvement in boar taint is not quite clear. However, this gene maps close to a region on SSC5 affecting androstenone in pigs [2].

Conclusion
Here we showed whole genome expression differences for varying androstenone levels in testis and liver tissues. RNA-Seq provided high resolution map of transcriptional activities and genetic polymorphisms in these tissues. However, due to incomplete porcine annotations, only around 50% of the total reads could be mapped to annotated references. The improvements in pig genome annotations may lead to better coverage and detailed understanding of genetic and functional variants such as novel transcripts, isoforms, sequence polymorphisms and noncoding RNAs. Integration of high throughput genomic and genetic data (eQTL) with proteomic and metabolomic data can provide additional new insight into common biological processes and interaction networks responsible for boar taint related traits.
On the basis of number of DEGs, our results confirm that transcriptome activity in testis is higher in comparison to liver tissue for androstenone biosynthesis. These results also show that the entire functional pathway involved in androstenone metabolism is not completely understood and through this study, we propose additional functional candidate genes such as, DKK2 and CYP2B22 in testis and IP6K1 and HSD17B2 in liver for androstenone metabolism. Importantly, most of the DEGs are in QTL positions functionally related to pathways involved in boar taint. Furthermore, various gene polymorphisms were also detected in testis and liver DEGs and associations are validated with androstenone levels. Potential polymorphisms and association were identified in DEGs such as IRG6, MX1, and IFIT2 in testis and CYP7A1, FMO5 and KRT18 in liver. This transcriptome and polymorphisms analysis using RNA deep sequencing combining with association analysis has revealed potential candidate genes affecting boar taint compound. It is speculated that these polymorphisms could be used as biomarkers for boar taint related traits. However, further validation is required to confirm the effect of these biomarkers in other animal populations.

Animals and Phenotypes
Tissue samples and phenotypes were collected from the Duroc 6 F 2 cross animals. F 2 was created by crossing F 1 animals (Leicoma 6 German Landrace) with Large White pig breed. Duroc 6 F 2 boars were on average 116 days old and had on average 90 kg live weight when slaughtered. All pigs were slaughtered in commercial abattoir, called Landesanstalt für Schweinezucht -LSZ Boxberg. Slaughterhouse management gave the necessary permissions for the tissue and organ collections. Animals were bred and growth, carcass and meat quality data were collected according to guidelines of the German performance test [60]. Tissue samples from testis and liver were frozen in liquid nitrogen immediately after slaughter and stored at 280uC until used for RNA extraction. Fat samples were collected from the neck and stored at 220uC until used for androstenone measurements. For the quantification of androstenone an in-house gas-chromatography/mass spectrometry (GC-MS) method was applied as described previously [61]. Pigs having a fat androstenone level less than 0.5 mg/g and greater than 1.0 mg/g were defined as low and high androstenone samples, respectively. Ten boars were selected from a pool of 100 pigs and the average androstenone value for these selected animals was 1.3660.45 mg/g. Notably, these 100 boars were used for association study (Table S3 and Table S4). RNA was isolated from testis and liver of 5 pigs with extreme high (2.4860.56 mg/g) and 5 pigs with extreme low levels of androstenone (0.2460.06 mg/g). Total RNA was extracted using RNeasy Mini Kit according to manufacturer's recommendations (Qiagen). Total RNA was treated using on-column RNase-Free DNase set (Promega) and quantified using spectrophotometer (NanoDrop, ND8000, Thermo Scientific). RNA quality was assessed using an Agilent 2100 Bioanalyser and RNA Nano 6000 Labchip kit (Agilent Technologies).

Library Construction and Sequencing
Full-length cDNA was obtained from 1 mg of RNA, with the SMART cDNA Library Construction Kit (Clontech, USA), according to the manufacturer's instructions. Libraries of amplified RNA for each sample were prepared following the Illumina mRNA-Seq protocol. The library preparations were sequenced on an Illumina HiSeq 2000 as single-reads to 100 bp using 1 lane per sample on the same flow-cell (first sequencing run) at GATC Biotech AG (Konstanz, Germany). All sequences were analysed using the CASAVA v1.7 (Illumina, USA). The deep sequencing data have been deposited in NCBI SRA database and are accessible through GEO series accession number GSE44171 (http://www.ncbi.nlm.nih.gov/geo/query/acc. cgi?acc = GSE44171).

Reference Sequences and Alignment
Two different reference sequence sets were generated from NCBI Sscrofa 9.2 assembly: (1) the reference sequence set generated for differential expression analysis comprised of RefSeq mRNA sequences (cDNA sequences) and candidate transcripts from NCBI UniGene database (Sscrofa). (2) For gene variation analysis a different reference sequence set, generated from whole genome sequence (chromosome assembly) was used. During Table 9. Genotype and association analysis of selected candidate genes with androstenone.

Differential Gene Expression Analysis
The differential gene expression analysis was designed to contrast the difference in the expression of genes between high and low androstenone samples. For differential gene expression analysis with raw count data a R package DESeq was used [33]. The normalization procedure in DESeq handles the differences in the number of reads in each sample. For this purpose, DESeq first generates a fictitious reference sample, with read counts defined as the geometric mean of all the samples. The read counts for each gene in each sample is divided by this geometric mean to obtain the normalized counts. To model the null distribution of the count data, DESeq follows an error model that uses the negative binomial distribution, with variance and mean linked by local regression. The method controls type-I error and provides good detection power [33]. After analysis using DESeq, DEGs were filtered based on p-adjusted value [63] 0.05 and fold change $1.5. Additionally, the gene expression data was also analyzed using a Generalized Linear Model (GLM) function implemented in DESeq to calculate both within and between group deviances. As sanity checking and filtration step, we cross matched the results from both analysis (p-adjusted #0.05 and fold change $1.5 criteria and GLM analysis) and only those genes which appeared to be significant in both the tests (p-value #0.05), were selected for further analysis. The results of GLM analysis are given in Table  S2.

Gene Variation Analysis
For gene variation analysis the mapping files generated by aligning the raw reads to reference sequence set (2) were used. All the downstream analysis was performed using Genome Analysis Toolkit (GATK) [64] and Picard Tools (http://picard.sourceforge. net/). The Genome Analysis Toolkit (GATK) was used for local realignment incorporating Sscrofa 9.2 converted SNPs which was described in the previous section. SNPs were furthermore classified as synonymous or non-synonymous using the GeneWise software (http://www.ebi.ac.uk/Tools/psa/genewise/last accessed 21.03.2013) by comparing between protein sequence and nucleotides incorporated SNP position [65]. Covariate counting and base quality score recalibration were done using the default parameters suggested by GATK toolkit. The re-aligned and recalibrated mapping files were grouped according to tissue and phenotype categories. Variant calling was performed for each group using GATK UnifiedGenotyper [64]. To find out the differentially expressed genes that also harboured sequence polymorphisms, we filtered the results from UnifiedGenotyper with chromosomal positions of DEGs and retained only those which mapped to DEG chromosomal positions. By this way, we were able to isolate a handful of mutations that mapped to DEGs from many thousands of identified potential sequence polymorphisms. Additionally to understand whether these identified polymorphisms segregate either in only one sample group (high androstenone or low androstenone group) or in both the groups (high and low androstenone group) we calculated the read/ coverage depth of these polymorphisms in all the samples. The results of this analysis are detailed in the results section and read coverage for individual samples are given in Table S2.

Pathways and Networks Analysis
A list of the DEGs was uploaded into the Ingenuity Pathway Analysis (IPA) software (Ingenuity Systems, www.ingenuity.com) to identify relationships between the genes of interest and to uncover common processes and pathways. The 'Functional Analysis' tool of the IPA software was used to identify the biological functions that were most significant to the data set [5].

Quantitative Real-Time PCR (qRT-PCR) Analysis
For qRT-PCR experiment, total RNA from testis and liver samples were isolated from the 10 boars used for deep sequencing. Additionally, RNA was isolated from the similar tissues of 10 independent boars with divergent androstenone level among the remaining 90 boars. cDNA were synthesised by reverse transcription PCR using 2 mg of total RNA, SuperScript II reverse transcriptase (Invitrogen) and oligo(dT)12 primer (Invitrogen). Gene specific primers for the qRT-PCR were designed by using the Primer3 software [66]. Detailed information for primers used in this study was given in Table 10. In each run, the 96-well microtiter plate contained each cDNA sample and no-template control. The qRT-PCR was conducted with the following program: 95uC for 3 min and 40 cycles 95uC for 15 s/60uC for 45 s on the StepOne Plus qPCR system (Applied Biosystem). For each PCR reaction 10 ml iTaqTM SYBRH Green Supermix with Rox PCR core reagents (Bio-Rad), 2 ml of cDNA (50 ng/ml) and an optimized amount of primers were mixed with ddH 2 O to a final reaction volume of 20 ml per well. All samples were analysed twice (technical replication) and the geometric mean of the Ct values were further used for mRNA expression profiling. The geometric mean of two housekeeping genes GAPDH and PPIA were used for normalization of the target genes. The delta Ct (DCt) values were calculated as the difference between target gene and geometric mean of the reference genes: (DCt = Ct target 2 Ct housekeeping genes ) as described in Silver et al. [67]. Final results were reported as fold change calculated from delta Ct-values. Details of primers which were used for qRT-PCR study are shown in Table 10.

Validation of SNP and Association Study
Seven SNPs were selected covering both the testis and liver samples for further validation and association study (Table 9). Genotyping in 100 boars were performed by PCR-RFLP method. In brief, a working solution with a final concentration of 50 ng/ml DNA was prepared and stored at 4uC for further analysis. Polymerase chain reactions (PCR) were performed in a 20 ml volume containing 2 ml of genomic DNA, 1 6 PCR buffer (with 1.5 mM MgCl 2 ), 0.25 mM of dNTP, 5 pM of each primer and 0.1 U of Taq DNA polymerase (GeneCraft). The PCR product was checked on 1.5% agarose gel (Fischer Scientific Ltd) and digested by using the restriction enzyme (Table 10). Digested PCR-RFLP products were resolved in 3% agarose gels. Details of GenBank accession numbers, primers sequences, annealing temperature and SNP position used in this study are listed in Table 10. Statistical analyses were performed using SAS 9.2 (SAS Table 10. Details of primers used for qRT-PCR analysis and genotyping. Institute Inc., Cary, USA). Effects of slaughter age, husbandry system (pen) as well as genotype on boar taint compound androstenone were assessed with fixed effect model (ANOVA) using PROC GLM. For all models, fixed effects included genotype and pen (group, individual), and age of slaughter was fitted as a covariate for boar taint compound androstenone. Due to the skewed nature of the androstenone, data were transformed with natural logarithm before ANOVA to achieve normality. Least square mean values for the loci genotypes were compared by t-test and p-values were adjusted by the Tukey-Kramer correction [68;69;70]. Figure S1 The smear plots for differential expression between high and low androstenone levels in testis and liver.