RNA-Seq Approach for Genetic Improvement of Meat Quality in Pig and Evolutionary Insight into the Substrate Specificity of Animal Carbonyl Reductases

Changes in meat quality traits are strongly associated with alterations in postmortem metabolism which depend on genetic variations, especially nonsynonymous single nucleotide variations (nsSNVs) having critical effects on protein structure and function. To selectively identify metabolism-related nsSNVs, next-generation transcriptome sequencing (RNA-Seq) was carried out using RNAs from porcine liver, which contains a diverse range of metabolic enzymes. The multiplex SNV genotyping analysis showed that various metabolism-related genes had different nsSNV alleles. Moreover, many nsSNVs were significantly associated with multiple meat quality traits. Particularly, ch7:g.22112616A>G SNV was identified to create a single amino acid change (Thr/Ala) at the 145th residue of H1.3-like protein, very close to the putative 147th threonine phosphorylation site, suggesting that the nsSNV may affect multiple meat quality traits by affecting the epigenetic regulation of postmortem metabolism-related gene expression. Besides, one nonsynonymous variation, probably generated by gene duplication, led to a stop signal in porcine testicular carbonyl reductase (PTCR), resulting in a C-terminal (E281-A288) deletion. Molecular docking and energy minimization calculations indicated that the binding affinity of wild-type PTCR to 5α-DHT, a C21-steroid, was superior to that of C-terminal-deleted PTCR or human carbonyl reductase, which was very consistent with experimental data, reported previously. Furthermore, P284 was identified as an important residue mediating the specific interaction between PTCR and 5α-DHT, and phylogenetic analysis showed that P284 is an evolutionarily conserved residue among animal carbonyl reductases, which suggests that the C-terminal tails of these reductases may have evolved under evolutionary pressure to increase the substrate specificity for C21-steroids and facilitate metabolic adaptation. Altogether, our RNA-Seq revealed that selective nsSNVs were associated with meat quality traits that could be useful for successful marker-assisted selection in pigs and also represents a useful resource to enhance understanding of protein folding, substrate specificity, and the evolution of enzymes such as carbonyl reductase.


Introduction
The pig, a major source of meat-based protein, is an economically important livestock animal. For the sake of improving genetic traits of economic importance, pigs have been bred using artificial selection. Among such traits, meat quality traits in particular have been included in genetic improvement programs because mechanisms controlling the traits are strongly associated with various internal intrinsic factors such as meat color, firmness, wetness, and intramuscular fat contents that are primarily determined by the postmortem metabolism, regulated by the altered functionality of native proteins during conversion of muscle to meat [1,2], which is in turn affected by the coordinated inheritance of genetic variations [3,4]. Accordingly, genetic variations have been an important resource for marker-assisted selection of genetic markers for pork meat quality traits.
Single nucleotide variations (SNVs) especially can affect meat quality development by altering or disrupting various metabolic pathways depending on where the variations are located in the genome, such as in regions encoding protein-coding or noncoding regions. For example, variations in non-coding regions often affect gene splicing, CpG methylation, transcription factor binding, or the sequence of noncoding RNA, leading to altered levels of gene expression [5,6,7,8,9]. Noticeably, variations in protein-coding regions give rise to nonsynonymous change in the amino acid sequence of the encoded protein and thus, are more likely to affect the protein structure and function [10,11], probably leading to the critical effects on a phenotype of interest such as meat quality trait [12]. In addition, it has been estimated that 20-30% of nsSNVs affect protein function [13,14]. Accordingly, nonsynonymous single nucleotide variation (nsSNV) has been a subject of recent interest in studies of the protein structure and function [10,11] and even in genetic improvement of a phenotype in human and livestock [15,16,17,18]. Therefore, the abundance of nsSNV data can facilitate the identification of protein folding changes through structural and phylogenetic comparisons and reveal evolutionarily conserved sites that may be critical for protein function. Furthermore, structural and functional information can provide important insights into associations between nsSNVs and meat quality traits.
To obtain massively unbiased information on nsSNVs, wholegenome resequencing is a useful method [19,20], but it is very expensive and time consuming. Recently, there have been several examples of next-generation transcriptome sequencing (RNA-Seq) for nsSNV discovery. The RNA-Seq technique was first conducted by Chepelev et al. to identify SNVs in expressed exons of the human genome [21] and, thereafter, was applied to the discovery of SNVs in the bovine milk transcriptome and subjected to further validation by genotyping analysis [22]. From both of these studies, RNA-Seq was also demonstrated to be a cost-effective and timesaving strategy for the systematic identification of nsSNVs in the expressed regions of the genome. Moreover, the strategy is capable of significantly increasing the efficiency of selectively identifying specific trait-related nsSNVs when RNA samples from a specific trait-related tissue or cell are used for RNA-Seq [21,22]. Thus, this may be a reasonable method for the selective discovery of porcine nsSNVs that affect meat quality-related metabolism, despite the lack of a complete genome sequence for the pig.
In this study, porcine RNA-Seq analysis was carried out to discover nsSNVs significantly associated with pork meat quality traits. For this, porcine liver RNA samples were chosen to selectively identify metabolism-related nsSNVs, since liver tissue produces diverse metabolic enzymes. Using this approach, several nsSNV candidates were identified, and, among these, a few candidates were validated by genotyping analysis. Furthermore, it was assessed whether or not they were significantly associated with meat quality traits. In addition, this approach, which integrates structural and phylogenetic analyses, was applied to porcine testicular carbonyl reductase (PTCR), which catalysis the NADPH-dependent reduction of aldehydes or ketones on a large number of carbonyl compounds, including steroid hormones [23].
Taken collectively, our approach integrating RNA-Seq and genotyping analyses revealed that selective nsSNVs are associated with meat quality traits and that they may be useful for successful marker-assisted selection in pigs. The RNA-Seq approach should also provide a useful resource for understanding protein folding, substrate specificity, and enzyme evolution, when combined with structural and phylogenetic approaches.

Identification of the porcine liver transcriptome by RNA-Seq
To discover nsSNVs related to postmortem metabolism, transcriptome sequencing (RNA-Seq) was performed using total RNA from liver tissues of four genetically different porcine breeds, Berkshire, Duroc, Landrace, and Yorkshire. The analysis included a total of 97,524,874 trimmed reads, and an average of 24,381,219 reads was obtained for each porcine sample. The trimmed reads were further assembled and mapped to UniGene, an annotated pig transcriptome assembly. According to the mapping results, among the total trimmed reads, 52,417,707 reads, ,54% as a total mapping percentage, were categorized as mapped reads, corresponding to exon reads. Next, RPKM (reads per kilobase per million mapped reads) values (Mortazavi et al. 2008) were used to identify the total number of genes expressed in the porcine liver. Using RPKM threshold values greater than 1, 11,667 expressed genes were detected in the porcine liver samples (21.77% of the 53,600 total annotated genes for pig in the UniGene transcriptome assembly).

Detection and characterization of nsSNV candidates in porcine liver transcriptome reads
Subsequently, the mapped reads were subjected to transcriptomic SNV discovery in four pig breeds, which was undertaken on the basis of exon reads showing nucleotide variation rates of greater than 30% among genes with a coverage of more than 30 reads. Using this strategy, 18,969 SNV candidates were identified in 4,248 genes, but location information in the gene sequence was available for only 3,623 of these candidates. This information was obtained using the NCBI database, with searches based on the start codons of these gene coding sequences (Table S1 and  Table 1); the UTRs contained 2,168 (59-UTR, 35; 39-UTR, 2,133) SNV candidates and the ORFs contained 1,455 SNV ones. Noticeably, of the 1,455 candidates in the ORFs, 580 ones were nonsynonymous (Table S2). When genes including the 580 nsSNV ones were further classified into various functional categories, many metabolism-or immune-related genes were found to contain nsSNV candidates (Fig. 1), a reasonable result for RNAs extracted from liver tissues.

Validation of nsSNV candidates detected by porcine liver RNA-Seq
To confirm the presence of different alleles in the nsSNV loci found by RNA-Seq, 45 of the nsSNV candidates for which genomic locations in the pig genome assembly (SGSC Sscrofa9.2/ susScr2) were known were selected and genotyped. First, a total of 437 pigs of the pure Berkshire line that had been bred under the same conditions, were selected randomly and slaughtered in 10 batches when their body weight reached 110 kg. Subsequently, genomic DNA was isolated from whole blood cells and subjected to genotyping analysis using the Illumina VeraCode GoldenGate system together with BeadStudio software (Illumina) which was used for genotype clustering and calling. As shown in Table 2, this genotyping revealed high call rates (.90%) and more than 0.01 minor allele frequencies (MAFs) at 27 SNV candidates. In addition, the genotype distributions for 33 SNV candidates in this sample population were in agreement with Hardy-Weinberg equilibrium (HWE) (P.0.05).

Discovery of nsSNVs associated with multiple pork meat quality traits
Among the nsSNVs validated above, 18 nsSNVs showing high call rates (.90%) and more than 0.01 MAFs, the genotype distributions of which were in HWE (p.0.05), were further subjected to association analysis with pork meat quality traits. For this, meat samples from the 437 pigs in Table 2 were used for meat quality evaluation. Meat quality traits such as backfat thickness, carcass weight, meat color, drip loss, cooking loss, shear force, water holding capacity, post-mortem pH, and chemical compositions (fat, protein, collagen, and moisture) were collected for statistical analysis, as described in the 'Materials and Methods' section. As shown in Table S3, 15 nsSNVs were significantly associated (p,0.01 or p,0.05) with meat quality traits under the codominant model. Especially, 10 nsSNVs showed significant associations with more than three kinds of traits (Table 3). In particular, ch7:g.22112616A.G SNV was associated significantly with multiple meat quality traits, such as backfat thickness, meat color (yellowness), drip loss, shear force, water holding capacity and postmortem pH 24 hr under the codominant model and was found in the gene encoding the histone H1.3-like protein, a component of chromatin. The porcine H1.3-like protein is highly homologous to human H1.3 (Acc. No. NP_005311.1) with a sequence identity of about 92.3%. Human H1.3 is reported to contain post-translational modification sites, such as phosphorylation, acetylation, methylation, and ubiquitination sites, which are responsible for the epigenetic regulation of gene expression [24,25]. The ch7:g.22112616A.G SNV creates a single amino acid change (Thr/Ala) at the 145 th residue of porcine H1.3-like. This residue is very close to the putative 147 th threonine phosphorylation site [26]. Thus, our data suggest that ch7:g.22112616A.G SNV affects multiple meat quality traits by affecting the epigenetic regulation of postmortem metabolismrelated gene expression. This hypothesis is further supported by previous studies showing that epigenetic transcriptional regulation by chromatin modification has broad effects on economic traits in pig [27,28,29]; for example, a C1354T SNV site in the KIAA1717 gene, encoding an H3-K4-specific methyltransferase, showed significant associations with meat quality traits [29]. Taken collectively, our data show that porcine liver RNA-Seq is a useful approach for the selective discovery of nsSNVs associated with meat quality traits, although subsequent integrative approaches are necessary to gain structural and evolutionary insights into the precise associations of nsSNVs with meat quality traits.

Identification of a duplicated gene nucleotide variation that result in premature stop codons
The nsSNVs may be either missense or nonsense, which results in a single amino acid replacement or a premature stop codon,   Table 2. List of nsSNVs validated in porcine liver genes using RNA-Seq and a genotyping assay. respectively. Particularly, the nonsense variation can alter the stability and function of proteins by leading to the truncation of an amino acid chain and thus can cause some genetic disorders including human diseases [30,31,32,33,34].
Noticeably, several genes were identified that included a stop codon caused by a nonsynonymous single nucleotide change. As shown in Table 4, the stop codons were discovered by RNA-Seq analysis in four genes encoding porcine testicular carbonyl  reductase 1 (PTCR), type III receptor tyrosine kinase, mannose receptor C type 1 and aldo-keto reductase family 1 C1-like, where they probably caused C-terminal truncations. We focused on PTCR and further validated the nonsense SNV locus in PTCR by performing both genotyping for the pig population and Sanger sequencing of its partial genomic DNA and full length cDNA, including the variation site. The genotyping showed only G type, no allelic variation, at the locus of PTCR gene in the population (Table 2). However, the sequencing of T-vector-cloned PCR fragments revealed that the gene has different base type (G/T) in an exonic region ( Fig. 2A) and, in particular, at the 951 st nucleotide of the PTCR transcript (Fig. 2B). The different base type (G/T) is likely to be generated by gene duplication during evolution, and thus, it is considered as the paralogous sequence variation, especially a duplicated gene nucleotide variation (DNV) [35]. The DNV base (T) is expected to create a nonsynonymous single base variation (GAG(Glu)RTAG(stop)) in the genomic and transcript regions of PTCR that would lead to a deletion of the Cterminal region from E281 to A288. To further confirm the Cterminal deletion, each of the PTCR genes with a different DNV (G/T) was cloned into a plasmid vector for the production of histagged (about 1 kD) fusion protein and expressed in E. coli BL21. After IPTG induction, SDS-PAGE revealed that the PTCR gene possessing the nonsynonymous single base variation (G.T) was expressed as a protein with a size (about 31 kDa) smaller than that (about 32 kD) of wild-type PTCR (Fig. 2C). This is very consistent with a result in previous report [36]; PTCR/20B-HSD proteins were identified as two bands, 31 kDa and 30 kDa proteins from porcine testis, through western blot analysis, but the reason for detection of a minor 30 kDa protein remains unclear [36]. Thus, our results strongly support that the minor 30 kDa protein, detected in porcine testis, is a C-terminal-deleted PTCR, expressed from the PTCR gene having a paralogous sequence variant (T), probably generated by gene duplication. Taken together, our data suggest that the nonsynonymous variation (G.T), generated by gene duplication, leads to a deletion of the C-terminal region (E281 to A288) of PTCR, strongly indicating that the pig testicular genome endogenously produces both wildtype and C-terminal-deleted PTCR, 31 kDa and 30 kDa proteins, respectively.

Molecular docking study of wild-type and C-terminaldeleted PTCRs and human carbonyl reductase
To understand the structural change caused by nonsense single nucleotide variation, we focused on PTCR for three reasons. First, PTCR is useful for the study of active sites that contribute to substrate specificity because of its capacity to bind a wide range of substrates, such as androgens, progestins, prostaglandins, and even a large number of xenobiotics [23]. Second, because its crystal structure has been resolved at high resolution [37], PTCR provides a fine reference structure from which simulations can be generated for subsequent molecular docking studies. Third, in the PTCR structure, the C-terminal fragment from E281 to A288 is reported to be in the vicinity of the active site [37]. This fragment is absent in human carbonyl reductase despite the high homology (about 85%) between PTCR and human carbonyl reductase [23,37], which suggests that the PTCR-unique C terminus might provide a clue to the differential substrate specificity between porcine and human carbonyl reductases [23].
Accordingly, molecular docking simulations using 5a-dihydrotestosterone (DHT), a C 21 -steroid, were used to correlate differences in substrate binding between wild-type pork PTCR, C-terminal (E281 to A288)-deleted PTCRs, and human carbonyl reductase with differences in their structures. For the molecular docking study, the 2.30 Å crystal structures of PTCR (PDB ID: 1N5D) and human carbonyl reductase (PDB ID: 1WMA) bound with NADPH were used. PTCR and human carbonyl reductase have both been reported to contain the Tyr-Lys-Ser catalytic triad (S139, Y193, and K197) at their active sites, which is involved in the transfer of hydrogen from NADPH to the substrate [37,38,39]. The best-docked conformations were selected according to fitness score and the closest distance between the substrate carbonyl group and the Y193 hydroxyl group that is proposed to be the proton donor for electrophilic attack reactions [39,40]. The refined conformations of wild-type and C-terminal-deleted PTCRs and human carbonyl reductase were obtained using energy minimization calculations (Fig. 3). In wild-type PTCR, the carbonyl group of 5a-DHT forms two hydrogen bonds with the Y193 hydroxyl group and the M234 sulfur atom. However, in the C-terminal-deleted PTCR, only one hydrogen bond between the carbonyl group and the Y193 hydroxyl group was observed, and human carbonyl reductase showed no hydrogen bond between these residues. In addition, P284, a C-terminal tail (E281-A288) residue, forms hydrogen bond interactions with the hydroxyl group of 5a-DHT in wild-type PTCR. Although both of the PTCR complexes (but not human carbonyl reductase) have similar hydrophobic interactions and identical hydrogen bond interactions between the carbonyl group of 5a-DHT and the Y193 hydroxyl group, stronger charged interactions with 5a-DHT were observed in the wild-type PTCR compared with the C-terminaldeleted PTCR (Table 5). These results suggest that the differences in the expression pattern between wild-type and C-terminaldeleted PTCRs are caused by interaction differences, especially with respect to hydrogen bond interactions between the backbone oxygen atom of P284 and the hydroxyl group of 5a-DHT. This comparative structural analysis of the three complexes revealed stronger hydrogen bond interactions in wild-type PTCR than in C-terminal-deleted PTCR and human carbonyl reductase. Furthermore, our simulation results are strongly supported by the previous report that the deletion of 12 C-terminal residues, including the fragment from E281 to A288, affects steroid metabolism [41]: kinetic comparisons between wild-type and Cterminal-deleted PTCRs revealed that the deletion led to the reduced binding affinity with various steroids, including 5adihydrotestosterone (DHT), testosterone and progesterone, resulting in the low enzyme efficiency of PTCR for steroids [41]. Based on these, it can be concluded that wild-type PTCR may have better substrate (5a-DHT) binding affinity than C-terminaldeleted PTCR and human carbonyl reductase.

Functional and evolutionary insights into the conservation of the C-terminal tail in animal carbonyl reductases
PTCR homologs are conserved in most animals, from fish to human (Fig. 4A). Particularly, PTCR is highly homologous to human carbonyl reductase, with a sequence identity of about 85% [23]. However, PTCR and human carbonyl reductase exhibit a large difference in their substrate specificities; among the compounds used to determine the substrate specificity of human carbonyl reductase, quinones and ketoaldehydes were preferentially reduced, whereas PTCR reduced menadione (one of the quinones) at a lower rate than that of human carbonyl reductase [23,42]. Steroid hormones such as 5a-DHT are reduced more readily by PTCR than by human carbonyl reductase, suggesting a greater role for PTCR in the reduction of C 21 -steroids than in the carbonyl reduction of a variety of carbonyl compounds [23]. In addition, the deletion of 12 C-terminal residues from PTCR led to the decreased binding affinity with C 21 -steroids, resulting in the low enzyme efficiency of PTCR for the steroids [41]. Moreover, these enzymatic properties of wild-type and C-terminal-deleted PTCRs are strongly supported by the molecular docking simulations which revealed that wild-type PTCR has a higher 5a-DHT-binding affinity than C-terminal-deleted PTCR, similar to human carbonyl reductase. Noticeably, PTCR contains 13 additional amino acid residues at its C terminus, compared with human carbonyl reductase [23], and a similar C-terminal tail was discovered in the carbonyl reductases of animals such as cattle, horse, and dog (Fig. 4B). Sequence alignment also showed that the P284 residue, mediating the hydrogen bond interaction between PTCR and 5a-DHT ( Fig. 3 and Table 5), is highly conserved in the C-terminal tails of carbonyl reductase homologs from cattle, horse, and dog (Fig. 4B).
Although further biochemical study will be necessary to confirm whether P284 is indeed a critical site for the 5a-DHT-binding affinity of PTCR, the present study indicates that this residue is evolutionarily conserved and important for the specific binding of PTCR to the steroid hormone 5a-DHT. Therefore, in response to evolutionary pressure to increase substrate specificity for C 21steroids, animal carbonyl reductases, including PTCR, may have Figure 2. Confirmation of the nonsense variation at the genomic, transcript, and protein levels for PTCR. The partial genomic DNA (A) and full length cDNA (B) of PTCR, including the nonsense variation locus, were obtained by PCR using specific primers as described in the 'Methods' section and subcloned into the pGEM T easy vector for Sanger sequencing. Boxes indicate different variants (G/ T) in an exonic region of genome (A) and at nucleotide 951 of the transcript (B). Expression of PTCR genes with different variants (G/T) was induced in E. coli BL21 by IPTG, and total extracts were loaded onto 12% and 20% SDS-polyacrylamide gels (C). Arrows indicate the His-tagged PTCR fusion proteins of different sizes, about 32 kD or 31 kD, which correspond to PTCR(WT) and PTCR(DCterm), respectively. doi:10.1371/journal.pone.0042198.g002 evolved their C-terminal tails by generating a paralogous sequence variant through gene duplication.
In conclusion, the objective of this study was to selectively discover nonsynonymous single nucleotide variations that are valuable genetic markers for the improvement of economic traits in pig and to highlight the potential use of such variations as a resource for gaining insights into the changes in protein structure that are associated with the gain or loss of protein functions during the curse of evolution. In the present study, porcine metabolismrelated genes were identified that included massive nonsynonymous single nucleotide variations. Further, these genes were validated to include different nsSNV alleles with significant associations with various meat quality traits. Moreover, the study shows that nonsynonymous changes may have critical effects on protein structural and functional changes during evolutionary adaptation. For example, a nonsense sequence variation in PTCR provided a structural and evolutionary explanation for the specificity of animal carbonyl reductases for an androgen, 5a-DHT. Taken collectively, our integrative approaches may provide useful resources for not only marker-assisted selection in the pig industry but also for protein structure databases to enhance understanding of protein folding, substrate specificity, and the evolution of enzymes.

Ethics statement
The Animal Care and Use Committee of GNTECH (Gyeongnam National University of Science and Technology) specifically waived the need for consent because no ethics committee approval  of study is required for the slaughter of farm animals in the Republic of Korea. However, pigs used in this study were slaughtered in accordance with the guidelines on animal care and use established by the Animal Care and Use Committee of GNTECH and with the Korea Animal Protection Act and related law. In detail, pigs weighing approximately 110 kg were transported to an abattoir near the experimental station. They were slaughtered by stunning with electrical tongs (300 volts for 3 s) after 12 h of feed restriction. The shocked pigs were exsanguinated while being hanged.

RNA-Seq library preparation
Total RNAs were isolated from liver tissues of the four pigs, consisting of Berkshire, Duroc, Landrace, and Yorkshire breeds, using TRI-Reagent (Molecular Research Center, Cincinnati, OH, USA) according to the manufacturer's instructions, and mRNA was isolated and purified using an RNA-Seq sample preparation kit (Illumina, Inc., San Diego, CA). The mRNA was fragmented and used as a template for first-and second-strand cDNA synthesis. After adapters were ligated to the ends of the double-stranded cDNA, a 300625 bp fragment size was selected by gel excision and each sample was individually sequenced on an Illumina GAII analyzer.

RNA-Seq analysis and detection of SNV candidates
Using a sliding window method, the reads, obtained by the above sequencing, are further trimmed according to their base qualities: with a specific base sliding window, if the average quality value for this window is higher than a threshold (T = 20), then the window sliding continues, whereas, if it is lower than the threshold, then this stretches of sequence are trimmed out from the original read. Thus, our RNA-Seq finally included a total of 97,524,874 trimmed reads. The trimmed reads were further assembled and mapped to the UniGene, an annotated pig transcriptome assembly (http://www.ncbi.nlm.nih.gov/UniGene/UGOrg.cgi?TAXID = 9823), by performing alignments using BWA software [43]. At the mapping procedure, we allowed up to two mismatches per 32 nucleotides of the trimmed reads. Of these, 52,417,707 reads (,54%) were categorized as mapped reads, corresponding to exon reads, and the read coverage in exon regions was obtained by BEDtools [44]. In addition, RPKM (reads per kilobase per million mapped reads) values were generated, according to a previous report [45], and were used to identify the total number of genes expressed in the porcine liver. Using RPKM threshold values greater than 1, 11,667 expressed genes were detected in porcine liver samples; 21.77% of the 53,600 total pig annotated genes in the UniGene transcriptome assembly. Finally, the mapped reads were subjected to exonic SNV discovery, which was achieved on the basis of exon reads showing nucleotide variation rates of .30% among genes with greater than 30 read coverage. Synonymous and nonsynonymous SNVs were identified on the basis of NCBI information regarding the start codon, and the functional classification of genes, containing nonsynonymous SNV candidates, was carried out through Gene Ontology and KEGG pathway analyses using DAVID web tool [46]. For running the tool, we used the 11,667 expressed genes as input data and the DAVID gene IDs, based on NCBI Accession IDs, in the porcine background supplied by DAVID. Among functional annotations of the expressed genes, those of the 229 genes, found to include the 580 nonsynonymous SNV candidates (Table 1), were further classified according to functions shown in Fig. 1.

SNV validation
A total of 437 pigs of pure Berkshire line, bred under the same conditions, were selected randomly and slaughtered in 10 batches when their body weight reached 110 kg. Subsequently, genomic DNAs were isolated from individuals' whole blood cells and subjected to SNV genotyping analysis. The Illumina VeraCode GoldenGate Assay kit (Illumina, San Diego, CA) was used to perform the genotyping of nsSNVs, according to the manufacturer's instructions. Primer information for the genotyping of 45 nsSNVs is summarized in Table S4. Genotype clustering and calling were performed using BeadStudio software (Illumina). Meanwhile, the genotyping of an nsSNV in PTCR was carried out through the sequencing of PCR products obtained with genomic DNAs used in the above multiplex genotyping. In detail, PCR products for the genomic region including the nsSNV were obtained using the following primers: 59-CCCAGGGTGGGT-GAGAACTGACAT-39 and 59-TTATGCATTGACCCAGGG-39 as forward and reverse primers, respectively. Subsequently, they were subjected to sequencing analysis using Applied Biosystems 3130xl DNA sequencer (Applied biosystems, Foster City, CA) to obtain genotype calls.

Association analysis with meat quality traits
Pork meat quality traits such as backfat thickness, carcass weight, meat color, drip loss, cooking loss, shear force, waterholding capacity, postmortem pH and chemical composition (fat, protein, collagen, and moisture) were evaluated and subjected to statistical analysis, as described previously [12,47].
To clarify the associations between genotype and meat quality traits for each of the 18 SNVs, statistical analysis was performed with SAS version 9.1.3 (SAS Institute Inc., Cary, NC). Only SNVs showing high call rates (.90%) and MAFs greater than 0.01 and whose genotype distributions were in HWE (p.0.05) were subjected to statistical analysis. To verify significant differences (p,0.01 and p,0.05) between the genotypic frequencies of traits, the Mann-Whitney and Student's t tests and the ANOVA and Kruskal-Wallis tests were used for the dominant and recessive models and the codominant model, respectively.
Validation of the nonsense variation locus of PTCR at the genomic, transcript and protein levels To validate the nonsense variation of PTCR, classical Sanger sequencing was performed on exonic and transcribed regions, including the nonsense variation, because of the unreliability of PTCR genomic information. PCR products for the exonic and transcript regions were obtained using the following primers: 59-ATCACAGAGGAGGAGCTG-39 and 59-TTATGCATT-GACCCAGGG-39 as forward and reverse primers, respectively, for the exonic region, and 59-ATGTCTTCCAACACTCGAG-39 and 59-TTATGCATTGACCCAGGG-39 as forward and reverse primers, respectively, for the transcript region. Subsequently, they were subcloned into a pGEM-T Easy vector (Promega) for sequencing.
To confirm that the C-terminal deletion resulted from the nonsense variation, each of the PTCR genes, having different variants (G/T), was expressed in E. coli BL21. For this, two types (G/T) of PTCR inserts were prepared by PCR and subsequent Bam HI and Hind III digestion and were subcloned into the pP RO EX HTb vector for the production of his-tagged fusion protein, generating two types of pP RO EX HTb-PTCR clones. E. coli BL21 transformed with each of the clones was subjected to IPTG induction and total extracts were loaded onto 12% and 20% SDS-polyacrylamide gels. Finally, the gels were observed by Coomassie brilliant blue staining, and after excision of bands from the gel and protein digestion, PTCR proteins were identified by MALDI-TOF mass spectrometry.

Molecular docking simulation and energy minimization calculation
Molecular docking simulations were carried out to investigate the binding mode of 5a-DHT into PTCR (PDB ID: 1N5D) and human carbonyl reductase (PDB ID: 1WMA) using the GOLD 4.1 program (Genetic Optimization for Ligand Docking) from Cambridge Crystallographic Data Center, UK [48]. The program utilizes a genetic algorithm for docking flexible ligands into protein binding sites [49]. The GOLD program uses a genetic algorithm (GA) in combination with scoring functions to predict binding conformations. The number of GA runs was set to 30 iterations on each flexible ligand and, to increase accuracy, early termination was not allowed. The GOLD fitness score was adopted to rank order for proper ligand conformations. The ligand binding site radius was taken as 20 Å around the center of the site between the C-terminal tail and NADPH. All other parameters were maintained by default values. Energy minimization (EM) calculations were performed to refine the docked structures using DS 2.5 (Accelrys Inc., San Diego, USA). The Smart Minimizer EM algorithm was used to relax the conformation and remove any steric overlap that would serve to produce bad contacts from initial structures. The algorithm parameters used were as follows: 10,000 max steps, 0.01 kcal/mol RMS Gradient, and distance-dependent dielectrics for the implicit solvent model.

Supporting Information
Table S1 Exonic SNPs obtained by pig liver RNA-Seq analysis. Y, L, B and D indicate Yorkshire, Landrace, Berkshire and Duroc, respectively. Syn and non-syn represent synonymous and nonsynonymous SNPs, respectively. The * indicates a stop codon. (XLS) Table S3 Association analysis of non-synonymous SNVs with pork meat quality traits. To verify significant differences (p,0.01 and p,0.05) of traits between genotypic frequencies, the Mann-Whitney and Student's t tests and the ANOVA and Kruskal-Wallis tests were used for the dominant and recessive models and the codominant model, respecively. (XLS)