Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Whole-genome resequencing reveals collagen-related genes in Kele pigs

  • Yu Dan Zhang ,

    Contributed equally to this work with: Yu Dan Zhang, Wei Yuan

    Roles Data curation, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliations Key Laboratory of Animal Genetics, Breeding and Reproduction in the Plateau Mountainous Region, Ministry of Education, Guizhou University, Guiyang, Guizhou Province, China, Key Laboratory of Animal Genetics, Breeding and Reproduction, Guiyang, Guizhou Province, China, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China

  • Wei Yuan ,

    Contributed equally to this work with: Yu Dan Zhang, Wei Yuan

    Roles Data curation, Formal analysis, Investigation, Methodology, Software

    Affiliation College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China

  • Huan Bi,

    Roles Investigation, Methodology

    Affiliation Guizhou Agricultural Vocational College, Guiyang, Guizhou, China

  • Xiao Yang,

    Roles Formal analysis, Investigation

    Affiliations Key Laboratory of Animal Genetics, Breeding and Reproduction in the Plateau Mountainous Region, Ministry of Education, Guizhou University, Guiyang, Guizhou Province, China, Key Laboratory of Animal Genetics, Breeding and Reproduction, Guiyang, Guizhou Province, China, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China

  • Yi Yu Zhang,

    Roles Resources, Supervision

    Affiliations Key Laboratory of Animal Genetics, Breeding and Reproduction in the Plateau Mountainous Region, Ministry of Education, Guizhou University, Guiyang, Guizhou Province, China, Key Laboratory of Animal Genetics, Breeding and Reproduction, Guiyang, Guizhou Province, China, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China

  • Wei Chen

    Roles Conceptualization, Funding acquisition, Project administration, Resources, Supervision

    chenweigzu@163.com

    Affiliations Key Laboratory of Animal Genetics, Breeding and Reproduction in the Plateau Mountainous Region, Ministry of Education, Guizhou University, Guiyang, Guizhou Province, China, Key Laboratory of Animal Genetics, Breeding and Reproduction, Guiyang, Guizhou Province, China, College of Animal Science, Guizhou University, Guiyang, Guizhou Province, China

Abstract

Objective

To verify the accuracy of collagen-specific SNP mutation loci of Kele pigs selected by whole genome resequencing, and to excavate collagen-related genes of Kele pigs, so as to lay a foundation for further molecular selection.

Methods

Based on whole genome resequencing, candidate genes related to collagen trait of Kele pig were screened for gene annotation. Through KEGG and GO enrichment analysis of differential genes, we selected four genes that may affect collagen trait of collagen pig, namely COL9A1, COL6A5, COL4A3 and COL4A4. Then 14 specific SNP sites were randomly selected from the four genes for sanger sequencing verification, and finally RT-qPCR was used to verify the expression levels of related genes in different tissues of Kele pigs.

Results

Our sequencing results revealed that 241.04 G of clean data, Q30 reached 93.96% and the average coverage depth was 9.04×. After data analysis, the SNP annotation of Kele pigs identified 4,570 high-impact mutation sites that could result in protein function loss, with SNPs primarily distributed in the intronic and exonic regions. There were 132,256 middle-impact mutation sites and 318,150 low-impact mutation sites that could potentially impact protein properties. Additionally, The INDEL annotation results revealed a total of 17,806 high-impact mutation sites that could potentially result in the loss of protein function. There were 4740 medium-impact mutation sites that have the potential to affect protein properties, as well as 19,298 low-impact mutation sites. Furthermore, there were 14,197,763 mutation sites of modification influence degree in the analysis. In addition, through real-time fluorescence quantitative PCR results, we found that the expression levels of collagen-related genes COL9A1 and COL6A5 in skin tissues were higher than those in other tissues, and the expression levels of COL4A4 and COL4A3 in kidney tissues were higher than those in other tissues. The SNP site verification results showed that the 14 SNP mutation sites randomly selected by us were the same as the SNP mutation sites screened by whole genome resequencing.

Conclusion

A total of 307 genes related to collagen traits were excavated, including COL9A1, COL6A5, EP300, SOS2 and EPO, etc. It was found that COL9A1 and COL6A5 genes were significantly expressed in the skin tissue of Kele pigs, and COL4A4 and COL4A3 genes were significantly expressed in the kidney tissue of Kele pigs. The mutations of 14 randomly selected loci in the four related genes were consistent with the results of previous whole genome resequencing analysis, indicating that the specific SNP molecular marker information obtained by whole genome resequencing can be used as the basis for analyzing collagen traits of Kele pig. Our results are conducive to further research on collagen trait regulation of Kele pigs and development and utilization of Kele pigs in the future.

Introduction

Collagen constitutes the predominant protein in the animal organism [1], constituting 30% of the total protein content in the body [2]. Collagen contains 18 different types of amino acids, including 7 essential amino acids, making it a highly nutritious substance. As a component of connective tissue in muscles, collagen is widely distributed among muscle fibers and around muscle bundles, forming a fine fibrillar network [3]. Studies have indicated that there is a negative correlation between the tenderness of meat and the content and solubility of muscle soluble collagen. Specifically, a higher content of muscle soluble collagen and greater solubility are associated with lower shear force values, indicating increased tenderness. Therefore, the evaluation of meat tenderness can be achieved through measuring the collagen content in muscle [4]. Therefore, collagen is one of the most important factors affecting meat quality [5]. According to Li Qiannan’s research [6], there are several signal transduction pathways associated with collagen, including TGF-β/Smad, PI3K/Akt, MAPK, Wnt, NF-κB, integrin, JAK/STAT, and others. Ricard-Blum [7] demonstrated the discovery of 26 new types of Collagen, including Collagen VIII, in addition to Collagen I, Collagen II, Collagen III, Collagen IV, Collagen V, Collagen VI and Collagen VII. Among them, COL9A1 gene codes the 1 chain of Type IX collagen and is the main component of Type IX collagen [8, 9]. COL6A5 gene is a component of type VI collagen [10]. COL6 (collagen VI) is a subtype of collagen found in most connective tissues, consisting of A1 (VI), A2 (VI) and A3, A4 (VI), A5 (VI) and A6 (VI). These six chains are composed of different gene codes (COL6A1, COL6A2, COL6A3, COL6A4, COL6A5, COL6A6) [11]. COL4A4 and COL4A3 genes encode the a3 and a4 chains of type IV collagen, which are the components of type IV collagen [12]. The Kele pig is considered one of the top pig breeds in China. It originates from Hezhang, Guizhou, and can be found in the northwest of Guizhou, as well as in Xuanwei and Qujing of Yunnan. This breed is recognized as a valuable local genetic resource in Guizhou and has been documented in the Annals of 《Chinese Livestock and Poultry Genetic Resources · Pig Annals》 [1315]. Being a characteristic local pig breed in Guizhou, the Kele pig is known for its fine meat quality and excellent flavor [16]. Hence, it is of significant importance to investigate the genes associated with the regulation of collagen-protein characteristics in Kele pigs. Based on whole genome resequencing, this study conducted a screening of collagen-regulating differential genes in Collagen-regulating pigs. The study also explored the expression levels of four genes (COL9A1, COL6A5, COL4A4 and COL4A3) in seven different tissues of Collagen-regulating pigs: skin, longissimus dorsi muscle, heart, liver, spleen, lung and kidney. This research lays a foundation for further exploration into the regulation of collagen-regulating traits in Collagen-regulating pigs.

Experimental samples and methods

Experimental samples

This research group collected skin, longissimus dorsi muscle, heart, liver, spleen, lung, kidney and other tissues of Kele pigs at the Kele pig breeding farm in Hezhang County, Bijie City, Guizhou Province. The tissue samples were initially treated with DEPC water and then frozen in liquid nitrogen using sterile enzyme-free freeze-storage tubes. Subsequently, they were transferred to an ultra-low temperature refrigerator at -80°C for DNA and RNA extraction.

Experimental methods

Extraction of genomic DNA and construction of library

In this experiment, the phenol-chloroform method was utilized for DNA extraction. The specific procedures were as follows: ear tissue samples were collected and cut into pieces, then added to a centrifuge tube containing lysis buffer. The DNA was purified using an agarose column, and impurities such as protein and RNA were subsequently removed. Next, DNA extraction was carried out using phenol/chloroform, followed by DNA precipitation with ethanol. Subsequently, impurities were removed using a wash buffer and the genomic DNA was dissolved in TE buffer. DNA concentration and A260/280 values were measured, and TE buffer was added as necessary to adjust the concentration to a range of 50-300ng/ul. Subsequently, agarose gel electrophoresis was conducted to assess the quality of the DNA samples. The genomic DNA is fragmented using sonication, followed by chemical end repair and ligation to Illumina sequencing primers. The resulting DNA fragments are then amplified by PCR. Subsequently, the amplified DNA fragments are subjected to gel electrophoresis for screening, with only the target length DNA fragments being retained and purified. Then, the purified DNA fragments are inserted into the vector, and the library is transformed into E. coli by electrotransformation. Following screening and amplification, the library is constructed. The resulting genome library can be utilized for high-throughput sequencing in order to obtain comprehensive organism whole genome sequence information.

Variation detection and filtering based on whole genome resequencing

Wu Han gene read Technology Co., LTD was commissioned to complete the whole genome resequencing of Kele pig. The main sequencing platform was Illumina HiSeq 2500, the sequencing strategy was PE125, and the depth was 10 x. The Fsatp v0.19.10 software was used to filter the original sequencing data, remove adaptor and low-quality sequences, and compare the filtered data to the porcine reference genome Sscrofa11.1 using bwa v0.7.17-r1188 software. The sequenced data was then de-duplicated to obtain a bam file for mutation detection. The read was compared to the pig reference genome (Sscrofa11.1) downloaded from the NCBI database using samtools vl.16.1 and sorted based on the length score of the comparison. Once sorting is complete, Picard MarkDuplicates v2.18.29-SNAPSHOT is utilized to eliminate duplicate comparisons. Use GATK v4.0 (https://gatk.broadinstitute.org/hc/en-us) HaplotypeCaller tools to respectively analyze the Kele pig bam files for mutation detection. The gvcf file for each pig was then merged into a single population gvcf file containing variation across the entire population. Filter SNP and INDEL using the VariantFiltration tool in GATK 4.0. The quality control filter parameters of SNPs and INDELs are QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0.

Assessment of effects of SNPs and INDELs along with extraction of their associated genes

The variations were annotated using snpeff v5.0e software, and all SNPs were categorized into exons, introns, gene upstream regions, splicing sites, and other genomic regions based on their positions. SNPs classified into exon regions are further categorized as either synonymous variants or non-synonymous variants. The annotation process for INDELs is similar to that of SNPs, with the distinction between insertion and deletion regions based on their location in the genome. The body parameter is download-c./snpEff.config -v ebola_zaire Sscrofa11.1.99, -c./ snpEff.config-ud 2000-csvStats-htmlStats ". The annotation operation of INDEL is the same as that of SNPS, which are divided into insertion or deletion regions according to their location on the genome.

GO and KEGG analyses of genes associated with SNPs and INDELs

KEGG uses the R package clusterProfiler v4.0 to call the official website of KEGG database (https://www.kegg.jp/)API) directly for KEGG analysis, and uses the built-in function of the R package dotplot for visual analysis. GO via R package org.Ss.eg.dbv3.16.0(http://bioconductor.org/packages/release/BiocViews.html#___OrgDb) is analyzed, and the use clusterProfiler built-in function dotplot visualization analysis; Finally through the online website and kish became the father of letter cloud tools dior (https://www.omicshare.com/tools/Home/Soft/cog) into the line of visualization.

PCR based validation of 14 SNPs and their associated genes

In order to verify the specific SNP sites of Kele pigs, 16 sites were randomly selected from the previously screened specific SNPs sites of Kele pig (Table 1) for verification. We used genomic DNA extraction kit (Thermo GeneJET) to extract DNA from ear tissue samples of 50 Kele pigs. According to the sequences of 100 bp before and after SNP sites were selected, specific primers were designed using Primer Premier 5.0 software (Table 2). The primers were synthesized by Qingke Biotechnology Co., LTD. PCR amplification system 50 μL: DNA template 2μL, upper and downstream primers 2μL, 2×EasyTaqPCR SuperMix 25 μL, ddH2O 19 μL. PCR reaction procedure: predenaturation at 94°C for 2 min; Denatured at 94°C for 30 s, annealed (annealing temperature is shown in Table 2) for 30 s, extended at 72°C for 30 s, a total of 33 cycles, extended at 72°C for 2 min, stored at 4°C. Then 1.0% agarose gel electrophoresis was used for detection. Finally, the PCR products identified correctly were sent to Qingke Biotechnology Co., Ltd. for sanger sequencing.

RT-fluorescent qPCR based validation of role of SNP associated genes in collagen expression

RNA extraction and cDNA synthesis.

In this experiment, the Trizol method was utilized for RNA extraction. The specific steps were as follows: skin, longisbest dorsi muscle, heart, liver, spleen, lung and kidney samples were collected and fully ground. Subsequently, they were added into a centrifuge tube containing Trizol. Chloroform was then added to remove the protein followed by centrifugation and absorption of the upper liquid into a new centrifuge tube. Next, isopropyl alcohol was added to the new centrifuge tube to precipitate the RNA. The mixture was then centrifuged and the supernatant discarded. Subsequently, 75% anhydrous ethanol was added to wash the RNA. After drying, the RNA was dissolved in DEPC water. The purity of the RNA was evaluated by determining its concentration and 260/280 values. The extracted RNA was then reverse-transcribed into cDNA and stored at -80°C for future use.

Primer design and synthesis.

According to the sequences of the COL9A1 gene (NC_010443.5), COL6A5 gene (NC_010455.5), COL4A4 gene (NC_010457.5) and COL4A3 gene (NC_010457.5) in the GenBank database, we used Premier 5.0 software to design one pair of real-time fluorescent quantitative PCR primers for each gene. The primers were synthesized by Qing Ke Biotechnology Co., LTD, and their sequences are shown in Table 3.

Real-time fluorescence quantitative PCR detection.

Using the cDNA from each tissue as a template, ABI QuantStudioTM7 Flex real-time fluorescence quantitative PCR was employed for detection. The reaction system consisted of 10 μL: 0.2 μL of 10 μmol/L forward and reverse primers, 1 μL of cDNA, 5 μL of 2xEs Taq MasterMix (Dye), with water added to reach a total volume of 10 μL. The reaction conditions included an initial denaturation at 95°C for 5 min; followed by denaturation at 95°C for 10 s, annealing at 56°C for 30 s, extension at 72°C for 30 s, and the reaction proceeded through 45 cycles.

Results

DNA quality testing of samples of Kele pig

Fig 1 illustrates that the extracted genomic DNA bands of Kele pig are singular and clear, with a high concentration. Additionally, the DNA from all 5 samples meet the requirements for sequencing in the database. Table 4 indicates that the OD260/OD280 values of the extracted genomic DNA from Kele pig range between 1.8 and 2, indicating good quality DNA samples.

thumbnail
Fig 1. DNA quality test results of Kele pigs.

https://doi.org/10.1371/journal.pone.0311417.g001

thumbnail
Table 4. Results of DNA OD value of Kele pig.

https://doi.org/10.1371/journal.pone.0311417.t004

Sequencing data quality analysis

After rigorous filtering of the sequencing data, high-quality clean data was obtained. The output data underwent statistical analysis (Table 5), which included sequencing data output, sequencing error rate, Q20 content, Q30 content, GC content, and other relevant parameters. The five samples collectively generated a total of 241.8 G of raw data, averaging 48.36 G of raw data per sample. Following filtration, the clean data totaled 241.04 G with an average of 48.2 G per sample. The Q30 of the filtered samples were all higher than 90%, which reached the quality control qualified data standard, and the sequencing data could be used for the next analysis.

Compare with the reference genome

The filtered clean data was compared to the pig reference genome (Sscrofa11.1). The comparison results indicated that the comparison rate between each individual and the reference genome ranged from 93% to 98%. Additionally, the average sequencing depth of Kele pig samples was determined to be 9.04×. For the ratio index of five individuals to the reference genome, it is evident that the average ratio of this index exceeds 90%, which satisfies the criteria for resequencing analysis. Consequently, the data obtained from the current resequencing analysis meets the requirements for subsequent analysis.

Detection results of SNPs and INDELs in Kele pigs

According to Table 6, a total of 50,040,8.74 million SNPs loci and 12,037,063 SNPS in gene regions were obtained from samples Kel1, Kel2, Kel3, Kel4 and Kel5. The majority of the SNPS were found to be distributed in the intronic and exonic regions. Additionally, 14,216,261 INDELs loci and 3,375,354 gene region INDELs loci were identified. A total of 7146.2 million SNPs and 99,992 INDELs were identified within the exon region.

thumbnail
Table 6. Detection results of SNPs and INDEL in Kele pig samples.

https://doi.org/10.1371/journal.pone.0311417.t006

SNPs annotation results of Kele pigs

The SNPs classified by their degree of influence in the annotation results were tallied. The analysis revealed 4570 mutation sites that could result in loss of protein function, as well as 132,256 mutation sites that could impact protein properties in Kele pigs. Additionally, there were 318,150 low-impact mutation sites identified in Kele pigs. Detailed results can be found in Table 7.

thumbnail
Table 7. Presents the SNP annotation results of Kele pigs, classified according to their degree of influence.

https://doi.org/10.1371/journal.pone.0311417.t007

The SNPs classified by gene mutation function in the annotation results were subjected to statistical analysis. A total of 24,971,814 SNPs were annotated, with 246,426 being synonymous mutations, accounting for 64.67% of the functional mutations. Non-synonymous mutations totaled 134,584, representing 35.33% of the functional classes of gene mutations. Within the non-synonymous mutations category, there were 132,872 missense mutations that altered the coding amino acid codon (34.87% of the functional categories). Additionally, there were 1,712 nonsense mutations resulting in stop codons after point mutations and prematurely terminating peptide synthesis (0.44% of the functional types). Detailed results can be found in Table 8.

thumbnail
Table 8. SNPs annotation results classified by mutation function in Kele pigs.

https://doi.org/10.1371/journal.pone.0311417.t008

As illustrated in Fig 2, there were a total of 61,012,906 mutation sites within nucleotides of the same class and 25,399,662 mutation sites between nucleotides of different classes in Kele pigs. The Ts/Tv ratio was calculated to be 2.4021, indicating a significantly higher prevalence of base conversion type mutations compared to transmutation type mutations.

thumbnail
Fig 2. Base changes of SNPs variation in Kele pigs.

https://doi.org/10.1371/journal.pone.0311417.g002

INDELs annotation results of Kele pigs

The number of mutation sites influencing mutant genes in the annotation results was calculated. Specifically, there were 17,806 mutation sites that could result in the loss of protein function and 4740 mutation sites that could impact protein properties in Kele pigs. Additionally, there were 19,298 mutation sites with a low impact degree in Kele pigs and 14,197,763 mutation sites with a degree of modification. Detailed results can be found in Table 9.

thumbnail
Table 9. Results of INDELs annotations classified by impact degree in Kele pigs.

https://doi.org/10.1371/journal.pone.0311417.t009

Statistics were conducted on the insertion and deletion marks in the INDEL annotation results. A total of 7,041,172 insertion and deletion marks were identified in Kele pig, comprising 4,031,333 insertion marks and 3,009,839 deletion marks. The detailed findings are presented in Table 10.

thumbnail
Table 10. Results of INDELs annotation by mutation type.

https://doi.org/10.1371/journal.pone.0311417.t010

Gene enrichment analysis of the selected region of the Kele pig population

KEGG enrichment analysis was conducted on genes within the selected regions. The results of the KEGG enrichment analysis revealed that relevant collagen pathways were enriched in six pathways, specifically PI3K/Akt, Jak STAT signaling pathway, MAPK, Wnt, NF-κB, and transforming growth factor-β signaling pathway. A total of 475 candidate genes with pathway annotations were identified (Table 11). GO enrichment analysis was conducted on the genes within the selected regions, resulting in a total of 30 pathways related to collagen. The GO enrichment and KEGG enrichment results are presented in Figs 3 and 4, respectively. Notably, the collagen-related genes COL9A1 (NC_010443.5), COL6A5 (NC_010455.5), COL4A4 (NC_010457.5), and COL4A3 (NC_010457.5) were identified in both GO and KEGG enrichment analyses. These findings suggest that these genes could be considered as potential candidates for encoding collagen proteins in further studies.

thumbnail
Fig 3. Gene GO enrichment in selected regions of Kele pig population.

https://doi.org/10.1371/journal.pone.0311417.g003

thumbnail
Fig 4. KEGG gene enrichment in selected regions of Kele pig population.

https://doi.org/10.1371/journal.pone.0311417.g004

thumbnail
Table 11. Part of KEGG enrichment pathways and genes regulating collagen.

https://doi.org/10.1371/journal.pone.0311417.t011

SNP sites validation

14 validated sites were randomly selected, That is, G. 50509272G > C, G. 50509314G > T, G. 50509226C > T, G. 2117893T > C, G. 2117929T > G, G. 2118081C > T, G. 128531420C > G, G. 128531424C > T, G. 1285 31440G > A, G.128531456T > G, G.84423643G > A, G.84423793C > T, G.84423810g > A, G.84423811C > A, and the specificity of the 14 SNPs sites was the same as that of the SNP mutation sites screened by whole genome resequencing. The verified SNPS were found only in a specific population, the results were as expected (Figs 58).

thumbnail
Fig 5. Peak map of SNPs locus with variation in Sanger sequencing of COL9A1 gene.

https://doi.org/10.1371/journal.pone.0311417.g005

thumbnail
Fig 6. Peak map of SNPs locus with variation in Sanger sequencing of COL6A5 gene.

https://doi.org/10.1371/journal.pone.0311417.g006

thumbnail
Fig 7. Peak map of SNPs locus with variation in Sanger sequencing of COL4A4 gene.

https://doi.org/10.1371/journal.pone.0311417.g007

thumbnail
Fig 8. Peak map of SNPs locus with variation in Sanger sequencing of COL4A3 gene.

https://doi.org/10.1371/journal.pone.0311417.g008

The expression of related genes in different tissues

The expression levels of COL9A1, COL6A5, COL4A4, and COL4A3 in skin, muscle, heart, liver, spleen, lung, and kidney tissues of Kele pigs were investigated using RT-qPCR. The results indicated that the expression levels of COL4A4 and COL4A3 in Kele pig kidney tissues were higher than those in other tissues. Additionally, the expression of genes COL9A1 and COL6A5 in the skin tissue of Kele pigs was also found to be higher than that in other tissues (Fig 9).

thumbnail
Fig 9. Expression levels of COL9A1, COL6A5, COL4A4 and COL4A3 genes in different tissues of Kele pigs.

“*”Significant difference (p<0.05),“**”The difference is extremely significant (p<0.01).

https://doi.org/10.1371/journal.pone.0311417.g009

Discussion

Based on the sequencing data and reference genome, a total of 50,040,8.74 million SNPs loci and 14,216,261 INDELs loci were identified. The variations were predominantly found to be distributed in intergenic and intronic regions in this study. Intergenic variation may impact gene expression, RNA post-transcriptional modification, transcription factor binding, and splicing. Variation within introns may influence the occurrence of splicing events, thereby impacting the amino acid sequence that encodes for proteins and ultimately the structure and function of proteins. Ryu et al. [17] identified stroke-related SNPs in the gene spacer region between FOXF2 and FOXQ1 in zebrafish. These SNPs were found to regulate the enhancer activity and expression of the vascular stability regulator FOXF2, thus playing a role in regulating stroke risk in both human cells and zebrafish. High-impact mutations can lead to the loss of protein function, while medium-impact mutations may impact the performance of the protein. Low-impact mutations are unlikely to affect protein function, and modified variants have no direct effect on gene and protein function. The number of high-impact mutation sites in Kele pigs was 17,806, which can be highly disruptive to gene or protein function. In the context of functional mutation types of SNPs, non-synonymous mutations have the potential to alter the amino acid sequence, thereby impacting the structure and function of the protein. According to the findings of this study, a total of 134,584 non-synonymous mutations were identified in Kele pigs. In addition, base translocations can also result in significant changes in gene function, where a single base is substituted by another in a DNA sequence mutation. If both of the substituted bases are purines or pyrimidines, it is referred to as Transition (Ts); if one of the two replaced bases is a purine and the other is a pyrimidine, it is known as Transversion (Tv). This study identified 25,399,662 transversion points in Kele pigs. Liu et al. [18] demonstrated that a single non-synonymous mutation in the E-protein coding sequence 9 of the Zika virus can significantly increase neurovirulence in vivo. Similarly, Matsumoto et al. [19] found that non-synonymous mutations in the bovine SPP1 gene may impact muscle development, leading to an increase in carcass weight of C/T animals. Therefore, non-synonymous mutations can alter protein structure and function for both beneficial and detrimental effects. Further investigation of the high-impact mutation sites, non-synonymous mutation sites, and translocation sites identified in this study can provide a better understanding of gene function and regulatory mechanisms in Kele pigs. INDEL refers to the insertion or deletion of a base in the DNA sequence. These variations can impact the function, structure, and genetic characteristics of the genome, playing a crucial role in genetic breeding exploration. Niu et al. [20] discovered that inserting 12 bases into the 3’UTR region of the CISH gene in Landwhite pigs could affect the susceptibility of landwhite piglets to diarrhea. Mi et al. [21] also found that three INDEL variation sites (L-13, L-16, and L-19) on the gene of cilia and flagella-related protein CFAP43 were significantly correlated with growth traits such as chest depth in goats. In this study, a total of 7,041,172 insertion-deletion markers and 17,806 high-impact mutation sites were identified in Kele pigs. The discovery of these INDEL variation sites holds significant importance for comprehending the structure and function of the Kele pig genome. Furthermore, it provides valuable information for gene editing and genetic modification. We used sanger sequencing technology to verify the specific loci of Kele pigs previously screened. This verification method has also been reported in other studies. For example, Zhu Tao et al. [22] selected a total of 56 ducks from 7 populations, namely Cherry Valley Duck, Beijing duck, Maple Leaf duck, Jinding duck, Shaoxing duck, Shanma duck and Gaoyou duck, and conducted whole genome resequencing. 686 SNPs and Indel existing in specific populations were screened, and 7 SNPs sites were randomly selected for verification. The results were consistent with expectations. The specific SNPS screened in the second generation sequencing could be used as molecular markers for the identification of duck breeds. Fan Huanhuan et al. [23] randomly selected 30 red deer specific SNPs sites for sanger sequencing based on genotyping sequencing technology, and the research results indicated that red deer specific SNPs screened by GBS sequencing could be used as molecular markers for identification. This study is based on whole genome sequencing. The pathways involved in the regulation of collagen were screened, including the PI3K-Akt signaling pathway, Jak-STAT signaling pathway, MAPK signaling pathway, Wnt signaling pathway, TGF-beta signaling pathway, NF-kappa B signaling pathway, etc. A total of 133 genes were identified in the PI3K-Akt signaling pathway to regulate collagen-related traits in Kele pigs. Whole gene resequencing and GO analysis were utilized to investigate the genes associated with collagen deposition in Kele pigs, resulting in the identification of a total of 307 major candidate genes. The results revealed that these related genes were significantly enriched in several signaling pathways, including the PI3K-Akt signaling pathway, Jak-STAT signaling pathway, MAPK signaling pathway, Wnt signaling pathway, TGF-beta signaling pathway, and NF-kappa B signaling pathway. The PI3K-Akt signaling pathway regulates a variety of cellular processes in response to extracellular signals, including metabolism, proliferation, cell survival, growth, and angiogenesis [24]. The MAPK signaling pathway plays a critical role in cell proliferation, differentiation, apoptosis, and metabolism [25]. Hu et al. [26] showed that LPS promoted collagen synthesis in lung fibroblasts by activating the PI3K-Akt-mTOR/PFKFB3 pathway and aerobic glycolysis. Therefore, from the PI3K-Akt signaling pathway, COL9A1, COL6A5, COL4A4 and COL4A3 genes related to collagen-deposition in Kele pigs were preliminatively screened. In recent years, there have been many studies on the correlation between COL9A1 and Kashin-beck disease [27], congenital clubfoot [28] and tumors. Currently, COL6A5 gene has been associated with lipid metabolism [29], proliferation and angiogenesis of colon cancer cells [30]. The study of Miner et al. [31]. showed that gene mutations encoded by COL4A3 and COL4A4 could lead to glomerular diseases in humans and mice.

According to literature, the protein content of pig skin is reported to be as high as 33%, with collagen accounting for 87.7% of this total [32]. In summary, this study randomly selected 14 SNP-related variation sites through whole genome resequencing analysis and conducted sanger sequencing to verify that the results were in line with expectations. That is, G. 50509272G > C, G. 50509314G > T, G. 50509226C > T, G. 2117893T > C, G. 2117929T > G, G. 2118081C > T, G. 128531420C > G, G. 128531424C > T, G. 1285 31440G > A, G.128531456T > G, G.84423643G > A, G.84423793C > T, G.84423810G > A, G.84423811C > A These mutation sites can be used as a reference for marker assisted selection. We initially explored the genes closely related to collagen-protein traits of Kele pigs, and detected the expression of genes COL9A1, COL6A5, COL4A4 and COL4A3 regulating collagen-protein traits in the heart, liver, spleen, lung, kidney, longissimus dorsi muscle and skin tissues of Kele pigs by qRT-PCR method. The results showed that both COL9A1 and COL6A5 genes were significantly expressed in the skin tissue of Kele pigs, and both COL4A4 and COL4A3 genes were significantly expressed in the kidney tissue of Kele pigs. The study laid the groundwork for further investigation into the regulation mechanism of collagens by COL9A1, COL6A5, COL4A4, and COL4A3 genes. Based on the expression patterns of these genes in the heart, liver, spleen, lung, kidney, skin, and longissimus dorsi muscle of Kele pigs, it was found that COL9A1 and COL6A5 genes were significantly expressed in the skin of Kele pigs. This suggests a need for deeper exploration into how these genes regulate collagenic protein traits in Collagenic pigs. The findings provide a scientific basis for future research on the regulatory mechanisms of collagenic proteins related to Collagenic pig genes.

Conclusions

Understanding the collagen-related genes of Kele pig is helpful for us to further explore the regulatory factors of the collagen-related traits of Kele pig. In this study, a total of 307 candidate genes related to collagen traits were excavated, including COL9A1, COL6A5, COL4A4, COL4A3, EP300, SOS2, EPO, etc. By RT-qPCR analysis, we determined the expression levels of four candidate genes, COL9A1, COL6A5, COL4A4 and COL4A3, in different tissues of Kele pigs, among which two genes, COL9A1 and COL6A5, were significantly expressed in the skin tissues of Kele pigs. Both COL4A4 and COL4A3 genes were significantly expressed in the kidney tissue of Kele pigs. The specificity of 14 sites randomly selected from related genes was consistent with the results of whole genome resequencing. These results indicate that the specific SNP molecular marker information obtained by whole genome resequencing can be used as the basis for the analysis of collagen traits in Kele pigs, and these genes may be potential targets for domestication, reproduction and selection in the past and in the future. The results of this study are helpful for further research on the regulation of collagen traits and the development and utilization of Kele pigs.

References

  1. 1. Shi R, Zhang Z, Zhu A, Xiong X, Zhang J, Xu J, et al. Targeting type I collagen for cancer treatment. Int J Cancer. 2022 Sep 1;151(5):665–683. Epub 2022 Mar 8. pmid:35225360.
  2. 2. Shoulders MD, Raines RT. Collagen structure and stability. Annu Rev Biochem. 2009;78:929–58. pmid:19344236; PMCID: PMC2846778.
  3. 3. Xinzhang Chen, Shengwei Di, Xibiao Wang. Relationship between muscle collagen characteristics and meat quality of civilian pigs and their hybrids [J]. Heilongjiang Animal Husbandry and Veterinary Science,2017(18):79–81.
  4. 4. Hua Li, Wei Chen, Yongqing Zeng, etc. Study on the characteristics of collagen in pig muscle and Its relationship with Meat Quality Traits [J]. Pig Breeding,2012(03):51–54.
  5. 5. Qianan Li and Li Hong. Research progress of collagen metabolism related signaling pathways [J]. China Medical Review,2017,14(10):56–59.
  6. 6. Ricard-Blum S. The collagen family. Cold Spring Harb Perspect Biol. 2011 Jan 1;3(1):a004978. pmid:21421911; PMCID: PMC3003457.
  7. 7. Liqiang Yang, Chunjian Jiang, Chunli Long, et al. Effect of Xiaoyao Powder on expression of IL-6 and collagen COL4A2 and COL9A1 in rats with liver-stagnation and spleen-deficiency syndrome [J]. Journal of Hubei University of Traditional Chinese Medicine,2022,24(06):13–17.
  8. 8. Ghazanfari S, Khademhosseini A, Smit TH. Mechanisms of lamellar collagen formation in connective tissues. Biomaterials. 2016 Aug;97:74–84. Epub 2016 Apr 27. pmid:27162076.
  9. 9. Duan Y, Liu G, Sun Y, Wu J, Xiong Z, Jin T, et al. Collagen type VI α5 gene variations may predict the risk of lung cancer development in Chinese Han population. Sci Rep. 2020 Mar 19;10(1):5010. pmid:32193401; PMCID: PMC7081318.
  10. 10. Chu ML, Zhang RZ, Pan TC, Stokes D, Conway D, Kuo HJ, et al. Mosaic structure of globular domains in the human type VI collagen alpha 3 chain: similarity to von Willebrand factor, fibronectin, actin, salivary proteins and aprotinin type protease inhibitors. EMBO J. 1990 Feb;9(2):385–93. pmid:1689238; PMCID: PMC551678.
  11. 11. Khashim Alswailmi F, Bokhari K, Aladaileh SH, A Alanezi A, Azam M, Ahmad A. Protective and pathogenic role of collagen subtypes genes COL4A3 and COL4A4 polymorphisms in the onset of keratoconus in South-Asian Pakistani cohort. Saudi J Biol Sci. 2023 Jan;30(1):103503. Epub 2022 Nov 17. pmid:36439958; PMCID: PMC9694102.
  12. 12. Street BR, Gonyou HW. Effects of housing finishing pigs in two group sizes and at two floor space allocations on production, health, behavior, and physiological variables. J Anim Sci. 2008 Apr;86(4):982–91. Epub 2007 Oct 26. pmid:17965323.
  13. 13. Schmolke SA, Li YZ, Gonyou HW. Effect of group size on performance of growing-finishing pigs. J Anim Sci. 2003 Apr;81(4):874–8. pmid:12723074.
  14. 14. Chunping Zhao, Xiong Zhang, Jing Zhang, et al. Determination and correlation analysis of meat quality and flavor indexes of Kele pigs [J]. Guizhou Veterinary Science, 2022, 46(1): 12–16.
  15. 15. Fang Chen , Binyue HU, Guo Fei et al. Comparative analysis of fatty acid composition of pork between Chinese and foreign pig breeds [J]. Food Science and Technology, 2019,45(03):166–171.
  16. 16. Sun J, Wang C, Wu Y, Xiang J, Zhang Y. Association Analysis of METTL23 Gene Polymorphisms with Reproductive Traits in Kele Pigs. Genes (Basel). 2024 Aug 12;15(8):1061. pmid:39202421; PMCID: PMC11353829
  17. 17. Ryu JR, Ahuja S, Arnold CR, Potts KG, Mishra A, Yang Q, et al. Stroke-associated intergenic variants modulate a human FOXF2 transcriptional enhancer. Proc Natl Acad Sci U S A. 2022 Aug 30;119(35):e2121333119. Epub 2022 Aug 22. pmid:35994645; PMCID: PMC9436329.
  18. 18. Liu Z, Zhang Y, Cheng M, Ge N, Shu J, Xu Z, et al. A single nonsynonymous mutation on ZIKV E protein-coding sequences leads to markedly increased neurovirulence in vivo. Virol Sin. 2022 Feb;37(1):115–126. Epub 2022 Jan 21. pmid:35234632; PMCID: PMC8922429.
  19. 19. Matsumoto H, Kohara R, Sugi M, Usui A, Oyama K, Mannen H, et al. The non-synonymous mutation in bovine SPP1 gene influences carcass weight. Heliyon. 2019 Dec 13;5(12):e03006. pmid:31879711; PMCID: PMC6920195.
  20. 20. Niu B, Chen Z, Yao D, Kou M, Gao X, Sun Y, et al. A 12-bp indel in the 3’UTR of porcine CISH gene associated with Landrace piglet diarrhea score. Res Vet Sci. 2022 Sep;146:53–59. Epub 2022 Mar 18. pmid:35325756.
  21. 21. Mi F, Wu X, Wang Z, Wang R, Lan X. Relationships between the Mini-InDel Variants within the Goat CFAP43 Gene and Body Traits. Animals (Basel). 2022 Dec 7;12(24):3447. pmid:36552367; PMCID: PMC9774114,.l.
  22. 22. Tao Zhu, Hongchang Gu, Zebin Zhang, et al. Duck variety screening and identification of specific molecular markers SNPS [J]. Chinese poultry, 2018, 40 (15): 7–10.
  23. 23. Huanhuan Fan, Tianjiao Wang, Yimeng Dong, et al. Red deer specific validation of molecular markers SNPS [J]. Journal of China animal husbandry and veterinary, 2021 (04): 13. 1313–1322
  24. 24. Zhao H, Sun G, Mu X, Li X, Wang J, Zhao M, et al. Genome-wide selective signatures mining the candidate genes for egg laying in goose. BMC Genomics. 2023 Dec 6;24(1):750. pmid:38057756; PMCID: PMC10702089.
  25. 25. Guo YJ, Pan WW, Liu SB, Shen ZF, Xu Y, Hu LL. ERK/MAPK signalling pathway and tumorigenesis. Exp Ther Med. 2020 Mar;19(3):1997–2007. Epub 2020 Jan 15. pmid:32104259; PMCID: PMC7027163.
  26. 26. Hu X, Xu Q, Wan H, Hu Y, Xing S, Yang H, et al. PI3K-Akt-mTOR/PFKFB3 pathway mediated lung fibroblast aerobic glycolysis and collagen synthesis in lipopolysaccharide-induced pulmonary fibrosis. Lab Invest. 2020 Jun;100(6):801–811. Epub 2020 Feb 12. pmid:32051533.
  27. 27. Zhengming Sun, Xianghui Dong, Yanhai Chang, et al. Relationship between COL9A1 gene polymorphism and COL9A1 expression in KBD knee cartilage [J]. Journal of Xi ’an Jiaotong University (Medical Edition),2018,39(6):845–847.
  28. 28. Zhao J, Cai F, Liu P, Wei J, Chen Q. Gene Environment Interactions Between the COL9A1 Gene and Maternal Drinking of Alcohol Contribute to the Risk of Congenital Talipes Equinovarus. Genet Test Mol Biomarkers. 2021 Jan;25(1):48–54. Epub 2020 Dec 28. pmid:33372835.
  29. 29. Lifeng Sun. The role and mechanism of lipid metabolism-related molecules Col6a5 and Gpr1 in hyperandrogenized mice [D]. University of Chinese Academy of Sciences (Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences),2019.
  30. 30. Ningning Lin. Study on the effect of Beclin1 on the proliferation and angiogenesis of colon cancer cells by regulating COL6A5 [D]. Tianjin University of Science and Technology,2021.2021.000086.
  31. 31. Miner JH. Glomerular basement membrane composition and the filtration barrier. Pediatr Nephrol. 2011 Sep;26(9):1413–7. Epub 2011 Feb 15. pmid:21327778; PMCID: PMC3108006.
  32. 32. Kaixiong Li, Zhiyuan Zhao, Xia Liu. Extraction and application of collagen from pig skin [J]. Meat Research, 1996(4):43–48.