Exploring the genomic resources of seven domestic Bactrian camel populations in China through restriction site-associated DNA sequencing

The domestic Bactrian camel is a valuable livestock resource in arid desert areas. Therefore, it is essential to understand the roles of important genes responsible for its characteristics. We used restriction site-associated DNA sequencing (RAD-seq) to detect single nucleotide polymorphism (SNP) markers in seven domestic Bactrian camel populations. In total, 482,786 SNPs were genotyped. The pool of all remaining others were selected as the reference population, and the Nanjiang, Sunite, Alashan, Dongjiang, Beijiang, Qinghai, and Hexi camels were the target populations for selection signature analysis. We obtained 603, 494, 622, 624, 444, 588, and 762 selected genes, respectively, from members of the seven target populations. Gene Ontology classifications and Kyoto Encyclopedia of Genes and Genomes enrichment analyses were performed, and the functions of these genes were further studied using Genecards to identify genes potentially related to the unique characteristics of the camel population, such as heat resistance and stress resistance. Across all populations, cellular process, single-organism process, and metabolic process were the most abundant biological process subcategories, whereas cell, cell part, and organelle were the most abundant cellular component subcategories. Binding and catalytic activity represented the main molecular functions. The selected genes in Alashan camels were mainly enriched in ubiquitin mediated proteolysis pathways, the selected genes in Beijiang camels were mainly enriched in MAPK signaling pathways, the selected genes in Dongjiang camels were mainly enriched in RNA transport pathways, the selected genes in Hexi camels were mainly enriched in endocytosis pathways, the selected genes in Nanjiang camels were mainly enriched in insulin signaling pathways, while the selected genes in Qinghai camels were mainly enriched in focal adhesion pathways; these selected genes in Sunite camels were mainly enriched in ribosome pathways. We also found that Nanjiang (HSPA4L and INTU), and Alashan camels (INO80E) harbored genes related to the environment and characteristics. These findings provide useful insights into the genes related to the unique characteristics of domestic Bactrian camels in China, and a basis for genomic resource development in this species.


Introduction
The domestic Bactrian camel is an essential livestock resource in China, serving as an important means of transportation for cultural exchanges between the East and the West [1]. These camels are also the source of various livestock products. For example, camel hair is a good textile material containing natural protein fiber [2]. Camel milk is rich in nutrition and has hypoglycemic and anticancer activities [3,4]. Camel meat has high moisture and protein levels, as well as low fat and cholesterol levels, and has been shown to have medicinal properties, suggesting that it may have applications in the treatments of some diseases [5][6][7].
Seven domestic Bactrian camel populations come from different regions of China. Nanjiang camels have stronger heat resistance than the pool of all remaining others, Alashan camels form stronger anti-ultraviolet physiological functions than the pool of all remaining others [8,9]. Thus, the pool of all remaining others can be used as a reference group for exploration of the genomic resources of their species. At present, some studies have been conducted on domestic Bactrian camels. Ming et al. [10] sequenced the whole genome of 128 camels across Asia and revealed origin and migration of domestic Bactrian camels. Additionally, Liu et al. [11] used single nucleotide polymorphisms (SNPs) to study the genetic diversity and genetic structure of seven domestic Bactrian camel populations; our study clarified the phylogenetic relationships of these populations, laying a foundation for the protection of their biodiversity.
RAD-seq is a simplified genome sequencing technology based on whole-genome restriction sites developed on the basis of next-generation sequencing. With the establishment of highthroughput sequencing technology and bioinformatics technology, RAD-seq analysis has become increasingly refined, and has been applied to many organisms [12]. For example, Li et al. [13] used RAD-seq for genome SNP typing of 618 sows to accurately predict of genome breeding value and evaluate the accuracy of breeding value prediction. Moreover, Wang et al. [14] successfully employed RAD-seq to explore genome-wide SNPs among six breeds of Sichuan cattle.
In this study, we explored the genome-wide SNP markers of seven domestic Bactrian camel populations by RAD-seq. Our aim was to identify genes that might be related to the unique characteristics of these camels, such as heat and stress resistance. Our results established interesting targets for subsequent genomics research of the domestic Bactrian camel, and may facilitate genetic association analysis of economically important traits of this species.

Animals and sampling sites
Blood sampled from domestic Bactrian agricultural camels were obtained from seven sites in Inner Mongolia, Gansu, Qinghai, and Xinjiang (Table 1, Fig 1). A total of 47 venous blood samples were collected (5 ml each). Miannaining, a compound preparation of xylazine and dihydroetorphine hydrochloride (with a specification of 2 ml per tube) purchased from the Institute of Quartermaster University of the Chinese People's Liberation Army, was used to anaesthetize the animals during the sampling process [15]. The application amount of camel was 2.5-3.5ml/100kg, and each sample was derived from a different family; there was no kinship among individuals [16]. All experimental protocols from this study were approved by the Institutional Animal Care and Use Committee of the College of Animal Science and Technology, Northwest A & F University (Shaanxi, China; approval no. DNX20170604).

DNA extraction, library construction, and Illumina sequencing
DNA samples were extracted following a standard phenol-chloroform extraction procedure before being diluted to 20 ng/μL [17]. Then, the quality and quantity of DNA were evaluated using an Epoch spectrophotometer (BioTek, Winooski, VT, USA), a Qubit 3.0 fluorometer (Thermo Fisher Scientific, Waltham, MA, USA), and agarose gel electrophoresis [18].
RAD-seq libraries were constructed in accordance with a modified protocol [19]. Genomic DNA (0.1-1 μg; from a single sample or pooled samples) was digested with EcoRI (New England Biolabs), and P1 adapters were connected at the cutting site. The samples were then pooled, randomly sheared, and size-selected. After P2 adapters were added, DNA fragments ranging from 300 to 700 bp were used to construct the sequencing libraries. Finally, the samples were sequenced on the Illumina HiSeq 3000 platform (Illumina, San Diego, CA, USA) using 100-bp paired-end reads.

Quality control, read mapping, and SNP calling
The fastp tool (v0.19.5) was used to filter the generated reads [20]. Raw reads were processed into high-quality clean reads using three strict filtering standards, as follows: (1) removing reads with greater than or equal to 10% unidentified nucleotides (N), (2) removing reads with greater than 50% of bases having Phred quality scores of less than or equal to 20, and (3) removing reads aligned to the barcode adapter. Burrows-Wheeler Aligner software was used to align the clean reads of each sample with the reference genome (https://www.ncbi.nlm.nih.gov/genome/10741), with settings of 'mem 4 -k 32 -M', where -k is the minimum seed length and -M is an option used to mark shorter split alignment hits as secondary alignments [21]. GATK's Unified Genotyper was used to conduct variant calling on all samples [22], and GATK Variant Filtration, with proper standards, was used to filter SNPs (-Window 4, -filter "QD < 2.0 || FS > 60.0 || MQ <40.0", -G_filter "GQ < 20") [23]. Finally, the obtained SNPs were filtered with VCFtools (https://github.com/ vcftools/vcftools) for further analysis with Minor Allele Frequency (MAF) > 0.05 and proportion of missing genotyping data < 25% as parameters. This data file was then used in subsequent analyses [24].

Analysis of the selection signatures
Selection signature detection was performed using 50-kb windows and 25-kb steps for sliding the π value (nucleotide diversity) and for F ST (population differentiation index) distribution calculation. The -log10 transform of Nei's π was used to select the lower end of the diversity windows, with these parameters then quantified using in-house PERL scripts (S1 File). The top 5% of the F ST and π values were selected as candidate regions [25,26]. All related graphs were drawn using R scripts [27].

Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis
To further systematically elucidate their complex biological functions, the selected genes from Nanjiang, Sunite, Alashan, Dongjiang, Beijiang, Qinghai, and Hexi camels were mapped against both GO and KEGG databases. WEGO software was used for GO enrichment analysis, and the gene number of each term was calculated [28]. KOBAS software and the KEGG database (http://www.genome.jp/kegg/) were used to test the statistical enrichment of the selected genes [29]. The calculated p values were subjected to false discovery rate (FDR) correction, applying an FDR threshold less than or equal to 0.05. Pathways meeting this condition were defined as significantly enriched pathways.

RAD-seq sequencing results and data filtering
A total of 131 GB of raw data and 893,672,274 reads from seven populations were obtained by Illumina sequencing. After quality filtering, 129 GB of clean data and 882,097,022 clean reads remained. The average clean data for each sample was 2.75 GB, and an average of 19.01 million reads was obtained. The average effective rate was 98.57%. Q20 was higher than 94%, Q30 was higher than 87%, and the GC content was stable between 40.71% and 45.13% (S1 Table). Overall, the sequencing data showed high Phred quality.

Analysis of the selection signatures
F ST and π were used to select the top 5% regions. Using the pool of all remaining others as the reference population and Nanjiang, Sunite, Alashan, Dongjiang, Beijiang, Qinghai, and Hexi camels as the target populations, 603, 494, 622, 624, 444, 588, and 762 selected genes, respectively, were obtained (Figs 2-8, S2-S8 Tables). The selected genes in the scanning regions were extracted, and their functions were predicted using Genecards. The analysis indicated that the Nanjiang camels (HSPA4L, INTU), and Alashan camels (INO80E) may harbor genes related to the environment and their unique characteristics.

Sequence annotation and enrichment analysis
GO classification was carried out on the selected genes. Across all populations, cellular process (GO:0009987), single-organism process (GO:0044699), and metabolic process (GO:0008152) KEGG enrichment analysis was carried out on the selected genes from seven domestic Bactrian camel populations. In Alashan camels, the selected genes were mainly enriched in ubiquitin mediated proteolysis pathways. In Beijiang camels, the selected genes were mainly enriched in MAPK signaling pathways. In Dongjiang camels, the selected genes were mainly enriched in RNA transport pathways. In Hexi camels, the selected genes were mainly enriched in endocytosis pathways. In Nanjiang camels, the selected genes were mainly enriched in insulin

PLOS ONE
Exploring the genomic resources of domestic Bactrian camels in China signaling pathways. In Qinghai camels, the selected genes were mainly enriched in focal adhesion pathways. In Sunite camels, the selected genes were mainly enriched in ribosome pathways (Figs 2-8, S16-S22 Tables).

Discussion
In this study, we used RAD-seq to detect 482,786 SNP markers from seven domestic Bactrian camel populations. The pool of all remaining others were used as the reference population for selection signature analysis. GO classification and KEGG enrichment analysis were performed on the identified selected genes, and Genecards was used to further study their functions. It showed that Nanjiang, and Alashan camels may harbor genes related to the environment and the unique characteristics of these camels. Overall, our findings established interesting targets for genetic association analysis of important traits in domestic Bactrian camel populations in China, and provided a basis for additional genome studies in this species [30][31][32].
Wensu County, the home of Nanjiang camels, is close to the Taklimakan Desert. Nanjiang camels have long lived in an environment with abundant sunlight, little rain, strong winds, and copious amounts of sand [33]. Thus, these camels tend to have strong heat resistance [8]. HSPA4L (encoding heat shock protein family A member 4 like) is a candidate gene related to heat resistance [30] and cellular response to heat stress. Thus, HSPA4L may be an interesting target for studying heat resistance in Nanjiang camels. Additionally, these camels have 3-5-cm-long eyelashes, with eyelash length determined by the duration of their hair follicle growth period [34]. The Inturned planar cell polarity protein (encoded by INTU) is involved in hair follicle morphogenesis [35]. Therefore, studies of INTU may provide insights into the distinct eyelash length of Nanjiang camels. Alashan camels originate from the Alashan Plateau, in western Inner Mongolia. The Alashan Plateau is subjected to strong ultraviolet rays, and Alashan camels have evolved anti-ultraviolet physiological functions in response to this prolonged exposure [9]. Because INO80E (INO80 complex subunit E) is involved in UV-damage excision repair [36], this gene may have applications in future studies of anti-ultraviolet radiation functions in these camel populations.
In summary, the SNP loci of seven domestic Bactrian camel populations in China were identified at the whole-genome level using RAD-seq. Through selection signature analysis, we found that Nanjiang, and Alashan camels may have genes related to the environment and the unique characteristics of these camels. These findings have established a basis for further mining the genomic resources of domestic Bactrian camels in China, which will help us better protect and develop the genetic resources of the species. In future studies, we will actively seek cooperation with other institutions to conduct a more in-depth exploration of genes related to the characteristics of this species, furthering our understanding of their genomic resources. Supporting information S1