Genome-wide analysis of the diversity and ancestry of Korean dogs

There are various hypotheses on dog domestication based on archeological and genetic studies. Although many studies have been conducted on the origin of dogs, the existing literature about the ancestry, diversity, and population structure of Korean dogs is sparse. Therefore, this study is focused on the origin, diversity and population structure of Korean dogs. The study sample comprised four major categories, including non-dogs (coyotes and wolves), ancient, modern and Korean dogs. Selected samples were genotyped using an Illumina CanineHD array containing 173,662 single nucleotide polymorphisms. The genome-wide data were filtered using quality control parameters in PLINK 1.9. Only autosomal chromosomes were used for further analysis. The negative off-diagonal variance of the genetic relationship matrix analysis depicted, the variability of samples in each population. FIS (inbreeding rate within a population) values indicated, a low level of inbreeding within populations, and the patterns were in concordance with the results of Nei’s genetic distance analysis. The lowest FST (inbreeding rate between populations) values among Korean and Chinese breeds, using a phylogenetic tree, multi-dimensional scaling, and a TreeMix likelihood tree showed Korean breeds are highly related to Chinese breeds. The Korean breeds possessed a unique and large diversity of admixtures compared with other breeds. The highest and lowest effective population sizes were observed in Korean Jindo Black (485) and Korean Donggyeong White (109), respectively. The historical effective population size of all Korean dogs showed declining trend from the past to present. It is important to take immediate action to protect the Korean dog population while conserving their diversity. Furthermore, this study suggests that Korean dogs have unique diversity and are one of the basal lineages of East Asian dogs, originating from China.


Introduction
Dogs belong to the family Canidae and show high diversity between and among different species. They have diverse feeding habits and advanced social organization. The dog was suggested as the first domesticated animal by archaeological discoveries around the world [1]. Moreover, it is considered as the most distinctive domesticated animal with regard to a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 with worldwide dog populations (ancient and modern breeds) using genome-wide analysis of single nucleotide polymorphisms (SNPs).

Materials and methods
Animals and genotype quality control In total, 2258 animals were used as a sample for this study. To achieve the major objectives of the study, we selected coyote, wolve, and several breeds analyzed in a previous study [10], after reviewing the literature. The Akita (AKT), Chow Chow (CHO), Chinese Shar-Pei (CHS), Lhasa Apso (LHA), Basenji (BSJ), Afghan Hound (AFH), Alaskan Malamute (ALM), Saluki (SAL), Pekingese (PEK), Shiba Inu (SHI), Shi Tzu (SHT), Siberian Husky (SIH), and Tibetan terrier (TIT) dog breeds were categorized as ancient breeds in many publications due to high divergence levels compared to other dogs. It is believed that they originated > 500 years ago [17][18] and are highly associated with the original domestication of dogs [8,19,20]. Furthermore, these breeds can be considered a basal lineage of domestic dogs and live prototypes of ancestral dogs. Therefore, data on these dog breeds were extracted to investigate the relationship between ancient and Korean breeds.
The Border Collie (BDC), Boxer were selected as modern breeds, representing all parts of the world. The sample comprised 1870 modern dog breeds. These breeds emerged during the Victorian era (circa 1830-1900) through controlled breeding practices. Their breeding regime was implemented by humans, and therefore they no longer have a close relationship with wolves [20]. The dog breeds in the sample sizes are indicated in S1 Table. Korean Based on memorandum of understanding (MOU) between the research team and the research and breeding center, veterinarians collected blood samples for the research purposes of this study. All blood samples were obtained in an ethical manner, following guidelines for animal health and welfare. Advance approval was acquired from the Institutional Animal Care and Use Committee of the National Institute of Animal Science, of the Rural Development Administration, of South Korea. Genomic DNA from the Korean dogs was isolated from blood samples using standard methods [21]. Samples were genotyped for 173,662 single nucleotide polymorphisms (SNPs) by Illumina CanineHD array. The quality of genome-wide data was maintained by the application of SNP filtering in PLINK 1.9 [22] based on the following quality control parameters: SNPs with low call rates (<90%) or high missing genotypes (>10%) were removed. To reduce bias in the data, the number of minor allele frequencies was limited to 1%. Dog genotypes obtained from other sources [5] were merged into our dataset. Only genotypes from autosomal chromosomes were used for further analysis.
Diversity, population structure, and phylogenetic analysis Diversity and population structure analyses were performed using following algorithms: 1) pairwise fixation indices within populations (F IS ) and between populations (F ST ) [23]; 2) heterozygosity and Nei's standard genetic distance estimation [24]; 3) GRM estimation, 4) multidimensional scaling (MDS) analysis; 5) neighbor-joining tree and 6) ancestor's admixture prediction. The fixation indices, and heterozygosity and Nei's standard genetic distance analyses were performed using two R packages, hierfstat [25] and StAMPP [26]. GRM was estimated in GCTA v1.25.2 [27]. The four-dimensional pairwise genetic distances matrix was obtained from the calculation of the MDS algorithm in PLINK 1.9 [28] and depicted as a coordinate in R [28]. ADMIXTURE v1.23 [29] was used to detect possible mixtures of ancestral populations by the two to ten adjusted cluster models (K). The neighbor-joining tree was constructed using SNPhylo [30]

Migration events, linkage disequilibrium (LD) and demographic estimation
An extended analysis of the relationships among dog populations was performed using Tree-Mix v1.12 [32]. This approach allows an estimation of possible historical splits and mixtures between populations, termed migration events. A maximum likelihood tree of populations was first produced. We generated a tree model to estimate migration events that may have occurred in the domestication of Korean dogs in relation to both ancient and modern Asian breeds. To account for LD in tree reconstruction, markers were grouped together in windows of 1,000 SNPs. Migration edges that best fit the data were evaluated based on the fraction of the variance defined in the matrix of residuals, in which positive values were preferred. To identify possible introgression traces in dog populations, we generated an f3 statistical analysis that was introduced [33] using the threepop command line. Three population (A, B, and C) statistical models with significant negative values for both the f3 statistic and Z-score were selected as a possible event of population B and C introgression in the population A.
Demographic history of the dog population was reflected by the number of estimated recent to past effective population size (N e ). N e was estimated from the LD value following Sved's equation [34]. Prior to Ne calculation, LD was annotated as r 2 to measure the correlation of alleles at two loci [35]. We used the default PLINK 1.9 [22] approach and SNeP V1.1 [36] to finalize the estimations of LD and N e . The historical N e values were plotted using R [28] with the estimated times on the horizontal ordinate.

Population structure and diversity
The observed autosomes in the CanineHD array of our genotype data included 140,420 SNPs, as many as in the worldwide dog data obtained from Shannon et al [5]. After the cleaning process, the remaining autosomal SNPs for Korean dogs and other breeds (ancient and modern) were 98.7%, and 93.83%, respectively. The results of population structure analyses are summarized in Table 1.
Variability of the samples in each population was shown by the negative off-diagonal variances in the GRM analysis. All Korean breeds had relatively high heterozygosity. The observed heterozygosity of the Akita, Shiba Inu and Chow Chow were slightly lower, while other ancient breeds ranged between 0.4 and-0.44.
The inbreeding coefficients (within population F IS ) of Korean breeds were between-0.22 and-0.23 while ancient breeds ranged from -0.23 to -0.3. The F IS of all dogs observed in this study was negative indicating that the sample used in this study had a low level of inbreeding.
Population differences based on inbreeding coefficient (between populations -F ST ) ( Table 2) were used to examine variation within Korean dog populations, as well as their correlation with wolves (gray, Chines, Russian, and Korean) and ancient and modern breeds ( The MDS results are depicted in Fig 1. The plot was constructed using coyotes, worldwide wolves, Korean dogs, and dogs from other parts of the world. MDS analysis allows visualization of the genetic distance of each breed within a selected sample. Various colors were used to differentiate breeds. The group containing wolves was placed in the left corner. All Korean breeds were situated near the non-dog group and were tightly clustered with each other. Chinese Shar-Pei, Chow Chow, and Shiba Inu clustered with the Korean breeds. European breeds such as Cavalier King Charles Spaniel, Chihuahua, Golden Retriever, and Miniature Pinscher were located further away from the wolves and Korean breeds. In particular, the Boxer was located furthest away from all other breeds at a great distance. Inbreeding coefficients 2 Average of the genomic relationship matrix referring to the inbreeding of the animal itself (Diagonal) and referring to the relationship between animals in the population (Off-diagonal) 3 Linkage disequilibrium estimated by the r 2 method (0-20 Kb marker distance) 4 Effective population size (Ne)

Population ancestries and migration events
Neighbor-joining tree (Fig 2), admixture (Fig 3 and Fig 4), and TreeMix ( Fig 5 and S1 Fig) analyses were used to determine viable Korean dog ancestries. The neighbor-joining tree was constructed using the coyote, gray wolf, and ancient and Korean dogs. Coyote was selected as the root of the tree. The tree had two main branches. Siberian Husky and Alaskan Malamute (morphologically wolf-like dogs) formed another one sub clade next to the root. Afghan Hound, Basenji, Tibetan Terriers, Lhasa Apso, and Shi Tzu formed another branch, similar to a previous study [8], Shih Tzu and Lhasa Apso, which have similar appearances, were grouped in a single clade. The next branch was situated further away from the previous breeds and consisted of the Shiba Inu, Akita, Chow Chow, Chines Shar Pei and all Korean breeds. All Korean Poongsan White, Korean Donggyeong white, Korean Jindo Brindle, Korean Jindo Black, Korean Jindo White and Korean Jindo Black and Tan were found in a single clade. The results of the admixture analysis clearly show the genetic structure of Korean dogs in an ancestry-based model (Fig 3). We conducted admixture analysis with K = 2, K = 3, K = 5 Diversity and ancestry of Korean dogs and K = 10 and revealed that the lowest error after cross-validation was obtained with K = 10 (cross-validation error = 0.5153, Fig 4). K = 2, K = 3, K = 5, and K = 10 were selected to improve visualization of the ancestry model while displaying the relationship among Korean, ancient and modern breeds. The admixture results of K = 10 clearly showed the diversity and admixture of Korean breeds compared with other breeds. Although Korean dogs were admixed with both the ancient and wolf categories, they showed a distinctive admixture compared with all dogs in the sample. Korean Donggyeong White had a distinct genetic makeup from Jindo and Poongsan. Admixture analysis also showed a strong relationship among Chow Chow, Shar-Pei and Korean breeds. Akita, Alaskan Malamute, Basenji, Shi Tzu, Siberian Husky and Cavalier King Charles Spaniel showed very low levels of admixture. Korean breeds showed admixture events with some Japanese breeds, such as Akita and Shiba Inu. Close relationships among coyote, gray wolf, and Korean wolf were visualized in this analysis.
Several migration events of Korean dogs were revealed using non-dogs, and ancient and modern dogs in the maximum likelihood tree (Fig 5). Migration edges that best fit the data were selected if they had positive values as seen in a plot of residuals (S1 Fig) with basal colors. The coyote was set as the root of the ancestry model. The tree showed that all Korean breeds were clustered in one branch with some ancient Chinese and Japanese dogs. The modern breeds clearly clustered together away from wolves while the Boxer exhibited the highest genetic drift in the sample.
Several migration events could be observed in the TreeMix results. A few important migrations were observed from Korean Jindo Black to the Chinese Shar Pei, Akita to Tibetan spaniel and wolf clade to Basenji with a high migration weight. Observation of the residuals from the fit of the model to the data (S1 Fig) revealed that a number of populations do not adhere to a strict tree model.
The f3 statistics were generated to trace the possible ancestry mixtures in Korean dogs using a sample that included ancient breeds, and the gray wolf. A concise table of the most significant f3 statistics (standardized to a Z score <-2) is shown in Table 3. Coyote and European wolf introgression on Russian wolf were significant.

Demographic trends
The historical effective population size values were estimated based on the LD value across the genome and were used as a representation of demographic changes in the dog population. The  Table 1 and averaged in Table 4 based on genetic distance ranges. Ne over 20,000 generations is shown in Fig 6. All Korean dogs have low adjacent LD values than ancient breeds (  (Fig 6). This has caused a decrease in the inbreeding rate from the past to present in Korean dog breeds. The Ne trend for Korean Donggyeong White and Korean Jindo White can be traced back to 239,233 (while other breeds can be traced back to more thañ 1,000,000 years ago Table 4).

Discussion
This study was based on genome-wide SNP data to reveal information on diversity, population structure, ancestry, migration events, and demographic trends compared with ancient, and modern breeds and their ancestors (wolves and coyotes). Dogs originated from the gray wolf, and various studies have presented diverse hypotheses for dog domestication [37,38]. Although a considerable number of studies used different methods, they had various drawbacks and information on the ancestry of Korean dogs is rare. Data based on genome-wide SNPs are appropriate for these types of studies and some previous studies have used this kind of data. However, most of these studies have lacked samples from Northeast Asia, especially from Korea. Therefore, this study mainly focused on the diversity and ancestry of Korean dogs and revealed interesting information about these dogs. Ascertainment bias is the systematic variation of population genetic statistics from theoretical expectations. It occurs due to sampling a non-random set of individuals, small sample sizes, or biased SNP discovery protocols [39]. Moreover, small sample size tends to bias towards common SNPs in the allele frequency distribution [40]. This error always occurs, unless sequencing is performed on the whole genome of every individual. High coverage sequencing data, analysis of a large number of SNPs [41,42], raw data modification, and incorporating ascertainment bias into the theoretical models of population genetics can minimize this error [39]. The ascertainment bias in our analysis was minimized by using a considerable sample size, a large SNP genotype dataset and through sample size correction protocols. Therefore, the present study provides precise results on Korean dog ancestry.
The data used in this study were grouped into four different categories to improve the clarity of the analysis. GRM analysis was performed for all Korean breeds and ancient dogs. The heterozygosity in Korean dogs was high (around 0.4), while the inbreeding coefficient within populations indicates that all Korean breeds in this study had a low level of inbreeding. Diversity and ancestry of Korean dogs Table 4. Historical effective population size (Ne). Previously, it was revealed that Korean Donggyeong White, Korean Jindo White, and Korean Poongsan White had heterozygosity values of 0.77,0.70, and 0.74, respectively [43]. The sample of this study has a low level of heterozygosity compared to that study. Lee et al. [44] showed an average inbreeding coefficient within populations of Korean breeds of 0.028. The inbreeding coefficient is comparatively higher than this study. Ancient history and recent factors such as breeding programs introduced during the past few hundred years can lead to changes in the genetic diversity of individuals. Nevertheless, the variation may be due to the differentiation between samples and different methodologies used in the studies [45].

KDW
F ST values were used to investigate genetic diversity between populations. Korean breeds showed more similar allele frequencies with some Chinese breeds (Chow Chow and Chinese Shar Pei.) than others in the sample. The MDS, TreeMix and admixture results also indicated close relationships between Korean and Chinese breeds. MDS analysis showed that Korean breeds are closely related to wolves. The modern breeds show a distinct genetic background from their dog ancestors. It was previously found that Southeast Asian dogs were closely related to wolves, especially Chow Chow, Akita and Chinese Shar Pei. Further, they are considered a foundation lineage connecting to the gray wolf [6,45,46,47]. Fan et al. [48] found that the Boxer genome does not follow any wolf population, which agrees with our results.
Some publications clearly established that gray wolves (C. lupus) are distributed throughout China in both ancient and modern times [49]. According to Wang et al., [45] wolves from the southern part of East Asia have a significant genetic relationship with domestic dogs. All of these studies shed light on East Asian dog domestication. The results of our study are in significant agreement with these previous studies. Because there is little literature showing the close relationship among Chinese wolves, Korean wolves, and dogs, our observations represent a reliable source of information for future studies.
The phylogenetic tree, MDS, admixture analysis, and TreeMix results provide evidence showing that Korean dogs have a close relationship with Japanese breeds. A previous study also revealed that Korean dogs were brought to Japan many years ago [50].
The admixture analysis revealed that Korean breeds are uniquely diverse compared with all other breeds, although they were admixed with both wolf and ancient dog breeds. Korean Donggyeong White showed a different genetic makeup from when compared to other Korean breeds. Nevertheless, most of the migration events could not be identified from the F statistics due to the difficulty in identifying admixtures due to the large amount of genetic drift since the admixture event [51].
Effective population size is the main factor in population genetics and conservation [51] because it strongly associated with inbreeding, fitness and loss of genetic variation through random genetic drift [52,53]. Therefore, it is considered as an important criterion for determining the endengerment of a population [54,55].
The historical effective population size suggests that all Korean breeds exhibit decreasing effective population sizes over long time scales. The results of this analysis are agree with a previous study of effective population size in the Sapsaree breed [56]. The smallest effective population size were observed in the Korean Poongsan White and Korean Donggyeong White breeds, while the largest effective population size was observed in Korean Jindo Black. This results signals increasing inbreeding rate over time.
Artificial breeding, or domestication can cause a reduction in effective population sizes [57,58]. Thus, the observed effect may be due to the number of breeding programs that have been introduced recently, and could be related to the observed heterozygosity reduction. The study conducted by Calboli et al, [59] revealed adverse consequences (loss of unique genetic variants, high prevalence of recessive genetic disorders) of increasing inbreeding rates and a dramatic effect of breeding patterns on genetic diversity based on pedigree information. These results are in accordance with the findings of our study.
It has been noted previously that populations of breeds or species require a minimum effective population size of about 50 or 100 [60]. Therefore, the declining effective population sizes of Korean dogs, especially, the Korean Poongsan White and Korean Donggyeonng White emphasize the need for strong actions and strategies to increase the effective population size while maintaining the genetic diversity these breeds.

Conclusion
This study presents some interesting findings on the diversity, population structure, ancestral admixture, and demographic history of Korean dog breeds. Since there are few studies on the ancestry and diversity of Korean dog breeds, our study helps to fill gaps in knowledge this population. Korean dogs have clear genetic divergence from modern breeds. The unique genetic structure of Korean dogs has caused them to have extremely distinctive characteristics. It is clear that the effective population size of Korean dogs has decreased from the past to present due to increased inbreeding due to modern breeding programs.
The present results emphasize that Korean dogs have a close relationship with ancient Chinese and Japanese breeds. Since most analyses in the study showed a strong relationship between Korean and Chinese breeds, migration of dogs between China and Korea can be scientifically validated by our study. Therefore, this study suggests Chinese ancestry for Korean dogs. The geographical location, previous studies and the history of these two countries support this hypothesis. Moreover, Korean breeds show a closer relationship with ancient dog breeds than the wolf ancestor. Therefore, we suggest that Korean dogs are also one of the indigenous dog categories that can be considered as the basis of the East-Asian dog domestication process. The various types of admixture events leading to increased diversity of Asian dogs including Korean dogs is greater than in any other part of the world. Korean Donggyeong has a different genetic composition from than other Korean breeds. More studies using whole genome sequencing data, larger sample size and more Korean dog varieties are needed to improve accuracy and to investigate the exact time period for Korean dog domestication.
Supporting information S1 Table.