Canine Genomics and Genetics: Running with the Pack

ABSTRACT The domestication of the dog from its wolf ancestors is perhaps the most complex genetic experiment in history, and certainly the most extensive. Beginning with the wolf, man has created dog breeds that are hunters or herders, big or small, lean or squat, and independent or loyal. Most breeds were established in the 1800s by dog fanciers, using a small number of founders that featured traits of particular interest. Popular sire effects, population bottlenecks, and strict breeding programs designed to expand populations with desirable traits led to the development of what are now closed breeding populations, with limited phenotypic and genetic heterogeneity, but which are ideal for genetic dissection of complex traits. In this review, we first discuss the advances in mapping and sequencing that accelerated the field in recent years. We then highlight findings of interest related to disease gene mapping and population structure. Finally, we summarize novel results on the genetics of morphologic variation.


Introduction
In July of 2004, the first high-quality draft (7.53) sequence of the dog genome was made publicly available [1], providing an invaluable resource for navigating the genomes of more than 400 recognized dog breeds worldwide, as well as a number of closely related wild canids. This work represented the crowning achievement of the dog genomics community that has for years worked hard to demonstrate the utility of a mammalian genetic system outside of the commonly accepted rodent models [2][3][4]. In this review, we first briefly highlight some recent advances in our picture of the canine genome. With this background, we then focus on issues related to the identification of genes associated with complex traits, the dog as a model for advancement of human medical genetics, and the development of mapping tools that take advantage of features of population structure.

Canine Genome Maps and Sequence
The high-quality draft sequence of the dog (http://www. genome.ucsc.edu; http://www.ncbi.nlm.nih.gov; http://www. ensembl.org) has illuminated many qualities of the canine genome only hinted at from previous work [1]. The DAPIbanded karyotype of the dog's 40 chromosomes, combined with reciprocal chromosome paint studies from two independent groups, suggested that the canine genome was highly homologous to the human genome and comprised a limited number of conserved segments, intimating a low level of rearrangement between the two [5][6][7]. The size of the canine genome was initially estimated from maximum likelihood predictions to be 27 morgans in genetic distance [8]. Estimates based on flow sorting of chromosomes suggest a physical size of 2.8 gigabases [9,10]. Predictions based on sequence analysis of euchromatic sequence suggest a size of 2.3-2.4 gigabases [1,11]. Integrated linkage, radiation hybrid (RH), and cytogenetic maps of the dog genome confirmed these general conclusions [12][13][14][15].
Most recently a comparative RH map of the dog genome containing 10,000 canine gene sequences has proven invaluable for identifying small rearrangements within the canine genome (http://www-recomgen.univ-rennes1.fr/doggy. html) [16], as well as assisting in the ordering of contigs for the draft assembly [1]. The gene sequences used in construction of the 10,000-gene RH map were derived from a 1.53 sequence of the standard poodle. This initial sequence of the dog genome contained at least partial orthologs for 75% (18,473) of annotated human genes [11]. Portions of 9,850 individual genes or about half of all dog genes were localized on an RH panel with 200 kilobase resolution [16]. A total of 264 conserved segments less than 500 kilobases in size were identified from a comparison of the dog map and human genome sequence (National Center for Biotechnology Information Build 34), matching well with predictions from the draft assembly [1]. This provided a clear snapshot of the order of canine genes relative to the human genome, and precise information about breakpoint position. Subsequent studies on breakpoint reuse across mammalian systems, utilizing RH maps from a variety of mammalian species, verify these data [17].
The 7.53 draft assembly of the canine genome was derived from a female boxer, selected because of her apparent lack of heterozygosity (H.G. Parker, unpublished data). The assembled sequence spans most of the dog's 2.4 gigabases and is derived from 31.5 million sequence reads (http://www. genome.ucsc.edu) [1]. The quality of the assembly is extremely high compared to initial assemblies of other mammalian genomes [18][19][20][21]. Half of assembled bases (N50 contig size) are in contigs of 180 kilobases, and the N50 supercontig size is 45.0 megabases, which is considerably longer then the mouse genome at a similar point in its assembly. The reasons are 2fold. First, technical advances in sequencing result in longer and higher-quality reads. Second, advances in bioinformatics have improved the accuracy with which genomes can be assembled. From a practical standpoint, this means that the majority of canine genes contain no sequence gaps and most canine autosomes are comprised of one to three supercontigs. The current gene count is listed as approximately 19,000, with about 75% representing 1:1:1 orthologs among dog, human, and mouse.
With the availability of the canine genome sequence, the research community is now ready to tackle goals stated nearly a decade ago, when the first arguments were put forth as to why the dog system offered unique advantages for mapping complex traits [22][23][24]. In general, researchers have focused on the dog as a system for advancing general medical knowledge. In addition, a small but growing number of groups have used the dog for tackling the genetics of morphology and behavior.

The Canine Genome and Molecular Mechanisms of Disease
Among the most well-studied elements of the canine genome sequence are the short interspersed nuclear elements (SINEs) [25][26][27]. These retrotransposons are implicated in genome evolution and include several families of wellrecognized repeats, such as the Alu sequences in humans [18,19,28]. In dogs, the major family of SINEs is derived from a tRNA-Lys, and is distributed throughout the genome at about 126 kilobase spacing [26,29,30]. The frequency of bimorphic SINE elements is 10-to 100-fold higher than what is observed in humans, largely because of the expansion of a single subfamily, termed SINEC-Cf in the canine lineage [11].
As with human Alu repeats, a surprising number of SINEs seem to be located in positions that affect gene expression. A perfect example is the often cited SINEC-Cf element inserted into intron 3 of the gene encoding the hypocretin receptor, resulting in narcolepsy in the Doberman pinscher [31]. These data were the first to link the hypocretin gene family to sleep disorders, and a large body of work on molecular biology of sleep has evolved from these initial studies. Likewise, insertion of a SINE into the canine PTPLA gene leads to multiple splicing defects, causing an autosomal recessive centronuclear myopathy in the Labrador retriever [32].
By studying SINEs in the dog, genome researchers have learned about important disease mechanisms that have not been appreciated from the study of human families. Analysis of other canine diseases demonstrates this as well. For example, analysis of miniature wire-haired dachshunds in the United Kingdom revealed that recessive progressive myeloclonic epilepsy is due to expansion of a dodecamer repeat in the Epm2b gene [33]. Normal dogs carry two sequential copies of the repeat and a third slightly variant copy, while affected individuals carry up to 26 repeats, resulting in dramatic reduction of mRNA by about 900-fold. While simple mutations in the same gene cause Lafora disease in humans, this is the only report of a dodecamer repeat expansion associated with any mammalian disease.
Over 360 genetic disorders found in humans have been described in the dog [3,34], with 46% occurring largely in either one or a few breeds (http://www.vet.cam.ac.uk/idid). It has been said that the ''low hanging fruit'' of canine genetics is rapidly being plucked. That is, the genes that can be mapped using easily obtainable pedigrees or those caused by highly penetrant alleles are rapidly being identified. To some degree this is true. Loci have been mapped, and in some cases mutations found, for a multitude of common canine diseases (reviewed in [3,22,35,36]). In some cases, the biology of the underlying mutations has been helpful in understanding a comparable human disorder. In other cases, such as the identification of the CNGB3 gene for cone degeneration [37] or the folliculin gene for renal cystadenocarcinoma and nodular dermatofibrosis [38], the work has served primarily to highlight the power of canine genetics for dissecting genetic diseases common to humans and dogs.
Particularly challenging will be the identification of genes associated with complex diseases such as hip dysplasia, a common disease in dogs, affecting up to 50% of the large breeds. The disease is recognized radiographically as subluxation of the femoral head from the acetabulum of the hip joint [39,40], and is likely caused by a mixture of genetic [41][42][43][44][45] and environmental factors [44][45][46][47][48]. Two approaches have been used to try to identify causative genes.
Investigators at the University of Utah have looked for a genetic association in a population of well-characterized and densely genotyped Portuguese water dogs (PWD) using the Norberg angle, a highly heritable and quantitative radiographic measure of joint laxity. They report the presence of two unlinked quantitative trait loci (QTLs) on CFA1, located more than100 megabases apart, which demonstrated statistically significant associations [49]. A third locus on a different chromosome was found to be associated with osteoarthritis [50].
By comparison, Todhunter and colleagues have developed a large outcrossed pedigree of affected Labrador retrievers crossed with unaffected greyhounds [45,51,52]. A variety of measures, including age at detection of femoral capital epiphyseal ossification, distraction index, hip joint dorsolateral subluxation score, and hip joint osteoarthritis, are being used in a genome-wide scan for classical linkage [51]. While no gene has yet been found, pedigree analysis suggests that loci controlling these traits act additively, and that the distraction index may be controlled by a single major locus [45,53].
These studies represent two distinct methods for approaching a complex problem. Both highlight different advantages of using the canine system for genetic analysis. The first makes use of the availability of large controlled populations with limited genetic diversity. The second demonstrates the ability to cross populations showing extremes of phenotype in order to map genes. Each has the potential for success, and comparison of the two methods will improve the design of future studies.

The Canine System and Genetics of Complex Traits
The study of morphology in the PWD represents one of the most interesting stories from the canine genetics community to date. PWD of today are descended from a small number of dogs primarily from two kennels [54,55]. Six ancestors account for about 80% of the current gene pool of 10,000 dogs, and the breed is characterized, in part, by extensive consanguinity, with a range of 0-0.6 [56].
Through a program termed the Georgie Project (http:// www.georgieproject.com), investigators at the University of Utah, Gordon Lark, Kevin Chase, and collaborators, have collected extensive phenotypic data (five sets of X rays on over 500) and genotypic data (DNA samples of over 900) on registered PWDs. Ninety-one metrics describing aspects of canine morphology have been extracted from the X rays, and DNA samples have been genotyped with nearly 500 microsatellite markers that span the genome at about 5centimorgan density.
In 2002, a subset of the data was subjected to principal component analysis, which classifies variation of correlated traits into independent linear combinations. Principal component 4, for instance, demonstrates how skull and limb lengths are inversely correlated with the strength of the limb and axial skeletons. Bulldogs have proportionately shorter, wider bones designed to accommodate large body mass, while the greyhound's long, thin limbs are adapted for speed (see Figure 1). Analysis of these data has highlighted 44 putative QTLs on 22 chromosomes that are important for heritable skeletal phenotypes in the PWD [57]. Ongoing collaborations between our own laboratory and Gordon Lark, Kevin Chase, and collaborators are aimed at finding the specific variants responsible for each principal component.
Among the most satisfying aspects of this work has been the ability to demonstrate how canine genetics allows us to unravel complex, but nonadditive, interactions between genetic loci, a problem which has proven difficult to approach using classical genetic methods. For example, 21% of the observed variation in skeletal size among PWDs results from differences between females and males. Analysis of the above dataset suggests that more than half of this sexual dimorphism results from an interaction between a QTL linked to canine Chromosome 15 and a locus adjacent to the CHM gene on the X chromosome [58]. In females, the haplotype associated with small size on canine Chromosome 15 is dominant, while in males it is the reverse-the haplotype associated with larger size is dominant. The introduction of the X chromosome locus complicates the story, but explains some curious observations as well. For instance, in any population of PWDs, there are always a small number of females who are as large as the largest males. Analysis of the X chromosome locus shows that females who are homozygous for both the CHM linked marker as well as the haplotype associated with large size on canine Chromosome 15 will be, on average, as large as the largest males [58]. Modifier genes such as the one on the X chromosome, which by themselves do not have a detectable effect, would not be identified using traditional QTL methods, highlighting again the value of this approach.

Phenotypic Variation in the Dog
Very recently, efforts have been made to understand the apparent plasticity of the canine genome [59]. Fondon   repeats within coding sequences are a major source of phenotypic variation in dogs. As such, this mechanism serves to generate dogs with novel morphologies faster than would be otherwise predicted. To test their hypothesis, they sequenced 37 repeat-containing regions from 17 genes that were known or predicted to have a role in craniofacial development in 92 breeds of dogs. They found that the repeats in the dog were changing faster in terms of length than comparable repeats in humans. They also analyzed three-dimensional models of dog skulls from 20 breeds and some mixed-breed dogs, and found that variation in the number of repeats in the coding regions of the ALX4 and RUNX2 genes were quantitatively associated with significant differences in limb and skull morphology. The authors argue that the incremental effects of repeat length mutations would be an efficient way to generate the rapid yet morphologically conservative changes that distinguish various breeds of dog. If correct, this hypothesis would be in striking contrast to a commonly held view that variation arises largely from modification of gene regulatory sequences, such as transcriptional control elements. Several avenues of experimentation are suggested by this work, including studies of additional genes and more phenotypic measures. However, the initial data provide a starting point for relating novel features of the canine genome to repeated observations regarding rapid creation of morphologically distinct breeds.

Population Structure and Linkage Disequilibrium in the Domestic Dog
Investigation into the genetic relationships between dog breeds is an area of explosive recent growth that holds great promise. Initial studies of breed relationships were highly focused. In a study of five Finnish breeds, Koskinen et al. reported that the phylogenetic distances between breeds were greater than those typically seen between human populations [60]. In addition, individual dogs from these five breeds could be correctly assigned to their breed of origin by analyzing allele patterns associated with small numbers of microsatellites [61]. Irion   The blocks are divided into six stacks indicating the percent of overall registrations acquired by that breed, as listed on the x-axis. Above each column is the percent of total registrations for all breeds in that category. Registration statistics can be found at http://www.akc.org/reg/dogreg_stats.cfm. microsatellites and 28 breeds [62]. Their analysis methods, based largely on neighbor-joining trees, revealed the important fact that little higher order structure could be found to describe the relationships between most breeds.
A subsequent and larger study from our own group that entailed genotyping five unrelated dogs from each of the 85 breeds with 96 microsatellite markers was undertaken in 2004 [63]. Assignment tests using the computer program Doh demonstrated that dogs could be correctly assigned to their breed of origin 99% of the time [63]. The majority of variation observed in the dog rests in the differences that separate breeds. In fact, 27% of genetic variation that exists in the dog is found when comparing breeds, whereas 5%-10% of all human variation is found between populations and races [63]. To determine how to best harness the power of canine population structure for mapping, studies of linkage disequilibrium (LD) in the dog have been undertaken [1,[64][65][66]. Hyun et al. found LD extended for 33 centimorgans around the copper toxicosis disease locus in Australian Bedlington terriers. The authors suggest, however, that because the study focused on a single disease locus, already identified by linkage studies, the conclusions were probably not readily generalized to the rest of the genome. The study of Lou et al. is based on analysis of a 10 centimorgan microsatellite scan in a single crossbred pedigree, and identified LD spanning five to ten centimorgans. A particular strength of the paper is its clear description of the nuances of analyzing LD using multiallelic markers. The authors, however, suggest that since only a single pedigree was analyzed, much larger studies need to be undertaken involving larger numbers of both dogs and breeds to develop a clear picture of LD in the dog.
There are over 150 breeds recognized in the United States by the American Kennel Club. The top ten most popular breeds account for more than half of all registrations, while more than 100 of the more uncommon breeds account for  The dataset includes five unrelated dogs from each of the 85 breeds that have been genotyped using 96 (CA)n repeat-based microsatellites that spanned the dog genome at an average density of 30 megabases. Clusters were obtained using the computer program Structure [69], which implements a Bayesian model-based clustering algorithm that attempts to identify genetically distinct subpopulations based on patterns of allele frequencies. The work is described in detail in [63]. Four distinct clusters described by Parker et al. are depicted as colored circles: cluster one is yellow, cluster two is blue, cluster three is green, and cluster four is red. Breeds associated with each cluster are listed within the appropriate circle, and examples of breeds are shown in the pictures. Some breeds show patterning similar to more than one cluster, and are listed in the overlapping space. Analysis is ongoing to expand the number of breeds in the dataset and to refine the clusters. less than 15% of the total (Figure 2). This range in population sizes is representative of a variety of breed histories. The LD studies of Sutter et al. and Lindblad-Toh, Wade, and collaborators were designed to be more widely applicable to the general population of purebred dogs [1,66]. Sutter and colleagues used 189 single nucleotide polymorphisms (SNPs) to examine 20 unrelated dogs from each of five breeds at five loci. They found a 10-fold difference in extent of LD in breeds that range from rare to popular, and whose population histories feature a range of popular sire and bottleneck effects [66]. These results were corroborated and extended in a much larger study by Lindblad-Toh et al. Using ten breeds and nearly 1,300 SNPs, the investigators were able to dissect the underlying haplotype structure of the dog genome in addition to measuring the extent of LD [1]. Both studies conclude that breed choice will have a profound effect on the number of markers required to complete whole genome association studies, and care should be taken when selecting breeds for the initial mapping stage. In addition, because of breed architecture, considerably fewer SNPs will be needed for mapping traits in dogs than in humans [1,66].
One additional way to improve power for fine mapping is to combine data across breeds. To determine the ancestral relationship between breeds, Parker et al. used the same dataset as described previously to perform an unsupervised clustering analysis with the computer program Structure [63]. The 85 breeds were ordered into four clusters, generating a new canine classification system for dog breeds based on similar patterns of alleles, presumably from a shared ancestral pool ( Figure 3) [4]. Cluster one comprised dogs of Asian and African origin, as well as gray wolves. Cluster two was made up of mastiff-type dogs, largely sharing a common theme of big, boxy heads and strong, sturdy bodies. The third and fourth clusters split a group of herding dogs and sight hounds away from the general population of modern hunting dogs including terriers, hounds, and gun dogs.
The Parker clusters offered the first look at relationships between breeds, and in doing so, suggest study designs for trait mapping. For example, Modiano and colleagues have sought to determine the origin of B and T cell lymphomas in dogs [67]. They found that while B cell lymphomas are most common overall, rates of T cell lymphoma are significantly higher in breeds from the Parker cluster one, the Asian cluster, than any other group. This suggests an ancestral cause of T cell lymphoma in Asian dogs, while arguing against a single ancestor for B cell lymphoma in any other group. The optimal mapping study for T cell lymphomas would, therefore, focus on dogs from the Asian group. Also of interest is the work of Neff and colleagues who describe a single haplotype surrounding the multidrug-resistant gene MDR1 in nine breeds [68]. The nine breeds represented a range of herding dogs and sight hounds that presumably shared a single common ancestor, and again suggests a strategy for mapping studies involving this set of breeds.
While understanding the relationships between breeds will assist in minimizing the task of mapping multigenic diseases, moving from locus to gene remains a daunting task [66]. Both Sutter et al. and Lindblad-Toh, Wade, and collaborators have undertaken studies to determine how haplotype analysis can facilitate such efforts [1,66]. Using their respective datasets, both studies demonstrate high haplotype sharing between breeds and low haplotype diversity within breeds. Thus, disease alleles will be most easily identified by comparison of haplotypes that are identical by descent in affected dogs from two or more breeds. Data from additional breeds can then be used for fine resolution mapping. The recent availability of 2.1 million SNPs (http://www.broad.mit.edu/mammals/dog/snp/) from the canine genome sequencing project will greatly enhance such studies [1].

Conclusion
When describing a dog, Mark Twain once wrote, ''The dog is a gentleman; I hope to go to his heaven, not man's.'' Twain's simple comment reflects both our admiration for the loyalty, integrity, and devotion we have come to expect from our closest companions, and our desire to keep them ever at our side. In the last few years, as summarized here, the canine genome project has worked tirelessly to develop resources and paradigms that will lead to both the improvement of animal and human health and an understanding of the genetics that regulates variation between breeds. A great deal remains to be learned. We still don't know why Great Danes are big and Pekingese are small, or why herding dogs herd and pointing dogs point. But in another sense, we have succeeded beyond our wildest dreams, as the dog is now a viable system in which to tackle problems relating to the genetics of complex traits. &