Beyond the MHC: A canine model of dermatomyositis shows a complex pattern of genetic risk involving novel loci

Juvenile dermatomyositis (JDM) is a chronic inflammatory myopathy and vasculopathy driven by genetic and environmental influences. Here, we investigated the genetic underpinnings of an analogous, spontaneous disease of dogs also termed dermatomyositis (DMS). As in JDM, we observed a significant association with a haplotype of the major histocompatibility complex (MHC) (DLA-DRB1*002:01/-DQA1*009:01/-DQB1*001:01), particularly in homozygosity (P-val = 0.0001). However, the high incidence of the haplotype among healthy dogs indicated that additional genetic risk factors are likely involved in disease progression. We conducted genome-wide association studies in two modern breeds having common ancestry and detected strong associations with novel loci on canine chromosomes 10 (P-val = 2.3X10-12) and 31 (P-val = 3.95X10-8). Through whole genome resequencing, we identified primary candidate polymorphisms in conserved regions of PAN2 (encoding p.Arg492Cys) and MAP3K7CL (c.383_392ACTCCACAAA>GACT) on chromosomes 10 and 31, respectively. Analyses of these polymorphisms and the MHC haplotypes revealed that nine of 27 genotypic combinations confer high or moderate probability of disease and explain 93% of cases studied. The pattern of disease risk across PAN2 and MAP3K7CL genotypes provided clear evidence for a significant epistatic foundation for this disease, a risk further impacted by MHC haplotypes. We also observed a genotype-phenotype correlation wherein an earlier age of onset is correlated with an increased number of risk alleles at PAN2 and MAP3K7CL. High frequencies of multiple genetic risk factors are unique to affected breeds and likely arose coincident with artificial selection for desirable phenotypes. Described herein is the first three-locus association with a complex canine disease and two novel loci that provide targets for exploration in JDM and related immunological dysfunction.

Juvenile dermatomyositis (JDM) is a chronic inflammatory myopathy and vasculopathy driven by genetic and environmental influences. Here, we investigated the genetic underpinnings of an analogous, spontaneous disease of dogs also termed dermatomyositis (DMS). As in JDM, we observed a significant association with a haplotype of the major histocompatibility complex (MHC) (DLA-DRB1*002:01/-DQA1*009:01/-DQB1*001:01), particularly in homozygosity (P-val = 0.0001). However, the high incidence of the haplotype among healthy dogs indicated that additional genetic risk factors are likely involved in disease progression. We conducted genome-wide association studies in two modern breeds having common ancestry and detected strong associations with novel loci on canine chromosomes 10 (P-val = 2.3X10 -12 ) and 31 (P-val = 3.95X10 -8 ). Through whole genome resequencing, we identified primary candidate polymorphisms in conserved regions of PAN2 (encoding p.Arg492Cys) and MAP3K7CL (c.383_392ACTCCACAAA> GACT) on chromosomes 10 and 31, respectively. Analyses of these polymorphisms and the MHC haplotypes revealed that nine of 27 genotypic combinations confer high or moderate probability of disease and explain 93% of cases studied. The pattern of disease risk across PAN2 and MAP3K7CL genotypes provided clear evidence for a significant epistatic foundation for this disease, a risk further impacted by MHC haplotypes. We also observed a genotype-phenotype correlation wherein an earlier age of onset is correlated with an increased number of risk alleles at PAN2 and MAP3K7CL. High frequencies of multiple genetic risk factors are unique to affected breeds and likely arose coincident with artificial selection for desirable phenotypes. Described herein is the first three-locus association with a complex canine disease and two novel loci that provide targets for exploration in JDM and related immunological dysfunction. PLOS

Introduction
Juvenile dermatomyositis (JDM) is an autoimmune vasculopathy that causes a characteristic skin rash (heliotrope rash across the eyelids and Gottron's papules on the bony prominences) and proximal muscle weakness [1]. It is the most frequently diagnosed childhood inflammatory myopathy, comprising 80% of all cases [1] and affecting 3.2 in every million children between the ages of 2 and 17 within the USA [2]. Prognosis is positively correlated with early diagnosis and swift treatment with corticosteroids and/or immunosuppressants [1,3]. While treatment of JDM is much improved overall, many children still suffer from chronic flare-ups [1]. Though the etiology is unknown, JDM is thought to be triggered by exposure to a virus or other environmental agent. Manlhiot et al. [4] reported that 71% of JDM patients had a clinical history consistent with infection prior to symptoms. Investigations into the class II major histocompatibility complex (MHC), TNF, and IL1 identified several susceptibility and protective alleles, but their collective contribution to pathogenesis is poorly understood [5][6][7][8]. Recent genome-wide association studies (GWASs) to identify additional susceptibility loci in JDM confirmed a strong association with the MHC but failed to detect novel major risk factors, likely because of a paucity of biological samples and genetically heterogeneous populations [9,10].
In domestic dogs, an inflammatory vasculopathy of the skin and muscle, also termed dermatomyositis (DMS), is clinically, histologically, and immunologically similar to JDM and provides the only animal model available to study genetic risk factors [11][12][13][14][15][16]. The earliest clinical signs of DMS are crusting and scaling on the face, ears, tail tip, and/or across the bony prominences of the limbs and feet [17][18][19] (S1 Fig). Alopecia and more extensive skin lesions may develop over time, resulting in dermal scarring associated with erythema and mottled pigmentation [17,19]. Lesions persist for weeks to months, and may or may not chronically recur [17]. Muscle wasting manifests as atrophy of the head musculature; difficulty eating, drinking, and swallowing; and an atypical, high-stepping gait [17,19].
Similar to JDM, DMS is an immune-mediated disease [13,18,20] that typically develops following an environmental trigger, such as vaccination or viral infection, and is exacerbated by subsequent stressors like exposure to UV light [13,17,21,22]. Anecdotal reports indicate that rabies vaccination, parvovirus infection, owner surrender, or mistreatment commonly precede disease onset. Consistent with an environmental trigger, age at onset is variable with many cases occurring between seven weeks and six months of age, but others not developing until well into adulthood [17][18][19]23].
DMS is diagnosed almost exclusively in the genetically [24] and phenotypically similar collie and Shetland sheepdog breeds, suggesting the presence of a strong heritable component(s) arising from ancestors common to both breeds. A 1980s study of disease transmission in the collie eliminated simple Mendelian modes of inheritance [14]. In two test matings, an affected male collie sired litters from an affected collie and a healthy Labrador retriever. All six collie puppies were affected with variable degrees of severity, while three of the four mixed breed puppies developed milder forms of DMS. Retrospective pedigree analyses of the collie sire and dam showed a complete absence of affected ancestors [14].
The availability of a naturally-occurring canine model provides a new opportunity for the identification of genetic risk factors of JDM. The conserved genomic backgrounds of genetically isolated dog breeds have enabled detection of risk loci in complex diseases that are often obscured by heterogeneity within human cohorts [25][26][27][28][29][30]. Here, we evaluated class II MHC haplotypes, performed multibreed GWASs, and generated whole genome and transcriptome sequencing data to dissect the genetic basis of DMS. We uncovered common polymorphisms of the MHC and two novel loci that are strongly associated with DMS, as well as patterns of allelic inheritance that explain 93% of cases studied. A genetic test is now available to determine the likelihood of a dog developing DMS and to facilitate breeding decisions that avoid progeny having high-risk genotypes.

Results and discussion
Association of a major histocompatibility complex haplotype Given the involvement of MHC genes in JDM, we first determined alleles of the highly polymorphic canine MHC class II dog leukocyte antigen (DLA) genes: DLA-DRB1, -DQA1, and -DQB1. Two locus (DLA-DRB1 and -DQB1) and three locus (DLA-DRB1, -DQA1, and -DQB1) haplotypes were first generated for 50 collies and 117 Shetland sheepdogs, respectively. Because all observed haplotypes contained a unique DLA-DRB1 allele, the 355 remaining dogs were genotyped for this locus only and the haplotype was inferred (Table 1).
We conducted an independent GWAS for each breed, using a total of 97 cases (31 collies, 66 Shetland sheepdogs), 68 controls (23 collies, 45 Shetland sheepdogs), and 98,520 SNPs after filtering. In collies, a single signal (P-val = 1.47X10 -8 ) composed of 17 SNPs at the centromeric end of chromosome 10 exceeded Bonferroni significance ( Fig 1A). In Shetland sheepdogs, this  association was replicated (P-val = 2.56X10 -7 ), and a second signal (P-val = 1.83X10 -9 ) composed of 11 SNPs surpassing Bonferroni significance was detected on chromosome 31 ( Fig  1B). Both associations persisted in a combined breed analysis (chr10: P-val = 2.3X10 -12 , chr31: P-val = 3.95X10 -8 ) (S1 Table, S2 Fig); although the breeds possessed a common haplotype on chromosome 31, the Shetland sheepdogs appeared to drive this association. No associated SNPs were detected near the MHC loci on chromosome 12, likely a result of high homogeneity in our cohort and poor SNP coverage on the array [25]. On chromosome 10, 97% of all affected dogs were homozygous or heterozygous for the risk alleles of the lead SNPs. On chromosome 31, 88% of affected Shetland sheepdogs, but only 39% of affected collies, shared the risk alleles of the lead SNPs. As neither locus appeared to be independently necessary for disease development, we surveyed the extent of regional linkage disequilibrium (LD) to demarcate candidate intervals of~1.33 Mb on chromosome 10 ( Fig  2A) and~696 kb on chromosome 31 (Fig 2B), harboring~65 and 6 genes, respectively. The large size of the chromosome 10 region is attributed to lower recombination rates near the centromere and a dearth of informative SNPs.

Identification of candidate variants in PAN2 and MAP3K7CL
Whole genome resequencing was performed for four affected dogs (three collies and one Shetland sheepdog) and two unaffected collies, resulting in 17X to 41X coverage. Variants were filtered for those lying within our delineated intervals (chr10:1-1,333,693; chr31:24,026,411-24,722,836) and following the inheritance pattern of the most significantly associated SNPs in the affected dogs (S2 and S3 Tables). Five intergenic and two intronic variants were unique to these breeds (i.e., not present in the Boxer reference genome, dbSNP, or 27 whole genome sequences from 16 other breeds); however, most were in repetitive regions and none were in conserved positions. Thus, the pathogenic variants were likely to be common polymorphisms, so we next prioritized variants within predicted exons and splice sites of genes expressed in skin for further study.
To confirm exon/intron boundaries predicted by Ensembl 79 and establish expression of variants in affected tissue, we generated RNAseq data. We obtained a minimum of 89 million reads per tissue from active skin lesions of two affected dogs (one collie and one Shetland sheepdog) and skin from one unaffected Australian shepherd dog. All genes expressed in skin within the candidate regions were also expressed in affected tissues. Seventeen exonic variants were expressed; seven of these were nonsynonymous and evaluated using in silico programs [31][32][33] (Table 2).
We genotyped a subset of our mapping population for three nonsynonymous SNPs on chromosome 10 (ANKRD52 g.565958G>C, PAN2 g.627760G>A, and STAT6 g.1239562G>A) that were predicted to be deleterious or probably damaging by more than one in silico program. The ANKRD52 and PAN2 variants were more strongly associated with DMS than the lead SNP. These variants were in perfect linkage disequilibrium with each other; however, PAN2 g.627760G>A was assigned damaging scores with higher confidence by in silico programs (Table 2). We therefore focused further studies on PAN2 g.627760G>A, encoding p.Arg492Cys (XP_531635.3), although ANKRD52 cannot be excluded. On chromosome 31, we genotyped Shetland sheepdogs for a SNP (g.24132343A>C) and an indel (c.383_392ACTCCACAAA>GACT, XM_846337.4), both located in a 5 0 non-coding exon of MAP3K7CL. Only the indel was associated with DMS ( Table 2,  In an expanded, combined population (132 affected and 390 unaffected collies and Shetland sheepdogs), both the PAN2 (P-val = 2.08X10 -35 ) and MAP3K7CL (P-val = 1.45X10 -33 ) variants were highly associated with DMS (S4 Table).

Analysis of three-locus genotypes reveal gene-gene interactions
We next considered three-locus genotypes in our expanded, combined population (132 affected and 390 unaffected dogs) where A = the PAN2 variant encoding p.Arg492Cys, B = MAP3K7CL c.383_392ACTCCACAAA>GACT, and C = DLA-DRB1 Ã 002:01, lower-case letters denote wild type alleles (c represents any alternate allele of DLA-DRB1). Only 4% of dogs possessed a threelocus genotype with cc, barring further analysis of those nine genotypes. We considered nine of the remaining genotypes to be low-risk, as less than 6% of dogs with these allelic combinations had DMS (Table 3). Among healthy dogs (Fig 3A), the most frequently observed genotypes Table 3. Distribution of three-locus genotypes in 132 cases and 390 controls with penetrance, 95% confidence intervals, and P-values. were AabbCC (24%) and aabbCC (15%). Based on penetrance, we classified five genotypes as conferring moderate risk (33-50%) and four as high risk (90-100%) for DMS. All cases possessed at least two risk alleles and all but one were homozygous for at least one risk allele. The most common genotypes of affected dogs ( Fig 3B) were AaBBCC (20% of cases), followed by AAbbCC, AABbCC, and AABBCC (17% of cases each). Interestingly, only affected dogs possessed AABBCc or AABBCC (n = 29), indicating that DMS is fully penetrant in dogs having these combinations. Epistasis plots illustrated that genotypes with at least one a or b allele confer a lower probability of disease when a c allele is present, compared to their CC counterparts (Fig 3, compare 3C and 3D). The plots also depicted a greater probability of disease than expected under a strictly additive model, providing evidence for additive-by-additive epistasis between the chromosome 10 and 31 loci [53,54]. We noted at least one ARE in MAP3K7CL, presenting a mechanism for interaction with PAN2. No difference in gene interactions was observed between the sexes (S4 Fig). Information regarding age at onset or diagnosis was available for 91 dogs. We compared dogs having two, three, or four risk alleles across PAN2 and MAP3K7CL and observed an inverse correlation between age of onset and number of risk alleles (S5 Fig). Dogs having four risk alleles developed DMS at a significantly younger median age (5 months) than did dogs with only two risk alleles (18.5 months). The complete penetrance of AABB genotypes, combined with an early age of onset, suggest that these dogs may be hypersensitive to commonplace environmental stimuli (e.g., routine puppy vaccinations).

Collies and Shetland sheepdogs have uniquely high frequencies of associated alleles
All three identified variants associated with DMS are polymorphisms present in several breeds, raising the question: why are other breeds rarely, if ever, affected by DMS? We genotyped five or more unrelated individuals from each of 30 diverse breeds for all three loci (Fig 4). The only other breeds to possess all three risk alleles were Cardigan Welsh corgis and Cairn terriers. Three Jack Russell terriers had moderate-risk genotypes (AAbbCc), as did one Cardigan Welsh corgi (AABbCc); both breeds are occasionally diagnosed with dermatomyositis-like disease [55,56]. None of the 229 individuals possessed a high-risk genotype (S5 Table). Interestingly, Labrador retrievers had both B and C, which could have enabled moderate or high risk genotypes (AaBBCc or AaBBCC) in puppies from the outcross mating described by Haupt et al. [14].
Combined frequencies of risk alleles in other breeds were dramatically lower than those observed among collie and Shetland sheepdog populations, and homozygosity for a risk allele (a characteristic of all moderate-to high-risk genotypes) was rare. Additionally, breeds having a high frequency of one risk allele had few or no risk alleles at the other loci. For example, Cairn terriers had a high frequency of A (75%) but low frequencies of B (19%) and C (3%), and no high-or moderate-risk genotypes were observed among 18 individuals. These findings suggest that independently the polymorphisms are neither deleterious nor selected against.
It is likely that recent artificial selection for phenotypes shared by collies and Shetland sheepdogs led to increased frequencies of A. We propose that persistence of A in these two breeds is attributed to linkage disequilibrium (D 0 = 0.998) between wildtype PAN2 (a) and another chromosome 10 allele, Merle of PMEL. In heterozygosity, Merle causes a popular pigmentation pattern (see collie in Fig 1A), but homozygosity for the allele results in severe hypopigmentation often with auditory and ocular defects [58]. Wildtype PMEL occurred on chromosomes with either A or a, whereas Merle was only found in conjunction with a. Accordingly, the Merle phenotype was underrepresented in affected dogs (P-value = 0.0018), 64% of which were homozygous for A. Consistent selection for heterozygosity (but not homozygosity) for Merle would simultaneously encourage maintenance of both PAN2 alleles. To our knowledge, there are no loci on chromosome 31 under positive selection for maintenance of a characteristic phenotype of collies and/or Shetland sheepdogs.
Across five collie genomes, we observed a 1.  retained a less common second haplotype that permits heterozygosity at the DLA loci, associated with a lower risk for developing DMS. Ironically, this reduced risk may have masked the presence of otherwise high-risk genotypes (i.e., AABb and AaBB) and hindered selection against A and B alleles.
We observed striking allele frequency differences between the two affected breeds at PAN2 and MAP3K7CL: collies had a higher frequency of A, 42% (25% in Shetland sheepdogs), whereas Shetland sheepdogs had a higher frequency of B, 38% (5% in collies) (Fig 4). Consequently, the frequency of observed allelic combinations varied between the breeds. The most common genotypes in healthy dogs were aaBbCC in Shetland sheepdogs, 16% (5% in collies), and AabbCC in collies, 38% (11% in Shetland sheepdogs). Among affected dogs, AaBBCC predominated in Shetland sheepdogs, 25% (8% in collies), whereas AAbbCC was the most frequent genotype in diseased collies, 38% (8% in Shetland sheepdogs). The latter finding is interesting given that AAbbCC is only a moderate-risk genotype. Among collies having this genotype, 67% were unaffected by age 8, whereas only 30% of Shetland sheepdogs with this genotype were disease-free. This discrepancy in disease probabilities between breeds was unique to this genotype. Further studies will be necessary to determine if other loci confer additional risk for or protection from DMS.
The contribution of alleles from multiple loci explains the spontaneous appearance of the disease in lines with no prior history [14] and has hindered elimination of DMS in the absence of a genetic test. For example, a mating between two healthy dogs having low risk genotypes (e. g., AaBbCC x AaBbCC) can produce puppies with low, moderate, or high risk for DMS. This study has led to the first three-locus genetic test for a complex disease of dogs, which will allow breeders to carefully reduce the frequency of A and B among collie and Shetland sheepdog populations while preserving genetic diversity and desirable breed characteristics.
In a canine model of JDM, we have identified a complex pattern of causation involving three independent loci, two of which offer new targets for exploration in JDM. Furthermore, these data provide support for the involvement of genetic risk factors independent of the MHC in human inflammatory myopathies. While further experiments are necessary to determine the exact contribution of the chromosome 10 and 31 loci, our findings suggest that DMS may result from an inability to properly regulate inflammatory response. This work highlights the utility of the canine model for unraveling genetic susceptibility conferred by common polymorphisms and/or gene-gene interactions in complex diseases.

Ethics statement
All samples were obtained with informed consent according to protocols approved by the Clemson University Institutional Review Board (IBC2008-17) and IACUC (2012-039).
Study population. Three populations were assembled: DMS-affected dogs (92 Shetland sheepdogs, 40 collies), control dogs for GWAS (45 Shetland sheepdogs, 23 collies), and unaffected dogs (160 Shetland sheepdogs, 162 collies). Affected dogs were diagnosed either through histopathology (75 Shetland sheepdogs, 30 collies) or by a veterinarian based on clinical signs and elimination of demodectic mange, a differential diagnosis. Pedigrees were collected when available; some samples were obtained from affected dogs surrendered to rescue organizations without paperwork. Twenty percent of affected dogs were collected internationally, and in each case a regionally-matched control sample was obtained.
Control dogs for GWAS were eight years of age or older at time of collection, had no known family history of DMS, had no personal history of skin disease, and were unrelated to other study participants within two (most often three) generations. Pedigrees were obtained from all control dogs. The population of unaffected dogs had no clinical signs of DMS and were eight years of age or older at time of collection. This subset was collected without regard to family history of DMS, presence of other skin disorders, or relationship to other study participants. Archival samples from 229 dogs of 30 other breeds were not closely related to each other to our knowledge.
Whole blood or buccal cells were obtained from each dog, and genomic DNA was isolated according to the Puregene DNA Isolation protocol (Gentra). Skin punch biopsies from active lesions or healthy tissue were also obtained from one Shetland sheepdog, one collie, and one Australian shepherd dog and preserved in RNAlater (Ambion).
DLA class II genotyping. The hypervariable regions of DLA-DRB1, -DQA1 and -DQB1 were sequenced and genotyped according to protocols previously described [59][60][61]. DLA-DQA1 was largely uninformative in collies and was inferred. Association of a haplotype or allele state with DMS was assessed through Fisher's exact tests, using VassarStats (Web Site for Statistical Computation, Vassar College, Poughkeepsie, NY).
Genome-wide association and LD analyses. Genotyping was performed for 166 dogs (85 males and 81 females) using the Illumina CanineHD BeadChip, containing 173,662 SNPs. One sample having call rates <95% was excluded. SNPs having call rates <95%, minor allele frequencies <5%, and/or significant deviation from Hardy-Weinberg equilibrium (P-value <0.0001) in the control dogs were excluded. No evidence of genomic inflation was observed in the combined Shetland sheepdog and collie analysis (λ = 1.04). Fisher's exact P-values were calculated under a dominant model. LD pairwise analysis was performed to calculate r 2 values, which were plotted using Locus-Zoom [62]. We calculated r 2 values between SNPs on chromosome 10 using all controls. For chromosome 31, r 2 values were calculated using only control Shetland sheepdogs because no association was detected in collies alone. Assuming that the pathogenic variant would be in high LD with the lead SNP, candidate regions were characterized by SNPs with pairwise r 2 ! 0.6 and defined by the first flanking associated SNPs in lower LD. Filtering and statistical analyses were conducted with SNP & Variation Suite v8 (SVS, Golden Helix). All chromosome positions throughout the text are reported in CanFam3.1.
Whole genome resequencing. Three affected and two control collies were selected for whole genome resequencing along with one affected Shetland sheepdog. Genomic DNA fragments of approximately 500bp were gel size selected for each sample and sequenced on two lanes of an Illumina HiSeq 2000, generating 2x100 (collies) and 2x125 bp (Shetland sheepdog) paired-end reads. 531 to 997 million total reads were generated for the five collies. Over one billion total reads were generated for the Shetland sheepdog. Paired reads were aligned to the indexed reference (CanFam3.1) with Bowtie2 [63] under sensitive parameters. The alignments were sorted and indexed with SAMtools [64,65] to be visualized in the Interactive Genomics Viewer [66].
RNA isolation and sequencing. Total RNA was extracted from 30-40mg of skin punch biopsy tissue from active lesions (collie and Shetland sheepdog) or healthy skin (Australian shepherd) using the ToTALLY RNA kit (Ambion), according to the manufacturer's protocols. RNA samples were treated with DNase to remove DNA contamination, using the DNA-free kit (Ambion). Samples were quantitated on a NanoDrop 1000 spectrophotometer (Fisher Scientific).
Three RNAseq libraries were constructed per dog using normalized total RNA and the manufacturer's protocol for one of the following: TruSeq RNA library prep kit v2.0 (Illumina) or TruSeq stranded total RNA library prep kit (Illumina). An Agilent Bioanalyzer 2100 was used for size validation. Each sample was sequenced at 2x125 bp paired-end on an Illimina HiSeq 2500 to a depth of at least 22 million reads.
FastQC from the Babraham Institute was used to assess read quality before and after preprocessing by Trimmomatic [67], which removed low quality bases and remaining sequence adapters. Trimmed reads for each sample were aligned to CanFam3.1 using gsnap [68], and SAMtools [64,65] was used to generate sorted and indexed bam files.
Variant filtering and genotyping. SAMtools and BCFtools [64,65] were used to generate variant call files for each sample, which were analyzed in SVS. On chromosome 10 (1-1,333,693 bp) variants that were heterozygous in affected collies 1 & 2 and homozygous in affected collie 3 and the affected Shetland sheepdog (i.e., reference/reference or alternate/alternate), consistent with the allele states at the lead SNPs, were selected. On chromosome 31 (24,026,722,836 bp), all homozygous variants in the Shetland sheepdog were selected. Ensembl 79 was used to identify variants lying within predicted exons and 10 bp flanking sequences to capture splice sites. RNAseq data were manually inspected in IGV to determine whether predicted exonic variants were expressed in the affected dogs. Alternate variants were investigated using dbSNP and 27 genomes of 16 other breeds (either sequenced as part of ongoing studies or shared by other research groups) to determine whether any were unique to collies and Shetland sheepdogs. SIFT [31], PolyPhen [32], and PANTHER [33] were used to predict the impact of the amino acid substitutions. Genotyping of variants in additional dogs was accomplished by restriction digest assays or Sanger sequencing (S6 Table).
Age of onset. Because age is not distributed as a normal random variable, we made use of the Weibull distribution, a density commonly used for time-to-event data [69]. Our goal was to estimate the mean and median age of onset, along with their confidence intervals, to facilitate comparisons across the three genotypic groups.
Taking advantage of a Bayesian strategy, we assume y ij~W eibull (r, e À b i ), where y ij is the observed age at diagnosis for the j-th (j = 1, 2, . . ., n i ) dog in the i-th (i = 1,2, . . .,3) genotypic class and b i is the effect of the i-th genotypic class. With this representation of the scale (r) and shape (e À b i ) parameters of the Weibull density, the median of the i-th genotypic class is e À b i log (2) 1/r and the mean of the i-th genotypic class is e À b i Γ(1 + 1/r).
In addition, we assumed a prior density for the unknown genetic parameters (b i ) to have a null mean and common variance σ (i.e., (b i~N (0, σ)). We consider the weakly informative prior for σ~Cauchy(0,25) [70]. For the scale parameter, r, we assume the weakly informative prior of r~exponential(0.001). The Monte Carlo Markov Chain (MCMC) sampling process was run in 4 chains through the public-domain software Stan [71], with each chain based on 50,000 total samples, and the first 20,000 were removed as part of the warm-up process, then thinned to every 20th sample, resulting in a MCMC sample of 6,000 values [71]. Convergence to the posterior density was evaluated by the Gelman-Rubin test statistic, where values less than 1.05 indicate that the MCMC sampling process was adequate to the data evaluated [72].
Genetic interaction analyses. All affected (n = 132) and healthy (n = 390) dogs were used to investigate possible interactions between the PAN2 variant encoding p.Arg492Cys and MAP3K7CL c.383_392ACTCCACAAA>GACT in dogs homozygous for 002:01/009:01/001:01 vs. dogs heterozygous for the haplotype. Three-locus genotypes observed in fewer than five dogs were excluded. For a given disease state (cases vs. controls) and a given pair of genotypic classes i (i = A/A, A/a, or a/a) and j (j = B/B, B/b, or b/b), n cases ij~B inomial(n cases ij + n controls ij , p ij ) where n cases ij is the number of observed cases in the combination of the genotypic class of locus i and that of locus j, n controls ij is the number of unaffected dogs in the combination of genotypic class of locus i and that of locus j. Finally, p ij is the probability of disease for the combined genotypic classes of i and j.
Estimation of the unknown elements of our model must ensure estimates of p ij within the interval [0, 1], recognizing that several genotypic classes have zero or few cases. We utilized a hierarchical Bayesian framework facilitated through the public-domain software Stan [71] to address this problem [73]. Stan can be accessed through the public-domain language R [74]. Log-odds was used to estimate p ij , i.e., log(p ij /(1-p ij )) = intercept + addA i + domA i + addB j + domB j + add x add ij + add x dom ij + dom x add ij + dom x dom ij , where intercept represents a term common to all genotypic classes, addA i , addB j domA i and domB j represent the additive and dominance terms respectively for loci A and B, and add x add ij + add x dom ij + dom x add ij + dom x dom ij represent the four possible epistatic interaction terms for all possible additive and dominance combinations [53].
We assumed a prior density for the intercept and unknown genetic parameters, with null mean and common variance σ (i.e., N(0, σ)). Subsequently, we consider the weakly informative prior for σ~Cauchy(0,25) [70]. We used the same MCMC parameters here as described above.
Chromosome 12 selective sweeps. We identified all SNPs on chromosome 12 present in five collie genomes (3 cases and 2 controls) and used a creeping window size 1 Mb to identify runs of homozygosity [75]. Windows containing fewer than 50 SNPs were excluded and gaps >10kb between SNPs were ignored. The heterozygosity (Hp) statistic was calculated for all windows and Z-transformed, making the average Hp value equal to zero and the standard deviation equal to 1. The-ZHp distribution was plotted in R to show putatively selected regions greater than 3.4 standard deviations from the mean.
PMEL. D' between PMEL and PAN2 was calculated as a measure of LD using all dogs for which coat pattern phenotypes were available (112 cases, 385 controls). A total of 98 dogs (10 cases and 88 controls) were described as merle-patterned, a semi-dominant trait caused by the Merle (M) allele of PMEL, and assumed to possess the Mm genotype. Computation of D 0 , along with a test of significance from zero, was facilitated through the package genetics [76], a public domain program that is part of the R language [74].

Accession numbers
SNP data are available in dbSNP under BioProject number PRJNA338128. All whole genome and transcriptome data generated for this study were deposited in SRA genomes under accesssion number SRP081080. Accession numbers for eight of the 27 other breeds used in variant filtering are SRX1360633, SRX1360635, SRX1360637, SRX1360639, SRX1022256, SRX10222 62, SRX1022286, and SRP081080.