Merle phenotypes in dogs – SILV SINE insertions from Mc to Mh

It has been recognized that the Merle coat pattern in dogs is not only a visually interesting feature, but it also exerts an important biological role, in terms of hearing and vision impairments. In 2006, the Merle (M) locus was mapped to the SILV gene (aka PMEL) with a SINE element in it, and the inserted retroelement was proven causative to the Merle phenotype. Mapping of the M locus was a genetic breakthrough and many breeders started implementing SILV SINE testing in their breeding programs. Unfortunately, the situation turned out complicated as genotypes of Merle tested individuals did not always correspond to expected phenotypes, sometimes with undesired health consequences in the offspring. Two variants of SILV SINE, allelic to the wild type sequence, have been described so far–Mc and M. Here we report a significantly larger portfolio of existing Merle alleles (Mc, Mc+, Ma, Ma+, M, Mh) in Merle dogs, which are associated with unique coat color features and stratified health impairment risk. The refinement of allelic identification was made possible by systematic, detailed observation of Merle phenotypes in a cohort of 181 dogs from known Merle breeds, by many breeders worldwide, and the use of advanced molecular technology enabling the discrimination of individual Merle alleles with significantly higher precision than previously available. We also show that mosaicism of Merle alleles is an unexpectedly frequent phenomenon, which was identified in 30 out of 181 (16.6%) dogs in our study group. Importantly, not only major alleles, but also minor Merle alleles can be inherited by the offspring. Thus, mosaic findings cannot be neglected and must be reported to the breeder in their whole extent. Most importantly, sperm cells seem to be a significant source of germline Merle allelic variants which can be passed to the offspring on Mendelian basis and explain unusual genotype / phenotype findings in the offspring. In light of negative health consequences that may be attributed to certain Merle breeding strategies, we strongly advocate implementation of the refined Merle allele testing for all dogs of Merle breeds to help the breeders in selection of suitable mating partners and production of healthy offspring.

The Merle (M) locus was suggested by Little ([1]) as being responsible for Merle pattern, which is a coat color where eumelanic regions are incompletely and irregularly diluted resulting in typical intensely pigmented patches. In 2006, the gene corresponding to the dominant allele of the M locus was finally recognized by Clark et al. [12]). Nevertheless, previous attempts were focused on factors, which are secreted primarily by keratinocytes (cells surrounding melanocytes) to stimulate the switch between phaeomelanin and eumelanin production ( [13]). However, searching for the gene candidates responsible for Merle phenotypes excluded the genes involved in the melanogenic pathway, i.e., micropthalmia transcription factor (MITF), tyrosinase (TYR), tyrosine related protein (TYRP1), or dopachrome tautomerase (DCT) ( [14]).
The Merle pattern can be seen in various breeds, such as the Australian Shepherd Dog, Australian Koolie, Border Collie, Dachshund, French Bulldog, Louisiana Catahoula, Labradoodle, Miniature American Shepherd, Miniature Australian Shepherd, Pyrenean Shepherd, Rough Collie, Shetland Sheepdog, Welsh Sheepdog, Cardigan Welsh Corgi, Chihuahua, Great Dane etc. Merle is thought to be inherited in an autosomal, incomplete dominant way. Dogs heterozygous for the M allele show a typical coat pattern, however, dogs homozygous for the M allele may also exhibit auditory and ophthalmologic impairments and abnormalities together with very pale or completely white coat phenotypes (Strain et al. 2009). Such negative health effects associated with the M locus encouraged the research to identify the gene responsible. Clark et al. ( [12]) confirmed the SILV (aka PMEL) gene as being the cause of the Merle pattern, with a short interspersed element (SINE) inserted in the structure of the gene. Clark et al. ( [12]) has mapped the SINE insertion of approximately 253 bp at the intron 10 /exon 11 border of the SILV gene. A typical SINE element is composed of a body and a poly-A tail of a variable length. It has been assumed that it might be the extent of the poly-A tail, which plays the peculiar biological role, visually recognized as different qualities of the Merle coat pattern. It has been suggested that the poly-A tail, being a monotonous genomic structure, is prone to replication errors, caused by a slippage of the cellular replication machinery in such a challenging genomic context, leading to possible length differences of the resulting replicons. It has also been observed that SINE insertions of different lengths do exist. Shorter SINE was ascribed to the allele Mc (Cryptic Merle) which has no apparent effect on the dogs' phenotype, while longer SINE insertions were found to be responsible for the individual Merle phenotypes ( [12], [15]).
This discovery enabled commercial testing of the Merle gene by laboratories worldwide. However, the occurrence and combinations of m, Mc and M alleles could not explain various irregularities in dog phenotypes and unexpected results of breeding. In 2011, the Catahoula Club EU in the Czech Republic started testing the Merle gene within the Czech population of the Louisiana Catahoula (LC) breed in cooperation with Biofocus laboratory (Germany) ( [16]). LC is a North American breed famous for its Merle (leopard) pattern, where, initially, no regulations for breeding of Merle to Merle individuals had been applied in the breeding strategies, contrary to other Merle carrying breeds, especially in Europe. Therefore, many dogs carrying the Merle gene can be expected within the breeding pool. The pilot testing by Biofocus proved that many unusual solid-color in appearance Catahoulas carry Merle alleles, but of a different SINE size than those found by Clark et al. ([12]). The middle-sized SINE allele has been named "Atypical Merle" (Ma) in order to distinguish it from those already known ( [16]). Further commercial testing, which started in 2015 and continues until now, has involved a wider LC population and later also other breeds and confirmed the Ma existence and its role in various specific and distinguishable Merle patterns ( [17]).
Originally, only two alleles had been recognized by Clark et al. ([12], [15]). However, unusual or unpredictable results of breedings, which cause unexplainable phenotypes of offspring, have confirmed the need for further research in this field. The length of SINE seems to be a key factor for the explanation of different effects on the final coat pattern. However, various commercial laboratories use different methods and instruments for SINE length detection. This can strongly influence the final results, their accuracy, and ability to distinguish different alleles. Some laboratories have already acknowledged the existence of the Ma allele and have started reporting it. Other laboratories are still waiting for scientific confirmation, because a relevant study on this subject has not been published as of yet. There is also only limited and/ or biased choice of dogs and breeds available for testing in particular studies ( [18], [19]). Thus, it might be difficult to estimate the rate of occurrence for different Merle alleles in a given breed without a thorough population breed study. It is usually beyond the ability of any commercial laboratory, especially without a close cooperation with dog breeders of particular breeds. Solely collecting data from the Merle testing is not sufficient to properly evaluate the relationship between genotypes and phenotypes without knowledge of particular dogs and their relationships.
Thus, the aim of our study was to evaluate the phenotype/genotype correlations between the individual Merle coat patterns and specific lengths of SINE insertions in the SILV gene, i.e. to address genetically the extent of Merle alleles we have anticipated phenotypically in Lousiana Catahoula dogs, as well as in other Merle breeds (Australian Koolie, Australian Shepherd, Border Collie, French Bulldog and others).
Herein we confirm that in many Merle breeds, there exists a significantly higher number of Merle alleles than previously suggested. The individual Merle alleles can be distinguished according to their specific SINE lengths, corresponding unique Merle phenotypes and breeding results. To better understand the complexity of coat color genetics in Merle breeds, the SILV SINE gene genotyping has been complemented by a comprehensive Next Generation Sequencing genotyping of all known major coat color loci and their modifiers-a test called SuperColorLocus, which we have developed.

Animals and sample collection
Biological material (buccal swab, hair, blood, sperm cells) was collected from 181 dogs (Canis The animals in the study were selected both for their visually detectable Merle phenotypes, and/or pedigree relationships with Merle-expressing individuals or individuals anticipated to carry the Merle allele. No dogs were housed for research purposes; all dogs were privately owned pets. The sampling of the biological material (buccal swab, hair, blood, sperm cells) was performed by veterinary practitioners or owners under written recommendations by the laboratory. All samplings were performed as a part of a routine commercial laboratory investigations for other routine laboratory genetical testing relevant for the breeding programs of the individual breeds recommended by the individual breeding clubs. All samples were taken non-invasively, causing no harm or pain to the animals, and were contributed by the owners voluntarily as a part of a routine commercial laboratory testing for other purposes. No ethical commitee evaluation was applicable for this type of research. All owners have signed a written informed consent for the research, publication of results and pictures of their dogs.

Isolation of DNA from biological samples
From buccal swab and sperm cells, DNA was isolated using QIAamp DNA Mini Kit (Qiagen, DE) according to the instruction of the manufacturer. For DNA isolation from hair, incubation in Lysis buffer was performed for 12 hours prior to proceeding with the DNA isolation procedure.

SuperColorLocus testing
To build NGS libraries, the SCL multiplex PCR product was end-trimmed, adapters-ligated and barcoded using NEBNext1 Fast DNA Library Prep Set for Ion Torrent™ (NEB, USA) according to the instructions of the manufacturer. Built libraries were quantified using Ion Univ Library Quant Kit (ThermoFisher Scientific, USA) and subjected to emPCR overnight, using OT2 semi-automatic platform (LifeTechnologies). After bead enrichment (OT2 platform), NGS sequencing chip was loaded (typically v 314 or 316) and sequenced (PGM, LifeTechnologies). Raw data has been analyzed using in-house, ISO 17025-validated bioinformatic software for SNP genotyping. Briefly, the raw data has been downloaded from the PGM Server using commercial Ion Torrent Suite plug-in (Life Technologies, USA) and automatically assembled based on the individual barcodes of the individual libraries sequenced. Then, the data has been end-trimmed by bioinformatic removal of P1 / A adapters and barcode sequences, and quality-filtered (removal of polyclonal signals and critically short reads). Initially processed reads that passed the default quality control were downloaded as FASTAQ files. FASTAQ files have been converted to FASTA and then BLAST-aligned (locally) to the reference sequences of the individual fragments harboring the mutations of interest. For each individual base called the sequencing coverage was calculated. Fragments with a base coverage less than 10 would be excluded from the analysis. Nevertheless, a typical coverage ranged from hundreds to thousands calls for each individual base sequenced, providing the SuperColorLocus assay with diagnostic robustness and genotyping accuracy. Moreover, during the ISO 17025 validation process, in accordance with the internal validation procedure for the implementation of new laboratory tests, NGS data were compared with the Sanger sequencing data; the concordance of genotyping results obtained by both platforms was 100%.

Parentage testing
STR parentage analysis was performed using Canine ISAG STR Parentage Kit (ThermoFisher Scientific, USA) according to the instructions of the manufacturer. The system has been validated for forensic use in dog paternity testing and evaluates 21 ISAG-recommended polymorphic STR markers.

Statistical data analysis
Data was obtained from 14 breeds, with various numbers of the individuals within each breed. Among the more frequent breeds, LC (73), ASD (40), AK (23), BC (18), and D (9) were used for a detailed analysis, while data from less frequent breeds, such as RC (3) Prevalence comparisons across all breeds may not have been performed in all cases because of the small subject numbers in some breeds. Therefore, we show here the qualitative differences found within the more populated breeds included in the study.

Refined SILV SINE genotyping
SILV SINE fragments were separated in denaturing polyacrylamide gel on ABI3500 Genetic Analyzer to obtain the highest resolution possible for fragment analysis technology. ABI3500 allowed us to discriminate individual alleles differing in 1 nucleotide in length even for long PCR products (approx 500 bp). Using this approach, we were able to confirm the presence of The whole porfolio of seven Merle alleles was found in three breeds with a larger number of dogs represented (LC, ASD, BC) (n ! 9, Table 1). Nevertheless, the length of particular Merle alleles did not differ significantly between all breeds tested.
For details on the length of Merle alleles found in our study please refer to S1 Table.

Precision of the allelic size measurement
To assess the precision of allelic size measurement using ABI3500 fragment analysis, the length of paternal and maternal Merle alleles during their passage to the offspring generation were evaluated.    The following Table 1 summarizes the major phenotypic differences among individual Merle alleles.

Abundance of the individual Merle alleles
We have calculated the relative frequency of the individual Merle alleles in our cohort of 181 dogs. As shown in Fig 4, among all breeds tested in our study, dogs heterozygous for the wild type m allele were the most frequent (approximately one third of all individuals tested); M, Mh, and others followed (Fig 4A). However, significant differences in their abundance have been revealed in the five most populated breeds (Fig 4B). LC carried Ma, M, and Ma+ (50.7, 39.7, and 34% of tested dogs, respectively), while the m allele was present only in 34% of the individuals tested. In contrast, most of AK, BC, D, and ASD were heterozygous for the m allele, as 100, 100, 78, and 60% of the tested dogs, respectively, carried it. ASD showed besides m and Mh (60%) also Mc (45%) as the most frequent allele. In BC, Ma+ (44%) was the most frequent allele, followed by m, Mh (33%), and Mc (22%). In our cohort of dogs, D have been found free of Mc, Mc+, and Ma alleles, while Ma+ (78%) was the most abundant allele, followed with M (44%).
Frequency of all individual alleles in all breeds tested are summarized in Fig 5. The Mc allele was found in eight breeds, while the Mc+ allele was found in six breeds. The Ma allele was detected only in three breeds (Fig 4B), while the Ma+ allele was present in seven breeds (S1 Table).
The Mh allele was found in 8 breeds out of 14 tested (Fig 6). Beside MAUS, WS, and SSD (n = 2-3), where Mh was carried by 50-100% of tested dogs, the highest frequency of Mh was observed in ASD (60%), followed by AK (43.5%), and BC (33%). On the other hand, in LC, the most populated breed in our study, only 11% of the individuals tested carried Mh.

Merle mosaicism
Mosaicism of the Merle gene seems to be quite wide-spread among the Merle breeds. It is characterized by the presence of more than two M Locus alleles in tested samples. We have found that 16.6% of all dogs tested show Merle mosaicism, harboring 3 or more different M locus alleles in the tested sample. The ABI3500 semi-quantitive fragment analysis technology allows us to discriminate between a minor and major fraction of the summary peak signal for a given allele, and to use terms "minor" and "major" allele based on the height of the individual allelic peaks. Square brackets, [], are used to denote the minor (mosaic) allele/s. Some mosaic results may explain an unusual phenotype that does not express as the two major alleles would be expected to. Fig 7 shows three examples documenting this phenomenon.
We have identified shortening of Merle alleles (SINE poly-A tail in the SILV gene) in 27 dogs from our cohort of 181 animals tested (S2 Table). We suggest that mosaicism results from the shorthening of the major longer Merle alleles.
Interestingly, in three examples (AF038, AE943 and AF273) we observed that the longer Merle allele represented the minor allelic population, with the shorter allelic variant being the major allele. Nevertheless, we consider this more of an artifact caused by suboptimal quality of isolated DNA (random PCR bias leading to the amplification of shorter Merle fragment) rather than a real finding.
To test the mode of inheritance of these multiple allelic variants, we have tested STRproven pedigrees and have shown that mosaic alleles presented in the parents can be Interestingly, the Merle allelic status of mosaic animals can differ between biological materials tested, and, as shown in Fig 9, the findings in buccal swab (terminally differentiated epidermal derivative) can dramatically difer from the finding in sperm cells (germinal cell line). In this specific example, the proband, sire AF174, has been routinely tested using buccal swab and found out to carry  Together with the phenomenon of consistent shortening of major Merle alleles in Merle mosaic animals observed in 27 out of 181 animals in our cohort (S2 Table), the finding of semen as a source of a significant pool of minor Merle alleles, has led us to a hypothesis we herein propose-unexpected Merle genotypes found in the offspring, not corresponding to the Merle genotypes obtained on other than germline cells in the parents, can be explained by the hidden pool of Merle alleles in the germline, undetectable in buccal swab or other differentiated tissues, rather than by the original theory of prolongation of shortened Merle alleles.
As discussed above, Merle allelic status might differ dramatically among the biological materials investigated. The  3, and 11% of tested dogs. Moreover, BC, AK, and ASD (and also WS, MAUS, and SSD) also are the breeds where the Mh allele occur more frequently, contrary to LC and D (Fig 6).
The finding of Merle mosaicism can further be supported by significant differences between the occurence of major or minor alleles found in the animals tested (S7 Fig). The major alleles

SuperColorLocus testing and Merle-phenotypic relationships
Using the SuperColorLocus analysis we tested our cohort of dogs also for the major coat color loci (locus E, K, A) and their modifiers (locus B, D, S, H) to explain phenotypic features of the dogs in the study and also to evaluate a possible relation to the phenotype of dogs carrying Merle alleles. The S5 Table (The SuperColorLocus table) shows the coat color genotypes of all dogs in our study; all respective photographs of the animals can be found in Figures A-X in S1 Fig. We also evaluated the influence of S and D loci on the final phenotype of the Merle carrying dogs and found that the S locus does not seem to be a strong modifier of the resulting Merle phenotype. S locus genotype also seems to have minimal, if any, impact on health consequences connected with Merle phenomenon. Genotyping of the other coat color modifying loci-E, K, A, B, D and H has shown an expected correlation with the actual phenotypes of the animals-for comparison of coat color genotypes and phenotypes please refer to S5 Table and Figures A-X in S1 Fig. Interestingly, during the genotyping process, we came across some irregularities, especially concerning the A locus in some Border Collies and Shetland Sheepdogs, where the SINE insertion in the ASIP gene has been regularly missed. Similar phenomenon has been observed for B locus testing in French Bulldogs. To elucidate the genetic background of both recurrent genotyping irregularities, further research is warranted.

Auditory and ophthalmic impairment in the Merle dogs
It has been recognized that homozygous Merle dogs might be at significantly higher risk of the development of hearing of vision impairment as compared to the non-Merle individuals. In our cohort of 181 dogs we have identified 10 dogs with hearing and/or vision disabilities  (Table 2). Interestingly, not only M/M genotypes, but also heterozygous Mh genotypes seem to predispose an individual to hearing and/or vision impairments. Of note, S locus genetic status does not seem to play any significant role here.

Discussion
The Merle coat pattern has long been a fascinating feature for many breeders. It has attracted attention not only because of its visual uniqueness, but also for its negative health consequences, which have been observed rather frequently-vision and hearing impairment of the affected animals. In 2006, Clark et al. [12] mapped the genomic abnormality to the SILV gene and identified a SINE element inserted at the vicinity of exon 11. SINEs (Short Interspersed Elements) are retroviral relics that parasitized the genome of vertebrates millions of years ago ( [20], [21], [22]). SINEs are thought to be a great source of genome evolution, but have also been shown to be involved in many human genetic disorders and cancer, in animals also in peculiar phenotypic features, such as Locus A variants (alleles "a" and ""at") ( [6], [7], [8]). A typical SINE element contains a poly-A tail, which harbors a stretch of homogeneous A nucleotide repetitions. This genomic feature poses a rather challenging situation for the cellular replication machinery and leads to a replication slippage and production of uneven replication products. This eventually leads to the senescence of the parasitic retroelement-shortening of the poly tail under a critical level buries the element at its original site with no transposition activity left. As shown for SILV SINE, shortening of the primary poly-A tail leads also to the atenuation of the local effect of the SINE insertion on the surrounding genomic environment, with visible phenotypic consequences.
It was originally assumed and recently addressed again by Murphy et al ( [23]) that the poly-A tail of SINE inserted in the SILV gene might not only shorten in time, but also extend its length. This information has previously been theorized based only on phenotype and not genotype. We have studied the pedigree of related dogs in our cohort as reported in the Supplementary data (S3 Table) and measured exactly the length of the obligatory alleles (within 1-2 bp size difference) and their passage from parent to the offspring and siblings and cannot confirm this theory. We suggest, for the first time, that parental alleles are conserved in length between generations and it is highly probable that the alleged phenomenon of the poly-A extension does not exist. This finding is of special practical importance for breeders, as animals harboring critically shortened Merle alleles would not pose any danger in terms of health issues to their offspring and might safely be bred with full Merle individuals. Further research focused on precise monitoring of the length dynamics of Merle alleles is needed to clarify the poly-A dynamics of SILV SINE.
Another extremely interesting output of our study is the identification of a high degree of mosaicism in a high proportion of the animals tested. In 30 out of 181 dogs analyzed, more than one allelic type was identified. This finding was made possible by leveraging the high-resolution fragment analysis technology that allowed us to discriminate 7 allelic variants of the Merle gene, 4 of them (Mc+, Ma, Ma+ and Mh) previously unknown, in comparison tothe recent paper by Murphy et al ( [23]), where the authors describe only four Merle phenotypes (cryptic, dilute, standard, harlequin), with their group "cryptic"referring to our Mc, Mc+ and Ma genotypes; their group "dilute"referring to our Ma+; their group "standard"referring to our group M, and their group "harlequin"referring to our group Mh. It is highly probably that the existence of various allelic variants mirrors the complex genomic situation during the early embryonic development, when populations of cells with different SINE allelic versions may arise. Importantly, the Merle genotype pattern, as we have found, may differ among the individual compartments tested. Sperm cells are the most heterogeneous tissue in terms of Merle allelic variability, with the possilibity of many minor Merle alleles idenfied, which can be passed to the next generation in a Mendelian fashion. Buccal swab, generally used for routine genotyping, may not mirror the germline status, be it a terminally differentiated ectodermal tissue. This phenomenon must be taken into consideration especially in those cases, when buccal swab-genotyped animals produce a phenotypically unexpected offspring. In those instances it is strongly recommended to test sperm cells of the sire for possible mosaicism that would explain the unexpected breeding results. For males often used for breeding, we strongly recommend to test their sperm cells first, to reveal Merle genetic background in full context and thus, address the possibility of passing minor Merle alleles to offspring. With females, the situation is complicated by the impossibility to obtain germinal cells. Nevertheless, to reveal possible mosaicism, hair from Merle looking coat areas can be tested in parallel with buccal swab.
As already mentioned, the length of the SINE poly-A tail seems to be crucial as to the biological effects and phenotype correlations (pigment cells ontogenesis). Thus, critically shortened Merle alleles, exhibiting no visual pigment defects, such as Mc, Mc+ or Ma may also be expected to lose the biological effects in terms of hearing or vision impairment development. Our preliminary data support this hypothesis, though more research is needed to elucidate this most important biological aspect of Merle biology. In line with it, SINE insertions with a long poly-A tail, such as M or Mh may pose a greater risk of auditory or ophthalmic irregularities. It has been widely accepted that the most pronounced negative health consequences (both auditory and ophthalmic) are connected with the M/M genotype. Interestingly, our preliminary data show that even a heterozygous combination of m/Mh may exert the same effect.
In our study, we have statistically analyzed the Merle allelic status in 181 dogs including 14 different Merle breeds, with the aim to look for some breed-associated Merle allelic length distribution, and /or mosaicism tendencies. It seems that some Merle breeds might be enriched for some Merle alleles, but our split cohort (14 breeds) is too small to draw any statistically significant conclusion. More subjects have to be tested to clarify this issue.
In conclusion, our study has brought to light novel findings in the field of Merle research and challenged some theorized data, which might exert undesirable effects on breeding strategies in Merle dogs. The results from our study are hoped to be instrumental not only to researchers working in the field of coat color genetics in dogs, but especially for breeders in developing their breeding strategies in Merle breeds-to preserve the health of their animals to the extent the laws of population genetics allow.
Supporting information S1 Table. The Table shows the genotype summary of the so far recognized coat color loci in dog and their modifiers relevant for the study; all photographs of the genotyped animals can be found in Figures A-X in S1 Fig, respectively. (DOCX) S1 File. Genotype/phenotype correlations: Mh. The Mh allele has a broad range of phenotypes with 2 expressions that are very recognizable.