A structural variant in the 5’-flanking region of the TWIST2 gene affects melanocyte development in belted cattle

Belted cattle have a circular belt of unpigmented hair and skin around their midsection. The belt is inherited as a monogenic autosomal dominant trait. We mapped the causative variant to a 37 kb segment on bovine chromosome 3. Whole genome sequence data of 2 belted and 130 control cattle yielded only one private genetic variant in the critical interval in the two belted animals. The belt-associated variant was a copy number variant (CNV) involving the quadruplication of a 6 kb non-coding sequence located approximately 16 kb upstream of the TWIST2 gene. Increased copy numbers at this CNV were strongly associated with the belt phenotype in a cohort of 333 cases and 1322 controls. We hypothesized that the CNV causes aberrant expression of TWIST2 during neural crest development, which might negatively affect melanoblasts. Functional studies showed that ectopic expression of bovine TWIST2 in neural crest in transgenic zebrafish led to a decrease in melanocyte numbers. Our results thus implicate an unsuspected involvement of TWIST2 in regulating pigmentation and reveal a non-coding CNV underlying a captivating Mendelian character.


Introduction
Coat color has been a long-standing model trait for geneticists due to the ease of phenotype recording. Coat color depends on the presence of pigment cells. In mammals, melaninsynthesizing melanocytes, which occupy the skin and hair follicles throughout the entire body surface, are responsible for pigmentation [1]. During embryonic development melanocytes are formed from melanoblasts, which originate in the neural crest and migrate through the developing embryo in order to reach their final position on the body [2]. This developmental program requires a delicate level of regulation to ensure that the correct number of cells reaches their final destination [3][4][5]. An over-proliferation of cells that have left their surrounding tissue leads to naevi, and in rare cases to congenital melanomas which lead to a poor prognosis [6][7][8]. In contrast, if too few melanoblasts are produced or survive, this will lead to partially or completely unpigmented phenotypes, including so-called "white-spotting" phenotypes characterized by patches of unpigmented skin and/or hair [9]. Domestic animals with such hypopigmented phenotypes have been highly valued due to their striking appearance and have often been actively selected in animal breeding. Modern domestic animals thus constitute a unique reservoir of spontaneous pigmentation mutants. Their favorable population structure facilitates the discovery of functional genetic variation including some interesting non-coding structural variants with regulatory effects [10][11][12][13][14][15].
A symmetrical belt of unpigmented skin circling the midsection has been observed in several mammalian species. It is thought that these belted phenotypes are due to downregulated melanoblast formation or early melanoblast losses in neural crest development. Belted mice have sequence variants in the Adamts20 gene encoding a secreted metalloprotease [16], which was shown to be required for melanoblast survival [17]. A complex structural rearrangement involving duplication of the KIT gene was identified in belted pigs, whose belt includes the forelimbs and is localized more cranially than the one in Adamts20 mutant mice [18,19]. 5 Belted cattle exist in at least three different breeds: Brown Swiss, (Belted) Galloway, and Lakenvelder, which are also known as Dutch Belted. The belt in these cattle breeds has an intermediate position; it is more caudal than the belt in pigs and more cranial than the belt in Adamts20 mutant mice ( Figure 1). The belt in cattle is inherited as a monogenic autosomal dominant trait and linked to the telomeric end of chromosome 3, which does not contain any known coat color gene [20,21]. The objective of this study was to identify the causative genetic variant and to provide a hypothesis on the functional mechanism leading to the partial lack of melanocytes in belted cattle.

Positional cloning of the belt sequence variant reveals a CNV upstream of TWIST2
We had previously shown that the belt locus maps to a 336 kb interval at the telomeric end of chromosome 3 and that belted animals from the Brown Swiss, Belted Galloway and Lakenvelder breeds share the same haplotype in the critical interval [21]. In order to further refine this interval we now genotyped 116 belted cattle on the Illumina bovine high density beadchip, phased the genotype data and searched for a shared haplotype. All belted animals had at least one copy of a shared haplotype spanning 22 consecutive markers (Table S1).
Thus, the refined boundaries of the new critical interval are defined by the flanking markers on either side of this shared haplotype block. The refined critical interval consisted of only 54 kb and spanned from position 118,578,893 to 118,633,153 on chromosome 3 ( Figure 2A).
For the identification of potentially causative genetic variants, we sequenced the genomes of two homozygous belted cattle and compared the data to 130 genomes from control cattle. A standard bioinformatic pipeline for the detection of single nucleotide variants or small indels did not reveal any private homozygous variants in the two cases in the critical interval. We 6 therefore visually inspected the short-read sequencing data in the critical interval and identified an amplification of a 6 kb sequence region ( Figure 2B,C). Both belted animals had a ~4-fold increased read coverage with respect to control animals. The 6 kb CNV was located approximately 16 kb upstream of the TWIST2 gene. We did not detect any other structural variant in the critical interval.
We designed a digital droplet PCR (ddPCR) assay to determine the copy number of the CNV in a large cohort of 239 belted cattle and 1303 non-belted cattle. The observed copy numbers varied between 2 and 12 with the most frequently observed values being 2, 5 and 8, which most likely corresponded to genotypes 1/1, 1/4 and 4/4. The belt phenotype was strongly associated with an increased copy number at the CNV (p = 1.3 x 10 -278 , Fisher's exact test; Table 1).

Does the CNV affect TWIST2 expression?
TWIST2 is normally expressed in the skin and craniofacial cartilages, but not in the neural crest [22,23]. We performed an RNA-seq experiment to quantify TWIST2 RNA expression in adult skin from belted and non-belted cattle, but found no significant difference (Table S2).
The strictly dominant mode of inheritance of the belt suggests a gain of function of the bt allele.
Given that the boundaries of the unpigmented area in belted animals are very even and mostly bilaterally symmetrical, this gain of function might supposedly take place during early melanoblast development. We hypothesized that increased copy numbers at the CNV might lead to an ectopic expression of TWIST2 during neural crest development. Accordingly, ectopically expressed TWIST2 might decrease the number of melanoblasts and thus lead to the unpigmented skin area of the belt. Unfortunately, it is currently not possible to test this hypothesis by directly quantifying TWIST2 expression in the developing neural crest of bovine embryos.

Functional analysis of TWIST2 overexpression
As bovine embryos are not readily accessible for experimental research, we investigated the effect of TWIST2 overexpression in transgenic zebrafish embryos. We prepared expression constructs with a zebrafish mitfa promoter fragment that has been shown to drive expression in premigratory neural crest cells and differentiating melanocytes [24,25]. The control construct pmitfa_EGFP contained the coding sequence of enhanced green fluorescent protein (EGFP) as a reporter under the control of the mitfa promoter. Additionally we prepared a second construct pmitfa_btaTWIST2_EGFP, which additionally contained the coding sequence for bovine TWIST2 separated from the EGFP coding region by an internal ribosome entry site (IRES) so that TWIST2 overexpressing cells were expected to be distinguishable in transient transgenic embryos by their EGFP expression.
We injected both constructs into fertilized zebrafish eggs and analyzed the expression of EGFP at 35 hours post fertilization (hpf) in four replicate experiments. About 80% of more than 100 surviving zebrafish embryos injected with the control construct pmitfa_EGFP showed several 8 EGFP-positive cells in a pattern resembling developing melanoblasts ( Figure 3A,C,D). In contrast, we never observed any EGFP-positive cell in the zebrafish embryos injected with pmitfa_btaTWIST2_EGFP ( Figure 3B). For maximum sensitivity we confirmed these results by immunostaining the transgenic zebrafish embryos with an anti-GFP antibody. We again detected EGFP expression only in the control embryos, but not in embryos that were injected with pmitfa_btaTWIST2_EGFP ( Figure 3E-J). We additionally stained for the expression of sox10 to investigate whether there were any gross changes to neural crest development. The comparison of sox10 expression patterns did not indicate any major differences between the control embryos and the embryos overexpressing TWIST2 (data not shown).
We hypothesized that the lack of visible EGFP expression in the zebrafish injected with pmitfa_btaTWIST2_EGFP was due to a negative regulatory effect of bovine TWIST2 on melanoblasts. Our experimental strategy was expected to lead to mosaic animals, where only a fraction of the cells integrated the expression casettes into the genome. In an additional experiment we quantified the number of visible melanocytes in TWIST2 overexpressing versus control embryos. We observed a significant reduction of melanocytes in the head region (p < 0.001). In the trunk we also consistently observed lower mean melanocyte numbers in the TWIST2 transgenic embryos. However, as the variance of melanocyte numbers in the trunk was much bigger than in the head region, this difference was significant only in one out of three replicate epxeriments ( Figure 4).

Discussion
In this study, we identified the amplification of a 6 kb genomic sequence in the 5'-flanking region of the TWIST2 gene as most likely causative variant for the belted phenotype in cattle.
The haplotype analysis greatly refined the critical interval to only 54 kb. The small size of the shared haplotype is consistent with a relatively old origin of the bt allele, having arisen before the separation of modern cattle breeds. A 15 th century painting from Austria already depicts a 9 belted ox suggesting that this allele is at least 600 years or roughly 200 generations old ( Figure S1). The haplotypes in belted Brown Swiss cattle were more diverse than in dysplasias with complex facial malformations [28][29][30]. In one Setleis patient hypo-and hyperpigmentation of the skin was reported [31]. Loss of Twist2 function in mice leads to runting, cachexia (adipose and glycogen deficiency), and thin skin with sparse hair [32].
Additional findings were corneal thinning and a reduced population of stromal keratocytes, which are derived from the neural crest [33]. repress the transcription of genes essential for melanoblast fate determination, such as e.g.

MITF.
As a first test of these hypotheses we asked whether overexpressing bovine TWIST2 in zebrafish embryos had an effect on melanocyte development. The TWIST2 protein is highly conserved among vertebrates. Bovine TWIST2 (160 aa) is 100% identical with human TWIST2 and it has 85% identity with zebrafish twist2 (163 aa). We thus hypothesized that overexpression of bovine TWIST2 in the premigratory neural crest of zebrafish might cause a reduction in the number of neural-crest derived melanocytes. Our data indeed indicated that overexpression of TWIST2 led to a decrease in melanocyte numbers in transgenic zebrafish.
We were unable to detect EGFP-expressing cells in our TWIST2-overexpressing embryos,

Ethics statement
All animal experiments were performed according to the local regulations.

Cattle samples
We isolated genomic DNA from hair roots or EDTA blood samples from 116 belted animals (6 belted Brown Swiss, 72 Belted Galloway, 38 Lakenvelder) and 264 non-belted controls from different breeds for SNP chip genotyping and the fine-mapping experiment. For the subsequent association study we recruited more than 1,000 additional DNA samples from the routine parentage testing performed at the University of Göttingen. The phenotype of the additional animals was solely based upon owners' reports. During the investigation we excluded several animals that were reported as belted by the owners, but had only minimal white spots that did not resemble the typical belt phenotype. We also excluded several animals that were reported as non-belted by the owners, but turned out be White Galloways, on which a belt would not have been visible [34]. The final cohort thus consisted of 239 belted animals and 1,303 non-belted cattle. In this cohort, we had 2 supposedly belted and 2 supposedly nonbelted animals with discordant genotypes (Table 1). We were not able to verify the phenotypes of these four questionable animals as either the owners could not be reached or the animals had been slaughtered several years before the investigation and no photos or other information was available.

SNP chip genotyping and haplotype analyses
Genotyping for the fine-mapping experiment was done on Illumina bovine HD beadchips containing 777,962 SNPs by GeneSeek/Neogen. Genotypes were stored in a BC/Gene database version 3.5 (BC/Platforms) and phased with SHAPEIT [35]. The haplotypes for chromosome 3 were exported into an Excel-file and visually inspected for shared segments among the 116 cases.

Whole genome sequencing
We prepared a fragment library with approximately 300 bp insert size from a homozygous belted Brown Swiss cow (GU21) and a homozygous Belted Galloway bull (BG011) and collected 2 x100 bp paired-end reads on an illumina HiSeq2000 instrument. We obtained 16 [38] together with the UMD 3.1 annotation (ENSEMBL release 79) was used to predict the functional effects of detected variants.

Droplet digital PCR (ddPCR)
To determine the copy number of the CNV on chromosome 3 we designed a ddPCR assay.
The TruSeq stranded mRNA library prep kit (Illumina) was used to prepare libraries, which were sequenced on the Illumina HiSeq 3000 platform using 2 x 150 bp paired-end sequencing cycles. The reads in FASTQ format were processed for quality checks using the FastQC tool (v0.10.1; http://www.bioinformatics.babraham.ac.uk). Good quality reads were then mapped to the cattle reference genome UMD 3.1 using the STAR Aligner [39]. We allowed for a maximum of four mismatches per read pair and a maximum intron length of 100 kb. We used htseq-count from the HTSeq package to count reads against annotated Ensembl genes models (version 79) Differential expression was carried out on raw read counts for genes using edgeR [40].

Recombinant constructs for the expression of bovine TWIST2
An 4.1 kb mitfa promoter fragment, the coding sequences for EGFP and bovine TWIST2, and the IRES casette were amplified separately and engineered into a vector containing Tol2 sites (pTKXIGΔin; Kawakami Laboratory) using the Gibson Assembly® Cloning kit from NEB. We thus generated the plasmids pmitfa_EGFP (control construct without TWIST2 expression cassette) and pmitfa_btaTWIST2_EGFP. A detailed map of these plasmids is given in Figures   S2 and S3. Plasmid DNA was purified using the Qiagen HiSpeed Maxiprep kit and the correct sequence was verified by Sanger sequencing.

Melanocyte imaging and counting
At approximately 24 hpf, the dishes were cleaned and the medium was changed. The embryos were grown at 28°C and screened at about 35 hpf. Both the controls and TWIST2 expressing embryos were carefully dechorionated, put in tricaine treated fish water (final tricaine concentration 160 mg/l) and used for melanocyte counting. For counting purposes, the embryos were imaged using bright field technique under an SMZ25 fluorescent stereoscope (Nikon). The embryos were imaged lying laterally with their heads facing to the right using methyl cellulose to prevent them from drying. The images were coded and melanocytes were counted by an investigator who was blinded to the information whether embryos were injected with the control construct pmitfa_EGFP or with pmitfa_btaTWIST2_EGFP.

Antibody labelling and fluorescent microscopy
After counting melanocytes, the embryos were fixed with 2% PFA diluted in PBS, overnight at 4°C and were processed for antibody labelling as described previously [42]. The anti-GFP antibody was obtained from Aves Labs Inc., the anti-Sox10 antibody was obtained from Gene tex. Anti-GFP and anti-Sox10 were used in a dilution of 1:500. Secondary antibodies were antichicken coupled to Alexa Fluor® 568 (1:500; ThermoScientific) and anti-rabbit coupled to Alexa Fluor® 647 (1:250; Invitrogen). Nuclei were counterstained with 4c,6-diamidino-2phenylindole dihydrochlorid (DAPI; Invitrogen). After antibody labelling, the embryos were imaged under an AxioLSM880 confocal microscope (Zeiss). Whole mount embryos were imaged under 10x, the images were processed using ImageJ and Adobe Illustrator.

Sequence data accession numbers
Whole genome sequencing data and RNA-seq data from this study have been submitted to the European Nucleotide Archive under accessions PRJEB14604 and PRJEB14606.     Table S1. Fine-mapping of the belt locus. Table S2. RNA-seq experiment.