A Genomic Duplication is Associated with Ectopic Eomesodermin Expression in the Embryonic Chicken Comb and Two Duplex-comb Phenotypes

Duplex-comb (D) is one of three major loci affecting comb morphology in the domestic chicken. Here we show that the two Duplex-comb alleles, V-shaped (D*V) and Buttercup (D*C), are both associated with a 20 Kb tandem duplication containing several conserved putative regulatory elements located 200 Kb upstream of the eomesodermin gene (EOMES). EOMES is a T-box transcription factor that is involved in mesoderm specification during gastrulation. In D*V and D*C chicken embryos we find that EOMES is ectopically expressed in the ectoderm of the comb-developing region as compared to wild-type embryos. The confinement of the ectopic expression of EOMES to the ectoderm is in stark contrast to the causal mechanisms underlying the two other major comb loci in the chicken (Rose-comb and Pea-comb) in which the transcription factors MNR2 and SOX5 are ectopically expressed strictly in the mesenchyme. Interestingly, the causal mutations of all three major comb loci in the chicken are now known to be composed of large-scale structural genomic variants that each result in ectopic expression of transcription factors. The Duplex-comb locus also illustrates the evolution of alleles in domestic animals, which means that alleles evolve by the accumulation of two or more consecutive mutations affecting the phenotype. We do not yet know whether the V-shaped or Buttercup allele correspond to the second mutation that occurred on the haplotype of the original duplication event.


Introduction
In the domestic chicken (Gallus domesticus) the comb serves as a sexual ornament and the size of the comb is associated with mate choice in both sexes as well as fecundity in females [1,2]. The vast majority of chicken populations used for commercial meat and egg production around the world are fixed for the wild-type single comb phenotype. However, there are three major comb loci found in non-commercial chicken breeds which are primarily used for exhibition purposes; Rose-comb, Pea-comb and Duplex-comb. The causal mutations for Rose-comb and Pea-comb have recently been identified, both corresponding to structural genomic variants that drive ectopic expression of transcription factors in the developing comb region of the chicken embryo [3][4][5].
The Duplex-comb locus harbors three alleles, Buttercup (D Ã C), V-shaped (D Ã V) and wildtype or normal (D Ã N). Chickens that are wild-type at the Rose-comb, Pea-comb and Duplexcomb loci have the single comb phenotype. D Ã C corresponds to a cup shaped comb arising from a single central blade ringed with individual points along the perimeter of the cup (Fig. 1A). This phenotype is somewhat rare, being found in the Sicilian Buttercup, Caumont and Augsburger chicken breeds. Chickens that appear to have the Buttercup comb phenotype were described by the naturalist Ulisse Aldrovandi in 1600 [6,7] and are thought to originate from North Africa, possibly being the progenitors of the Sicilian Buttercup breed. D Ã V corresponds to a two-pronged horn or V-shaped comb that is restricted to the posterior portion of the comb developing region (Fig. 1A). The V-shaped comb is found in many breeds from around the world such as the Crevecoeur, Houdan, La Fleche, Merlerault, Padova, Polish, Spitzhauben and Sultan. Both comb types can vary slightly in shape and size between breeds and strains due to differences in genetic background, with D Ã C occasionally resembling two distinct single (wild-type) combs split down the midline. Previous experiments have demonstrated that the V-shaped and Buttercup phenotypes are inherited as determined by alleles at a single locus [8]. In these experiments D Ã V was completely dominant over D Ã C. Both mutant alleles were incompletely dominant over D Ã N. D Ã V was completely penetrant in both sexes while D Ã C was incompletely penetrant (68%) in females [8]. From these experiments it was suggested that D Ã C represents a comb doubling effect while D Ã V causes doubling and reduction of comb size.
Here we show that both Duplex-comb phenotypes are associated with a 20 Kb tandem duplication and ectopic expression of EOMES, a transcription factor with a known role in mesoderm specification in the developing embryo [9].

Identification of the genomic region associated with Duplex-comb
We previously mapped D Ã V to the 37.3-39.8 Mb region of Gallus gallus autosome 2 (GGA2) in a backcross population [10]. Subsequent fine mapping with additional markers identified a Phenotypes associated with the Duplex-comb locus V-shaped (D*V), Buttercup (D*C) and wild-type (D*N) single comb alleles. (B) A 381 Kb haplotype is IBD in D*V individuals, SNP names in the region marked in red. Yellow shading represents heterozygous genotypes with blue/orange representing reciprocal homozygous genotypes. Genotype data is from the 60K SNP chip and color coded according to genotype with missing data shown in black. A single SNP GGaluGA142157 at 38,806,246 bp (marked in yellow) is heterozygous in all D*V individuals, suggestive of a duplication. A broader view of haplotypes in more individuals is available in S1 Fig. (C) A TaqMan assay was used to investigate the genomic copy number of the putative duplicated region, showing that a duplication is present in both D*V and D*C individuals. Each bar represents a single individual. Error bars represent the minimum and maximum estimated copy number as calculated from technical replicates of each sample by Copy Caller software (ABI). (D) The Duplex-comb alleles are associated with a tandem duplication with no sequence homology at the junction point. The middle line shows the sequence at the junction point with vertical dashes indicating sequence alignment to the 5' (bottom line) and 3' (top line) boundaries of the duplicated region. A LTR element is shaded in gray.
region of maximum association with D Ã V as between markers rs15086167 and rs14167302, corresponding to GGA2:38,554,221-39,229,442 bp.
Genotyping of a diverse breed panel (D Ã V, n = 7; D Ã N, n = 64) on the 60K Chicken iSelect chip [11] (Illumina) revealed an identical by descent (IBD) haplotype located between markers  rs15086146 and rs15086500, corresponding to GGA2:38,528,939-38,910,305 bp and consistent with other reports [12]. A single SNP within the IBD haplotype was observed to be heterozygous in all D Ã V individuals, (GGaluGA142157 at 38,806,246 bp), suggestive of a duplication fixed for alternative SNP variant alleles (Figs. 1B and S1).
Copy number of the IBD region was explored using SYBR Green qPCR analysis with genomic DNA as template. Iterative rounds of qPCR analysis of copy number analysis ultimately defined the approximate boundaries of a putative 2-fold duplicated region in D Ã V individuals. Genomic copy number analysis using a TaqMan assay was in agreement with the SYBR Green assays and confirmed the presence of a duplication in both D Ã V and D Ã C individuals as compared to D Ã N (Fig. 1C).
Successful amplification and sequencing across the duplication junction point revealed this to be a tandem duplication of a *20 Kb segment spanning from 38,798,537 to 38,818,314 bp. Analysis of the duplication junction sequences revealed only a single base pair of overlap and no other sequence micro-homologies (Fig. 1D). The 38,818,314 bp duplication junction point is within a LTR element as annotated by the UCSC Genome Browser.
A diagnostic PCR test was designed to amplify across the duplication junction point and used to screen a diverse breed panel representing the three known alleles at the Duplex-comb locus. All D Ã N homozygotes (n = 44) were wild-type for the duplication junction point while all V-shaped (n = 48) and Buttercup (n = 35) individuals had at least one copy of the duplication (Table 1). This PCR test cannot distinguish between heterozygotes and homozygotes for the duplication as no wild-type sequences are disrupted in this tandem duplication.
Whole genome sequencing for the characterization of the D*V and D*C haplotypes Three chickens, each representing one of the three alleles at the Duplex-comb locus, were selected for whole genome sequencing to search for mutations other than the 20 Kb duplication that could be responsible for the difference between the V-shaped and Buttercup comb phenotypes. The average depth of sequence coverage for each bird was in the range 24x to 34x, which gives a high power for SNP detection at most sites. The D Ã V (White Crested Black Polish, USA) and D Ã C (Sicilian Buttercup, Italy) individuals were from breeds with standardized V-shaped and Buttercup comb phenotypes and were tested with the TaqMan copy number assay to verify homozygosity before whole genome sequencing. The D Ã N (Single Comb Dark Brown Leghorn, USA) individual was selected due to sharing an identical haplotype as the D Ã V individual based on the 60K SNP chip genotype data except for the D Ã V associated heterozygous SNP and 20 Kb duplication.
The largest region for which D Ã C and D Ã V individuals were IBD was 89 Kb in size (38,738,016-38,827,468 bp) which includes the entire 20 Kb duplicated region ( Fig. 2A, IBD_reseq track). We identified 6 and 17 paired-end reads that spanned the duplication junctions in D Ã V and D Ã C individuals, again confirming the exact duplication breakpoints. We then used the sequencing data to explore if there were any other sequence variants that showed a perfect concordance with D Ã V and D Ã C like the 20 Kb duplication. Stringent SNP calling revealed only one high-quality SNP, at position 38,797,948 bp, within the IBD region that showed this pattern and that were not found in other chicken populations with the single comb phenotype [13]. This SNP did not occur at an evolutionary conserved site.
To identify one or more mutations that distinguish the two mutant alleles we first searched for SNPs within the duplicated region. There were 182 SNPs detected between all three sequenced individuals, with 181 SNPs having identical genotypes in D Ã C and D Ã V individuals (Fig. 2B). The one remaining SNP at 38,808,838 bp was heterozygous G/A in the D Ã V individual and homozygous reference (G) in the D Ã C and D Ã N individuals chosen for sequencing. Further screening showed that this SNP was homozygous reference (G) in 23 of 32 additional D Ã V individuals from four different breeds and was never found to be homozygous for the mutant allele. This indicates that the 38,808,838 bp SNP is not causally associated with D Ã V, but instead has evolved in some D Ã V populations in one of the two copies of the 20 Kb duplicated region. There were no other high-quality SNPs within the 89 Kb IBD region identified from the  sequencing data for which the V-shaped and Buttercup individuals were homozygous for alternative alleles. There were 66 SNPs within the duplicated region that were called as heterozygous in both the D Ã C and D Ã V individuals, indicative of the two copies of the duplicated region composing a single haplotype, each copy carrying different sequence variants at 66 SNP positions. Thus, the nucleotide divergence between the two copies is about 0.3%, i.e. three times higher than the average nucleotide diversity in the human genome and close to the average nucleotide diversity of 0.5% in the chicken genome [14]. This implies a scenario where two different haplotypes contributed to the tandem duplication and the majority of the sequence differences are expected to represent sequence differences between the two ancestral haplotypes. This interpretation is consistent with the observation that most SNPs showing sequence differences between the two copies, such as GGaluGA142157 at 38,806,246 bp, also segregated among wild-type chromosomes (Fig. 1B).
There are several regions within the 20 Kb duplication that exhibit elevated conservation scores according to the UCSC genome browser and Genomic Evolutionary Rate Profiling (GERP) [15] (Fig. 2B), representing putative regulatory elements.

Morphology
The chicken comb originates from a region on the upper beak, posterior to the fronto-nasal facial process and is first visible as a narrow midline ridge, at embryonic day 6-7 (E6-7). The wild-type single comb has one row of papillae that are formed from local mesenchyme condensations along the initial comb-ridge and they present the beginnings of the comb serrations Expression analysis reveals ectopic expression of EOMES in comb tissue from Duplex-combed individuals The expression pattern of candidate genes located in the proximity of the 20 Kb duplication was investigated using quantitative reverse transcription-PCR (qRT-PCR) in samples of comb tissue from developing chicken embryos. The duplication is located within an intron of CMC1 encoding COX assembly mitochondrial protein homolog (S. cerevisiae) as assessed by aligning the predicted CMC1 sequence (XM_418758.4) to the chicken genome via BLAT in the UCSC genome browser. 5-azacytidine induced 2 (AZI2) is the nearest gene on the 3' side of the duplication and eomesodermin (EOMES) is the nearest gene on the 5' side. CMC1 and AZI2 were both expressed in comb tissue but did not show any significant difference in expression between genotypes. In contrast, EOMES showed a dramatic expression difference between genotypes and was more highly expressed at E8, E9 and E12 in D Ã V embryonic comb as compared to D Ã N, while it was not expressed in any genotype at E18 (Fig. 3).
The spatial distribution of EOMES expression in the developing comb region of the chicken embryo was investigated using immunohistochemistry. In D Ã N embryos no EOMES expression was detected from E7-E18 in the ectoderm or mesenchyme of the comb region. Both D Ã V and D Ã C embryos showed clear expression of EOMES in the ectoderm of the comb region already at E7 and continuing through E12/E15 (Fig. 4E-G, I-K). The expression of EOMES was limited to the ectoderm of the developing comb region at all stages analyzed in D Ã V and D Ã C embryos and could not be detected by E18 (Fig. 4H).

Discussion
The Duplex-comb locus was originally described as having two mutant alleles [8] and being linked to the polydactyly locus [16], which was subsequently mapped to GGA2 [10,[17][18][19]. Through successive rounds of linkage mapping and IBD haplotype analysis using different chicken breeds we have identified an 89 Kb region of GGA2 as encompassing the Duplex-comb locus. This region contains a 20 Kb tandem duplication that is only found in chicken breeds that have a Duplex-comb phenotype when screened on a diverse breed panel. Sequence analysis of the duplicated region identified only a single base pair difference within the 20 Kb duplication between the two mutant alleles, however this variant was subsequently excluded as the causal difference between D Ã C and D Ã V alleles after finding many D Ã V individuals that were homozygous reference. The 20 Kb duplication contains several putative conserved regulatory elements (Fig. 2B) that is likely driving the ectopic expression of the downstream transcription factor EOMES in the developing chicken comb region in Duplex comb individuals.
The phenotypic diversity of the chicken comb is primarily governed by a small number of loci with large effects that determine the overall morphology of the comb during embryonic development; the Rose-comb, Pea-comb and Duplex-comb loci. The Rose-comb and Pea-comb loci are notable in being the first example of classical genetic epistasis, giving rise to the Walnut comb phenotype when mutant alleles are present at both loci [20]. The Pea-comb mutant allele has recently been described as corresponding to a copy number expansion in an intron of SOX5, resulting in ectopic expression of this transcription factor in the mesenchyme of the developing comb region of the chicken embryo [5]. The Rose-comb mutant allele was also recently characterized, corresponding to a 7 Mb inversion that leads to ectopic expression of the transcription factor MNR2 in the mesenchyme of the developing comb region of the chicken embryo [4]. This overlapping spatial and temporal domain of ectopic expression of SOX5 and MNR2 is a clear demonstration of how the epistasis between Rose-comb and Pea-comb loci is derived at the cellular level through the combined action of two transcription factors [4]. Here we show that the last major comb locus in the chicken to be characterized at the molecular level also corresponds to a structural variant in the chicken genome that results in ectopic expression of a transcription factor. However, while Rose-comb and Pea-comb phenotypes are driven by ectopic expression of SOX5 and MNR2 in the mesenchyme of the developing comb region, we show that the Duplex-comb phenotype is mediated by ectopic expression of EOMES confined to the ectoderm.
Eomesodermin (EOMES) is a T-box transcription factor that is involved in mesoderm specification during gastrulation as shown in zebrafish, chicken and mouse. EOMES is expressed in the extraembryonic tissues of the chicken and the mouse as well as the primitive streak, forebrain region and genital ridge [9,21]. However, expression of EOMES in primordial germ cells is only found in the chicken [21]. Investigation of four upstream and one downstream putative cis-regulatory element (CRE) of mouse EOMES indicated that different regulatory mechanisms between mouse and chicken were likely responsible for EOMES expression in extraembryonic tissues while a single CRE located *150 Kb upstream drove expression in the brain in both chicken and mouse [21]. The 20 Kb duplication overlaps several regions of elevated sequence conservation and lies approximately 200 Kb upstream of EOMES in the chicken, suggesting that the duplicated region contains CREs and that an altered dosage of these elements causes perturbed regulation of EOMES expression.
Using qRT-PCR we show that EOMES is upregulated in the comb region of D Ã V embryos as early as embryonic day 8 as compared to D Ã N embryos. There was no difference in expression of CMC1 and AZI2 (the two other genes located nearest the 20 Kb duplication) between D Ã V and D Ã N embryos, suggesting that this mutation involves a CRE specific to EOMES, at least in comb tissue. Using IHC we show that EOMES is ectopically expressed in the ectoderm of the developing comb region of D Ã C and D Ã V embryos. There was no detectable EOMES expression in this region of the D Ã N embryo at these stages, suggesting that EOMES does not normally play a role in comb development.
The major comb phenotypes are all caused by mutations that direct expression of transcription factors to the ectoderm or mesenchyme of the comb ridge. The development of the comb as part the chicken naso-facial processes is directly induced and regulated by reciprocal ectomesenchymal interactions [22]. Interactions of ectopically expressed transcription factors either in the ectoderm or mesenchyme then cause the similar but not identical comb phenotypes. The exact regional and temporal expression of the inductive signals or their receptors is instrumental for the morphogenesis [3]. The D Ã C phenotype is characterized as a splitting of the comb mass while the D Ã V phenotype involves both splitting and reduction of comb mass as well as enlargement of the nostrils [6]. We propose that the initial duplication event is the primary driver of ectopic EOMES expression in the ectoderm of the comb region and causes the majority of the comb duplication phenotype. A subsequent and unknown mutation is suspected of further modifying the spatial or temporal expression of EOMES to result in two different Duplex-comb phenotypes, but does so in a manner that escapes the resolution of our IHC experiments.
The mutation that distinguishes the two mutant alleles should be found in a D Ã C or D Ã V IBD region. Our initial genotyping data identified a 381 Kb IBD haplotype in D Ã V individuals (Fig. 1B), however it is uncertain which mutant allele evolved first and we lack similar data for D Ã C individuals. We restricted our search for a causal mutation to the 89 Kb IBD region since this study shows that this region contains regulatory elements that affect EOMES expression during comb development, but we found no high-quality SNP where the sequenced D Ã V/D Ã V and D Ã C/D Ã C birds were homozygous for different alleles. A causal mutation could have been overlooked due to a gap or lack of adequate coverage in the sequence data although we had on average high sequence coverage (in the range 24x to 34x); the current assembly of the 89 Kb region contains one gap annotated as comprising about 750 nucleotides. Although prior experiments [8] found that D Ã C and D Ã V were alleles of the same locus, it remains unknown how close the mutation differentiating these two alleles lies to the 20 Kb duplication. At present we cannot exclude the possibility that the causal difference between D Ã V and D Ã C could be affecting a nearby gene other than EOMES.
A common feature of duplicated sequences is that they show copy number variation because nearly identical tandem copies are prone to unequal crossing-over or slippage during replication [23]. We did not detect any such copy number variation and all D Ã C and D Ã V chromosomes analyzed in this study appeared to contain only two copies of the duplicated sequence. Furthermore, the two copies of the duplicated sequence showed a 0.3% sequence divergence and were identical between V-shaped and Buttercup chromosomes (except at the SNP distinguishing D Ã V and D Ã C sequenced individuals). This implies that the sequence divergence between the two copies is sufficient to suppress unequal crossing-over that may otherwise lead to copy number variation and gene conversion, resulting in homogenization of the tandem copies.
The investigation of genetic mechanisms underlying phenotypic diversity in domestic animals has revealed that structural variation plays a significant role, typically affecting spatiotemporal gene expression patterns through rearrangement of regulatory elements [24]. Examples of such traits are Pea-comb [5], Rose-comb [4], Fibromelanosis [25] and Dark Brown plumage [26] in the chicken; Dominant White in the pig [27]; Greying with age in the horse [28] and Color Sidedness [29] and Polled in cattle [30][31][32]. Here we add the Duplex-comb locus to this list, highlighting how large-scale genomic mutations appear to often result in very noticeable phenotypic effects that are then easily selected by humans during animal domestication and breeding. The Duplex-comb trait also illustrates another striking feature of genetic diversity in domestic animals, the evolution of alleles [24]. The evolutionary history of domestic animals is sufficiently long to allow the accumulation of two or more causative mutations on the same haplotype. This is the case for instance with Dominant white color in pigs [27], Black spotting in pigs [33] and Rose-comb in chickens [4]. The Duplex-comb locus can now be added to this growing list of examples since the Buttercup and V-shaped alleles share an 89 Kb IBD region including the 20 Kb duplication but differ at a yet unknown position. This illustrates why domestic animals present a valuable model to study the genetic mechanisms and processes that likely underlie phenotypic traits in humans and other species.

Fine mapping and IBD analysis
A custom GoldenGate BeadXpress panel (Illumina) containing 28 SNPs on GGA2 was used to fine map the D Ã V mutation in the same backcross population we previously reported [10]. The 60K Chicken iSelect chip [11] (Illumina) was used to genotype a diverse panel of chicken breeds for IBD haplotype analysis. All genome coordinates are relative to the May 2006 WUGSC 2.1/galGal3 assembly.
Genomic copy number analysis SYBR Green assays for genomic copy number were performed using SYBR Green PCR Master Mix (ABI) with 800 nM of each primer and 10 ng of DNA in a total volume of 10 μl. Reactions were performed in quadruplicate and data was analyzed using the 2-ΔΔCt method [34], correcting for amplification efficiency as measured by a standard dilution series. TaqMan assays and data analysis for genomic copy number were performed as previously described [25]. A primer/probe set in an exon of SOX5 was used as a calibrator for both SYBR Green and Taq-Man assays. All primer sequences can be found in S1 Table. Diagnostic PCR test for duplication junction point A three primer PCR diagnostic test was developed that amplified over the duplication junction point as well as amplifying a product over one of the duplicated region wild-type sequences. Whole genome sequencing DNA was prepared from blood samples of single individuals representing the D Ã V (White Crested Black Polish, USA), D Ã C (Sicilian Buttercup, France) and D Ã N (Single Comb Dark Brown Leghorn, USA) alleles. The DNA was used to construct paired-end libraries with average insert size of approximately 220 bp and these libraries were subjected to whole genome sequencing using a HiSeq sequencing instrument (Illumina). Sequencing reads (2 x 100bp) were aligned to the chicken reference genome (galgal3) using the Burrows Wheeler Aligner (BWA) [35], revealing average depths of coverage of 24, 25 and 34 for D Ã V, D Ã N and D Ã C, respectively. The aligned reads were subjected to duplicate flagging using Picard Tools (http://picard. sourceforge.net) and to SNP calling using the Genome Analysis Toolkit (GATK) Unified Genotyper version 2.4.9 [36]. Identified raw SNPs were filtered based on GATK best practice variant detection and genotypes with a PHRED genotype quality ! 20 were used in subsequent steps. SNP-and genotype calls were compared to SNPs detected in DNA pools from wild-and domestic chickens in a previously published study [13].

Gene expression analysis
Total RNA was extracted from comb tissue from E8, E9, E12 and E18 D Ã V (Merlerault, France) and D Ã N (Geline de Touraine, France) chicken embryos using TRIzol (Invitrogen). RNA was treated with DNase (1 μg/μl) and cDNA was made from 1 μg of RNA using High Capacity RNA-to-cDNA Kit (ABI). The qRT-PCR analysis was performed using CFX96 SyBr Green Supermix (Bio-Rad) with primers designed by using Primer Express v2.0 (ABI), checked for PCR efficiency, linear dynamic range and specificity. The mRNA levels were normalized to βactin mRNA levels. The use of β-actin for normalization purposes was validated by testing for the most stable mRNA expression of TATA box binding protein, β-actin, ß-2-microglobulin and glyceraldehyde 3-phosphate dehydrogenase over the developmental stages using geNorm [37]. Expression levels were calculated from cycle threshold (Ct) and the 2-ΔΔCt method [34]. The normalized amplification levels of Duplex-comb and single-comb samples relative to the ß-actin amplification levels are shown, and differences were tested by using one-way analysis of variance (ANOVA) followed by Tukey's range test as indicated in figure legend.

Immunohistochemistry
Chicken embryo heads from D Ã V (Merlerault, France), D Ã C (Caumont, France), and D Ã N (Geline de Touraine, France) breeds were fixed in 4% paraformaldehyde, pH 7.4 in PBS for one hour at 4°C, transferred to 30% sucrose in PBS overnight at 4°C, frozen in OCT freezing medium and sectioned 10 μm with a cryostat. The sections were washed in PBS and used for immunohistochemistry. Sections were blocked (PBS with 1% fetal calf serum, 0.1% Triton-X, 0.02% Thimerosal) before addition of primary antibodies in blocking solution and incubated overnight at 4°C. The slides were washed three times for 5 min in PBS before incubation secondary antibodies in blocking solution in room temperature for two hours. The slides were washed three times 5 min with PBS before mounting. Primary antibody: TBR2/EOMES (Abcam #ab23345), rabbit polyclonal 1:1000. Secondary antibody: Alexa Fluor 568, rabbit IgG (Invitrogen) was made in donkey. Images from immunohistochemistry were captured using a Zeiss Axioplan2 microscope and AxioVision 4.8 software (Carl Zeiss).