Copy Number Variation in Intron 1 of SOX5 Causes the Pea-comb Phenotype in Chickens

Pea-comb is a dominant mutation in chickens that drastically reduces the size of the comb and wattles. It is an adaptive trait in cold climates as it reduces heat loss and makes the chicken less susceptible to frost lesions. Here we report that Pea-comb is caused by a massive amplification of a duplicated sequence located near evolutionary conserved non-coding sequences in intron 1 of the gene encoding the SOX5 transcription factor. This must be the causative mutation since all other polymorphisms associated with the Pea-comb allele were excluded by genetic analysis. SOX5 controls cell fate and differentiation and is essential for skeletal development, chondrocyte differentiation, and extracellular matrix production. Immunostaining in early embryos demonstrated that Pea-comb is associated with ectopic expression of SOX5 in mesenchymal cells located just beneath the surface ectoderm where the comb and wattles will subsequently develop. The results imply that the duplication expansion interferes with the regulation of SOX5 expression during the differentiation of cells crucial for the development of comb and wattles. The study provides novel insight into the nature of mutations that contribute to phenotypic evolution and is the first description of a spontaneous and fully viable mutation in this developmentally important gene.


Introduction
In 1902 Bateson [1] reported the first examples of Mendelian inheritance in animals based on the genetic studies of four traits in chicken, one of these being the Pea-comb phenotype ( Figure 1). The Pea-comb allele results in reduced comb and wattle size compared to wild-type individuals. Pea-comb shows incomplete dominance and as such the small comb shape can differ slightly between homo-and heterozygous birds. Homozygotes present three longitudinal rows of papillae, whilst heterozygotes can have a well-developed central blade (still of reduced size compared to wild-type) [2]. The wild-type has a single central blade of tissue and is therefore often denoted single comb. Bateson and Punnet [3] reported the first example of an epistatic interaction between genes when they showed that walnut comb is caused by the combined effect of Pea-comb and Rose-comb. Subsequent studies revealed that Pea-comb, besides its effect on comb and wattles, was also associated with a ridge of thickened skin that runs the length of the keel over the breast bone [4]. The Pea-comb mutation may have occurred early during domestication as the phenotype is widespread among both European and Asian breeds of chickens. Furthermore, it has been speculated that a reproduction in the tomb of Rekhmara at Thebes, Egypt, dated to ,3,450 years before present depicts a rooster with the characteristic Pea-comb phenotype [5].
Chickens were domesticated from the red junglefowl with some contributions from the grey junglefowl [6], two species adapted to subtropical or tropical environments. Chickens do not sweat, instead they dissipate up to 15 percent of their body heat through the comb and wattles [7], making the Pea-comb phenotype adaptive to cold environments since it reduces heat loss. This phenotype has also been favoured in chickens bred for cockfighting, as noted by Darwin [8] the smaller ornaments provided smaller targets for injury.
In the present study we show that the classical Pea-comb phenotype in chickens is caused by a large expansion of a duplicated sequence in intron 1 of the gene for the SOX5 transcription factor.

Identifying the Causative Gene for Pea-comb
Pea-comb has previously been assigned to chromosome 1 [9,10]. We refined the localization by linkage analysis using a dense set of genetic markers and a large segregating family. The interval harbouring Pea-comb was defined as 67,831,796-68,456,921 bp on chromosome 1, based on flanking markers showing recombination with Pea-comb (Table 1). This interval contains a single gene, SOX5, a member of the SRY-related HMG box family of transcription factors. SOX5 is located in a one Mb gene desert that is enriched for Evolutionary Conserved Non-coding Sequences (ECNS; Figure 2A). This is a typical feature of developmentally important genes [11,12]. SOX5 was not an obvious candidate gene for Peacomb but the comb is composed of extracellular matrix and SOX5 has a well-established role in chondrocyte development and production of extracellular matrix [13]. Mouse SOX5 knockouts die at birth from respiratory distress caused by a cleft secondary palate and narrow thoracic cage [13]. Mouse SOX5/SOX6 double knockouts die in utero with severe skeletal dysplasia, demonstrating that these two genes have critical, redundant roles during development [13,14].

Pea-comb Region
To further refine the localization of Pea-comb we characterized SOX5 haplotype patterns among three breeds of chicken, a French experimental population, the Russian Orlov and the Chinese Hua-Tung. These breeds all carry Pea-comb and, to the best of our knowledge, there has been no exchange of genetic material between them for 100 generations or more. The Orlov and Hua-Tung are not fixed for Pea-comb, allowing recombination to reduce the size of the shared haplotype associated with the mutation. Initial IBD mapping using 12 samples from the three different populations revealed a completely shared haplotype between 67,961,701 bp and 68,061,854 bp (Table 2). SNP genotyping of all Hua-Tung and Orlov individuals available narrowed the shared haplotype further to a 50 kb region spanning positions 67,985,285 bp and 68,035,337 bp ( Figure 2A; Table 2). The upstream break-point (67,985,285 bp) was identified using a single Hua-Tung bird. The break was confirmed in two additional individuals from the same population which were homozygous at the six SNPs diagnostic of the Pea-comb haplotype, but heterozy-gous at this break-point. Downstream, the haplotype was broken at 68,035,337 bp in three Orlov birds (Table 2).
This critical region is located upstream of the first annotated exon however a comparison with SOX5 from mammalian species indicated that exon 1 is missing from the chicken genome assembly and is expected to be found more than 200 kb upstream of exon 2 ( Figure 2A). We confirmed the existence of an upstream exon in chicken by 59 RACE analysis. The obtained nucleotide sequence (GenBank accession number FJ548639) showed 90% identity to human SOX5 exon 1, but did not give a match in the chicken genome, implying a gap in the current chicken assembly.

SOX5 Mutation Detection Reveals Copy Number Variation
Resequencing the 50 kb region associated with Pea-comb from a set of Pea-comb and wild-type birds revealed a limited number of sequence polymorphisms, with fixed differences between genotypes. These potentially causative SNPs were interrogated using a larger set of wild-type birds from the AvianDiv panel [15], however none of the alleles were found to be unique to the Pea-comb haplotype ( Table 2). The failure to identify a causative point mutation led to a screen of the Pea-comb region for structural changes using Southern blot analysis. The SOX-85kb_SB probe (Table S1) revealed a dramatic increase in the hybridization signal of a 3.2 kb BamHI fragment in Pea-comb birds ( Figure 2C) whilst other probes from the region gave identical restriction fragment patterns for both alleles. The result implied that Pea-comb is associated with a large tandem array of a duplicated sequence containing a BamHI restriction site. PCR and sequence analysis revealed that this DNA fragment is also duplicated on wildtype chromosomes which have two copies ( Figure 2B), whereas the Pea-comb allele has a large number of copies.
Quantification of the copy number of the duplicated fragment using both pulsed field gel electrophoresis (PFGE) and real-time PCR analysis confirmed that a massive amplification of a duplicated sequence is associated with the Pea-comb allele. PFGE analysis using the restriction enzyme PshA1, which cuts outside the duplicated region, gave a 97 kb restriction fragment in Pea-comb birds in contrast to a predicted 10 kb fragment based on the reference genome sequence from a wild-type bird ( Figure S1). The result indicates that the Pea-comb allele contains about 30 copies of the duplicated sequence. Real-time PCR analysis of Pea-comb birds from three breeds confirmed this finding and revealed a 20to 40-fold sequence amplification ( Figure 2D). The real-time PCR analysis did not indicate two clear groupings corresponding to Peacomb heterozygotes and homozygotes suggesting that the duplication may show further copy number variation among Pea-comb individuals. Interestingly, 100 years ago Bateson and Punnett [16] reported variable expression of the Pea-comb phenotype which may reflect a copy number variation of the duplicated sequence. Although the duplicated sequence is not evolutionary conserved, it is located close to two highly conserved ECNSs (Figure 2A). The distance between these elements is about 10 kb on wild-type chromosomes in contrast to about 100 kb on Pea-comb chromosomes. The duplication includes a sequence repeated in two copies on wild-type chromosomes and each copy contains two partial LINE fragments ( Figure 2B). The expansion of this duplication must be the causative mutation because it was the only polymorphism showing complete association with the phenotype.
A closer examination of the duplicated sequence shows that it is particularly GC-rich and contains a small CpG island (Figure 2A and 2B). The wild-type chromosome contains two copies of this CpG island whereas the Pea-comb chromosome contains about 30. This could be relevant for the mechanism of action of this intronic mutation.

Author Summary
The featherless comb and wattles are defining features of the chicken. Whilst the Pea-comb allele was known to show a dominant inheritance and drastically reduce the size of both comb and wattles, the genetics underlying the mutation remained elusive. Chicken comb is primarily composed of collagen and hyaluronan, which are produced by chondrocytes. These cells are formed through the condensation and differentiation of mesenchyme cells during the chondrogenesis pathway, the early stages of which are regulated by SOX transcription factors. Here we pinpoint a massive amplification of a duplicated sequence in the first intron of SOX5 as causing the Pea-comb phenotype. By studying early embryos, we show that SOX5 is ectopically expressed during a restricted stage of development in the cells which underlie the comb and wattles of Pea-comb animals. We hypothesise that the sequence duplication alters the regulation of SOX5 expression when the differentiation of cells essential for comb and wattle development is taking place. Pea-comb adds to the growing list of phenotypic variation which is explained by regulatory mutations and so demonstrates the evolutionary significance of such events.

SOX5 Expression in the Embryonic Nasofacial Region
The Pea-comb phenotype is apparent at hatch and must therefore reflect altered gene expression during development.
Tissue samples from the comb region were collected from both homozygous Pea-comb and homozygous wild-type birds at embryonic (E) days 6, 7, 8, 9, 12 and 19 for expression analysis. Quantitative RT-PCR analysis only revealed significant differences in SOX5 expression at stage E7 and E8 (which were combined due to the low number of E8 samples). The results for E7+8 revealed significant upregulated SOX5 expression in the comb region in Pea-comb birds (t = 25.0, p = 0.002; Figure S2A). Expression analysis was also conducted using primers specific for each exon of SOX5 (including the previously un-annotated exon 1 described above), however the results did not indicate any difference between genotypes in regards to differential splicing of SOX5 ( Figure S2B).
Immunohistochemical staining with a human SOX5 antibody as well as in situ-hybridization with a chicken-specific cRNA probe was carried out to investigate SOX5 expression in both Pea-comb and wild-type embryos during development ( Figure 3). Specific immunostaining of nuclei was seen in developing cartilaginous structures including the nasal septum, Meckel's cartilage and optic sclera ( Figure 3A and 3D). Scattered and rare SOX5 positive cells were seen in the surface ectoderm ( Figure 3B and 3M). All structures with SOX5 staining in wild-type embryos were also   Thus, Pea-comb appears to be a spatiotemporalspecific, cis-acting regulatory SOX5 mutation.

Discussion
A major challenge in current genome biology is to reveal the biological significance of the many Evolutionary Conserved Noncoding Sequences (ECNS). The analysis of the functional significance of ECNS is hindered by a paucity of mutations in such regions which show an association with a phenotype. Here we demonstrate the first spontaneous SOX5 mutation associated with a phenotype, despite the rich abundance of ECNS in the SOX5 region (Figure 2A). SOX5 is under complex regulation and as demonstrated here, mutations affecting its regulation can have very specific effects. It would be surprising if regulatory mutations in this gene do not to some extent contribute to phenotypic diversity present in humans. For instance, the human face shows a bewildering array of diversity. The nearly identical facial appearances of monozygotic twins imply that this diversity is nearly 100% genetically determined, but knowledge concerning the underlying molecular basis of this diversity is restricted to certain craniofacial abnormalities [17]. It is likely that regulatory mutations in developmentally important genes shape this type phenotypic diversity, and SOX5 may very well be one of the genes that contributes.
The comb is a sexual ornament that shows strong sexual dimorphism in chickens and the fact that this sexual dimorphism is maintained in Pea-comb birds shows that the Pea-comb tissue maintains the response to the influence of sex hormones (Figure 1). That the comb is under sexual selection is evidenced by red junglefowl females showing mating preferences for males with large combs and reciprocally, males tend to favour females with larger combs [18,19]. The size of the comb is proportionally larger in many breeds of domestic chickens compared to their wild ancestors. In our previous study of a large intercross between White Leghorn chicken (with larger combs) and red junglefowl, we identified a number of Quantitative Trait Loci (QTL) affecting the size of the comb [20]. Interestingly, one of the QTL controlling the size of the female comb overlaps the SOX5 locus, which now becomes an obvious candidate gene for this QTL. However, the confidence interval for the QTL is large, as is usually the case in an F 2 intercross, and the entire SOX5 region needs to be considered in a search for possible causative mutation(s).
SOX genes are defined by their high-mobility-group (HMG) domains and are divided into eight groups (A to H) based on protein sequence comparison [14]. SOX5 belongs to the D family of SOX genes, along with SOX6 and SOX13. SOX5 has been termed an architectural transcription factor [21], as binding to this protein will cause a sharp bend (80-135 degrees) in the bound DNA and may lead to different regulatory regions of a target gene coming into closer proximity. SOX5 has been reported to have a co-operative role in chondrogenesis; during embryonic cartilage formation SOX5 and SOX6 assist SOX9 to activate specific genes [22], and have a repressive role in oligodendrogenesis during neural development [23]. SOX5 is also expressed in the developing neocortex and cranial neural crest during the early stages of development. SOX5 postmitotically regulates migration, axon projection and postmigratory differentiation of certain neocortical neurons [24] but little is known about SOX5 function in neural crest derivatives [25]. With these different roles, the functional consequence of the transient ectopic SOX5 expression in Pea-comb birds is not clear.
The comb is composed of layers of epidermis, dermis and central connective tissue, of which collagen and hyaluronan are the major components [26]. The ectopic SOX5 expression is first seen in E7 (st28) mesenchyme ( Figure 3). Previous studies with grafts of comb-primordia from different ages at various locations imply that cells giving rise to the comb are already determined by E4 (st24) [27,28] and that the determination resides in the mesenchymal components and not in the ectoderm [27]. These experiments also revealed that the morphology of the comb was under control of the mesenchyme [27,29]. Heterotopic grafts of single-comb primordia to the neck region without beak mesenchyme, lost the serrated single ridge morphology and expanded laterally following the development, resembling that of complex comb types [29] such as the Pea-comb. Hence, changes in the underlying mesenchyme at the time of the ectopic SOX5 expression will not affect the determination and initial stages of the comb development but rather the development of comb shape. Our results indicate that ectopic SOX5 expression changes the modulating properties of the mesenchyme of the nasofacial region beneath the regions of the developing comb and wattles. The serration of a single comb is associated with loosely coherent clusters or points of proliferating mesenchymal cells [30,31]. Such clusters were not observed in the developing Pea-comb mesenchyme and this difference may be due to the ectopic SOX5 expression.
Pea-comb is an additional example of a Copy Number Variation (CNV) associated with a phenotype. About 12% of the human genome contains tandem duplications that may show CNV [32] and a number of human diseases have been reported to be associated with CNVs [33,34]. It is important to distinguish CNVs that are due to duplications of single copy sequences (de novo duplications) and expansions or contractions of already duplicated sequences. We have previously reported three de novo duplications associated with phenotypic traits in domestic animals, Dominant white colour in pigs [35], the Ridge phenotype in Ridgeback dogs [36] and Greying with age in horses [37]. In contrast, Pea-comb and most human diseases associated with CNVs involve expansions or contractions of existing duplications. Pea-comb is however an unusual CNV associated with a phenotype because it involves the amplification of a non-coding region located far from any coding sequence. Pea-comb therefore to some extent resembles the massive region corresponding to the probe used for Southern blot analysis is indicated. (C) Southern blot analysis using genomic DNA digested with BamHI from Pea-comb and wild-type chickens; the estimated sizes of restriction fragments are given to the left. (D) Results of real-time PCR analysis of the duplicated region. Individual phenotypes were not available for the Hua-Tung breed and the real-time PCR assay indicated that one bird was homozygous wild-type which is fully possible since Pea-comb is not fixed in this breed; furthermore, this bird did not carry the Pea-comb haplotype. The results for each individual sample are compiled in Table S3 Table 2. Identical-by-Descent analysis of Pea-comb haplotypes from three different breeds in comparison with wild-type haplotypes. amplification of a trinucleotide repeat in intron 1 of Frataxin causing Friedrich ataxia [38]. However, the mechanism of action is probably very different since the expansion of the trinucleotide repeat in Frataxin leads to the formation DNA triplexes and ''sticky DNA'' causing transcriptional silencing [38]. The duplicated sequence in intron 1 of SOX5 is not evolutionary conserved between birds and mammals. This does not exclude the possibility that it contains regulatory elements which are important for SOX5 in birds, or in birds that develop combs and wattles. However, even if the duplicated sequence per se is not functionally important, the massive amplification of this sequence may disturb the action of regulatory elements in the region. For instance, tandem repeats may recruit DNA methylation which abolishes protein-DNA interaction at regulatory elements [39]. Our observation that the duplicated region is not only particularly GC-rich, but contains a small CpG island which becomes repeated about 30 times on the Pea-comb chromosome, suggests that DNA methylation maybe a plausible mechanism for Pea-comb as this effect may spread to neighbouring regulatory sites. Genetic studies of phenotypic diversity in domestic animals provide a strong case for the evolutionary significance of regulatory mutations. Other examples of cis-acting regulatory mutations underlying phenotypic traits in domestic animals include (i) a nucleotide substitution in intron 3 of IGF2 with a prominent effect on muscle growth in the pig [40], (ii) regulatory mutations in the gene for microphtalmia-transcription factor (MITF) causing white spotting in dogs [41], (iii) regulatory mutation(s) in BCDO2 causing the yellow skin phenotype in chicken [6], (iv) a 4.6 kb duplication in intron 6 of STX17 causing Greying with age in horses [37], (v) an 11.7 kb intergenic deletion causing intersexuality and lack of horns in goats [42] and (vi) a mutation creating an illegitimate microRNA target site in the sheep myostatin gene promoting muscle growth [43]. Furthermore, the ridge phenotype in dogs [36] and the dominant white colour in pigs [35] are caused by large duplications that most likely lead to dysregulated expression of some fibroblast growth factor genes and the KIT receptor, respectively. Most of these examples concern growth factors, growth factor receptors, or transcription factors that have important roles during development and for which null mutations are lethal or sub-lethal. The significance of regulatory mutations is also supported by the identification of mutations underlying morphological variation in Drosophila [44,45] and stickleback fish [46]. This wealth of data now demonstrates the prominent role of regulatory mutations, at least for morphological evolution, as predicted by King and Wilson more than 30 years ago based on the limited divergence in protein sequences between human and chimpanzee [47].

Animals
DNA samples from a French pedigree consisting of 7 parental, 14 F 1 and 244 F 2 progeny were used for linkage analysis. The parentals consisted of four heterozygous Pea-comb birds and three homozygous wild-type birds. DNA samples from Pea-comb birds for identical-by-descent mapping came from a French experimental population kept by INRA, from a Chinese Hua-Tung population and from the Russian Orlov breed. DNA samples from various domestic breeds collected by the AvianDiv project [15] were used for real-time PCR analysis and to test whether candidate causal mutations from the Pea-comb region could be excluded since they were present among birds homozygous for the wild-type allele at the Pea-comb locus.

Linkage analysis
Linkage analysis was conducted using the SNPs compiled in Table S1. SNP genotyping was performed with Pyrosequencing (See 'Linkage primers', Table S1 for details). Fine-mapping was carried out on a small number of recombinant individuals that more exactly defined the Pea-comb region. In this case, one kb fragments were amplified and sequenced to detect SNPs (see '1 kb fragment analysis', Table S1 for primers).

Identical-by-Descent (IBD) mapping
IBD mapping was initially performed on a panel of 12 chickens; two Pea-comb and two wild-type birds from the linkage pedigree, four homozygous Pea-comb birds from the French pedigree, two Pea-comb birds from the Chinese Hua-Tung population and two Pea-comb birds from the Russian Orlov population. A collection of one kb regions spanning approximately 67,891,800 bp to 68,181,677 bp on chromosome 1 were sequenced for each animal to identify SNPs between lines (See 'SNPs used for IBD Mapping', Table S1, for exact positions). In a similar way, the heterozygosity of chromosome 1, fragment 68,181,600 bp to 68,335,500 bp, was determined by sequencing 16 homozygous Pea-comb birds belonging to the linkage pedigree (Primers SOX+130, SOX+140, SOX+200, SOX+260 in Table S1). This re-sequencing effort revealed potential causative SNP that were differentially segregating between the Pea-comb and non-Pea-comb populations. These polymorphisms were subsequently tested in the non-Pea-comb individuals from the AvianDiv panel and used to define the Pea-comb region by six loci, positions 68,038,060 bp, 68,035,337 bp, 68,019,518 bp, 68,011,661 bp, 67,991,941 bp and 67,985,285 bp respectively. Pyrosequencing was used to assay these six variations in 34 Hua-Tung Pea-comb birds and 27 Orlov Pea-comb birds (See 'Pyro SNPs used for IBD mapping', Table  S1). Lastly, four of these loci were also genotyped for a variety of birds from the AvianDiv panel to check the frequency of the Peacomb haplotype among wild-type chromosomes.

Real-time PCR analysis
The copy number of the SOX5 duplication was evaluated by comparing eight populations with wild-type phenotype (red junglefowl, n = 5; commercial broiler, n = 5; Czech Golden Pencilled, n = 5; Friesian Fowl, n = 5; Finnish Landrace, n = 5; Red Villafranquina, n = 5; Transylvanian Naked Neck, n = 5; White Leghorn, n = 5) to three breeds segregating for Pea-comb (French Pea-comb, n = 3; Hua-Tung, n = 13; Orlov, n = 13). The real-time PCR assay contained TaqMan Gene Expression Master Mix (Applied Biosystems), 900 nM of each primer combined with 250 nM of fluorometric probe and 30 ng of genomic DNA. The SOX5 assay was normalised using an assay designed to ribosomal protein S24 (rps24). Primer and probe concentrations of those reactions were 750 nM and 300 nM, respectively. Each assay was performed in triplicate, averaged and referenced to a wild-type red junglefowl. Details of primer and probe sequences are in Table S2. Fold change was calculated using the equation 2 2(Normalized Ct peacomb assay2Normalized Ct rps24 assay) and the range of this value was determined from the combined standard errors of both assays.

Resequencing
Seventy kb on chromosome 1 from 67,969,741 bp to 68,041,242 bp were re-sequenced using a panel of ten birds; two wild-type parental birds from the linkage pedigree, two red junglefowl (RJF) birds, two homozygous Pea-comb from the French pedigree, two Pea-comb Hua-Tung birds and two Pea-comb Russian Orlov birds. Primers pairs were used to generate overlapping PCR amplicons ranging from approximately 1200 bp to 1400 bp in size. Internal primers were used with each primer pair set. Primers were designed using Primer3 [48]. DNA sequences were analysed and edited in Codoncode Aligner (CodonCode, Dedham, MA). The RJF genomic sequence used to generate the chicken genome sequence was used as a reference for alignment.
The chicken genome reference sequence contained three gaps. Gap 1 spanned 67,981,199 bp-67,983,790 bp; gap 2, 68,002,231 bp-68,003,557 bp and gap 3, 68,006,200 bp-68,006,994 bp. Gaps 1 and 3 were closed using a PCR-based 2step strategy [49] (Primers Dynal-75_gap and Dynal-105_gap primers in Table S1), whilst gap 2 was covered using long range PCR (Primers LR_gap1, Table S1). Gap 2 was found to be a tandem duplication, part of the duplication linked to the Pea-comb mutation. Therefore sequencing was performed after the amplicon was cleaved with XhoI, and both halves sequenced independently.

Southern blot analysis
Southern blot analysis was performed using a set of six different probes (SOX-55kb_SB to SOX-105kb_SB, Table S1) on a panel consisting of three homozygous Pea-comb birds from the linkage pedigree, three red junglefowls, two commercial broiler samples and two White Leghorn birds. The DNA was digested with BamHI and separated by 0.7% agarose gel electrophoresis.

Pulsed Field Gel Electrophoresis (PFGE)
DNA plugs were prepared from nine chickens, three of each wildtype, Pea-comb heterozygous and Pea-comb homozygous birds. The plug preparation and restriction digest protocol follows that of Giuffra et al. [35], with the following modifications. Whole blood stored in 0.5 M EDTA was used as starting material and resuspended to a concentration of 25610 8 cells/ml in PBS after washing. Plugs were solidified at room temperature prior to digestion for 24 hours at 50uC in 0.5 mg/ml proteinase K, 16NDS (0.5 M EDTA, 0.01 M Tris, 0.34 M N-Laurylsarcosine, pH 8.0) with constant shaking. Enzyme digestions were performed as described [35]. PshA1 (New England BioLabs) was selected for this experiment as this restriction enzyme was predicted to cut at position 67,998,520 bp and 68,005,614 bp, i.e. outside the duplicated region.
PFGE of the PshA1 digested plugs was performed in a 1.0% agarose gel, 0.5% TBE at 14uC, 6 V/cm, switch times ramped from 1-25 seconds for 17 hours and fragment sizes were estimated using the MidRange I PFG Marker (New England BioLabs). Southern blot analysis was performed as before, using the 986 bp product from the SOX-85kb_SB amplicon (Table S1) as probe.

Duplication re-sequencing and analysis
The duplicated region was amplified with long-range PCR primers (SOX-Duplication_LR1_F and R, Table S1). In addition, internal primers were used to check the length of the potential duplication through nested PCR of the initial amplicon (Primers SOX-Duplication_F, R11, 12 and 13, Table S1).

Immunostaining
Heads from staged embryos were fixed in 4% paraformaldehyde in phosphate buffered saline (PBS) for one hour at 4uC. Fixed heads were incubated overnight in 30% sucrose in PBS at 4uC, embedded in OCT freezing medium (Tissue-Tek, Sakura), frozen and sectioned in a cryostat. Cross sections and sagittal sections, 10 mm thick, were collected on glass slides (Super Frost Plus, Menzel-Gläser). The sections were rehydrated in PBS for 15 min and then blocked in PBS containing 1% fetal calf serum, 0.1% Triton-X and 0.02% Thimerosal. The SOX5 antibody (Abcam, a_6226041) was diluted 1:500 in blocking solution and incubated on the slides over night at 4uC. The secondary antibodies (Jackson Immunoresearch Laboratories) were incubated at room temperature for two hours at a 1:200 dilution in blocking solution. Samples were analysed using a Zeiss Axioplan2 microscope equipped with Axiovision software. Images were formatted, resized, enhanced and arranged for publication using Axiovision and Adobe Photoshop.

In situ hybridization
A cRNA probe was made using a DIG RNA labeling kit (Roche). The SOX5 probe was made from the chEST752i6 cDNA clone acquired from the BBSRC ChickEST Database [50]. The probe was hybridized to untreated sections over night at 66uC under conditions containing 50% formamide and 56SSC in a humidified chamber. The DIG labeled nucleotides were detected using an alkaline-phosphatase coupled anti-DIG antibody (Roche) followed by incubation with BCIP/NBT developing solution (Roche) for 1-5 hours at 37uC. Images were captured using a Zeiss Axioplan2 microscope equipped with Axiovision software (3.0.6.1, Carl Zeiss Vision GmbH).

qPCR analysis of tissue samples
Tissue was collected from homozygous Pea-comb birds and homozygous wild-type birds. The ages of the birds sampled were embryonic (E) days 6, 7, 8, 9, 12 and 19 (with hatching occurring at approximately day 21). Two Pea-comb and two normal individuals were collected from each stage, with the exceptions of E7, where nine samples (four Pea-comb and five wild-type) were used and two E8 samples (one of each type). Tissues were initially stored in RNALater (Ambion), with total RNA extracted from embryonic tissues using the Trizol reagent (Invitrogen). The most central part of the presumptive beak and comb were dissected out. cDNA was made from 1 mg of RNA using GeneAmp (Applied Biosystems). Samples were run in triplicate using IQ SyBr Green Supermix (Biorad) and normalized to b-actin and TATA-box binding protein (TBP); primers are given in Table S1. SOX5 was amplified using primers SOX5_cDNA_1 crossing intron/exon boundaries. Control cDNA reactions containing primers but no RNA were performed in parallel. Samples were run on two separate machines: the ABI 7900HT and the Corbett Rotor-Gene 6000. In addition to these primers, primers for each individual exon (2 to 15) were also used to analyse potential alternate SOX5 splicing in tissue from the comb. These were used on cDNA from two E7 samples (Pea-comb and wild-type) and two E9 samples (Pea-comb and wild-type). Statistical analysis was performed by first correcting Ct values for batch effects caused by using two different machines, then conducting a twosample t-test on the average of each set of triplicates.

Web reference
Information on the chicken genome sequence is available at http://www.genome.ucsc.edu.

Accession numbers
The sequence data presented in this paper have been submitted to GenBank with the following accession numbers FJ548629-FJ548639