Molecular cytogenetic characterization of repetitive sequences comprising centromeric heterochromatin in three Anseriformes species

The highly repetitive DNA sequence of centromeric heterochromatin is an effective molecular cytogenetic marker for investigating genomic compartmentalization between macrochromosomes and microchromosomes in birds. We isolated four repetitive sequence families of centromeric heterochromatin from three Anseriformes species, viz., domestic duck (Anas platyrhynchos, APL), bean goose (Anser fabalis, AFA), and whooper swan (Cygnus cygnus, CCY), and characterized the sequences by molecular cytogenetic approach. The 190-bp APL-HaeIII and 101-bp AFA-HinfI-S sequences were localized in almost all chromosomes of A. platyrhynchos and A. fabalis, respectively. However, the 192-bp AFA-HinfI-L and 290-bp CCY-ApaI sequences were distributed in almost all microchromosomes of A. fabalis and in approximately 10 microchromosomes of C. cygnus, respectively. APL-HaeIII, AFA-HinfI-L, and CCY-ApaI showed partial sequence homology with the chicken nuclear-membrane-associated (CNM) repeat families, which were localized primarily to the centromeric regions of microchromosomes in Galliformes, suggesting that ancestral sequences of the CNM repeat families are observed in the common ancestors of Anseriformes and Galliformes. These results collectively provide the possibility that homogenization of centromeric heterochromatin occurred between microchromosomes in Anseriformes and Galliformes; however, homogenization between macrochromosomes and microchromosomes also occurred in some centromeric repetitive sequences.


Introduction
Highly repetitive DNA sequences are one of the major components of chromosomes, which are generally divided into two categories, viz., interspersed repetitive sequences and site-specific repetitive sequences, based on their genomic organization and chromosomal distribution [1]. Two subtypes of non-long terminal repeat (LTR) retrotransposons, viz., long interspersed elements (LINEs) and short interspersed elements (SINEs), are well-known as major components of interspersed-type repetitive sequences, which are distributed throughout the genome. Site-specific highly repetitive sequences constituting the heterochromatin viz., centromeric repetitive DNA sequences, non-centromeric chromosome site-specific repetitive sequences, microsatellite repeat motifs, etc., are tandem duplicated and are usually present as more than 10,000 copies in the genome, which have important roles in chromosome organization, sex chromosome differentiation, and chromatin architecture in interphase nuclei [2][3][4]. Centromeric heterochromatin-associated highly repetitive DNA sequences have been isolated and characterized from a high number of vertebrates. These sequences are generally susceptible to rapid nucleotide substitution; therefore, they often rapidly evolve in a concerted manner, resulting in low and intraspecific sequence variation but a higher degree of interspecific sequence variation [5][6][7]. This indicates that the centromeric repetitive sequence is a good taxonomic and phylogenetic marker for reconstructing the evolutionary relationships between closely related species that share the same origin of repetitive sequence families.
The order Anseriformes consists of three extant families, viz., Anatidae, Anhimidae, and Anseranatidae, containing over 180 species; they are highly adapted for swimming on the surface of water. Anseriformes is considered to have appeared approximately 77 million years ago when the ancestral Galloanserae split into the two main lineages of Anseriformes and Galliformes [25]. Anseriformes and Galliformes are the most basal lineages of neognathous birds that appeared after Palaeognathae in the phylogeny of birds. The karyotypes of Anseriformes are characterized by high diploid chromosome numbers (2n = 78-98), and their typical karyotypes are composed of 78-80 chromosomes that consist of a small number of macrochromosomes and numerous microchromosomes, which are similar to typical avian karyotypes [8][9][10]26]. The tandem repetitive sequence, RBMII, which was isolated from the Red-breasted Merganser (Mergus serrator) [27], was found in all 22 Anseriformes species, including the domestic duck (A. platyrhynchos) [28]. However, in Anseriformes, molecular characterization of repetitive sequences is limited to RBMII sequences, whose chromosomal distribution is not known yet.
In this study, to improve our understanding of chromosome size-correlated genomic compartmentalization in Aves, we isolated repetitive sequences constituting centromeric heterochromatin from three Anseriformes species, viz., the domestic duck (A. platyrhynchos), bean goose (A. fabalis), and whooper swan (C. cygnus), and characterized the sequences molecular cytogenetically. We examined the chromosomal locations, genomic organization, and sequence conservation among the avian species using fluorescent in situ hybridization (FISH) and filter hybridization. Finally, we discussed the molecular evolution of the centromeric repetitive sequences in Anseriformes and genomic compartmentalization between macrochromosomes and microchromosomes in Aves based on the obtained data.

Ethics statement
Animal care and all experimental procedures were conducted according to the guidelines for the care and use of experimental animals of Nagoya University. The animal protocols were approved by the Animal Experiment Committee, Graduate School of Bioagricultural Sciences, Nagoya University (approved no. 2009090901).

Cell culture, chromosome preparation, and chromosome banding
For cell culture, small pieces of skin tissues were collected from a 1-month-old female domestic duck (A. platyrhynchos, Anatidae), purchased from a breeding farm in Japan, and a female bean goose (A. fabalis, Anatidae) and two female whooper swans (C. cygnus, Anatidae) from the Asahiyama Zoo, Asahikawa and Kushiro Zoo, Kushiro, respectively, Hokkaido, Japan. Fibroblasts were cultured in Medium 199 (Thermo Fisher Scientific-GIBCO, Carlsbad, CA, USA) supplemented with 15% fetal bovine serum (Thermo Fisher Scientific-GIBCO), 100 μg/ ml kanamycin, and 1% antibiotic-antimycotic solution (PSA) (Thermo Fisher Scientific-GIBCO) at 39˚C in 5% CO 2 . For Giemsa-stained and C-banded karyotype analyses, fibroblasts were collected 30 min after colcemid treatment, suspended in 0.075 M KCl for 20 min, and then fixed with 3:1 methanol/acetic acid. Chromosome preparations were prepared following the standard air-drying method. Replication-banded chromosome slides were prepared for in situ hybridization as described previously [29]. The cultured cells were treated with 5-bromodeoxyuridine (BrdU) (25 μg/ml) (Sigma-Aldrich, St Louis, MO, USA) at the late replication stage for 4.5 h, including the 30-min colcemid treatment. After staining the slides with Hoechst 33258 (1 μg/ml) for 5 min, replication bands were obtained by heating at 65˚C for 3 min and exposing to UV light at 65˚C for an additional 6 min. The slides were maintained at −80˚C until use.
To examine the chromosomal distribution of centromeric heterochromatin, C-banding was performed by the barium hydroxide/saline/Giemsa method [30] with slight modification; chromosome slides were treated with 0.2 M HCl for 5 min at room temperature and then with 5% Ba(OH) 2 at 50˚C for 5-7 min.

Giemsa-stained and C-banded karyotypes
Giemsa-stained and C-banded karyotypes of A. platyrhynchos (2n = 80) were described in previous studies [36][37][38][39][40][41]. The chromosome number of A. fabalis was 2n = 80, consisting of two pairs of large submetacentric chromosomes, one pair each of large subtelocentric chromosomes, medium-sized metacentric chromosomes, and medium-sized subtelocentric chromosomes; four pairs of small acrocentric chromosomes; 30 pairs of indistinguishable microchromosomes; and the submetacentric Z and subtelocentric W sex chromosomes (Fig 1A). The karyotype of C. cygnus (2n = 80) consisted of one pair each of large submetacentric chromosomes, large metacentric chromosomes, and large acrocentric chromosomes; two pairs of medium-sized acrocentric chromosomes; four pairs of small acrocentric chromosomes; 30 pairs of indistinguishable microchromosomes; and the acrocentric Z and small acrocentric W sex chromosomes, which were similar to those of A. platyrhynchos (Fig 1B). The morphology of the chromosomes 4, Z, and W chromosomes was acrocentric in A. platyrhynchos and C. cygnus, whereas these three chromosomes were metacentric, submetacentric, and subtelocentric, respectively, in A. fabalis (Fig 1). Large C-positive heterochromatin blocks were observed in the centromeric regions of most autosomes and the Z chromosome and in whole regions of the W chromosome in these two species as well as A. platyrhynchos (Fig 2) [38,39,41]. In A. platyrhynchos and A. fabalis, Cpositive heterochromatin on the Z chromosome was observed in only the centromeric region (Fig 2A), whereas ladder C-positive heterochromatin blocks were observed throughout the region of the Z chromosome in C. cygnus (Fig 2B).

Chromosome homologies with chicken chromosomes
Each of macrochromosome probes (GGA1-3, 5-9, and Z), except for GGA4, painted a single pair of chromosomes of A. fabalis and C. cygnus (Fig 3). The GGA4 probe hybridized to chromosome 4 and additionally to one pair of microchromosomes (Fig 3B and 3E). These results of A. fabalis and C. cygnus were consistent with those of A. platyrhynchos in our previous study [40]. The microchromosome-specific paint pool (GGAmicro) hybridized with approximately half of microchromosomes, and no hybridization signals were detected on macrochromosomes ( Fig 3F).

Repetitive sequence families and their nucleotide sequences
Prominent DNA bands of repetitive sequences were revealed by agarose gel electrophoresis of A. platyrhynchos (APL) genomic DNA digested with HaeIII. A DNA band of approximately 190 bp was isolated from the gel, and then 16 clones inserted into plasmid vectors were obtained. For A. fabalis (AFA), 16 and 8 clones were obtained from two bands of approximately 100 and 200 bp by HinfI digestion, respectively. For C. cygnus, 50 clones were obtained  Consequently, 26 DNA fragments categorized into four families of repetitive sequences were isolated from these three species and deposited in DDBJ (http://www.ddbj.nig.ac.jp/), viz., the APL-HaeIII family from A. platyrhynchos, the AFA-HinfI-S and AFA-HinfI-L familiy from A. fabalis, and the CCY-ApaI family from C. cygnus (Table 1). The sizes, GC content, and  nucleotide sequence identities between the fragments and within the same sequence family are summarized in Table 1. The length of the consensus sequence of five APL-HaeIII fragments was 190 bp (S1 Fig). A deletion of 29 nucleotides was found at the end of two 161-bp fragments (APL-HaeIII-08 and APL-HaeIII-11). Nucleotide sequence identities, which were calculated by eliminating insertions and deletions but including one nucleotide gap, ranged from 78.2% to 89.4% (84.7% on an average), and the GC content was relatively high (51.6% on an average). All four AFA-HinfI-S fragments were of the same length of 101 bp, and two AFA-HinfI-L fragments were of 192 bp (S2 Fig). The identities of nucleotide sequences between the fragments ranged from 91.0% to 98.1% (94.1% on an average) for the AFA-HinfI-S sequence family and 98.4% for the AFA-HinfI-L sequence family. AFA-HinfI-S showed much higher GC content (60.2%) compared to that of AFA-HinfI-L (51.6%). The length of 15 CCY-ApaI fragments ranged from 286 to 294 bp with 97.6% sequence identify and 54.0% GC content on an average (S3 Fig).

Chromosomal distribution
In A. platyrhynchos, the APL-HaeIII sequence family showed intense hybridization signals in the centromeric regions of all macrochromosomes and microchromosomes, except for the chromosomes 1 and Z, i.e., no signals were observed on the chromosomes 1 and Z (Fig 5A). In C. cygnus, CCY-ApaI was localized to the centromeric regions of approximately 10 microchromosomes (Fig 5B). AFA-HinfI-S and AFA-HinfI-L were localized to the centromeric/pericentromeric regions of all chromosomes and almost all microchromosomes, respectively, in A. fabalis (Fig 5C-5E). The fluorescent signals of AFA-HinfI-L and AFA-HinfI-S hardly overlapped, suggesting that the two sequence families were located separately in centromeric heterochromatin on the same microchromosomes (Fig 5E).  [14,16,18], and squares on the sequences indicate the A 3-5 or T 3-5 internal repeats in this motif. Dot matrix analysis of the 101-bp AFA-HinfI-S consensus sequence (B). Dot matrix analysis was performed in the condition of the scoring matrix, 200PAM/K = 2 and threshold score = 22 (E = 0.00805). Alignment of the APL-HaeIII consensus sequence and partial sequence at nucleotide position 60-192 of the AFA-HinfI-L consensus sequence with the RBMII sequences of A. platyrhynchos (APL) (X61424) and Aix sponsa (ASP) (X61410) (C). Alignment of the partial sequences at nucleotide position 76-95 of the AFA-HinfI-L consensus sequence and at position 24-43 of the APL-HaeIII consensus sequence (D) and the partial sequence at positions 179-223 of the CCY-ApaI consensus sequence (E) with the four CNM sequence homologs in Galliformes, viz., CNM repeat in chicken [14], TM repeat in turkey (M. gallopavo) [16], CCH-S in Blue-breasted Quail (C. chinensis) [17], and ACH-Sau3AI in chukar partridge (A. chukar) [18]. Squares indicate the A 3-5 or T 3-5 internal repeats in the 12-17-bp T-rich and A-rich motifs conserved in the CNM repeat sequence family of Galliformes [14,16,18].

Organization in the genome
Southern blot hybridization was performed to examine the genomic organization of four families of repetitive sequences. APL-HaeIII showed polymeric ladder signals of tandem repeats of the 190-bp monomer unit in HaeIII, MspI, and TaqI digests ( Fig 6A); the monomer unit was present in the highest abundance, with decreasing copy numbers of each higher order. By contrast, the BamHI digest produced higher intensity of hybridization bands with increasing size of multimers. This result indicated that the BamHI cleavage site was not highly conserved in the tandem array of the 190-bp monomer unit. The restriction site for both HpaII and MspI is CCGG, and HpaII does not cleave when the second cytosine is methylated, whereas MspI does. In contrast to the MspI digest, the intensity of ladder bands increased from low to high molecular weight in HpaII digests, indicating that the APL-HaeIII sequence was highly methylated.
Hybridization of the AFA-HinfI-S sequence showed only a 101-bp monomeric band in the HinfI digest, indicating that the HinfI site was highly conserved with regard to the tandem array of the sequence (Fig 6B). However, no restriction sites of BamHI, HaeIII, RsaI, and MspI were found in the AFA-HinfI-S monomer unit (S2A Fig), thus, resulting in smear-like bands in the digests of these enzymes. This sequence was weakly methylated because the intensities of ladder bands at lower molecular weight were slightly higher in the HpaII digest than those in the MspI digest.
In hybridization with AFA-HinfI-L, ladder bands at lower molecular weight were observed in the HinfI, AluI, RsaI, and MspI digests, whose restriction sites were all contained in the 192-bp monomer unit (Fig 6C and S2 Fig). Of these sites, the HinfI and AluI sites were particularly highly conserved. In contrast to the MspI digest, there were no hybridization signals at a lower molecule weight in the HpaII digest, indicating that this sequence was hypermethylated.

Centromeric repetitive sequences in Anseriformes birds
Hybridization of the 290-bp CCY-ApaI fragment revealed that the ApaI, HinfI, and MspI sites were conserved in the tandem array of this repeat sequence (Fig 6D and S3 Fig). The hybridization bands positioned at approximately 580 and 870 bp corresponded to the dimeric and trimeric bands in the ApaI digest, respectively. The intermediate band between the 290and 580-bp bands may have been derived from the internal restriction sites; however, this site was not found for this fragment. The <290-bp hybridization band in the HaeIII digest was derived from multiple internal HaeIII sites contained in the sequence (S3 Fig). Only one 290-bp band was observed in the MspI digest; however, no bands were found at lower molecular weight in the HpaII digest, indicating that this repetitive sequence was hypermethylated.

Nucleotide sequence conservation
Slot blot hybridization probed with four families of repetitive sequences was performed using genomic DNA from 17 avian species of 10 orders (Fig 7). Hybridization signals of APL-HaeIII and AFA-HinfI-L were detected for two species of Anseriformes, viz., A. platyrhynchos and A. fabalis (Fig 7A and 7C). The hybridization signals of AFA-HinfI-S and CCY-ApaI were detected only in A. fabalis and C. cygnus, respectively (Fig 7B and 7D). In order to examine the chromosomal distribution of AFA-HinfI-L in A. platyrhynchos and APL-HaeIII in A. fabalis, we performed cross-species FISH mapping of AFA-HinfI-L to A. platyrhynchos chromosomes and of APL-HaeIII to A. fabalis chromosomes. However, no hybridization signals were found.

Discussion
Giemsa-stained karyotype analysis revealed that the chromosome number was 2n = 80 for both A. fabalis and C. cygnus as A. platyrhynchos [36][37][38][39][40][41]. The size and morphology of most macrochromosomes were similar between these two species; however, the chromosomes 4, Z, and W of A. platyrhynchos and C. cygnus were morphologically different from those of A. fabalis. C-positive heterochromatin was observed in the centromeric regions of almost all autosomes and whole regions of the W chromosomes in A. fabalis and C. cygnus, which was consistent with that of A. platyrhynchos, as reported previously [39,41]. However, amplification of the interstitial heterochromatin blocks in the Z chromosome, as shown by the C-positive ladder signals, occurred specifically in the lineage of C. cygnus.
In this study, we isolated four families of centromere-specific repetitive sequences, APL-HaeIII, AFA-Hinf-S, AFA-Hinf-L, and CCY-ApaI, from three Anseriformes species. Among these four repetitive sequences, AFA-Hinf-S and CCY-ApaI were species-specific, suggesting that these repetitive sequences occurred independently in each species. APL-HaeIII and AFA-Hinf-L were conserved in A. platyrhynchos and A. fabalis, both of which showed sequence similarities to the RBMII sequence that is conserved in at least 22 Anseriformes species [27,28]. However, these two sequence families showed different chromosomal distribution, and no hybridization signals were detected on chromosomes by cross-species FISH mapping. These results indicate that APL-HaeIII and AFA-Hinf-L were derived from the same origin as the RBMII sequence in Anseriformes; however, they differentiated to the extent that they hybridized interspecifically by slot blot hybridization but not by FISH.
APL-HaeIII and AFA-HinfI-S were distributed in almost all chromosomes. However, AFA-HinfI-L and CCY-ApaI were microchromosome-specific centromeric repeats that were firstly identified in Anseriformes. AFA-HinfI-L was predominantly localized to all microchromosomes and CCY-ApaI to some microchromosomes. All the four repetitive sequences were Anseriformes-specific sequences; however, the partial sequences of APL-HaeIII, AFA-HinfI-L, and CCY-ApaI showed homology with the CNM family sequences, including chicken CNM, turkey TM, Blue-breasted quail CCH-S, and chukar partridge ACH-Sau3AI sequences, which are localized primarily to microchromosomes in Galliformes [14,[16][17][18]. The T-rich and Arich motifs conserved in the CNM family sequences were observed in AFA-HinfI-L but not in APL-HaeIII and CCY-ApaI. Consequently, the CNM family sequences of Galliformes and APL-HaeIII, AFA-HinfI-L, and CCY-ApaI were concluded to be partially derived from the same ancestral sequence and diverged independently in each lineage. Microchromosome-specific repetitive sequences have been isolated from Falconiformes, Galliformes, Piciformes, and Struthioniformes in Aves and from the Chinese soft-shell turtle [14][15][16][17][18][19][20][21][22], suggesting that chromosome size-correlated genome compartmentalization between macrochromosomes and microchromosomes is common in birds and turtle. AFA-HinfI-L and CCY-ApaI of Anseriformes were also homogenized in a chromosome size-correlated manner. However, the homogenization between macrochromosomes and microchromosomes also occurred in APL-HaeIII and AFA-HinfI-S, as observed in ACH-Sau3AI, CVI-HaeIII, CCA-BamHI, and CSQ-BamHI in Galliformes [18].
Not much is known about how chromosome size-dependent distribution of the centromeric repetitive sequences evolved in avian genomes. One possible explanation is that such biased distribution of the centromeric repetitive sequences is caused by chromosome positioning in the nuclei. In the interphase nuclei of chicken, turkey, and Japanese quail, microchromosomes are located predominantly in the nuclear interior, and macrochromosomes are primarily located in peripheral parts of the nuclei [32,[47][48][49][50]. Owing to the spatially different disposition, physical interaction of chromatin between macrochromosomes and microchromosomes may be restricted, resulting in the restriction of homogenization of the centromeric repetitive sequences between different sized chromosomes. However, the molecular basis that is responsible for the spatial structure of the centromeres of macrochromosomes and microchromosomes in the interphase nuclei are not fully understood.
The presence of microchromosomes is a common feature of Aves, Reptilia (sauropsids), except for Crocodilia [51], and also in some amphibian species [52]. Comparative genome and chromosome analyses for amphibians, reptiles, birds, and mammals suggest that ancestral tetrapods and amniotes may have retained many microchromosomes whose linkages are highly conserved in chicken microchromosomes [53][54][55][56][57][58]. Comparison of the GC content in exonic third codon positions (GC 3 ) of genes between macrochromosomes and microchromosomes in several reptilian species, Chinese soft-shell turtle, Japanese four-striped rat snake (Elaphe quadrivirgata), central bearded dragon (Pogona vitticeps), and green anole (Anolis carolinensis) demonstrated that the genes on microchromosomes tend to have higher GC 3 than those on macrochromosomes, as shown in chicken, suggesting that chromosome size-dependent GC heterogeneity was acquired in the common ancestors of sauropsids [59][60][61][62]. Further identification of the microchromosome-specific centromeric repetitive sequences from avian and reptilian species may help clarify the relationship between the genomic organization of microchromosomes and chromosome size-correlated compartmentalization between macrochromosomes and microchromosomes in tetrapods.  Table. Nucleotide sequences exhibiting homologies with three repetitive sequence families, APL-HaeIII, AFA-HinfI-L, and AFA-HinfI-S, except for the CNM family sequence. (XLSX)