Chicken rRNA Gene Cluster Structure

Ribosomal RNA (rRNA) genes, whose activity results in nucleolus formation, constitute an extremely important part of genome. Despite the extensive exploration into avian genomes, no complete description of avian rRNA gene primary structure has been offered so far. We publish a complete chicken rRNA gene cluster sequence here, including 5’ETS (1836 bp), 18S rRNA gene (1823 bp), ITS1 (2530 bp), 5.8S rRNA gene (157 bp), ITS2 (733 bp), 28S rRNA gene (4441 bp) and 3’ETS (343 bp). The rRNA gene cluster sequence of 11863 bp was assembled from raw reads and deposited to GenBank under KT445934 accession number. The assembly was validated through in situ fluorescent hybridization analysis on chicken metaphase chromosomes using computed and synthesized specific probes, as well as through the reference assembly against de novo assembled rRNA gene cluster sequence using sequenced fragments of BAC-clone containing chicken NOR (nucleolus organizer region). The results have confirmed the chicken rRNA gene cluster validity.

organizing regions (NOR) in chromosomes and the functional status of NOR plays as an indicator of the physiological status of cells, tissues and the entire organism at various ontogenetic stages [2][3][4][5][6]. The precise structure of ribosome cluster contributes to advanced analysis of NOR processes.
All animal ribosome clusters are known to have fundamentally similar structure (Fig. 1). The structure of these clusters is based on sequences of RNA encoding conservative genes (18S rRNA, 5.8S rRNA and 28S rRNA) divided by internal transcribed spacers (ITS1 and ITS2) and flanked by external transcribed spacers (5'ETS and 3'ETS). All listed elements transcribe into a single RNA predecessor, pre-rRNA. Both internal and external transcribed spacers feature high structural variability, which accounts for ribosome cluster length variation within a wide range from 8 to 14 thousand base pairs (bp). The clusters are separated from each other intergenic spacers (IGS) containing promoter and terminator regions for RNA Poll that transcribes pre-rRNA [7].  Singer & Berg, 1991) Despite extensive studies of vertebrate genomes conducted recently [8], ribosome cluster exploration remains a complicated task, primarily due to high repetitivity and extensive length of the clusters as well as faster spacer evolution [1]. So far GenBank [9] has contained annotated complete rRNA gene cluster sequence only for a limited number of vertebrate species including Homo sapiens (GenBank accession number: HSU13369), Mus musculus (GenBank accession number: BK000964) and Nothobranchius furzeri (GenBank accession number: EU780557). Yet, no description of the complete ribosomal cluster sequence has been offered so far for such a major taxon as Aves.
Chicken ribosomal cluster deserves special research focus among Aves class representatives. Chicken is an important organism for agriculture and in addition, it is well used in extensive range of biomedical research including developmental biology, genetics, cell biology, histology, virology and so on [10,11]. Chicken genome is firstly sequenced in avian and one of the first among vertebrata genomes [12]. Gallus gallus 4.0 assembly has been used as a reference genome for assembling sequenced genomes of other birds [13]. Chicken NOR was localized on the chromosome 16 (GGA16) [14][15][16][17] and this was recently confirmed by FISH using WAG137G04 BAC clone as a probe [18]. However, the published assembly of Gallus gallus 4.0 genome [12] has decoded only 5% of GGA16 chromosome sequence and does not include the NOR region [18].
So far, complete description of the primary structure of chicken ribosomal cluster has not been reported. GenBank contains annotated sequences for individual fragments of chicken 18S rRNA and 28S rRNA genes. Besides, two groups of authors have contributed to GenBank annotated sequences of ITS1 and ITS2 spacers and 5.8S rRNA gene (Accession number: DQ018752 -DQ018755; FJ008990). However, unfortunately, these sequences are different from each other.
During the course of the "ChIP-sequencing with CENP-A from chicken cells containing neocentromeres on Z chromosome" project (NCBI BioProject accession number: PRJDB2279) [19], we obtained multiple raw chicken sequences, most of which are annotated in Sequence Read Archive (SRA accession numbers: DRX001860 -DRX001863). Based on these sequences as well as sequences of unannotated contigs from the Gallus gallus 4.0 genome assembly (NCBI WGS accession number: AADN00000000.3) we may clarify a complete structure of chicken ribosomal cluster.
In this study we have assembled and described the complete structure of chicken ribosomal cluster on using an integrated assembly of raw reads available in SRA, WGS contigs and Nucleotide database sequences (GenBank). Finally, we validated our analysis by FISH (fluorescent hybridization) on mitotic chromosomes. The results of this study may contribute notably to expansion of avian use as a model object, in particular, for exploring NOR regulation in ontogenesis. The output data would be useful genetics and evolutional biology.

Cluster assembling based on published data
To assemble a complete chicken ribosomal cluster we used the sequence library [19] which had been earlier generated of raw reads (SRA accession number:  Table 1).
The search for raw sequences and WGS contigs homologous to the ribosomal cluster elements was performed using BLAST [22]. For raw read tiling, alignment of contigs and annotated sequences and nucleotide structure determination, UGENE 1.16.1. [23] and Mega 6.06 [24] were used. Repeat search and typing was performed in Repeatmasker 4.0.5 [25]. Nucleotide sequence secondary structure was recreated in Mfold [26] software.

Experimental confirmation of assembly accuracy
To confirm the accuracy of our assembly and the location of the assembled sequence to the NOR on chromosome GGA16, serial fluorescent hybridization in situ (serial FISH) of assembled sequence fragments and WAG137G04 BAC clone known to include a chicken GGA16 fragment comprising NOR [18], was applied to chicken mitotic chromosomes.
The mitotic chromosomes were obtained from fibroblasts of a four-day chicken embryo by standard procedure.
Probes to the assembled sequence were PCR amplified boundary areas of 5'ETS-18S rRNA and ITS1-5.8S rRNA. PCR primers were designed based on the assembled rRNA cluster ( Table 2)

Sequencing of PCR products was carried out at the Molecular and Cell
Technology Development Saint-Petersburg State University Resource Center. Fluorescent probes were generated by labeling PCR products with modified biotin-16-dUTP nucleotide (Sileks) during amplification procedure.
For establishing whether the fluorescent probe hybridization had taken place at NOR on GGA16, re-FISH with WAG137G04 BAC clone probe was carried out on the same preparations.
FISH results were investigated using DM4000В (Leica) epifluorescent microscope in the Chromas Saint-Petersburg Recourse Center. The images were processed and superposed using Adobe Photoshop CS5.1 software.

Results and discussion
The assembled and annotated cluster is 11444 bp in size and contains complete

18S rRNA
We used the raw sequences deposited to SRA under accession number DRX001863 to assemble the complete sequence of 18S rRNA gene. To confirm accuracy of our assembly, six WGS contigs and three annotated sequences of fragments of this gene were used in addition to raw sequences (Table 1)

28S rRNA
The structure of chicken 28S rRNA sequence was assembled using raw sequences from SRA and involved six WGS contigs, five annotated sequences from GenBank and 28S rRNA gene sequence (Table 1). These WGS contigs and annotated sequences covered 76.7% and 69.7% of the assembly respectively (Additional file 1). Within the designated boundaries (Fig. 4)

ITS2
Chicken ITS2 assembly from raw sequences was validated using two overlapping WGS contigs (Table 1), assembly coverage being 100% (Additional file 1). We excluded from the sequences deposited in GenBank with accession numbers DQ018753, DQ018755 and FJ008990, which are annotated as containing chicken ITS2. The reason was that the related gene flank sequences were found nonhomologous to 3'-end of 5.8S rRNA sequence and to 5'-end of chicken 28S rRNA.
Within the designated boundaries (Fig. 6)   Chicken ITS1 sequence has proved to be more extended than that of most animals, with the exception of marsupials [29]. Chicken ITS2 has also proved to be far more extended than it had been reported previously (GenBank accession numbers: DQ018753, DQ018755, FJ008990). Our attempts to amplify and obtain complete chicken ITS1 and ITS2 sequences using traditional approaches have been unsuccessful. Our findings suggest that the key problem in avian ITS1 and ITS2 amplification and sequencing may probably be related to high CG pair content and secondary structure formation. These factors impact polymerase effect in PCR process and increase the probability of AT-enriched regions non-specific amplification. This factor is quite likely to be the main reason for the current unavailability of annotated extended avian ribosomal cluster sequences in GenBank. Representativity of animal ITS1 and ITS2 sequences in GenBank is shown on the Table 3.
Notably, among the eight avian sequences deposited to GenBank, only three actually belong to the ribosomal cluster, namely to ITS1 sequence. At the same time, complete deciphering of the ribosomal cluster sequence in different groups of organisms is of a great value for various fields of biology, primarily systematics, phylogeny and ecology. Our alignment of chicken rRNA gene complete cluster might be useful for comparative research in these fields. In the majority of animals, including birds, the order of alternation of coding and spacer sequences within rRNA gene clusters is conserved. Yet spacer sequences, particularly ITS1 and ITS2, are characterized by high rates of variability. Due to this they are used extensively as DNA barcodes as well as nuclear markers of micro-and macroevolutonary events [30][31][32][33]. However, the features of avian ITS1 and ITS2 do not allow treating their sequences as easily accessible and effective DNA barcodes or phylogenetic markers for this class.

External transcribed spacers (ETS) 5'ETS
The validity of our chicken rRNA 5'ETS assembly was confirmed using seven WGS contigs (Table 1). Additionally, we used a chicken sequence (GenBank accession number: DQ112354) containing the entire RNA PolI promoter and a pre-rRNA initiator site [11]. These WGS contigs and annotated sequences covered 63.9% and 14.8% of the assembly respectively (Additional file 1). Within the designated boundaries (Fig. 7)

3'ETS
Chicken 3'ETS sequence was assembled from raw reads. The size of the initial assembly was 481 bp. Yet we were unable to define precisely the position of the 3'-end of 3'ETS due to lack of data on the sequence of Sal1-box, the variable element of termination of the transcription by RNA-Pol 1 [34].

Fluorescent in situ hybridization
To verify the relation of the assembled sequence to chicken ribosomal cluster and its NOR location on GGA16, we applied serial re-FISH to chicken mitotic chromosomes. Probes for boundary areas of 5'ETS-18S rRNA and ITS1-5.8S rRNA within the assembled sequence were obtained from genome DNA by primer synthesis. Their sizes were 196 bp and 170 bp respectively. The primers were calculated based on the assembled rRNA cluster. Probe identity to the related regions of the assembled rDNA cluster was validated by standard sequencing. The NOR detection probe was produced by DOP-PCR amplification of BAC-clone WAG137G04 known to include a chicken GGA16 fragment comprising NOR [18].
On chicken mitotic plates, both the spacer and NOR probes hybridized on the same sites in two microchromosomes (Fig. 8). Our findings strongly confirm the reliability of our assembly of chicken ribosomal cluster.

Conclusion
In this work, we have determined, verified, and featured the complete sequence of chicken ribosomal cluster (Fig. 9). Codings of 18S, 5.8S and 28S rRNA gene sequences have typical for higher vertebrate structures. Both ITS1 and ITS2 were found to be of a longer size and GC higher content. As a result, they have a complicated secondary structure preventing their PCR analysis and consequently their use as phylogenetic markers. It also makes chicken ribosomal genome analysis complicated in total. It seems more promising to use a relatively less GCenriched 5'ETS sequence for the above purpose. Meanwhile, ITS1, ITS2 and 3'ETS sequences revealed similarities in the GC, CpG and repeated sequence contents.
Thus it is possible to suggest the existence of a general evolutionary mechanism supporting the spacer constant nucleotide proportions within avian rDNA genome.

Availability of supporting data
The data sets supporting the results of this article are included within the article and its additional files.