The Embryonic Transcriptome of the Red-Eared Slider Turtle (Trachemys scripta)

The bony shell of the turtle is an evolutionary novelty not found in any other group of animals, however, research into its formation has suggested that it has evolved through modification of conserved developmental mechanisms. Although these mechanisms have been extensively characterized in model organisms, the tools for characterizing them in non-model organisms such as turtles have been limited by a lack of genomic resources. We have used a next generation sequencing approach to generate and assemble a transcriptome from stage 14 and 17 Trachemys scripta embryos, stages during which important events in shell development are known to take place. The transcriptome consists of 231,876 sequences with an N50 of 1,166 bp. GO terms and EC codes were assigned to the 61,643 unique predicted proteins identified in the transcriptome sequences. All major GO categories and metabolic pathways are represented in the transcriptome. Transcriptome sequences were used to amplify several cDNA fragments designed for use as RNA in situ probes. One of these, BMP5, was hybridized to a T. scripta embryo and exhibits both conserved and novel expression patterns. The transcriptome sequences should be of broad use for understanding the evolution and development of the turtle shell and for annotating any future T. scripta genome sequences.


Introduction
Over the past thirty years, the mechanisms that underlie the fundamental processes of animal development have been identified and characterized at a molecular level in a select group of model organisms. Although the field of embryology traditionally investigated a diverse range of organisms the full power of developmental genetics has been brought to bear on developmental questions in only a few animal model systems [1][2][3]. Elucidation of the mechanisms underlying the development of morphological structures which are not found in model systems has, until recently, been limited by a lack of genetic and genomic resources in non-model systems.
The turtle shell is an evolutionary novelty restricted to the order Chelonia that first appears in the fossil record 210MYA [4,5]. The bony shell consists of the dorsal carapace and the ventral plastron. Each consists of a set of fused bones, some of which exist in other organisms and some of which are unique to turtles [6]. Understanding the evolution of the turtle shell involves answering fundamental questions about how new morphological structures develop. Did the evolution of the turtle shell require the innovation of new developmental programs or were existing programs modified in the Chelonians? If existing developmental programs were modified, which programs were recruited and how were they altered?
Work on shell formation in the red-eared slider turtle (Trachemys scripta) over the past decade suggests that the evolution of the turtle shell involved the co-option of highly conserved vertebrate developmental programs. The formation of the carapace represents a unique variation on vertebrate rib growth, coupled with existing programs of dermal ossification. The plastron originates in a different manner, as it appears to be derived from a late migrating population of neural crest cells, suggesting a similar origin for the plastron and facial bones [6].
The carapace is initiated by a bulge of mesodermal and ectodermal cells in the skin known as the carapacial ridge (CR). This turtle-specific structure is first seen on the flanks of the stage 15 embryo between the limbs [7,8]. Instead of curling ventrally around the thorax as is the case in other vertebrates, turtle rib precursor cells grow straight into the CR resulting in the lateral extension of the shell. Several genes with described functions in mesenchyme/epithelial interactions are expressed in the CR. This observation suggests that the CR forms similarly to limb buds [6,9]. Included in this set of genes are those encoding paracrine factors of the fibroblast growth factor (FGF), bone morphogenetic protein (BMP), and Wnt families. These are relatively small secreted proteins with demonstrated roles in developmental signaling in a wide range of organisms [6,[9][10][11].
Several lines of evidence suggest that signals from the CR are involved in the guidance of ribs into the CR of hard-shelled turtles. Local removal of the CR causes the ribs to enter adjacent regions of the CR [12], and the placement of tantalum foil between the developing ribs and the CR causes the ribs to migrate ventrally, as they do in most vertebrates [13]. The signal directing rib migration appears to be a FGF. Application of FGF inhibitors results in CR degradation and ventral rib migration suggesting an inductive role for FGFs in the CR. The application of FGF10 beads to developing chicken embryos resulted in altered rib guidance demonstrating that this process can be influenced by FGF signaling. Finally, the unusual expression of FGF8 at the tips of T. scripta ribs suggests a positive feedback loop between rib expressed FGF8 and CR expressed FGF10, an interaction involved in limb bud outgrowth in other species [14]. These results suggest that rib guidance in turtles relies on modifications of highly conserved FGF signaling pathways.
Similarly, ossification of the dermis between the flattened ribs forms the costal bones of the carapace and likewise appears to be mediated by well described genetic networks acting outside of their canonical vertebrate developmental compartments. The bone morphogenetic proteins (BMPs) are small secreted paracrine factors with demonstrated functions in ossification in model systems. BMPs are known to be secreted from the ribs during endochondral ossification [15]. The phosphorylation of Smad1 is a downstream event in BMP signaling. Smad1 phosphorylation in the dermis surrounding the ribs showed that BMP signaling is likely involved in turtle costal bone ossification and suggests that the ribs may be the source of these ossifying BMPs [14]. Confirmation of this hypothesis will require the development of in situ probes that distinguish between the various T. scripta BMPs.
The bones of the plastron are connected by sutures reminiscent of those that connect the facial bones of vertebrates. They appear to have their origin in a group of late migrating neural crest cells which can traced back to the neural tube at stages 16-17 [6,16]. The cells that produce the bones of the plastron express several molecular markers characteristic of neural crest identity including HNK-1, PDGFR-a, p75, and FoxD3 [17,18]. Given the similar morphology of the bones and the common developmental derivation of the cells that produce these bones, homology between the plastron bones and vertebrate facial bones has been suggested [6]. The identification of the source of the cells that make up the plastron, while clarifying some questions, raises many more questions that are dependent on the development of T. scripta molecular markers. Gilbert et al. (2007) suggest that the skeletogenic activity of these cells may depend on the down-regulation of Hox genes. As is true for the BMP genes, the ability to determine Hox gene expression patterns in T. scripta is limited by the lack of T. scripta gene sequences needed to make specific RNA probes and the potential for cross-reactivity when using antibodies generated in other species.
In addition, there are several other developmental alterations in the turtle-the origin of the new musculature in the neck and around the lungs, the repositioning of the appendicular skeleton within the ribs, and the lack of a general senescence syndromethat have not yet been investigated on a molecular level. There are limited genetic resources available for the study of turtles. Three turtle genomes (Chrysemys picta, Pelodiscus sinensis, and Chelonia mydas) have recently been published, although to date there is no published T. scripta genome [19][20][21]. A recent T. scripta brain transcriptome was used to support a phylogenetic grouping of turtles with the Archosaurs and significantly expanded the number of transcript sequences available for this species [22]. However, since the transcriptome was made from the brain of an adult turtle it is unlikely to contain many of the genes involved in embryonic development, many of which are expressed transiently. Genetic studies in Chelonians are difficult because turtles lay few eggs (which are available only during the breeding season) and take several years to become sexually mature. Developmental genetic studies done to date have used either antibodies from other organisms or relied on degenerate probes designed by comparing sequences from other organisms in the gene databases. In order to address the limited number of molecular markers available for working on T. scripta development we generated a turtle embryonic transcriptome using Illumina next generation sequencing. We used stage 14 and stage 17 embryos, an active period of induction and organogenesis, in order to ensure that genes involved in rib guidance, ossification of the carapace dermis, and early events in plastron formation would be captured in our data set. In this paper we describe the assembly and analysis of this transcriptome and identify several genes that should be useful markers for deepening our understanding of how the turtle makes its shell.

RNA Isolation, RNAseq Library Generation, and Next Generation Sequencing
Total RNA was isolated from stage 14 and stage 17 T. scripta embryos (Kleibert Alligator and Turtle Farm, Hammond LA) using TRI reagent (Sigma) according the manufacturer's recommended protocol. RNA was quantified using a Nanodrop-2000 (Thermo Scientific) and equal amounts of RNA from each stage were combined to generate a pooled RNA sample. Two mg of the pooled total RNA sample was used to construct an Illumina sequencing library using an Illumina's TruSeq RNA sample preparation kit (#RS-930-2001). Briefly, poly-A containing mRNA was purified from total RNA, the poly-A RNA was fragmented, double-stranded cDNA was generated from the fragmented RNA, and Illumina sequencing adapters were ligated to the ends of the fragments. The quality of the final purified library was evaluated using a BioAnalyzer 2100 automated electrophoresis system and quantified with a Qubit flourometer (Invitrogen). The library was sequenced in one 100 bp single end lane on a HiSeq 2000 (Illumina).
For all contigs longer than 250 bp the open reading frames most likely to encode proteins were identified using the transcript-s_to_best_scoring_ORFs.pl script distributed with the 2011-10-29 release of Trinity. The 20 best BLASTP matches for each predicted protein in the NCBI nr database (downloaded 2011-10-04) were identified using a local installation of Blast2 [24]. The Blast2 output was used as the input for Blast2GO [25] to assign gene ontology and IEC enzyme codes to proteins, to map enzyme code assignments onto KEGG maps, and to identify the organismal distribution of the best Blast2 hits.

Accession Numbers
The RNA-seq sequences have been deposited in the NCBI Sequence Read Archive as accession SRX121294 and the assembled transcripts are accessible in Genbank with accession numbers JW269948-JW501823.

Identification of Likely Homologs
Gallus gallus genes were identified in the NCBI protein database and used as BLAST queries to identify putative homologs in the T. scripta transcriptome. Homologs from zebrafish, humans, frogs, and the anole lizard were also identified when possible. These protein sequences were aligned using the Muscle algorithm [26] implemented in MEGA5 [27]. Excessively gapped positions were removed using trimAI and were used to build maximum likelihood phylogenetic trees using MetaPIGA version 3.1 [28]. Probability consensus pruning was performed using MetaPIGA default settings with the exception of using the General Time-Reversible (GTR) model for amino acid substitutions.

RT-PCR
RT-PCR was performed using a cDNA pool generated from RNA isolated from a stage 17 T. scripta embryo. Genes were amplified from the cDNA pool using Taq polymerase (NEB) for 35 cycles with a 60uC annealing temperature and a 1 minute extension time. Primers for each gene (Table 1) were designed to generate a 500-650 bp PCR product and have 65uC annealing temperatures using Primer3 [29].

Results
Total RNA from stage 14 and stage 17 [7] Trachemys scripta embryos was prepared separately, pooled and used to generate 188,674,651 single 100 bp sequences using an Illumina HiSeq 2000. These sequences were assembled without a reference genome using the Trinity package [23] which is capable of assembling and reporting allelic variation and alternatively spliced transcripts. Trinity produced 465,923 contigs with lengths over 150 bp. In these sequences 50% of the total sequence length was contained in the 61,333 sequences longer than 757 bp. Over half of the contigs were shorter than 250 bp and most of these short sequences did not code for proteins. We decided to remove all contigs smaller than 250 bp to simplify our analysis. This left 231,876 sequences with 50% of the total sequence length contained in 37,485 sequences longer than 1166 bp. A comparison of our assembly with ten T. scripta developmental genes that had already been deposited in Genbank showed that the embryonic transcriptome assembly covered 98% of the existing sequences and was 99% identical to them ( Table 2). Eight out of ten sequences had fewer than three differences between the existing and new sequences and four were identical. The length of the sequences in our assembly was longer than the existing sequences in every case. Assuming that the existing sequences are of high quality, these results suggest that not only is our assembly of high quality but that it also contains more complete contigs than existing Genbank sequences.
The existing T. scripta brain transcriptome is enriched for genes involved in nervous system function [22]. To investigate if the embryonic transcriptome is relatively enriched for genes involved in embryonic development we compared the same ten genes to the brain transcriptome sequences. Only two of these developmental genes (En1 with a 235/717 bp match and Sox9 with a 290/340 bp match) are represented in the brain transcriptome. Both are shorter sequences than the corresponding embryonic transcriptome sequences. The other eight sequences are not present. Comparing the two transcriptomes, 88% of all the sequences in the brain transcriptome are found in the embryonic transcriptome (with an average of 99% sequence identity and 93% coverage). Conversely, only 22% of the embryonic transcriptome sequences are found in the brain transcriptome (with an average of 99% sequence identity and 28% coverage). The larger embryonic transcriptome thus substantially increases the number of reported T. scripta transcript sequences and complements the existing brain transcriptome.
67,692 likely protein sequences were identified in the embryonic transcripts with an N 50 length of 394aa. We screened these protein sequences for duplicates and identified 6,049 duplicated protein sequences resulting in 61,643 unique protein sequences. Because we sequenced RNA from multiple embryos several alleles of each gene could potentially be present in the transcriptome. Since each protein was identified from a unique assembled transcript sequence these duplicates most likely represent synonymous allelic differences or sequence variation in non-coding regions. We used Blast2GO [25] to assign gene ontology (GO) terms and Enzyme Commission (EC) numbers to each predicted protein sequence. Blast2Go analysis was based on the results of a BLASTP search of each sequence against the Genbank non-redundant (nr) protein database. Recent phylogenetic analyses have placed turtles either close to Archosaurians (crocodilians+birds) or Lepidosaurians (lizards) in the tree of life [22,[31][32][33]. One prediction about our assembly is that the protein sequences should be most similar to one of these groups of organisms. The three species with the largest absolute number of top BLASTP hits are the Chicken (Gallus gallus), followed by the Carolina Anole Lizard (Anolis carolensis) and the Zebra Finch (Taeniopygio guttata). Since none of these species are model systems and thus are not especially well represented in the nr database, we normalized the number of hits to the number of proteins for each species in the NCBI protein database. Using this metric, T. scripta protein sequences are most similar to Wild Turkey (Meleagris gallopavo silvestris) sequences, closely followed by the Carolina Anole Lizard. If all three bird species are combined, however, T. scripta proteins are most similar to the Anole lizard, followed by the birds (Table 3).
Determining the completeness of a transcriptome in a new species is difficult because of a lack of reference genomic sequences. One prediction about a relatively complete transcriptome is that all of the major GO categories should be well represented. We assigned cellular component (CC), molecular function (MF), and biological process (BP) GO terms to each protein in the transcriptome. CC terms describe the predicted cellular location of a protein, MF terms describe the predicted function of each protein, and BP terms describe the biological pathways that proteins are predicted to participate in. All major cellular compartments, molecular functions, and biological processes are well represented in our transcriptome. Biological process annotations include 7,564 and 7,200 proteins annotated with cell communication and multicellular organism development functions, respectively (Table S1).
Another prediction about a complete transcriptome is that the enzymes that make up core metabolic pathways such as the TCA cycle should be well represented as the genes encoding these enzymes are expressed in all cells throughout development. We used Blast2Go to map each predicted protein onto the KEGG pathway database [34] which includes the TCA cycle as well as other core metabolic pathways. All of the enzymes required for the TCA cycle are represented in our transcriptome including, for example, both ADP and GDP forming Succinate CoA ligases (Table 4).
In order for the sequences in our transcriptome to serve as a useful resource for turtle developmental biologists they must enable the identification of homologues in other organisms and the generation of in situ probes. To demonstrate that our transcrip- tome can be used to identify homologs of developmentally important genes we queried the transcriptome with developmental protein sequences from several species (chicken, zebrafish, humans, frogs, and the anole lizard when possible). Several of the genes we were interested in identifying (e.g., BMPs and FGFs) are members of gene families. For genes in these families, we identified multiple transcripts for each query. To determine the placement of each transcript within the gene family we constructed phylogenetic trees based on protein sequence similarity of all of the gene family members we identified. In most cases, it was possible to determine which family member each turtle transcript was most similar to, and in most cases the T. scripta transcriptome contains complete or nearly complete coverage of all members of each gene family. As an example, one of the gene families we investigated was the BMP family which has been implicated in ossification of the carapace. We used BMP2-7 sequences from a range of vertebrates to query the transcriptome. In each case we identified a single T. scripta gene which clusters with family members from other species (Fig. 1).
To investigate if the transcriptome sequences could be used to amplify probes for use in in situ experiments we selected nine developmental genes, Gremlin, HoxA7, BMP4, BMP5, SOX2, RUNX1, FGFR1, SMAD3, and FGF2 (accession numbers JW357402,JW364078, JW321551, JW444478, JW460170, JW373558, JW459374, JW388739, and JW429145) and designed PCR primers to amplify each from a stage 17 cDNA pool. Using standard PCR conditions all of the genes apart from RUNX1 amplified and each produced a single dominant product except for  FGF2 which produced two bands (Figure 2). It is possible that the RUNX1 primers did not amplify a fragment because it is not expressed at stage 17. The amplification of a single dominant product in seven out of nine targets on the first try (a 77% success rate) is much more efficient than degenerate PCR approaches for probe production which often require extensive optimization.
Finally, a BMP5 probe was designed based on the predicted T. scripta sequence and used as an in situ probe on a stage 15 embryo. BMP5 expression is associated with the developing vertebrae in chicks and mice, and it is important in determining the curvature of the rib [35][36][37][38]. In addition to this conserved expression pattern in the vertebrae, turtle BMP5 is also expressed in the apical ectodermal ridges of the embryonic limb buds and in the margin mesoderm surrounding them (Fig. 3). This limb bud expression has not been reported in chicks or mice [39][40][41], suggesting an additional developmental role for this conserved gene in turtles.

Discussion
Understanding T. scripta development including the development of the plastron and carapace has been limited by a lack of genomic resources. Few sequences important for the study of embryonically expressed developmental genes were available before this study. We have used a next generation sequencing approach to assemble a high quality T. scripta transcriptome without a reference genome. These sequences were assigned putative functional annotations based on the predicted translation products. GO categories include all core cellular and molecular processes suggesting that the transcriptome is relatively complete for these functions. Classes of genes which are not expressed during the developmental stages we sampled would not be represented in this transcriptome.
We demonstrated that the sequences generated in this study can be used to design PCR primers with which we can amplify important developmental genes. This resource enables the design of in situ probes without resorting to degenerate PCR. We have used these sequences to design a BMP5 probe. The probe detects BMP5 expression both in expected locations in T. scripta embryos (vertebrae), but also in an unexpected location (the anterior limb buds). Further study of these expression patterns may shed light not only on shell development but also on other unique and previously undescribed mechanisms of turtle development.
The placement of turtles in the tree of life is controversial. Different data sets and methodologies, even from the same authors, result in different placements. Turtles have been grouped both with the lizards (Lepidosaurs) and with birds and crocodiles (Archosaurs), generally depending on whether morphological or molecular characters, respectively, were analyzed [22,[31][32][33]. A simple analysis of our transcriptome sequences shows that they are very similar to both lizard and bird sequences, consistent with either grouping. Given the limitations of both our transcriptome (it samples a limited set of developmental stages) and bird and lizard sequences, neither of which are 'complete', it seems unlikely that a more sophisticated analysis performed using our data will resolve this ongoing controversy.
We hope that this transcriptome provides a valuable resource for the T. scripta community both for developmental studies as well as for genome annotation in the future and is of use to other biologists interested in comparative genomics.

Supporting Information
Table S1 Cellular component (CC), molecular function (MF), and biological process (BP) GO categories assigned to proteins identified in the T. scripta embryonic transcriptome. (XLSX)