A Single Transcriptome of a Green Toad (Bufo viridis) Yields Candidate Genes for Sex Determination and -Differentiation and Non-Anonymous Population Genetic Markers

Large genome size, including immense repetitive and non-coding fractions, still present challenges for capacity, bioinformatics and thus affordability of whole genome sequencing in most amphibians. Here, we test the performance of a single transcriptome to understand whether it can provide a cost-efficient resource for species with large unknown genomes. Using RNA from six different tissues from a single Palearctic green toad (Bufo viridis) specimen and Hiseq2000, we obtained 22,5 Mio reads and publish >100,000 unigene sequences. To evaluate efficacy and quality, we first use this data to identify green toad specific candidate genes, known from other vertebrates for their role in sex determination and differentiation. Of a list of 37 genes, the transcriptome yielded 32 (87%), many of which providing the first such data for this non-model anuran species. However, for many of these genes, only fragments could be retrieved. In order to allow also applications to population genetics, we further used the transcriptome for the targeted development of 21 non-anonymous microsatellites and tested them in genetic families and backcrosses. Eleven markers were specifically developed to be located on the B. viridis sex chromosomes; for eight markers we can indeed demonstrate sex-specific transmission in genetic families. Depending on phylogenetic distance, several markers, which are sex-linked in green toads, show high cross-amplification success across the anuran phylogeny, involving nine systematic anuran families. Our data support the view that single transcriptome sequencing (based on multiple tissues) provides a reliable genomic resource and cost-efficient method for non-model amphibian species with large genome size and, despite limitations, should be considered as long as genome sequencing remains unaffordable for most species.


Rare availability of amphibian genomes
The availability of whole genome sequencing has greatly facilitated evolutionary studies in non-model organisms [1]. Besides comparative research involving entire genomes (e.g. [2]), applications include candidate gene approaches and the development of specific marker sets for population genetics and speciation research. However, for some groups of non-model organisms, like most amphibians with very high DNA content [3], whole genome sequencing still is bioinformatically challenging and costly, and very few complete genomes are available. Due to their large proportion of repetitive sequences, assembly and proper annotation of such genomes remain difficult. Currently only two assembled and annotated amphibian genomes have become available: that of the model anuran, the tropical clawed frog, Xenopus tropicalis (Pipidae) [4], and recently that of the Tibetan frog, Nanorana parkeri (Dicroglossidae) [5].
Transcriptome sequencing as genomic resource for large-genome nonmodel species Genetic and genomic applications often involve candidate gene approaches, in which the role of genes, known from related or model species, is examined in association studies for nonmodel species. In taxa with huge, unknown genomes, where whole genome sequencing still remains unaffordable, a useful resource may be transcriptomics (e.g. [6]). As transcriptomes only include exons and UTRs, sequencing efforts can be restricted to a small portion of the genome, and thus to the exclusion of hard-to-assemble non-coding intronic and intergenic repetitive regions. Amphibian transcriptomic information is expected to be well-annotatable. Given that many restrictions for whole genome approaches include most amphibians, in the present paper, we aimed at testing the usefulness of a single anuran transcriptome for obtaining a list of candidate genes on sexual development and sex determination.
As another application, we aimed at the targeted development of non-anonymous transcriptomic microsatellites [7,8], namely sex-specific and sex-linked markers. They appear less susceptible to phylogenetic noise than SNPs [9] and are applicable to population genetics and to study inheritance patterns [10,11].
As our target species, we have chosen a diploid bufonid toad, the European green toad Bufo viridis. Importantly, no genome or transcriptome of the species or of related taxa from this systematic anuran family have so far been characterized. Palearctic green toads (B. viridis subgroup) form a complex of at least 12 mitochondrial lineages that inhabit large parts of Europe, North Africa and Asia with several diploid clades as well as tri-and tetraploid lineages [12]. Multiple natural hybrid zones of lineages with different divergence time can be found in this group [13,14], making it a highly attractive amphibian radiation to study speciation and sex chromosome evolution under diploid [15] and polyploid hybridization [16]. In this paper, we present 11 markers located on the sex chromosome (with sex-linkage proven for 8 markers) and 10 autosomal markers for the B. viridis subgroup, most of which are transcriptome based and polymorphic. Sequencing of a single transcriptome RNA was extracted from six different tissues (liver, eye, brain, testes, heart, muscle) of a single male Bufo viridis, caught south of Astros, Greece (37.38299 N, 22.7455 E) in March 2011. RNA was pooled and adjusted to equal concentrations. Complementary DNA (cDNA) was synthesized and sequenced by BGI (BGI-Hongkong Co., Ltd.), using the Hiseq2000 sequencing system (Illumina, San Diego, USA). Reads were assembled using SOAPdenovo [17]: contigs (longest assembled sequences without Ns) were linked into scaffolds by mapping reads back to contigs and combining paired-end information. Scaffold length was estimated according to the average segment length of each pair of reads, and unknown bases were filled with Ns. Gaps in scaffolds were filled using paired-end reads, resulting in so-called unigenes. Raw reads were transmitted to the NCBI Sequence Read Archive (SRA, http://www.ncbi.nlm.nih.gov/Traces/ sra/) under the accession Number SRS1034407, assembled sequences were transmitted to NCBI Transcriptome Shotgun Assembly Sequence Database (TSA, http://www.ncbi.nlm.nih. gov/genbank/tsa), BioSample accession for the project is SAMN03993917.

Functional annotation and classification
To predict protein functional information from the annotation of blast hits, unigene sequences were searched against four protein databases (Nr, Swissprot, KEGG, COG) using Blastx [18] with an E-value threshold of E-05. Blast2GO [19] as applied for GO annotation on the basis of the results of the Nr annotation. GO functional classification of all unigenes was performed using WEGO [20].

Genes involved in sex determination or sex differentiation
To evaluate the usefulness of a single, though multiple tissue-based, amphibian transcriptome for candidate gene approaches, we aimed at retrieving genes involved in male or female sex determination and sexual differentiation (list from M. Schartl, pers. comm., modified for anurans). Template sequences from the anuran model species Xenopus tropicalis or X. laevis were retrieved from Xenbase [21]. Sequences were blasted against a local database of all B. viridis unigenes using tblastn [18]. Using NCBI blastn, homology of unigenes from the resulting blast hits was confirmed. Only best blast hits with an E-value lower than E-05 were considered homologous. Blast hits were not considered homologous when there were other blast hits for different genes with E-values less then 10000 times larger.

Development of transcriptome-based microsatellite markers
Despite 150 million years (My) of divergence between bufonid, hylid and ranid anurans, chromosome number and interphase banding patterns suggest a high degree of structural conservation of their genomes [22]. Even after 200 My of separation, homology between large parts of X. tropicalis chromosome 1 (in which it is not the sex chromosome) and the sex chromosome of all examined diploid species of the B. viridis subgroup has been shown [15,23]. Lacking more closely related relatives, the genome of X. tropicalis (and X. laevis) served as reference for B. viridis. Transcriptome data were first searched for microsatellite-like repetitive sequences (SSRs) using Misa (MIcroSAtellite identification tool, URL: http://pgrc.ipk-gatersleben.de/ misa/). Repeats were filtered to exclude short repeats and duplicate entries. Sequences of repeat-containing unigenes were searched against the transcriptome of X. tropicalis [4] using the local alignment software Blast+ 2.2.25 [18] with the options "-evalue = 0.01 andtask = blastn" to identify homologous transcripts in X. tropicalis. The resulting best blast hits (gene name abbreviations) were searched against a mapping file (a list of gene names, synonyms and abbreviations, X. tropicalis start and end position and length in bp) available from Xenbase [21], which allowed assignment to a specific gene and its genomic location in the X. tropicalis-genome. Different markers were then selected over the whole length of X. tropicalis linkage group 1. For comparative studies of sex chromosomes and autosomes, additional markers on several autosomal linkage groups were developed.
Because of the relatively low degree of polymorphisms in transcriptomic repeat sequences [7], different strategies were applied to increase the amount of polymorphic markers among candidates. Using Blast+ [18] with the megablast algorithm, candidate unigene sequences were searched against a database of the raw B. viridis Illumina reads. The sequences of the resulting Blast hits were mapped onto the unigene sequence using Geneious 7.06 (Biomatters). Ambiguities represented by a ratio of about 50%: 50% of raw Illumina reads in the resulting alignment were interpreted as heterozygous alleles in the individual diploid genome, from which the transcriptome was generated. It was expected, that markers, which were polymorphic within the single individual were likely to differ also between lineages. Furthermore, Serv [24] was applied, an algorithm, which uses a nonlinear model to calculate a numeric VARscore from intrinsic sequence properties. The VARscore predicts repeat variability and sequences, which were not already polymorphic within the single transcriptome, were selected for a high VARscore. Intron-exon boundaries of the repeat-containing unigenes were determined using the X. tropicalis genome [4] for these candidate genes using the Ensembl genome browser [25]. As amplification of genomic DNA markers fails if intron lengths and positions are not taken into account, X. tropicalis exons were aligned to the B. viridis unigene sequences using Geneious. PCR-primers for suitable marker candidates were then developed using Primer3 [26]. Primers were tested in 10 μl gradient PCRreactions containing 1 μl B. viridis (or other green toads') DNA (10 ng/μl), 1 μl 10x TopTaq-PCR-Buffer (with 15 mM MgCl 2 ; Qiagen), 1 μl dNTP-Mix (2.5 mM of each dNTP), 0.5 μl forward primer (10 pmol/μl), 0.5 μl reverse primer (10 pmol/μl), 0.05 μl TopTaq polymerase (Qiagen) and 5.95 μl RNAse free water. The PCR protocol reads: initial denaturation at 95°C for 3 min, 35 cycles (including denaturation at 94°C for 30 s, annealing at 48-68°C for 60 s and elongation at 72°C for 40 s), and final elongation at 72°C for 10 min. PCR-products were screened for optimal annealing temperatures using agarose (2%) gel electrophoresis. Markers were tested for polymorphisms by direct sequencing of PCR-products and at least five individuals of four green toad species (B. viridis, B. balearicus, B. siculus and B. variabilis) were genotyped. For sequencing reactions, 25 μl PCR reactions were conducted, containing 2.5 μl DNA (10 ng/μl), 2.5 μl 10x Top-Taq-PCR-Buffer (with 15 mM MgCl 2 ; Qiagen), 2.5 μl dNTP-Mix (2.5 mM of each dNTP), 1.25 μl forward primer (10 pmol/μl), 1.25 μl reverse primer (10 pmol/μl), 0.125 μl TopTaq polymerase (Qiagen) and 14.875 μl RNAse free water. The PCR-protocol comprised initial denaturation at 95°C for 3 min, 35 cycles (including denaturation at 94°C for 30 s, annealing at optimal annealing temperature for 60 s and elongation at 72°C for 40 s), and a final elongation at 72°C for 10 min.
PCR-products were purified and Sanger sequenced by Eurofins MWG (Ebersberg, Germany) using one or both of the original PCR primers for sequencing reactions. Sequences were aligned to the transcriptomic unigene sequences using Geneious and manually checked for indels, i.e. desired length polymorphisms.

Tests for sex linkage
To test whether markers located on X. tropicalis linkage group 1 are really located on the largely homologous B. viridis sex chromosome, sex-specific transmission was analyzed. In an XY-sex determination system, the father's Y chromosome is always transmitted to male, while the X chromosome is always transmitted to female offspring. This was tested by genotyping previously generated genetic families with parents and offspring of known phenotypic sex [27,28,13]. Genotyping of all PCR-products was performed on an ABI 3500xL capillary sequencer (Applied Biosystems) with 0.5 μl diluted PCR-products in 9.25 μl Hi-Di formamide (Applied Biosystems) and 0.25 μl GeneScan 500 ROX size standard (Life Technologies). Genotypes were scored using Genemapper 4.1 (Applied Biosystems). Forward Primers of genotyping reactions were fluorescently labeled.
Cross-amplification was tested in 10 μl gradient PCR-reactions containing 2 μl DNA (10 ng/μl), 1 μl 10x TopTaq-PCR-Buffer (with 15 mM MgCl 2 , Qiagen), 1 μl dNTP-Mix (2.5 mM of each dNTP), 1 μl of 10x CoralLoad Concentrate (Qiagen), 1 μl of 5x Q-Solution (Qiagen), 0.4 μl of MgCl 2 (25mM, Qiagen), 0.5 μl of each primer (10 pmol/μl), and 0.05 μl TopTaq polymerase (Qiagen) and 2.55 μl RNAse free water. We considered markers to be specific when gel electrophoresis showed a clear band in proximity to the fragment size of B. viridis used as positive control on the same gel with 1 kb DNA HyperLadder (Bioline). Double or multiple bands and much larger or smaller bands suggested unspecific amplification. If a band in the right size could be determined among multiple bands the marker was declared as inoperable. To determine exact marker sequences in B. bufo, we used the conditions described above for green toad sequences. Sequences of the markers which could be successfully cross-amplified in B. bufo were aligned to the B. viridis unigene using sequences homologous to X. tropicalis-LG1-genes. Mapping of the sequences was done in Geneious 7.0.6.

Amplification success versus divergence time
To test inverse correlation of amplification success and evolutionary divergence, we took the divergence times of systematic anuran families from www.timetree.org. Amplification success is measured by the proportion of markers that amplified in each species to the total number of tested markers (S3 Table) developed for B. viridis. Maximum amplification success would be 10 out of 10 markers. Because of the proportional nature of the data, we calculated a generalized linear model (GLM) with binomial errors and a logit link (ANOVA, χ2). Statistical analysis was carried out in R version 3.1.0 [29].

Results Transcriptome
The complete transcriptome data comprise 22,421,688 reads with a total length of 2,242,168,800 bp. The GC-content of the reads was 46.83%, 92.33% of all bases had a quality score of at least Q20. Assembly and scaffolding resulted in 110264 unigenes with a total length of 45,245,144 bp. Unigenes had an average length of 411 bp and a N50 of 504 bp. A total of 86707 unigenes ranged in length from 150 to 7508 bp (S1 Fig)

Sequence Annotation
Sequence annotation revealed significant similarities to a large number of proteins from four protein databases (Table 1). Based on blast hits of the Nr database, 10541 unigenes (9.56%) were assigned to gene ontology terms.
Regarding the molecular function ontology, the most common classifications were cellular processes (8156), metabolic processes (6713) and biological regulation. The most common categories for the cellular component ontology were cellular (10437)

Genes involved in sex determination or sex differentiation
Of a list 37 genes, ca. 87% (32) had significant blast hits in the B. viridis transcriptome, many of which providing the first sequence data for these genes in this non-model anuran species.
Compared to the length of Xenopus mRNAs used as BLAST templates, the majority of the B. viridis unigenes is shorter (between 10 and 92%) and likely represents fragments. In contrast, several unigenes are also longer, which may be indicative of insertions, for example by the extension of repetitive regions ( Table 2).

Non-anonymous microsatellite markers and sex linkage
For comparative studies of sex chromosomes and autosomes, markers on several autosomal linkage groups, with emphasis on linkage group 2, were developed. In total, twenty-one microsatellite markers were generated from sequences with repetitive patterns found in the transcriptomic dataset (Fig 1, Table 3). Additionally, marker BvIno80b was generated to amplify an indel present in the transcriptome data. All of these microsatellites were polymorphic among B. viridis, B. balearicus, B. siculus and B. variabilis. Markers BvDMRT1, BvVLDLR and BvCHD1 were intron based indel-targeting markers, equally developed from transcriptome data by sequencing introns with primers matching flanking exons. Markers BvCamk4, BvDMRT1, BvCherp, BvHNRNPD and BvVLDLR show a signature of sex-specific transmission of alleles in the offspring of Si337 (mother, B. siculus) and Cr13 (father, B. siculus x B. balearicus). Sex-specific transmission is not perfect since single male specific alleles can be found in females and also single males without specific alleles are present. The marker BvIno80b showed no sex linkage in B. viridis and was therefore considered to be autosomal (S1 Table). For the sex-linked marker C223, allele 163 was perfectly linked to allele 225 of marker BvEll2, and allele 171 linked to allele 162 of the marker BvGar1 (S2 Table).

Cross-amplification
Cross-amplification trials in representatives from nine systematic anuran families showed variable success for different markers. The number of amplifying markers decreases with increasing evolutionary distance (Fig 2). In B. bufo (23.3 My divergence time to B. viridis) 9/11 markers amplified. In the second bufonid, Rhinella marina (37.3 My), 8/11 markers amplified. The amplification proportion for Dendrobatidae (58.6 My) was at a mean of 4.5/11 markers. For Hylidae and Ranidae (77.9 My) mean amplification success was 5.3/11 markers. In the most distant group consisting of Bombinatoridae and Pipidae (223.2 My) the mean amplification proportion was 3.5/11 and in Pelobates fuscus (Pelobatidae, 196.1 My) 2/11 markers amplified successfully.

Discussion
The present study aimed at testing whether a single transcriptome of a non-model anuran species may present a reasonable resource for amphibians, where whole genome sequencing remains unaffordable due to large genome size. We have first tested this by retrieving a list of candidate genes, known for their role in sex determination and sex differentiation, and second by the targeted development of non-anonymous microsatellite markers. In terms of completeness, i.e. the presence of at least a fragment of a certain gene, our approach has provided reasonably sufficient data, as we could retrieve 87% of the desired genes. However, and somewhat expected, we face two problems regarding this list: i) deep divergences (ca. 200 My) to the model species may generally lower the blast hit success due to strong sequence divergence, ii) some expected sequences are either completely missing from the transcriptome or have become only available in small portions. For the development of non-anonymous microsatellite markers, the single transcriptome has been useful. Since these markers are exonic (or UTR-linked), their genomic position can be mapped to the few available and only remotely related reference genomes. In this way, even in non-model species, genomic positions can be inferred based on structural conservation (synteny). Classical short polymorphic tandem-repeated DNA sequences (microsatellites) stem mostly from the non-coding, i.e. >98% of vertebrate genomes [30]. While these multiallelic For markers written in bold, sex linkage was confirmed in green toads (main text for details). Marker names include the gene name abbreviation according to Xenbase. Linkage group size is relative to the proportion of the respective scaffold represented in the X. tropicalis genome v. 7.1 [4]. In green toads, "(Ino80b)" was found to be unlinked to other scaffold 1-homologous markers. markers can be easily genotyped, their genomic location usually remains unknown ("anonymous"). Thus, cross-amplification and consequently applicability to diverged lineages and species are often limited. Since exonic regions show a larger degree of sequence conservation, transcriptome-based microsatellites can ensure cross-amplification better than classical noncoding markers [31,8]. Targeted development of green toad markers homologs on X. tropicalis LG 1, largely homologous to the B. viridis sex chromosome, has been successfully accomplished [23,15]. For a subset of these markers, sex linkage could be demonstrated in genetic families: 7 out of 8 markers show expected sex-specific transmission in B. viridis (S1 and S2 Tables). Sexspecific transmission is not perfect in the genetic family Si337xCr13 (S1 Table), likely due to meiotic recombination along the B. viridis sex chromosomes, which has been shown in previous studies [28,23]. Sex linkage of the 7 B. viridis microsatellites corroborates evidence for conserved synteny across the anuran LG 1 for Palearctic green toads. Only marker BvIno80b, located at the distal tip of X. tropicalis LG1 (Fig 1), shows no sex linkage, presumably a result of genome rearrangements.
The new transcriptome-based microsatellite markers also show a high cross-amplification success in several representatives across the anuran phylogeny. In other bufonids, Bufo bufo and R. marina, most markers could be successfully genotyped. Even in Xenopus tropicalis, a species with more than 200 My of divergence to B. viridis, three markers have still amplified successfully (S3 Table). Cross-amplification success inversely correlates with evolutionary distance [8]. Similar results have been shown for anonymous microsatellites in other vertebrate groups such as fish [32], crocodiles [33], birds [34,35] as well as in other amphibians [36,37]. Cross-amplification success among tree frogs declines by 50% after about 40 My [8]. The 'halflife' of our marker set tested for cross-amplification is about 120 My (predicted from regression). Despite the relatively small number of data points to test this relationship, our results suggest that cross amplification trials for transcriptome-based PCR products may be a convenient alternative for developing nuclear markers in anurans.

Conclusions
Our data support the view that single, namely multiple-tissue based transcriptome sequencing is a reliable genomic resource and cost-efficient method for non-model amphibian species with large genome size and despite limitations, should be considered as long as genome sequencing remains unaffordable for most species.  Table. Markers developed for B. viridis that cross-amplified in the tested anuran species. In all ten examined species at least some of the markers showed bands in the agarose gel. Gradient PCRs were done with all markers to determine the optimal annealing temperature (T A ) and the T A range, in which amplification is possible. The approximate fragment length is estimated from the gel pictures. (DOCX)