The authors have declared that no competing interests exist.
Conceived and designed the experiments: HH SR WRJ DC. Performed the experiments: HH SR JP CDS CN. Analyzed the data: HH SR. Contributed reagents/materials/analysis tools: WRJ. Wrote the paper: HH SR WRJ DC.
Neither genomic nor transcriptomic data are currently available for
In the context of a paucity of sequence information, understanding the evolutionary history of
Here we have sequenced cDNA libraries from several different developmental stages of
Animals were treated according to the French and European regulations for handling of animals in research. SR’s authorization for use of animals in research is number 91–116. Laboratory study uses exclusively embryos and early larvae from aquatic vertebrate (non-mammalian) animals and therefore did not require special authorizations. Field sampling was conducted with Mexican Permit Number 040396-213-03 granted to W. R. Jeffery. Fish were caught using nets. A small (4 mm2) tissue sample was excised from the caudal fin and stored in 100% ethanol before release of the fish at the point of capture. All efforts were made to minimize suffering.
Fish embryos and larvae were anaesthetized with MS222 (Sigma), immediately immersed in Trizol (Invitrogen), and frozen at −80°C. Fifty to 200 embryos/larvae originating from several independent spawns were pooled for each developmental stage of the two morphs. RNA extraction was performed using Trizol following the manufacturer’s instructions.
Eight libraries were constructed in a pCMV-SPORT6 derivative (polylinker region modified to include SfiI-sites for compatibility with directional cloning). This vector includes a CMV promoter for expression and a T7 promoter for antisense probe production. RNA was reverse-transcribed with Mint reverse transcriptase (MMLV-based, Evrogen) and cDNA was ligated into pCMV-Sport6 vector by LGC Genomics (Berlin). The Mint Universal cDNA Synthesis Kit and the Trimmer Normalization Kit (both from Evrogen) were used. The 8 ligation products corresponding to the 8 libraries (2 normalized, 6 non-normalized) were transformed into E. Coli DH10B phageT1 resistant bacteria at the Genoscope (Evry, France). Clones were arrayed onto 384 multiwell plates and sequenced using Sanger technology.
198,380 Sanger ESTs (Expressed Sequence Tags) were obtained from the sequencing of the 8 libraries. The mean length of reads was 1,364 bp. 44 additional
Sanger sequences were cleaned with Seqclean with the following options: i\ vector sequence pCMV_sport6, and ii\ contaminant sequences of yeast, E. coli536, and phage sequences from Genbank phage division. Then, low quality sequences at the extremities and very short sequences were removed with Prinseq
Assembly of Sanger sequences was carried out using TGICL software
These contigs were annotated with the Biotoul platform pipeline, firstly performing BLAST against the following databases: i\ Reference databases: UniProtKB, RefSeq Protein and RNA, Pfam; ii\ TIGR fishes databases; iii\UniGene fishes species; iv\ Ensembl fishes Transcripts (a detailed list of databases and versions is given in
The repeat sequences were detected by RepeatMasker
Non-singlets contigs were blasted against the zebrafish proteome (Zv9 assembly, downloaded from EnsEMBL
The first threshold was determined by estimating that the rate of error was mostly dependent on the Mint reverse transcriptase, which is supposed to make an error every 30,000 nucleotides
Regarding the second threshold, we estimated that at least 10 individuals of each morph had been involved in the breeding that gave rise to the sampled embryos. With a stringent assumption (no more than 10 individuals), 20 alleles at most would be present in the sampled embryos. It would thus be impossible to observe an allele with a frequency lower than 0.05.
The polymorphisms were then sorted into different classes: (1) “shared polymorphism” for positions at which both cavefish and surface fish sequences were polymorphic, and with the same alleles, (2) “divergent polymorphism” for positions where both cavefish and surface fish sequences were polymorphic but with different alleles, (3) “polymorphism in one morph only” when either cavefish or surface fish was polymorphic and the depth was equal or higher than 4 in the other morph, (4) “polymorphism in one morph, unknown status for the other morph” at positions where the apparently non-polymorphic morph had insufficient depth.
Fixed differences between the two morphs were also analyzed at positions where the depth for each one was at least 4, and where all cavefish shared the same allele and all surface fish shared another allele. Cavefish and surface fish transcripts were then translated into proteins and aligned with the corresponding zebrafish protein, which allows for eliminating the contig regions that were non-coding. The coding regions of the translated surface fish and cavefish contigs were then compared in order to identify non-synonymous substitutions. The amino-acid substitutions between the two morphs were oriented using
The same approach was applied to detect population-specific indels, but no such indels were found in the contig coding sequences aligned to zebrafish proteins.
Orthology relationships were verified for all the cited potential candidate genes using Neighbor Joining phylogenetic analysis with Mega5
Non-singlet contigs used in the polymorphism analysis were annotated for gene ontology term (GO term) using EnsEMBL BioMart, and contigs with substitutions modifying the protein sequence were analyzed for GO term enrichment using conditional hypergeometrical test of GOstats R package
Srd5a PCR was performed using a first set of primers allowing the amplification of exons 4 and 5 (Fw 5′
Our aim was double: (1) to generate cDNA libraries from biologically relevant developmental stages for surface fish and cavefish as a clone resource, and (2) to generate transcriptome data for analysis of the genetic basis of cavefish evolution. Therefore, we extracted total RNA from 4 different stages of surface fish and Pachón cavefish embryos and larvae chosen according to the
Fifty to 200 embryos/larvae originating from several independent spawns were pooled for each developmental stage and each morph, to be certain that the libraries were representative of the genetic diversity in the two
Eight cDNA libraries were generated, 6 non-normalized and 2 normalized (
A: Composition of the 8 Astyanax developmental cDNA libraries. Biological process (B) and molecular function (C) gene ontology pie charts of the 17,152 contigs annotated for GO term.
Approximately 19,000 clones of each non-normalized library, as well as 43,000 clones of each normalized library were sequenced by the Sanger method (
After removal of the vector, polyA sequences and poorly sequenced regions, the resulting ESTs had a mean length of 624 bp. 189,933 ESTs from all libraries were used to build 44,145 contigs. The mean length of the contigs is 985 bp, and the mean depth is 6.8. The contigs were annotated by BLAST analysis against several databases (see
Moreover, 17,152 contigs were annotated for gene ontologies, and these gene ontologies are varied, thus the contigs appear to be representative of the
The
To further exploit the
A: Number of polymorphic positions in the nucleotidic sequences of the two Astyanax morphs. B: Number of fixed nucleotide differences, shared polymorphisms and divergent polymorphisms.
As expected, polymorphic contigs were built with a relatively high number of ESTs from non-normalized libraries: 65.8% of the ESTs belonging to polymorphic contigs were derived from non-normalized libraries, whereas non-normalized libraries provided only 54.6% of the total number of sequenced ESTs.
Polymorphism was found to be approximately twice as high in surface fish compared to cavefish (
Among the 940 fixed differences, 716 are synonymous. As the closest species (zebrafish) that can be used as an outgroup diverged at least 100 Ma, we assumed that several parallel nucleotide substitutions and reversions may have occurred in each lineage. We thus did not try to infer the direction of these nucleotide changes and did not investigate further the synonymous differences.
However, some of the 224 non-synonymous changes might be responsible for phenotypic differences observed between the two morphs. In addition, a premature stop codon in a cavefish sequence was detected (
A: Alignment of surface fish and Pachón cavefish nucleotide sequences of the si:ch211–210c8.6 transcript. B: Alignment of surface, Pachón and zebrafish translated protein sequences.
We also looked for indels specific for one of the two morphs but this analysis did not reveal any indels in coding sequences.
For the 224 amino acid substitutions found, the protein sequences were aligned to zebrafish to infer the direction of the substitutions, based on the principle of parsimony. Among them, 87 mutations had occurred in the cavefish lineage, and 65 mutations had occurred in the surface fish lineage; 72 others could not be oriented because the zebrafish amino acid at the mismatch position was different from both surface fish and cavefish amino acids.
To detect bias in the proteome evolution of surface fish and cavefish, we performed a GO term enrichment analysis on the pool of 184 genes in which the 224 substitutions were found. Surprisingly, ATP synthases seem to be over-represented among proteins with surface fish/cavefish substitutions (
GO terms are ordered by p-value. GO terms represented only once are not shown here.
Among the 224 amino acid substitutions, we found 83 radical substitutions, i.e., that correspond to amino acids with distinct physicochemical properties in the two morphs (see
Within the 83 radical amino acid changes, 31 mutations had occurred in the cavefish lineage, and 22 mutations had occurred in the surface fish lineage. We found two genes for which the cavefish radical mutations are located at a highly conserved position (
Other cavefish mutations are located in conserved domains, but not at highly conserved positions: sec13, involved in protein trafficking, is mutated in a WD40 domain; capsla, the calcyphosine-like a, is mutated in a calcium-binding domain; the gametocyte specific factor 1 Gtsf1 is mutated in a zinc finger domain; and the c-Myc binding protein Mycbp is mutated in a coiled-coil domain.
We next performed a GO term enrichment analysis on proteins with radical cavefish mutations: it appears that proteins involved in carbohydrate metabolism are overrepresented (
Finally, the expression patterns of the 79 transcripts with radical substitutions between surface fish and cavefish were investigated in Zfin, the zebrafish reference database
number ofgenes | genes with no Zfin expression annotation | genes with Zfin expression annotation | genesexpressedin the eye | % of annotated genes | % of total | |
mutations in cavefish lineage | 31 | 10 | 21 | 11 | 52.4% | 35.5% |
mutations in surface fish lineage | 22 | 8 | 14 | 1 | 7.1% | 4.5% |
mutations not oriented | 28 | 9 | 19 | 3 | 15.8% | 10.7% |
2 genes contain 2 mutations, one that occurred in cavefish lineage, the other in surface fish lineage: these two genes are thus counted twice in this table.
We present here
We also describe genetic variations within and between the two morphs. Polymorphism in cavefish seems to be much lower than in surface fish, and we describe 940 fixed differences between surface fish and cavefish coding sequences, some of them being potentially involved in adaptation to cave life.
Among the proteins showing radical substitutions in cavefish, a third are potentially expressed in the eye, based on their expression patterns in the zebrafish
(TIF)
(TIF)
(TIF)
(TIF)
(DOCX)
(DOCX)
The authors would like to thank Stéphane Père and Magalie Bouvet for animal care and management of the fish facility, Lydia Steiner for her wise advices about bioinformatics, and Berthold Fartman for fruitful technical suggestions on library building.