Transcriptomics and Comparative Analysis of Three Antarctic Notothenioid Fishes

For the past 10 to 13 million years, Antarctic notothenioid fish have undergone extraordinary periods of evolution and have adapted to a cold and highly oxygenated Antarctic marine environment. While these species are considered an attractive model with which to study physiology and evolutionary adaptation, they are poorly characterized at the molecular level, and sequence information is lacking. The transcriptomes of the Antarctic fishes Notothenia coriiceps, Chaenocephalus aceratus, and Pleuragramma antarcticum were obtained by 454 FLX Titanium sequencing of a normalized cDNA library. More than 1,900,000 reads were assembled in a total of 71,539 contigs. Overall, 40% of the contigs were annotated based on similarity to known protein or nucleotide sequences, and more than 50% of the predicted transcripts were validated as full-length or putative full-length cDNAs. These three Antarctic fishes shared 663 genes expressed in the brain and 1,557 genes expressed in the liver. In addition, these cold-adapted fish expressed more Ub-conjugated proteins compared to temperate fish; Ub-conjugated proteins are involved in maintaining proteins in their native state in the cold and thermally stable Antarctic environments. Our transcriptome analysis of Antarctic notothenioid fish provides an archive for future studies in molecular mechanisms of fundamental genetic questions, and can be used in evolution studies comparing other fish.


Introduction
Antarctic fish have undergone extraordinary evolutionary episodes since the onset of widespread glaciation in Antarctica approximately 34 million years ago, when the Southern Ocean cooled to the freezing point of seawater (21.9uC) [1].The Antarctic fish fauna are dominated by the perciform suborder Notothenioidei, which represents 77% of the species diversity and 91% of the biomass.There are currently 322 recognized species of Antarctic fishes, and a total of 132 notothenioid species are known [2].Notothenioids have survived in the subzero waters of the continental shelf and may have experienced a unique type of adaptive radiation known as species flock [3,4].Notothenioid fishes possess a wide range of unique adaptations to the extreme Antarctic environment, such as antifreeze glycoproteins, loss of heat shock response [5], and lack of hemoglobin [6,7].The Antarctic notothenioid antifreeze glycopeptides are derived from a related pancreatic trypsinogen-like protease [8,9], and they represent key evolutionary adaptations to life in subzero ice-laden water.Previous research on Antarctic notothenioid fishes has shown that these cold-adapted species lack a common cellular defense mechanism called the heat shock response, which involves the highly conserved and coordinated induction of a family of heat shock proteins in response to elevated temperatures [5,10,11].Other important phenotypic features are the loss of erythrocytes and hemoglobin and the variable patterns of myoglobin expression in muscle tissues of white-blooded channichthyids [12,13].The notothenioids have undergone resistant and compensatory adaptations to the extreme Antarctic marine environment as well as regressive evolutionary changes.Thus, they are considered an attractive model species for evolutionary and physiological studies [4,14].
Chen et al. [11] reported expressed sequence tag (EST) sequencing from Antarctic notothenioid Dissostichus mawsoni tissues and compared them to tissues of temperate/tropical teleost fish.They identified 177 notothenioid protein families that were overexpressed, which suggests that these protein families are upregulated by low temperatures.Further analysis of these upregulated genes indicated substantial expansion by gene duplication of 118 gene families involved in metabolic processes such as protein biosynthesis, folding and degradation, and lipid metabolism.This suggests that gene duplication may function as an adaptive strategy for organisms under freezing conditions.Detrich et al. [15] determined the genomic sizes of 11 notothenioid species including perches, notothes, dragonfish, and icefish, which have variable genome sizes ranging from 0.90 to 1.83 pg, and found that the evolution of phylogenetically derived notothenioid families was accompanied by genome expansion.The icefish (channichthyids), which are considered the most phylogenetically derived group within the notothenioids, have the largest genomes.Evolution in chronic cold and stable temperature conditions have resulted in these species lacking any erythrocytes or hemoglobin genes [6,16,17,18] and in variable patterns of myoglobin expression in muscle tissues in cold, well-oxygenated seawater [12,13].
Although ESTs represent only a subset of the entire eukaryotic genome, their sequencing is helpful for investigating the transcriptome rather than the genome of an organism.It also allows one to focus on the genome sections with high levels of functional information, avoiding introns and intragenic regions that can complicate data analysis [19].Next-generation sequencing technologies, such as 454 pyrosequencing, offer novel and rapid approaches for genome-wide characterization and profiling of mRNAs, small RNAs, transcription factor regions, chromatin structure, DNA methylation patterns, and metagenomics [20].Pyrosequencing of ESTs provides an efficient way to generate sequence data for non-model organisms in the form of transcriptome sequencing, and can be used to characterize gene expression and identify novel genes [21,22,23].The availability of complete genome sequences and large sets of ESTs from several fish species have stimulated the development of efficient and informative techniques for large-scale and genome-wide analysis of gene expression and comparative genomics.
We herein describe the transcriptomes of three Antarctic notothenioid fishes, Notothenia coriiceps, Chaenocephalus aceratus, and Pleuragramma antarcticum, by 454 FLX Titanium sequencing.These three notothenioid fishes are characterized by distinct biological and ecological traits, although, like other members of this suborder, all of them lack a gas bladder.N. coriiceps (family Nototheniidae) retained an ancestral notothenioid benthic habitus.Instead, P. antarcticum (family Nototheniidae) and C. aceratus (family Channicthyidae) were subjected to trophic evolution towards a pelagic lifestyle, involving an important suite of adaptations [24].Among them C. aceratus is a benthopelagic species while P. antarcticum is a true pelagic species living all stages of its development within the water column [25].We herein report the generation of more than 71,539 contigs from these three Antarctic fish species.Forty percent of the contigs (28,724 BLAST hits of the 71,539 contigs) could be annotated based on similarity to known protein or nucleotide sequences.Our work represents an ongoing genome project studying N. coriiceps (NCBI Genome Project ID #66471), and this initial study identified EST  sequences expressed in two tissues (liver and brain) of N. coriiceps and in two tissues of C. aceratus and P. antarcticum for comparative analyses.This represents the first report of publicly available pyrosequencing data for Antarctic fish and provides an important comparative resource for studies of physiology and evolutionary adaptation in fish biology.

Ethics Statement
This study including sample collection and experimental research conducted on these animals was according to the law on activities and environmental protection to Antarctic approved by the Minister of Foreign Affairs and Trade of the Republic of Korea.

Sample collection
N. coriiceps (length 35 cm), C. aceratus (length 32 cm), and P. antarcticum (length 13 cm) were collected in the Antarctic Peninsula (62u14'S, 58u47'W) from December 2009 to January 2010.Benthic nearshore specimens of N. coriiceps and C. aceratus were obtained using the hook-and-line method from depths of 20 to 30 m. Cryopelagic specimens of P. antarcticum were caught in traps.After capture, these fish were maintained in flow-through aquaria at ambient seawater temperatures (21.5uC) for 48 h before  sacrifice.Brain and liver tissues of each specimen were dissected, immediately frozen in liquid nitrogen, and stored at 280uC until use.

cDNA preparation and sequencing
Total RNA was isolated by homogenization of each sample in a TRIzol (Invitrogen, Carlsbad, CA)/chloroform mixture, followed by processing using an RNeasy mini kit (Qiagen, Chatsworth, CA) for DNAse treatment and cleaning.RNA quality and quantity were analyzed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA) and Nanodrop ND1000 (NanoDrop Technologies, Wilmington, DE), respectively.First-and secondstrand cDNA were synthesized from 200 ng mRNA using a SuperScript Double-Stranded cDNA Synthesis Kit (Invitrogen) with 100 mM random hexamer primers (Macrogen, Seoul, Korea).Double-stranded cDNA was purified with a QIAquick MinElute PCR purification column (Qiagen).The cDNA library was normalized according to the protocol described in the Trimmer Direct Kit (Evrogen, Moscow, Russia).Briefly, 300 ng cDNA was denatured at 95uC for 5 min and allowed to renature at 68uC for 5 h in the hybridization buffer included with the kit (50 mM HEPES, pH 7.5, and 0.5 M NaCl).After incubation, the reaction mixture was treated with 1 ml 4-fold-diluted duplexspecific nuclease.Then normalized cDNA was amplified using PCR Advantage II polymerase (Clontech, Palo Alto, CA).After library construction, the samples were quantified using a Qubit fluorometer (Invitrogen), and average fragment sizes were determined by analyzing 1 ml samples on a bioanalyzer (Agilent) using a DNA 7500 chip.Approximately 10 mg cDNA from each of the six samples was used for sequencing on a GS-FLX Titanium platform (454 Life Sciences, Branford, CT) at the DNA Link Inc. facility (Seoul, Korea) according to the manufacturer's protocol.

Bioinformatic analysis
The raw 454 sequence files in sff format were processed and assembled using Newbler.The resulting contigs were subjected to a BLASTx search against the non-redundant protein database (nr) with an e-value threshold of 10 23 and HSP length cutoff of 33.The gene ontology (GO) terms were assigned to each unique gene based on the GO terms annotated to the corresponding homologs in the UniProt database.GO mapping and annotation were performed with an annotation cutoff of 10 210 .Enrichment analysis was performed using Fisher's exact test.All analyses were performed using the BLAST2GO program [26].Identification of metabolic genes was accomplished by MetaFishNet computation [27].All cDNA sequences of D. mawsoni were retrieved from the National Center for Biotechnology Information (NCBI).Putative full-length cDNAs were identified by comparing full-length genes and start signals in the UniProt and nr databases to those of ORF prediction using the software Full-Lengther [28] with a cutoff evalue of 1E 25 .Once the start codon (ATG) and poly(A) tail had been identified, the sequence was considered a full-length cDNA.The unique sequences of each teleost fish tissue were used to search for microsatellite markers using msatcommander (http:// code.google.com/p/msatcommander/)with a repeat threshold of eight dinucleotide repeats or five tri-, tetra-, penta-, and hexanucleotide repeats.The unique genes and homologous genes of the three Antarctic notothenioids were identified using BLASTX against the NCBI Refseq protein and Ensembl databases (Tetraodon nigroviridis, zebrafish Danio rerio, and Atlantic salmon Salmo salar) with an e-value threshold of 10 210 .

Sequence assembly
Each liver and brain cDNA library of C. aceratus and P. antarcticum and the brain cDNA library of N. coriiceps were subjected to a one-quarter plate run, and the liver cDNA library of N. coriiceps was subjected to a one-plate run with the 454 GS-FLX Titanium platform.After removing low-quality regions, adaptors, and all possible contaminants, we obtained a total of 1,918,483 high-quality reads containing 584,174,779 bases from all of six libraries with average read length 318 bases, and sequencing depth of each library were 10,15X (Table 1).A de novo assembly was performed for each of the six samples independently.The cleaned read data were entered into Newbler for assembly; the sizeselected reads were assembled into 7,815 and 9,414 contigs from the liver and brain of C. aceratus, 10,271 and 9,671 contigs from the liver and brain of P. antarcticum, and 24,836 and 9,532 contigs from the liver and brain of N. coriiceps, respectively.The contigs ranged in size from 152 to 4,012 bp with an average size of 604 bp in the liver, and from 239 to 2,951 bp with an average size of 502 bp in the brain of C. aceratus.In P. antarcticum, they ranged from 102 to 7,900 bp with an average size of 701 bp in the liver, and from 100 to 2,520 bp with an average size of 488 bp in the brain.In N. coriiceps, they ranged from 371 to 6,171 bp with an average size of 966 bp in the liver, and from 238 to 2,457 bp with an average size singletons potentially contained useful sequences with low levels of expression, they included short reads, and a small number of redundant sequences and singleton reads were excluded from further analysis.The genome sizes of two of these notothenioids (N.coriiceps, C = 1.13 pg and C. aceratus, C = 1.73 pg) were recently described [15], but the percentages of the transcribed genomes remain unknown.Thus, it is difficult to predict the depth of coverage of the Antarctic fish transcriptome by our de novo assembled sequences.
A total of 28,724 contigs had a significant BLASTx hit at a cutoff value of ,1E 210 in the nr protein database: 6,689 of 17,229 contigs (38.8%) from C. aceratus, 8,982 of 19,942 contigs (45.0%) from P. antarcticum, and 13,053 of 34,368 contigs (38.0%) from N. coriiceps, respectively.To obtain an overall view of the transcriptome, these commonly expressed sequences in each tissue of the three fish (with an associated database match) represented a varied mix of functional groups (Table S1).However, in terms of sequence completeness, an estimate of the fraction of full-length  sequences in the transcriptome was obtained.A sequence was considered full-length when it included the complete 59 and 39 sequences of the mRNA.We used the software Full-Lengther [28], and 36% to 52% of predicted transcripts were validated as fulllength or putative full-length in each tissue of the three fish species (Table 2).Among these, 3,170 (41%) from the liver and 2,484 (26%) from the brain of C. aceratus, 10,188 (41%) from the liver and 2,209 (23%) from the brain of N. coriiceps, and 4,757 (46%) from the liver and 3,012 (31%) from the brain of P. antarcticum had significant BLAST matches.As expected, the majority of the sequences (81.2%) showed matches with teleost fish, with eukaryotes accounting for 89.0% of positive hits.Among the fish, the pufferfish Tetraodon nigroviridis showed the highest percentage of hits (24.6%), and the zebrafish Danio rerio represented approximately 13.6% of all hits (Fig. 2).
A total of 3,207 microsatellites were identified from 71,539 unique sequences from six libraries, including di-, tri-, tetra-, penta-, and hexanucleotide repeats (Table 3).Previous observations were reported that 454 pyrosequencing in transcriptomic studies were shown to be an excellent method for large scale prediction of molecular markers for future genetic linkage in nonmodel organisms [29,30].Therefore, given that these microsatellite predicted from transcriptomic sequences, they are likely linked to protein-coding genes, might have substantial physiological implications.
GO enrichment analysis was performed on the three notothenioid fish, and these were compared to the transcriptome database of zebrafish and medaka (Oryzias latipes) (NCBI library IC, 14,410 for liver and 1,522 for brain of zebrafish; 17,414 for liver and 8,625 for brain of medaka) because tissue-specific transcriptomes of these two fishes are well known in public databases (Fig. 3).Five GO terms were significantly overexpressed in the liver of Antarctic fish relative to the temperature/tropical fish: ribonucleotide binding, protein modification by small protein conjugation or removal, purine nucleotide binding in the biological process category, and ligase activity in the molecular function category.Eight terms were underrepresented in the liver, including the small ribosomal subunit and cytosolic parts in the cellular component category; macromolecular complex subunit organization, viral reproductive process, viral infectious cycle, pancreas development, and reproduction in the biological process category; and reproductive process in the molecular function category.Of the overrepresented molecular function terms, ligase activity terms were primarily composed of ubiquitin (Ub)-conjugated protein (75 of 339 genes in N. coriiceps, 27 of 120 genes in C. aceratus, and 45 of 203 genes in P. antarcticum).The Ub-proteasome pathway is a cytosolic protein-degradation pathway of misfolded or damaged proteins that takes place two distinct and successive steps.The first step involves tagging of the misfolded or damaged protein by multiple Ub molecules and degradation of the tagged protein by the 26S proteasome complex [31,32,33].Transcriptomic analysis of another Antarctic notothenioid fish, D. mawsoni, also revealed high levels of Ub-conjugated proteins compared to temperate/ tropical teleosts [11].Antarctic fish may have unusually high levels of misfolded or damaged proteins because low temperatures may affect the rate of protein folding [34].Previous studies have shown that Antarctic notothenioid fish lack a common cellular defense mechanism, such as the heat shock response [5,10,35].These coldadapted fish require an alternative cellular protein homeostasis mechanism to ensure proper cell functioning.These findings suggest that increased levels of Ub-conjugated proteins in Antarctic fish may be involved in maintaining proteins in their native state in the cold and thermally stable Antarctic environments.

Comparative analysis among four notothenioid species
A total of 28,724 genes were identified from the three notothenioid species based on a BLAST search.Previously, Chen et al. characterized ESTs of D. mawsoni [11].Therefore, we compared expressed transcriptomes of the liver and brain among these four notothenioid species to cross matching using tBLASTn.A total of 331 genes expressed in the liver and a total of 191 genes expressed in the brain were shared among the four species (evalue, 1E 210 ) (Fig. 4).In the three fishes focused on in this research, a total of 663 genes expressed in the brain and a total of 1,557 genes in the liver were shared (e-value, 1E 210 ).The summary of shared genes and identified genes among all species is shown in Table S2 and Table S3.
Li et al. [27] reported the construction of a genome-wide fish metabolic network model to identify and compare the metabolic pathway.They categorized 115 metabolic pathways from 5 fish genomes (D. rerio, O. latipes, Takifugu rubripes, T. nigroviridis, and Gymnopilus aculeatus) to create a list of all fish metabolic genes via gene ontology.And they identified the corresponding enzymes using either orthologous relationships to human genes or similarity to consensus enzyme sequences from this metabolic gene list.We analyzed all cDNA sequences from the four notothenioid fishes and 88 metabolic pathways were assigned; that is, no enzymes in 27 of these metabolic pathways were found mainly lipid related pathway, such as glycosphingolipid biosynthesis, mono-unsaturated fatty acid betaoxidation, omega-3 fatty acid metabolism, and sphingolipid metabolism, compared with the fish metabolic genes to other temperate/tropical teleosts (Table S4).In contrast, we have noticed that the enzyme in electron transport chain were more mapping than that in temperate/tropical fishes, that suggests greater demands for these functions in the cold Antarctic environment.
To assess the evolutionary conservation of the genes, the number of genes with homologs in Tetraodon, zebrafish, and Atlantic salmon (Salmo salar), which were the primary BLAST results, were compared (Fig. 5).A total of 3,605 genes (402 in N. coriiceps, 2296 in P. antarcticum and 907 in C. aceratus.12.6% of the total number of unique notothenioid fish genes) were found.Among these genes, 1,134 (8.7%) from N. coriiceps, 563(6.3%)from P. antarcticum and 567 (8.5%) from C. aceratus were commonly found in all three species (Tetraodon, zebrafish, and Atlantic salmon).
The 15 known species of icefish and white-blooded fish, including C. aceratus, all lack the hemoglobin gene [1,36].This phenotype preceded the evolutionary radiation of the icefish.Di Prisco et al. [6] showed that the C. aceratus genome has transcriptionally inactive truncated variants of a1-globin-related DNA and lacks b-globin genes.They found that the C. aceratus transcriptome contained only cytoglobin for oxygen transport and/or oxygen-binding machinery (Table S5).Cytoglobin is one of four types of globin (hemoglobin, myoglobin, neuroglobin, and cytoglobin), which differ in structure, tissue distribution, and likely function, but mainly serve to transport oxygen in the circulatory system [37].To determine the molecular phylogenetic position of C. aceratus cytoglobin, a phylogenetic tree was constructed using the neighbor-joining method from a distance matrix, calculated with MEGA4 [38].Cytoglobin was grouped with the fish cytoglobin cluster (Figure S1).There have been no previous reports of cytoglobin sequences from other icefish species.The cytoglobin of C. aceratus showed the highest level of identity (72%) to that of O. latipes based on amino acid similarity (Figure S2).The mechanism of the compensatory physiological and circulatory adaptations that resulted in replacement of the lost hemoglobin and myoglobin functions remains unknown.Recently, Cheng et al. [39] hypothesized that neuroglobin may play a role in oxygen transport because this gene is widely found in icefish despite the fact that this fish has generally lost hemoglobin and myoglobin.The observation that at least one icefish have retained the cytoglobin gene is intriguing, and the function of the cytoglobin gene should be further explored to address the evolutionary development and alternative physiology of losing globin genes.

Conclusions
We generated and assembled the transcriptomes of three Antarctic notothenioid fish species.We generated more than 71,539 contigs, identified more than 28,724 unique genes expressed in the brain and liver of the three Antarctic fish, and identified more than 3,200 gene-associated microsatellites.The Antarctic fish transcriptome, the analyzed by high-throughput 454 sequencing, can be functionally characterized for a wide range of molecules encoded in the transcriptomes of members of the notothenioid.Comparative sequencing of the three notothenioid fish transcriptomes also provided information on the variation in evolution and speciation of species that live at permanently cold temperatures.We are currently performing whole-genome sequencing of N. coriiceps.Comparison between genome and transcriptome sequences will allow for a better understanding of gene structure and organization in molecular mechanisms of fundamental genetic questions and furthermore provide a comprehensive view into evolution studies to environmental challenges during climate changes.

Figure 4 .
Figure 4. Comparison of shared and unique genes identified in four notothenioid fishes.Numbers in parentheses represent the total number of enzymes in metabolic pathway analysis.doi:10.1371/journal.pone.0043762.g004

Figure 5 .
Figure 5. Conservation of three notothenioid fish genes with other species.Number of notothenioid fish homologous genes annotated in GO analysis.doi:10.1371/journal.pone.0043762.g005

Table 1 .
Statistics for pyrosequencing of the three notothenioid species.

Table 3 .
Summary of microsatellite marker identification in the three notothenioid species.
of 582 bp in the brain.In total, 523 contigs were greater than 3 kb in length, and 2,179 contigs were composed of more than 300 reads, with the largest contig being 12,625 bp composed of 1,108 sequences, which this contig was annotated to titin.The size distribution of the reads is shown in Figure1.All high-quality reads have been deposited in the NCBI and can be accessed in the Short Read Archive (SRA) under the accession number SRP007644.Table1presents a summary of the sequencing and assembly results, and all transcriptome information of the three fishes is accessible at http://antagen.polar.re.kr.Although the

Table 4 .
Functional annotation of proteins encoded in the transcriptomes of the three notothenioid fish based on gene ontology (GO).