Molecular Analysis of the Cold Tolerant Antarctic Nematode, Panagrolaimus davidi

Isolated and established in culture from the Antarctic in 1988, the nematode Panagrolaimus davidi has proven to be an ideal model for the study of adaptation to the cold. Not only is it the best-documented example of an organism surviving intracellular freezing but it is also able to undergo cryoprotective dehydration. As part of an ongoing effort to develop a molecular understanding of this remarkable organism, we have assembled both a transcriptome and a set of genomic scaffolds. We provide an overview of the transcriptome and a survey of genes involved in temperature stress. We also explore, in silico, the possibility that P. davidi will be susceptible to an environmental RNAi response, important for further functional studies.


Introduction
Panagrolaimus davidi Timm 1971 [1] was first isolated from the McMurdo Sound region of the Antarctic in 1988 and cultured at the University of Otago [2].Much like its better known temperate cousin, Caenorhabditis elegans, it has, over the years since its early adoption as a model of study, provided surprises that in hindsight made its initial selection, and culturing, very fortunate.Panagrolaimus davidi was found to survive intracellular freezing [3].Although the cuticle acts as a barrier to ice, at low sub-zero temperatures inoculative freezing can occur through the excretory pore and other orifices, with the ice seeding the body fluid and freezing both intra-and extracellular compartments [3], [4].Survival of intracellular freezing was first described by Salt in the fat body cells of the goldenrod gall fly [5], [6], [7] but P. davidi still remains the only organism for which it has been described in all compartments of its body.In addition to freezing tolerance, P. davidi has developed other strategies for dealing with low temperatures.At high sub-zero temperatures, and importantly, when the rate of freezing is slow, ice is unable to enter body openings and the nematode supercools.This leads to a vapour pressure difference between the supercooled body fluids and the surrounding ice leading to the transfer of water from the nematode to the surrounding ice; a process termed cryoprotective dehydration [8], [9].Once P. davidi is cryoprotectively dehydrated, there is not enough water to freeze in either the pseudocoel or other compartments allowing it to sustain low sub-zero temperatures by freeze avoidance.While an increasingly detailed molecular picture is emerging of cryoprotective dehydration using large scale gene expression and proteomic approaches [10], [11], [12], no comparable molecular work has been undertaken on the survival of intracellular freezing.
With the sea change in current sequencing technology, what was previously termed a non-model organism can now become, with a concerted effort and a fraction of the resources that were required even five years ago, a molecular model for a specific physiological trait, which is a powerful change in the way biological systems can now be studied.With this in mind, we have undertaken to sequence not just the transcriptome, which will provide a backbone for expression studies, but also the genome.With the dramatic decrease in cost associated with sequencing genomes, the scaffolds and contigs resulting from a preliminary assembly, even when broken up, are a natural complement to the transcriptome, providing valuable information on gene structure such as intron-exon junctions.To date our only knowledge of the genome comes from a paper by Goldstein and Wharton [13] describing seven synaptonemal complexes, therefore seven chromosome pairs (2n = 14).
The molecular information provided on the scale such as the present study, opens up the exploratory possibilities of gene expression studies.But in order to gain more direct evidence for any such exploratory results, it would be valuable to know whether P. davidi is potentially susceptible to functional genetic methodologies.Since Fire et al. [14] determined the role of RNA interference in C. elegans, this method has provided a clear mechanism by which the role of a pathway or specific genes may be understood, for example, in an organism's response to environmental stresses [15].With the two sources of information, the transcripts and their gene structure determined by the genomic sequence, clean RNAi probes for specific targets can be easily developed.Recently, Panagrolaimus superbus was shown to respond to feeding RNAi [16], but a lesson has already been learnt from C. briggsae when, unlike C. elegans for whom it has been a comparative model, it was shown that RNAi was not possible owing to a divergent form of sid-2 [17].
A final word on a mystery that has unfolded in the last few years.In 2009, Lewis et al. published a paper [18] on the phylogenetics of the Panagrolaimus genus.P. davidi (designated CB1, from its presumed origin at Cape Bird, Ross Island) was included in this study, but unexpectedly, two Californian species proved phylogenetically closer to P. davidi CB1 than any other species or strain.Genetic analysis of fresh field material of P. davidi collected during 2005-2007 from Cape Hallett and Gondwana Station on the Victoria Land coast and from Cape Bird showed that the field strain of P. davidi is a different species to P. davidi CB1 [19].One argument put forward to explain this difference is the possible dominance in culture of a less common strain owing to its parthenogenetic reproductive mode.Questions of invasiveness have also been considered.But the adaptations of P. davidi CB1 to low temperature make this highly unlikely; or highly surprising.Further molecular work should help to resolve this intriguing situation, but in terms of its physiology, the origin of P. davidi CB1 is not of relevance, since its cold tolerance adaptations singles it out as an important organism of study.

Culturing, extraction, and library construction for Expressed Sequence Tags
Nematodes were cultured on Escherichia coli strain OP50 on NGM agar plates [20].RNA was extracted from 580 mg (for a culture grown at 20uC, called PDT) and 730 mg (for a culture grown at 20uC and subsequently brought down to 4uC, called PDF) of P. davidi CB1 (wet weights), respectively.Total RNA (930 mg for PDT and 750 mg for PDF) was prepared with RNagents total RNA Isolation Kit (Promega).Poly(A)+ RNA (2.9 mg for PDT and 5.5 mg for PDF) were isolated with Illustra mRNA Purification Kit (GE healthcare).2 mg of poly(A)+ RNA was then used to generate a cDNA library with Cloneminer cDNA library Construction Kit (GE healthcare).

Sanger sequencing and quality assurance
Seqclean [21] and Crossmatch [22] was applied to the two (PDT and PDF) P. davidi CB1 Expressed Sequence Tag (EST) libraries, removing the vector and stripping any poly-A tails and poor quality sequence.This left 25,182 reads from the PDT set and 69,958 reads from the PDF set.All of these sequences were added to the post-assembly of the Illumina reads.The ESTs are held in dbEST with accession numbers: JZ585947-JZ681086.

Culturing, extraction, and library construction for Illumina sequencing
The nematodes were cultured in S medium at 20uC for 3 weeks and fed every 3-4 days with E. coli following [23].Nematodes were extracted using a modified Baermann technique [24].The worms for the DNA and RNA were then snap frozen with liquid nitrogen in 1.5 ml microcentrifuge tubes and preserved at 280uC until extraction.Seven sets of these worms for the RNA were then subjected to other physiological states to enrich for stress related transcripts.These seven treatments consisted of 1) exposure to cold acclimation at +5uC for 3 days; 2) the previous sample set was then immersed in the bath of a refrigerated circulator where it was cooled from +1uC to 21uC at 0.5uC min 21 , and frozen by adding a small ice crystal and maintained at 21uC for 24 h; 4) the previous sample was warmed to +1uC at 0.5uC min 21 , and allowed to recover at 20uC for 24 hours; 5) samples from stage 1 were cooled from +1uC to 210uC at 0.5uC min 21 ; 6) the previous stage was ice nucleated once held at 210uC; 7) the previous sample was warmed to +1uC at 0.5uC min 21 .500 ng (wet weight) was used for the RNA extraction (from each of the 8 different stages, including the culture grown at 20uC) and 500 ng (wet weight) for DNA extraction.The RNA was extracted from whole worms using TRI-sure (Bioline) according to manufacturer's instructions and purified on Qiagen RNeasy columns.The DNA was extracted from whole worms using the Qiagen DNAeasy Blood and Tissue kit according to manufacturer's instructions.Both DNA and RNA were checked for purity on standard agarose gels and quantified using a NanoDrop ND-1000 spectrophotometer (Labtech).8 mg of DNA and 10-15 mg of RNA for each of the stages were used for sequencing.

Illumina sequencing and quality assurance
The RNA was sequenced on an Illumina HiSeq 2000 resulting in 143,223,606 paired-end reads of 100 bp, after quality control.Quality control consisted of removing adaptors from the sequence, removing reads where the number of unresolved nucleotides exceeded 5%, and reads where the number of nucleotides with phred quality less than or equal to 10 was over 20%.The reads can be downloaded from the SRA repository under the accession number SRP041973.
Genomic DNA was randomly fragmented, with insert sizes of between 500-800 selected through gel electrophoresis, with the fragments gel purified with adapters ligated.After sequencing, quality control consisted of adaptor removal and removing reads in which more than 50% of the bases had phred scores of less than 5. Two runs of 95 and 138 million paired end reads of 100 bp respectively remained after quality control.The sequence data can be downloaded from the SRA repository under the accession number SRP041572.

Transcriptome assembly and annotation
The transcriptome was assembled from 143,223,606 paired-end Illumina reads of 100 bp, and 95,140 sanger sequences.The Illumina reads were assembled first with Soapdenovo [25] using a number of different kmer sizes.Illumina datasets assemble differently depending on the kmer size chosen, with any given dataset having an optimal kmer for numbers and lengths of contigs.In the case of P. davidi CB1, by selecting a number of odd values from 19 to 93, the optimal kmer lay around 69 (see File S1).In order to enrich the assembly as much as possible, all contigs greater than 200 bp from all the different kmer size assemblies were selected (a total of 1,107,215 contigs) and, jointly with the 95,140 sanger sequences, they were assembled together using Newbler [26] and CAP3 [27] consecutively, to eliminate redundancy.The resulting assembly was compared using Blast [28] against the nr database [29] as well as Caenorhabditis elegans [30], Plectus murrayi [31] and Panagrolaimus superbus [32] nematode databases.In addition, a separate blast against the entire WS242 wormbase [30] was carried out.The final assembly set was reduced to those transcripts that were either 500 bp long, or annotated at an e-value less than 1e-10.
Functional groupings of the transcriptome was carried out through Clusters of Orthologous Genes [33], Kegg Orthology [34], and SEED subsystems [35].Separate annotation of the ESTs was also carried out and they were used in examining specific genes, particularly the LEA-like and HSP-70 genes, owing to their longer lengths combined with their more easily resolved reading frame, where the consensus sequences in the contigs proved more complicated.

DAPI staining of nuclear DNA
C. elegans N2 and P. davidi CB1 were washed out from a 5 cm Nematode Growth Media (NGM) plate and fixed in 2 ml Carnoy's solution (60% Ethanol, 30% Chloroform, 10% Acetic acid) overnight at room temperature.The fixed worms were then transferred to a watch glass with 50 ml phosphate buffered saline with 0.2% Triton X-100 and 100 ng/ml DAPI (49,6-diamidino-2phenylindole), and incubated in a humid chamber in the dark for 30 mins.The worms were mounted on an agar pad slide.Quantification of the intensity of nuclei in ventral nerve cord was carried out with AQUACOSMOS software [36] on fluorescent microscopy pictures with an Axioplan 2 microscope (Carl Zeiss).Estimation of P. davidi CB1 DNA amounts by proportion to C. elegans DNA was done for 10 worms from each species.

Assembling genomic sequence
As with the transcriptome assembly, Soapdenovo, with varying kmer sizes, was used to assemble the DNA paired-end reads.This led to an optimal kmer size of 89.Since it has often been noted that too much sequence coverage can lead to erroneous and more broken assemblies, a second, confirmatory round of assemblies were carried out with only 50x coverage, and then at 50x increments, to see whether the assembly at any lower coverage was better.However, this resulted in confirmation that the more data, the better the assembly.Finally, Blast was used to check and remove any contigs or scaffolds that may have been an assembly of bacterial contaminant.

Results and Discussion
The transcriptome, consisting of 25,875 transcripts, had an average length of 1,163 bp with the GC content at 32.5% (the raw Illumina reads were 35.97%).Against the nr database, 15,748 (61%) of the transcripts were annotated at an e-value of 1e-10 or less.When compared to the C. elegans protein database using blastx, 14,372 (54%) P. davidi CB1 (hereafter referred to simply as P. davidi) transcripts matched 14,395 (57%) of the C. elegans proteins.The P. davidi transcripts matched 62% of the Panagrolaimus superbus EST transcripts [32], the closest nematode for which there is any molecular data, and they matched 52% of the Plectus murrayi ESTs [31], the only other Antarctic nematode for which any molecular data has been published.When compared to all nematode databases currently housed at wormbase (WS242), 68.5% of the P. davidi transcripts matched at an e-value of 1e-10 or lower.The remaining transcripts that had no annotation, and in particular, those that matched no nematode are obviously of interest in that they may well contain potential clues to the unique physiological adaptations of P. davidi.File S2 lists all the transcripts that match against nr at an e-value of 1e-10 or lower, along with their corresponding description.
Functional classifications were able to be determined for 55% of the whole transcriptome.Figure 1 shows the functional spread provided by SEED subsystems.Although not shown, both the Clusters of Orthologous Genes and KEGG Orthology analyses, while broader in their categories, provided the same breakdown.The highest proportion of the transcriptome is clearly involved in protein metabolism, with a high proportion of clustering-based subsystems, a designation of genes based on their co-localisation across many genomes.Other highly represented subsystem groups were the carbohydrates, amino acids and derivatives, and RNA metabolism.A separate analysis was done on the two EST libraries with the breakdowns showing similar percentage patterns (File S3) with, like the transcriptome as a whole, protein metabolism having the strongest representation.
That the transcriptome is characterised so strongly by genes involved in protein turnover is a clear indication of activity and change which is hardly surprising given the fact that different physiological states were mixed together, where such activity might be expected.With the similar proportions also apparent in the two EST libraries this seems to imply that such activity and change is occuring even within each stage.This is probably reflective of the need to be highly responsive to changes in the environmental conditions.
A natural complement to the transcriptome, even in preliminary form, is the genome sequence.Prior to sequencing however, it is useful to know the size of the genome, even approximately.DAPI staining and comparison to the C. elegans nuclear DNA was carried out with the result that P. davidi is estimated to be roughly ,90 Mb, slightly smaller than C. elegans at ,97 Mb.We then sequenced two short-insert paired-end libraries (,1,000 bp) of nuclear DNA, which according to the genome size by DAPI staining, would provide a sequence depth of coverage of roughly 517.Assembly resulted in 86% inclusion of the incorporated genomic reads.The scaffold N50 is 6,352 bp, with an average size of 4,150 bp, and a total size of 195 Mb.File S4 shows the size distribution of the scaffolds greater than the N50 with the largest at 73,240 bp (the N50 of the contigs (those joined together to form the scaffolds) is 1,873 bp with the largest being 21,613 bp with the total size being 93 Mb).
The total size of the scaffolds indicates an observation that had already been noted with the EST libraries, namely that while P. davidi is a parthenogenetic species, which should give rise to a homozygous line, evidence suggests that it is in fact heterozygous.Generation of the ESTs was constructed from a single animal with .10xgenerations to produce an isogenic line, yet many of the ESTs showed a heterozygosity.Examining the LEA ESTs for example, provided evidence that is best explained by heterozygous descendent sequences in which there is crossing over.This was followed up by examining the LEA loci on the genome, as well as examples in the cDNA of two other genes, glutamine synthetase and 6-phosphogluconate dehydrogenase, all of which led to the same conclusion (see File S5).
Mapping of the transcriptome onto the scaffolds resulted in 94% of the transcripts finding a match at an e-value of 1e-10 or lower, indicating that while broken up, the genome sequence is reasonably complete.Further work is being carried out to build the assembly into a more contiguous form (see Figure 2).Subsequent work on building up a well-annotated genome should also be made easier with close comparative models, such as other Panagrolaimus species.
The genomic scaffolds that have been assembled have already provided valuable information for the development of PCR probes, allowing one to examine the splice sites for many of the transcripts, as well as for current work in developing probes for RNAi.A small example resulting from the assembly that is redolent of the early work on the C. elegans genome providing information on gene structure is the intron size that was found to have a peak at 47 bp [37], and in C. briggsae at 54 bp [38].For P. davidi, 139 random intronic regions were examined in closer detail across 37 scaffolds showing a clear peak at 49 bp (see File S6).
One of the key reasons for deep sequencing and undertaking a large-scale genomics approach in P davidi, is to aid in isolating key genes involved in cold tolerance.One class of protein that has proven elusive so far, despite a previously reported but inconclusive result [39] are the ice-active proteins (IAP): ice nucleating proteins (INPs), antifreeze proteins (AFPs) and recrystallizationinhibiting proteins (RIPs) [39].Adhikari et al. [31] has reported a type II antifreeze protein in Plectus murrayi, and it was expected that such a find would also be made in the current P. davidi transcript set.However, neither searches of the annotation or homology matches with the Plectus murrayi EST has identified such a gene, except for a very weak match to transcript PdU008960v1.1, a c-type lectin carbohydrate-binding protein.
To date all effort to find even one ice-active protein has failed.In 2009, a newly discovered antifreeze molecule, xylomannan, was isolated from the freeze tolerant Alaskan Beetle, Upis ceramboides [40].Since this is not a protein, but a combination of saccharide and fatty acid, such an avenue may prove useful with P. davidi, but has yet to be undertaken.
Much is already known of the molecular processes involved in cryoprotective dehydration [10], [11], [41] and Table 1 lists a number of the key genes which have been identified in previous expression profiling studies.These include genes in the trehalose synthesis pathway, the aquaporins, chaperones and oxidoreductase genes associated with cell stress, and desaturase genes involved in membrane fluidity.
Trehalose is commonly synthesised from glycogen and has been shown to act as an anhydroprotectant [42] by preserving the functionality of biomolecules and acting as a water replacement in terms of a compatible osmolyte [43], and by glass formation and chemical stability [44].Two enzymes are directly involved in the synthesis of trehalose: trehalose-6-phosphate synthase (tps) and trehalose 6-phosphate phosphatase (gob), with trehalase (tre) involved in the breakdown of this sugar.Previous work on desiccation has identified a duplication of the tps gene in other species: Megaphorura arctica [11], C. elegans [45] and Brachionus plicatilis [46].Transcripts with sequence similarity to trehalose-6phosphate synthase (tps) were identified in the P. davidi dataset, but these appeared to be two non-overlapping portions of the same gene, namely tps-2.However searching the genomic scaffolds indicate that they may come from different regions of the genome, and therefore a potential duplication.TPS-1 was not identified in this dataset.However, in line with the previous studies on duplicated tps genes, potential duplicates of the trehalase gene were also identified in P. davidi.Homologs of the remaining two tre genes, tre-4 and 25 that are present in C. elegans, were not found.
Two membrane function gene families have been identified as being significantly expressed during cryoprotective dehydration, the aquaporins (associated with solute transport across membranes) [47] and the D9-acyl-CoA desaturases (involved in changing membrane fluidity via fatty acid composition) (eg [48]).To date 12 aquaporin (aqp) and 7 desaturase (fat) genes have been identified in the genome of C. elegans.In the P. davidi dataset 6 aqp and 3 fat genes were identified.Of particular note was the potential duplication of both aqp-7 and fat-7, which would be specific to P. davidi.Searching the genomic scaffolds, both fat-7 transcripts aligned onto the same scaffold implying different regions of the same gene.However, the two transcripts of aqp-7 aligned to different scaffolds, which indicate a potential duplication.It is worth noting that aqp-7, as well as aqp-3, are the aquaglyceroporin genes, the glycerol-permeable homologs of the classical aquaporins for water transport.
Much of the recent molecular interest in desiccation survival has involved the study of the late embryogenesis abundant (LEA) protein family.These were initially found during the embryogenesis of cottonseed in 1981 [49], [50] and are hydrophilic, intrinsically disordered proteins.They have received a great deal of attention over the last decade or so, since they were found to play a role not only in the desiccation of plants, but also in animals.The verdict is still out in terms of both the classification system that should define the types -the plant types do not so easily translate to the animal types -but also in terms of all the possible functions the LEAs might play [51].This paper will not attempt to weigh into the debate on the types, as there has been some good work to date focussed on this issue [52], [53], [54], [55].Owing to the inconclusive designations many researchers prefer to refer to certain LEAs found in animals as LEA-like (i.e.[56]) with all LEArelated proteins found in animals to date most similar to type 3 LEA, with the exception of two type 1 LEA sequences from Artemia franciscana [57].More work also needs to be done on differentiating the functional differences of the separate types before too much is made of the syntax, even though it is likely that any syntactic differences will be reflected functionally.As Tunnacliffe and Wise [51] have phrased it, the LEAs remain a conundrum.
Among the many functional properties attributed to the LEA (and LEA-like) proteins are as a molecular shield inhibiting aggregation of denaturing proteins, as an antioxidant, to provide protection to membranes preventing damaging phase transitions during freezing, as hydration buffers, slowing water loss -among others (see [51], [58], [56]).In nematodes trehalose is an important constituent of desiccation survival [59], unlike in tardigrades and bdelloid rotifers where trehalose is not accumulated, or even present during desiccation [60], [61], [62].Yet Gal et al. [63] found that silencing the C. elegans lea-1 gene significantly reduced survival during induction of desiccation as well as of osmotic and heat stress.However, the more remarkable of recent studies is one conducted on human hepatoma cell lines in which two type 3 LEA proteins from Artemia fransiscana were transfected, with 98% of cells retaining membrane integrity after rehydration from low water content, compared to 0% without transfection [64].When the same was attempted without the trehalose and only one of the transfected LEAs, 94% of the cells retained membrane integrity.Within the current P. davidi dataset, both the contigs and the EST reads were searched for potential LEAs with at least 26 individual reads or contigs found.These were checked for the ability to compose an amphiphilic a-motif [65], [51], and whether they were natively unfolded using Foldit [66].These ESTs and contigs were then clustered using Clustal [67] (see Figure 3) as well as checked for homology to the genomic scaffolds (see Table 2).The colouring scheme found in Figure 3 depict those sequences found on the same scaffold (in Table 2).As can be seen, the colouring matches perfectly the independent clustering based on Clustal which provides evidence that there are possibly up to 9 different LEA-type genes.However, translation to amino acid, combined with Cd-hit [68] at a relatively low stringency threshold of 0.8, resulted in 13 separate clusters.File S7 contains the cd-hit clustering and the resulting representative sequences.
For the cold tolerant process of intracellular freezing, very little is as yet known of the mechanisms that allow this to occur, with no molecular work done on any organism to date.However it would be surprising if many of the above mentioned genes were not in some way involved, either mechanistically, or in terms of stress response.
From the beginning of the molecular focus on P. davidi, a vital question has been whether it is susceptible to environmental RNA interference.If so, it would provide a method of functionally investigating survival of intracellular freezing.We have provided an in silico search for RNAi specific genes (following [69]).As with the IAP, an inability to find a key gene does not preclude there being one.However, so far there has been no in silico evidence of sid-2, even though a number of other associated genes are present (see Table 3).As pointed out in the introduction, without sid-2, even if other associated RNAi genes were present, it is considered unlikely that P. davidi would have an environmental response to RNAi [17].The results could potentially mirror the relationship between C. elegans and C. briggsae, since Panagrolaimus superbus has been shown to have an RNAi response [16].Work in determining whether there is an environmental response is being done in Otago, but so far the results have been inconclusive (A.Seybold, per.comm.).If the information hinted at in the in silico search is correct, it would be disappointing, since any lack of *see [66].
`Independent clustering of transcript similarity can be seen in Figure 3. doi:10.1371/journal.pone.0104526.t002 Table 3.An in silico search of the P. davidi transcriptome for the genes associated with RNAi (following [69]).response would imply similar difficulties with the soaking method [70].While microinjection is still a possibility in providing an RNAi response [14], it is unlikely to be of help in understanding an environmental response where large numbers of nematodes are needed to provide a statistical indication of survival.With this, the first large scale molecular work done to date on P. davidi, we now have the information to begin exploring the physiological adaptations of this extraordinary nematode in greater depth.
File S7 Cd-hit clustering of the 26 LEA transcripts and the resulting 13 represented amino acid sequences.(PDF) File S8 Sequences of the HSP-70 and HSP-70-like genes found in P. davidi.(PDF)

Figure 1 .
Figure 1.SEED subsystem analyses of the transcriptome.The y-axis indicates the percentage of the total annotated set represented by the specific category.doi:10.1371/journal.pone.0104526.g001

Figure 2 .
Figure 2. Browser of the scaffolds and their transcript alignments, housed at genomes.bas.ac.uk.The browser will be updated during the continued development of the genome assembly.doi:10.1371/journal.pone.0104526.g002

Table 1 .
Associated P. davidi transcripts matching genes significant for cold tolerance.Figure 3. UPGMA clustering of the LEA transcripts and ESTs from Table2.The colouring represents those transcripts and ESTs that aligned on the same scaffolds, as shown in Table2.The close clustering and independent scaffold alignment provides evidence of distinct LEA genes. doi:10.1371/journal.pone.0104526.g003