Discovery of Nuclear-Encoded Genes for the Neurotoxin Saxitoxin in Dinoflagellates

Saxitoxin is a potent neurotoxin that occurs in aquatic environments worldwide. Ingestion of vector species can lead to paralytic shellfish poisoning, a severe human illness that may lead to paralysis and death. In freshwaters, the toxin is produced by prokaryotic cyanobacteria; in marine waters, it is associated with eukaryotic dinoflagellates. However, several studies suggest that saxitoxin is not produced by dinoflagellates themselves, but by co-cultured bacteria. Here, we show that genes required for saxitoxin synthesis are encoded in the nuclear genomes of dinoflagellates. We sequenced >1.2×106 mRNA transcripts from the two saxitoxin-producing dinoflagellate strains Alexandrium fundyense CCMP1719 and A. minutum CCMP113 using high-throughput sequencing technology. In addition, we used in silico transcriptome analyses, RACE, qPCR and conventional PCR coupled with Sanger sequencing. These approaches successfully identified genes required for saxitoxin-synthesis in the two transcriptomes. We focused on sxtA, the unique starting gene of saxitoxin synthesis, and show that the dinoflagellate transcripts of sxtA have the same domain structure as the cyanobacterial sxtA genes. But, in contrast to the bacterial homologs, the dinoflagellate transcripts are monocistronic, have a higher GC content, occur in multiple copies, contain typical dinoflagellate spliced-leader sequences and eukaryotic polyA-tails. Further, we investigated 28 saxitoxin-producing and non-producing dinoflagellate strains from six different genera for the presence of genomic sxtA homologs. Our results show very good agreement between the presence of sxtA and saxitoxin-synthesis, except in three strains of A. tamarense, for which we amplified sxtA, but did not detect the toxin. Our work opens for possibilities to develop molecular tools to detect saxitoxin-producing dinoflagellates in the environment.


Introduction
Saxitoxin and its derivatives (STX) are environmental neurotoxins, with significant economic, environmental and human health impacts.An estimated 2000 cases of human paralytic shellfish poisoning, with a mortality rate of 15%, occur globally each year [1].The costs of monitoring and mitigation of STX have led to an annual economic loss from harmful plankton blooms calculated at US $895 million [2].
A striking feature of STX is that these compounds are synthesised by organisms from two kingdoms of life.They are produced by eukaryotic marine dinoflagellates and by prokaryotic freshwater cyanobacteria [3,4].The toxins appear to be synthesized by similar processes; precursor incorporation patterns and stereochemistry are identical in cyanobacteria and dinoflagellates [5].
In contrast to cyanobacteria, the genetic basis for STXproduction in dinoflagellates has remained elusive.Many studies have attempted to identify genes or enzymes involved in this pathway; through enzymatic characterisation [13,14], PCR approaches [15,16,17,18], in silico analyses of expressed sequence tag (EST) libraries [8,19], or of other nucleotide sequences publicly available [20].Despite these efforts, only one EST from the STXproducing Alexandrium catenella strain ACC07 has been identified as homologous to the N-terminal end of sxtA [8].SxtA is the unique starting gene of STX-synthesis in cyanobacteria.It has four catalytic domains with predicted activities of a SAM-dependent methyltransferase (sxtA1), GCN5-related N-acetyltransferase (sxtA2), acyl carrier protein (sxtA3) and a class II aminotransferase (sxtA4) [6].The origin of this unique enzyme may be chimeric: the domains sxtA1-3 are most similar to extant proteobacterial sequences, whereas sxtA4 may have a separate origin, possibly in actinobacteria [8].
Currently, it is unclear whether the synthesis of the same STX compounds, apparently via the same biosynthetic processes in bacteria and eukaryotes is a result of convergent evolution, horizontal gene transfer, or due to autonomous STX-production by bacteria associated with the dinoflagellate cell.The latter hypothesis has been investigated by a multitude of studies, but the results are conflicting.Some studies report an autonomous synthesis of STX by bacteria isolated from dinoflagellate cells (reviewed in [21]), whereas others show that axenic cultures of dinoflagellates may also produce STX [22].In addition, methods used for measurements of bacterial STX lacked specificity, since compounds originally thought to be STX, have later been shown to be imposters [23,24,25,26].
To clearly establish whether STX is produced by dinoflagellates it is necessary to identify the genes responsible for STX-production in STX-producing dinoflagellate cultures.The gene and transcript structure of bacteria and dinoflagellates are strikingly different.In dinoflagellates, genes may occur in multiple identical or nonidentical copies (e.g.[27,28]).The copy-number and sequence variation is reflected in their transcriptomes (e.g.[29,30]).Dinoflagellate transcripts of nuclear encoded genes have polyAtails and a unique dinoflagellate specific spliced-leader (SL) sequence [31], traits that have not been reported in bacteria.Spliced-leader sequences are small, non-coding RNAs that are trans-spliced onto the 59end of mRNAs.In dinoflagellates, all nuclear-encoded genes appear to be trans-spliced with a conserved 22 base pair (bp) leader sequence, the dinoflagellate-SL [28,31,32].This process converts polycistronic transcripts into translatable monocistronic mRNAs [33].In contrast, bacterial transcripts may be polycistronic, such as the sxt gene cluster of C. raciborskii T3, where 24 genes are transcribed into five different mRNAs [34].
To identify sxt genes from two STX-producing Alexandrium cultures, we sequenced a large number of transcripts using highthroughput sequencing technology.In addition, we used in silico transcriptome analyses, rapid amplification of cDNA ends (RACE), qPCR and conventional PCR coupled with Sanger sequencing.These multiple approaches successfully identified genes required for STX-synthesis in dinoflagellates and show that these eukaryotes are able to produce STX autonomously.

Culturing and toxin measurements
Saxitoxin-producing and non-producing dinoflagellate cultures were obtained from various culture collections (Table 1).Cultures were maintained in GSe [35] or L1 media [36] at 16-20uC, under a 12/12 light cycle, and a photon irradiance of ,100 micromoles of photons m 22 s 21 .Toxicity of strains was determined using HPLC at the Norwegian Veterinary Institute, Oslo, Norway [37] or LCMS at the Cawthron Institute, Nelson, New Zealand.The detection limit of the HPLC method ranged from about 0.07 mg STXeq/100 g for C1 and C3 to 4.1 mg STXeq/100 g for GTX1.The detection limit for the LCMS method ranged from about 0.1 pg/cell for NEO and STX to 0.5 pg/cell for C1 and C2.

RNA and DNA extraction
To isolate total RNA for the 454-library construction (see below), cultures of Alexandrium fundyense Balech CCMP1719 and Alexandrium minutum Halim CCMP113 were harvested in exponential phase through centrifugation (1 min, 10006g, 12uC).Cells were washed with PBS, exposed to bead-beating on dry ice with the Fast Prep bead-beater from Medinor (20 s, speed 4) using 1.4 mm beads (Medinor) and total RNA was extracted with the ChargeSwitchH Total RNA Cell kit (Invitrogen) according to the manufacturers' protocol.
For RACE analyses, polyA-enriched mRNA was isolated using the Dynabeads DIRECT kit (Invitrogen).Cells were harvested by centrifugation (2 min, 4uC, 160006 g), were washed twice with PBS, the lysis/binding buffer was added, and this was homogenised using the bead-beater (20 s, step 4).After centrifugation (1 min, 4uC, 160006 g), the clear homogenate was transferred to the Dynabeads mix and the mRNA isolated according to protocol.Finally, mRNA was treated with TURBO TM DNase (Ambion) according to the protocol supplied.
Genomic DNA was isolated from all dinoflagellate strains listed in Table 1 by either using the Genomic DNA plant ChargeS-witchH kit (Invitrogen) according to the manufacturer's protocol, or by the CTAB method [38].Quality and quantity of RNA and DNA were determined using a Nanodrop spectrophotometer (ThermoScientific), by amplifying control dinoflagellate genes (cytochrome b, actin) and/or by visualizing them on an ethidium bromide stained agarose gel.

cDNA library construction, 454 sequencing, assembly and analyses
Normalized polyA-enriched complementary DNA (cDNA) libraries with 454 adapters attached at each end were constructed commercially by Vertis Biotechnologie AG (http://www.vertisbiotech.com/).Half a plate each of A.fundyense CCMP 1719 and A. minutum CCMP113 libraries were sequenced using Roche 454 sequencing TITAN technology at the Norwegian High-Throughput Sequencing Centre (http://www.sequencing.uio.no/).Only 454 reads that possessed at least one cDNA adaptor were considered further.Adaptors and, where present, full and partial dinoflagellate spliced-leader (SL) sequences were removed prior to assembly using an in-house PERL script which is now integrated in the bioinformatic tool CLOTU [39].Reads were assembled using the software program Mira v3.0.5 [40] with the main switches 'denovo', 'est', 'accurate' and '454'.
To identify putative sxt gene sequences within the two 454 libraries, custom BLAST searches were performed at the freely available online data portal 'Bioportal' (www.bioportal.no).Two strategies were used: the cyanobacterial sxt genes were queried either against the assembled Alexandrium datasets or the unassembled 454 read datasets.All hits with an e-value,0.1 were extracted and the sequence with the lowest e-value for each gene was blasted against the non-redundant protein database at NCBI.
For sxtA, all retrieved sequences were re-assembled in the software program CLC Bio Main Workbench, using a minimum overlap of 10 bp and low or high alignment stringency.Resulting contig sequences were blasted against the non-redundant and EST databases at NCBI using algorithms blastn, blastx and tblastx.The structure of sxtA transcripts was determined by aligning their translated sequence to sxtA from cyanobacteria, as well as by conserved domains searches (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi).Catalytic and substrate-binding residues of sxtA from cyanobacteria have been previously determined [6,41].The transcripts were searched for the presence possible signal peptides and corresponding cleavage sites using the neural networks and hidden Markov models implemented in SignalP 3.0 ( [42] http:// www.cbs.dtu.dk/services/SignalP/) and the 3-layer approach of Signal-3L ( [43] http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/).Transmembrane helices were explored using TMHMM server 2.0 (http://www.cbs.dtu.dk/services/TMHMM/) and hydrophobicy profiles with Kyte-Doolittle plots [44].

RACE analyses
Primers were designed in conserved regions of the contigs with high similarity to sxtA using Primer3 software (http://frodo.wi.mit.edu/primer3/; Table 2).First-strand cDNA was synthesized with ,95 ng polyA-enriched mRNA using the adaptor primer AP according to the manufacturer's instructions for transcripts with high GC content (39RACE System, Invitrogen).Following RNase H treatment, the RACE product was 1:10 diluted and used as template for PCR.To amplify the 59end of the transcript, three different protocols were used.First, the method of Zhang et al. [31] was used with slight modifications: the 39RACE library described above was amplified with the primers AUAP (adapter primer supplied with the kit) and dinoSL [31] to enrich for full transcripts (PCR program: 94uC -60 s; 306(94uC -30 s, 68uC -5 min); 68uC -10 min; 8uC hold; PCR chemistry see below).The PCR product was 1:10 diluted and used as template in nested PCRs, which were amplified using the dinoSL primer as forward and several different internal reverse primers (Table 2).Further, we used the two kits 59RACE System (Invitrogen) and the GeneRacer kit (Invitrogen), using the provided 59Adapter primers and several different internal reverse primers (Table 2).All products were cloned and sequenced as described below.

SxtA1 and sxtA4 genomic amplification
All dinoflagellate strains (Table 1) were tested for the presence of putative sxtA1 and sxtA4 genes.PCRs were run using gDNA according to the protocol described above.The sxtA1 fragment was amplified with primers sxt001 & sxt002 (,550 bp) and the sxtA4 fragment with the primers sxt007 & sxt008 (,750 bp) (Table 2).

Phylogenetic analyses
Dinoflagellate nucleotide sequences were aligned manually using MacClade v4.07 [45] considering the coding sequence in the correct reading frame before being translated to the corresponding amino-acid sequence.The dinoflagellate amino acid sequences were subsequently aligned, using MAFFTv6 L-INS-I model [46] to the orthologous sxt sequences for cyanobacteria, in addition to a selection of closely related NCBI nr Blastp hits, constituting the outgroup.Resulting alignments were checked manually and poorly aligned positions excluded using MacClade v4.07 [45].
ProtTest v2.4 [47] determined WAG as the optimal evolutionary model for all inferred alignments.Maximum Likelihood (ML) analyses were performed with RAxML-VI-HPCv7.2.6, PROT-CATWAG model with 25 rate categories [48].The most likely topology was established from 100 separate searches and bootstrap analyses were performed with 100 pseudo-replicates.Bayesian inferences were performed using Phylobayes v3.2e [49,50] under the same substitution model with a free number of mixing categories and a discrete across site variation under 4 categories.Trees were inferred when the largest maximum difference between the bipartitions (chains) was ,0.1.All model estimation and phylogenetic analyses were done on the freely available 'Bioportal' (http://www.bioportal.uio.no/).

Copy number determination
Triplicate 200 ml batch cultures of Alexandrium catenella strain ACSH02 were grown as previously described, and abundance was counted every three days using a Sedgewick-Rafter chamber and inverted light microscope (Leica Microsystems).Ten ml samples for gDNA extraction were taken in early exponential, late exponential and stationary phase.
Primers suitable for qPCR were designed based on conserved regions in an alignment of A. fundyense and A. minutum 454 reads covering the sxtA4 region using Primer 3 software amplifying a 161 bp product.qPCR cycles were carried out on a Rotor Gene 3000 (Corbett Life Science) using SYBR Green PCR Master Mix (Invitrogen).qPCR assays were performed in a final volume of 25 ml volume consisting of 12.5 ml SYBR Green PCR master mix, 1 ml of template DNA, 1 ml of each primer pair, 1 ml of BSA and 8.5 ml of MilliQ water.qPCR assays were performed in triplicate with the following protocol: 95uC for 10 s, and 35 cycles of 95uC for 15 s and 60uC for 30 s. Melting curve analysis was performed at the end of each program to confirm amplification specificity, and select PCR products were sequenced.The standard curve was constructed from a 10-fold dilution series of a known concentration of fresh PCR product, ranging from 2-2610 25 ng.The molecules of PCR product were determined: (A66.022610 23)6(6606B) 21 with A: concentration of PCR product, 6.022610 23 : Avogadro's number, 660: average molecular weight per base pair and B: length of PCR product.The number of molecules in the unknown samples were determined and divided by the known number of cells in the qPCR template to obtain copy number per cell.The detection limit was around 5000 copies of the gene sequence (i.e.,20-30 cells per assay, each with ,200 copies of the sequence).However, the analyses were run with 10-100-fold this number of cells, and thus not run at or close to the detection limit.

Results
Identification of sxt sequences in the transcriptome of A. minutum and A. fundyense 454 sequencing resulted in 589,410 raw reads for A. minutum and 701,870 raw reads for A. fundyense (SRA028427.1:samples SRS151150.1 and SRS151148.1,respectively).After quality control, the reads were assembled into 44,697 contigs and 539 singletons for A. minutum and 51,861 contigs and 163 singletons for A. fundyense.The contig lengths and GC contents were similar for both libraries: the mean sequence lengths (6 SD) of 669 bp (6360) and 678 bp (6361) and a GC content of 59% and 58% were calculated for A. minutum and A. fundyense, respectively.
Searching the unassembled 454 cDNA library reads with the cyanobacterial sxtA gene resulted in 94 hits for A. fundyense and 88 hits for A. minutum, respectively.The same search on the assembled datasets returned 10 contigs from the A. fundyense and 9 from the A. minutum library.After pooling of all sequences and re-assembly, two contigs showed a high similarity to sxtA from cyanobacteria: one to the domain sxtA1 (contig length = 1450 bp, GC = 60.1%,bit score = 213, e-value = 5e 261 ) and the other to sxtA4 (contig length = 1059 bp, GC = 65%, bit score = 195, e-value = 1e 247 ).Both contigs contained sequences from both Alexandrium libraries, but neither contained a full ORF, a dinoflagellate spliced leader sequence or a polyA-tail.The two contigs were used to design sxtA1 and sxtA4 primers for genomic amplification, RACE analyses and sequencing.
The results of the in silico search for the remaining core sxt genes are summarized in Table 3. Apart from sxtA, contigs with a good alignment score (bit score .55)and a highly significant e-value (,e 220 ) were recovered for the amidinotransferase gene sxtG in both libraries.Re-blasting the contigs with the lowest e-values against the NCBI nr protein database showed that the most similar gene was an actinobacterial glycine aminotransferase, while the similarity to sxtG from cyanobacteria was less but still highly significant (Table 3).For the core biosynthesis genes sxtB, sxtF/M, sxtH/T, sxtI, sxtR and sxtU, contigs with an e-value#0.1 were recovered from both Alexandrium libraries, while sxtS only had a hit in the A. minutum library (Table 3).No matches were recovered for sxtC, sxtD and sxtE in either of the libraries.SxtC and sxtE are unknown proteins and sxtD is a sterol desaturase-like protein [6].It is possible that dinoflagellate proteins with no similarity to the cyanobacterial genes carry out their function.Alternatively, these genes were not present in our dataset.While our dataset is comprehensive, it is not complete.For example, some regions of the sxtA transcripts were also not recovered in the 454 dataset, but only obtained through RACE analyses (see above).Re-blasting against NCBI nr protein database retrieved hits to proteins for sxtB (A.fundyense only), sxtF/M, sxtH/T, sxtI, and sxtU that are similar to those encoded in the corresponding cyanobacterial sxt genes.The actual sequence similarity was less conserved and no significant hits between the Alexandrium contigs and the cyanobacterial sxt genes were observed.
Transcript structure of sxtA in A. fundyense The RACE experiments resulted in two different sxtA -like transcript families.Both had dinoflagellate spliced-leader sequences at the 59end and polyA-tails at the 39end, but they differed in sequence, length, and in the number of sxt domains they encode.The shorter transcripts encode the domains sxtA1, sxtA2 and sxtA3, while the longer transcripts encodes all four sxtA domains, which are also encoded by the cyanobacterial sxtA gene (Fig. 1).
The consensus sequence of the shorter transcripts was 3136 bp excluding polyA-tail.Eight clones with SL-leader were sequenced, and three different 59UTRs were uncovered.The sequences were almost identical; however, one clone had a 15 bp and another had a 19 bp insert exactly following the SL-sequence.The two sequence inserts were, apart from the length, identical.The nine 39UTR that Given are: the number of contigs with an E-value#0.1 present in each library; the top blastX hit, its accession number, taxonomy, score and E-value when the top contig is blasted against the non-redundant protein database of NCBI, as well as the closest hit to sxt genes from cyanobacteria from the same analysis.The GC content of the two Alexandrium sxtA transcripts was consistently higher than the cyanobacteria sxtA genes (Fig. 2).The GC contents were 69% (long transcript), 62% (short transcript) and 43% (all cyanobacteria sxtA genes).
All algorithms predicted the presence of signal peptides (SP) and corresponding cleavage sites for both transcripts (Supporting Information S3).However, transmembrane helices that may indicate class I transit peptides in dinoflagellates [51] were not predicted.Neither of the transcripts matched the criteria for class II and class III transit peptides [51].
The Genbank accession numbers are JF343238 for the short and JF343239 for the long sxtA transcripts (majority rule consensus sequences), and JF343357-JF343432 for the remaining cloned RACE sequences of A. fundyense CCMP 1719.

Phylogeny of dinoflagellate sxtA1 and sxtA4 sequences
The sxtA1 and sxtA4 primers designed in this study (Table 2) amplified single bands of ,550 bp (sxtA1) and ,750 bp (sxtA4) length in 18 Alexandrium strains comprising five species and two Gymnodinium catenatum strains, which had a range of toxicities (Table 1).No sxtA1 or sxtA4 PCR products were amplified for five non-STX-producing Alexandrium affine and Alexandrium andersonii strains, nor for non-STX-producing dinoflagellate strains of the genera Gambierdicus, Ostreopsis, Prorocentrum, Amphidinium (Table 1).These PCR-based results are generally in agreement with the toxin measurements.However, sxtA1 and sxtA4 fragments were amplified from the genomic DNA of four A. tamarense strains (ATCJ33, ATEB01, CCMP1771, ATBB01) in which no STX were detected (Table 1).
The phylogenetic analyses of sxtA1 (Fig. 3; Supporting Information S1) show that all sxtA1 sequences formed one fully supported cluster, divided into two sub-clusters.Some clones of the same strain were identical, however, slightly different clones were observed for most strains (Supporting Information S1).These different clones were distributed throughout the phylogeny, generally without species-or strain-related patterns.Only sequences from G. catenatum formed a tight branch within one of the sub-clusters.The closest relatives to the dinoflagellate cluster were the cyanobacterial sxtA genes and proteobacterial polyketide synthases (Fig. 3; Supporting Information S1).
All sxtA4 sequences formed one well-supported cluster, with clones from the same strain distributed throughout (Fig. 4; Supporting Information S2).The cyanobacterial sxtA genes and actinobacterial aminotransferases formed the closest sister clades.
The Genbank accession numbers for the genomic sxtA1 and sxtA4 fragments are JF343240-JF343356.

Copy number and polymorphisms of sxtA4
Between 100-240 genomic copies of sxtA4 in A. catenella were found in triplicate batch cultures of ACSH02 collected at three time points with different growth rates, based on the qPCR assay (Fig. 4b).
Analysis of a 987 bp contig, which covered the sxtA4 domain and was based on A. fundyense 454 reads revealed at least 20 single nucleotide polymorphisms (SNPs), 15 of which were silent.SNPs were defined as a base pair change that occurred in at least two of the reads.Homopolymer stretches and indels were ignored.

Sxt genes are encoded in dinoflagellate genomes
Until recently, the unusually large (1.5-200 pg DNA cell-1; [52]) and highly divergent genomes of dinoflagellates have hindered attempts to determine the genetic basis of their toxin production.Recent estimates predict that dinoflagellate genomes contain between 38,000 and almost 90,000 protein-encoding genes [53], which correspond to 1.5-4.5 the number of genes encoded in the human genome [54].Advances in sequencing technology have made it possible to efficiently investigate the complex transcriptome of dinoflagellates.The results of sequencing .1.2 million ESTs in this study demonstrate that close homologues of the genes involved in STX biosynthesis in cyanobacteria are also present in STX-producing dinoflagellates (Table 3).To further confirm their dinoflagellate origins, we investigated sxtA, the unique starting gene of the biosynthesis pathway [6].The transcriptome of A. fundyense CCMP 1719 contained two different transcript families that had the same domain architecture as sxtA in cyanobacteria.The two transcript families varied in length, sequence, and the number of catalytic domains they encode.The longer transcripts contained all four domains present in the known cyanobacterial sxtA genes, however, the shorter transcripts lacked the terminal aminotransferase domain (Fig. 1).In contrast to bacterial transcripts, both transcript families possessed eukaryotic polyA-tails at the 39end and dinoflagellate spliced-leader sequences at the 59end.Thus, our results clearly show that at least sxtA, and possibly other sxt genes, are encoded in the nuclear genome of dinoflagellates and that STX-synthesis in dinoflagellates does not originate from cocultured bacteria.As has been proposed, these bacteria may still, however, play an important role in modulating STX biosynthesis in dinoflagellates [22,55].
The signal peptides identified in both transcripts indicate a specific targeting of both Sxt products.Many genes in the nuclear genomes of dinoflagellates are plastid-derived and their products targeted to the plastid (e.g.[51]).These proteins are translated in the cytosol and then transported to the plastid through the plastid membranes.In peridinin-containing dinoflagellates like Alexandrium, this process requires the presence of signal and transfer peptide motifs [51].Both sxtA transcripts are predicted to contain signal peptides, but transfer-peptide structures were not identified.Thus, it seems that both sxtA proteins are targeted out of the cytosol, but the region of target need to be experimentally investigated.
The dinoflagellate sxtA transcripts did not only differ from the cyanobacterial counterparts by the presence of signal peptides, SL sequences and polyA-tails, but also in their GC content.The A. fundyense ESTs had a considerably higher GC content (Fig. 2).Transcribed genes from Alexandrium species have been reported to have an average GC content .56%[56,57,58,59], while filamentous cyanobacteria, such as the STX-producing genera Cylindrospermopsis, Anabaena, Aphanizomenon and Lyngbya, have a genomic GC content around 40% [9,60,61].This indicates that the GC content of sxtA has diverged significantly from the progenitor stxA possessing ancestor, in line with the rest of the genome in these microorganisms.
Recent analyses of the codon usage patterns of the STXproducing A. tamarense strain CCMP 1598 suggest that mutational bias, translational selection, hydropathy and aromaticity influence the selection of codon use in this species, however, codon usage also differs between high and low level expressed genes [59].The involvement of the two different sxtA transcripts and their role in STX-synthesis is presently unclear, but the differences in GC content (Fig. 2) indicate that they are under different selection pressures.
The non-identical copies of sxtA: variation at the genome and transcriptome level One typical feature of dinoflagellate genomes is that genes may occur in multiple copies, which may or may not be identical [27,28].This is possibly related to highly unusual genetic mechanisms such as the recycling of processed cDNAs [62].It appears that sxtA also occurs in multiple copies within dinoflagellate genomes.We estimated that 100-240 copies of the sxtA4 domain were present in the genomic DNA of A. catenella ACSH02 (temperate Asian ribotype).The copy number differences detected throughout the cell cycle are likely related to the growth rate of the batch culture and the proportion of cells in various cell cycle phases.All genomic sxtA4 sequences from 15 different Alexandrium and one G. catenatum strains formed one well-supported phylogenetic cluster, with several slightly different clone sequences of the same strain distributed throughout the tree.SxtA1 was also found to occur in multiple, non-identical copies in all strains analysed (Supporting Information S1).Further, the separation of the dinoflagellate sxtA1 cluster into two sub-clades indicates that sxtA1 may be encoded by two separate gene classes, at least in some strains.
The genomic variation of sxtA is also present in the Alexandrium transcriptomes.Adding the transcriptome data to the sxtA1 tree showed that the upper clade corresponds to the longer sxtA transcripts, whereas the lower clade corresponds to the shorter transcripts (Fig. 3, Supporting Information S1).Analyses at the nucleotide level of the sxtA4 region in the transcriptome of A. fundyense revealed many of SNP sites, two-thirds of which were silent.This is in line with results of other EST studies of dinoflagellate species, showing that gene families can comprise members with similar but non-identical sequences [28,56].Results from previous studies also indicate that much of the variation observed at the nucleotide level does not translate into variation in peptide structure [63].Correlation between sxtA1, sxtA4 and saxitoxin production The putative sxtA1 and sxtA4 genomic sequences identified during this study were present in all STX-producing dinoflagellate strains analysed, including two G. catenatum and 14 Alexandrium strains of the species A. catenella, A. minutum, A. fundyense and A. tamarense.Neither of the two sxt fragments were amplified from two A. andersoni and three A. affine strains.Homologs were also not detected in Gambierdiscus australes, Amphidinium massartii, Prorocentrum lima, Ostreopsis siamensis and Ostreopsis ovata, none of which are known to produce STX (Table 1).
Despite the very good correlation between the presence of sxtA1 and sxtA4 and STX content for most of the strains analysed, these fragments are not unambiguous markers for toxicity (Table 1).Both fragments were also amplified from A. tamarense strains for which no STX-production was detected (Table 1).Furthermore, RACE analyses of A. tamarense strain CCMP1771 revealed that sxtA1 and sxtA4 were transcribed in this supposedly non-STXproducing strain (data not shown).
Several scenarios may explain the discrepancy between the presence of sxtA1, sxtA4 and toxin production: 1) other genes of the STX pathway are missing in these strains, 2) post-transcriptional mechanisms differ between STX-producing and non-producing strains, or 3) the amount of STX these strains produce is lower than the detection limit of the HPLC/MS toxin determination methods used.Scenarios 1) and 2) can only be investigated when all core genes of the STX pathway have been fully characterized in STXproducing species.Scenario 3) might be a possible explanation in some cases, since a very sensitive saxiphilin assay used to investigate A. tamarense strain ATBB01 found it to be toxic, whereas the HPLC methods used in the same study [64], as well as toxin assays in the present study did not detect STX in the same strain (Table 1).
Transcript abundance has been suggested to be positively related to the number of gene copies present in a dinoflagellate genome [28].Hence, it is possible that strains with low levels of STX have fewer copies of the sxt genes compared to those with greater STX-production.If this holds true, then the presence of sxtA1 and sxtA4 would indicate toxicity and molecular methods could be developed to detect STX-producing cells in the environment.

Evolution of STX-synthesis in eukaryotes and its role in the diversification of Alexandrium
The cyanobacterial sxt genes are highly conserved between cyanobacteria species and the gene cluster is thought to have arisen at least 2100 million years ago [12].Our results show that dinoflagellate sxtA transcripts that are phylogenetically closely related to a clade of the cyanobacteria sxtA sequences and other bacterial putative toxin-related genes (Fig. 3 & Fig. 4) also have the same domain structure as cyanobacterial sxtA genes (Fig. 1).We propose that this striking similarity is most likely due to a horizontal gene transfer (HGT) event between ancestral STXproducing bacteria and dinoflagellates.Within dinoflagellates, STX are produced by species of the genera Alexandrium and Pyrodinium, which belong to the family Gonyaulacaceae within the order Gonyaulacales, as well as by one species of the genus Gymnodinium, which belongs to the family Gymnodiniaceae in the order Gymnodiniales.Thus, these toxins are produced by two genera within one family and by a single species from a distant dinoflagellate order.This distribution of STX-synthesis within the dinoflagellates as well as the close relationship between Alexandrium and Gymnodinium catenatum sxtA sequences (Fig. 3, 4; Supporting Information S1, S2), suggests that the bacteria-to-dinoflagellate HGT likely took place prior to the origin of the genera Alexandrium and Pyrodinium, and was followed by a dinoflagellate-to-dinoflagellate transfer into G. catenatum.The extent of eukaryote-toeukaryote HGTs is often underestimated due to difficulties in detecting such events, however, recent work highlights the importance and prevalence of such gene transfers [65,66].
We were not able to resolve the relationship among the dinoflagellate sxtA sequences in this study, as most the internal nodes were not statistically supported (Supporting Information S1, S2).Therefore, it was not possible to determine with certainty whether the evolution of the sxtA genes mirrors that of the genus Alexandrium, or to determine the origins of a putative HGT from Alexandrium into G. catenatum.However, the sxtA1 and sxtA4 gene copies from multiple strains of G. catenatum, A. minutum, and A. catenella tended to be clustered by species indicating that their history reflects the evolution of these species.The non-amplification of sxtA1 and sxtA4 from the non-STX-producing species A. affine and A. andersoni may indicate that the sxtA genes have either been lost from these lineages or have mutated so much, that the primers developed here were not able to amplify them.
Our two Alexandrium EST datasets contained transcripts, which encoded homologs to the majority of core sxt genes identified from cyanobacteria (Table 3).Even though the similarity to the cyanobacterial sxt genes was often significant, it was much less than observed for sxtA.The closest hits were to other bacterial or eukaryotic genes present in the database.This indicates that different genes in the sxt pathway may have separate origins in dinoflagellates.Further work is required to elucidate the complex origins of this gene cluster and will lead to further advances regarding the genomes and molecular biology of these ancient and important microorganisms.
doi:10.1371/journal.pone.0020096.t003were sequenced were almost identical and the polyA-tail started at the same position in each clone.The domain structure of this shorter sxtA transcript was as follows: Amino acid residues 1-27 encode a signal peptide.Residues 28-531 correspond to sxtA1, which contains three conserved motifs (I: VDTGCGDGSL, II: VDASRTLHVR, III: LEVSFGLCVL).Residues 535-729 correspond to sxtA2 with the catalytic domains 557-W, 648-T, 663-H, 711-R; while sxtA3, the final domain of the short transcript, corresponds to residues 750-822 with the phosphopantetheinyl attachment site 783-DSL-785.The consensus sequence of the longer sxtA transcript was 4613 bp (majority rule, longest 39UTR, without polyA-tail, Fig.1).Five clones with SL-sequences were characterized.One of those had a slightly divergent SL-sequence with an A at position 15 instead of the usual G.All 59UTRs were 97 bp long (excluding SL sequence) and almost identical in sequence.Each of the four 39clones sequenced had a different length (342, 407, 446 and 492 bp).The domain structure of the longer sxtA transcript was as follows: Amino acid residues 1-25 encode a signal peptide.Residues 26-530 correspond to domain sxtA1 with the three conserved motifs: I: VVDTGCGDG, II: VDPSRSLHV and III: LQGSFGLCML; residues 535-724 correspond to domain sxtA2, with the catalytic residues 556-W, 661-T, 693-H, 708-R; sxtA3 corresponds to the residues 763-539 where 799-DSL-801 is the phosphopantetheinyl attachment site; finally, domain sxtA4 corresponds to residues 894-1272.

Figure 1 .
Figure 1.The structure of sxtA in dinoflagellates and cyanobacteria.a) Transcript structure of sxtA transcripts in A. fundyense CCMP1719.b) Genomic sxtA structure of C. raciborskii T3.c) Structure of STX with bonds and molecules introduced by sxtA marked in bold.doi:10.1371/journal.pone.0020096.g001

Figure 2 .
Figure 2. GC content of A. fundyense sxtA transcripts and of cyanobacterial sxtA genes.GC content was calculated every 10 bp with a window size of 1000 bp.doi:10.1371/journal.pone.0020096.g002

Figure 3 .
Figure 3. SxtA1 phylogenetic tree.Schematic representation, drawn to scale (for full tree see Supporting Information S1).Maximum likelihood topology is shown.Numbers on nodes represent bootstrap values of maximum likelihood and Bayesian analyses, respectively.doi:10.1371/journal.pone.0020096.g003

Figure 4 .
Figure 4. SxtA4 phylogenetic tree and genomic copy number.a) Schematic representation of phylogenetic tree, drawn to scale (for full tree see Supporting Information S2).Maximum likelihood topology is shown.Numbers on nodes represent bootstrap values of maximum likelihood and Bayesian analyses, respectively.b) Genomic copy number of sxtA4 in A. catenella ACSH02 at three different time-points during the growth cycle.doi:10.1371/journal.pone.0020096.g004

Table 1 .
List of dinoflagellate strains used in this study, their production of STX and whether sxtA1 and sxtA4 fragments were amplified from their genomic DNA.

Table 2 .
Primers used in PCR and sequencing.

Table 3 .
Blast analyses of the core sxt genes from C. raciborskii T3 against the assembled A. fundyense and A. minutum 454 libraries.