Condition-specific RNA editing in the coral symbiont Symbiodinium microadriaticum

RNA editing is a rare post-transcriptional event that provides cells with an additional level of gene expression regulation. It has been implicated in various processes including adaptation, viral defence and RNA interference; however, its potential role as a mechanism in acclimatization has just recently been recognised. Here, we show that RNA editing occurs in 1.6% of all nuclear-encoded genes of Symbiodinium microadriaticum, a dinoflagellate symbiont of reef-building corals. All base-substitution edit types were present, and statistically significant motifs were associated with three edit types. Strikingly, a subset of genes exhibited condition-specific editing patterns in response to different stressors that resulted in significant increases of non-synonymous changes. We posit that this previously unrecognised mechanism extends this organism’s capability to respond to stress beyond what is encoded by the genome. This in turn may provide further acclimatization capacity to these organisms, and by extension, their coral hosts.


Author summary
RNA editing is a mechanism that alters the nucleotide sequence of transcripts; edited transcripts are thus different from what is encoded in the genome. If the edits occur in coding regions, non-synonymous changes will generate additional protein diversity. Environmental stress is one of the many triggers for RNA editing, and evidence for its role in stress response is emerging. Here, we identified nuclear-encoded genes undergoing RNA editing in S. microadriaticum, a dinoflagellate symbiont of reef-building corals. When subjected to several common environmental stressors, the editing frequencies of some genes shifted in a consistent pattern that was specific to the applied stress. Within these genes, the predicted edits were predominantly non-synonymous in nature, suggesting that RNA editing is introducing genotypic variability beyond the organism's haploid genome, extending its capability to respond to stress. Our data suggest that differential RNA editing is a process that contributes to the capacity of these algal symbionts to acclimate to environmental stress. Introduction RNA editing is a collection of co-or post-transcriptional processes that produce RNA sequences that differ from their DNA templates (excluding mRNA splicing, capping and polyadenylation). These processes provide organisms with an additional layer of post-transcriptional control, and typically manifest as the production of tissue-specific protein isoforms [1], play a role in caste determination [2], or to adapt to a new environment [3]. In contrast to these well-characterised editing events, a potential role for RNA editing in response to shortterm environmental change is just emerging. For instance, studies in human monocytes show specific induction of C-to-U edits in the human SDHB gene in response to hypoxic conditions [4]; in Drosophila, A-to-I edit frequencies have been shown to respond to heat stress [5]. However, for the latter, the changes were tightly linked to transcriptional silencing of the sole RNA editing enzyme encoded in the genome. Dinoflagellates are a highly diverse group of protists that thrive in both freshwater and marine environments. As one of the major primary producers in the world's oceans, they are an integral part of the marine food web [6]. Dinoflagellates of the genus Symbiodinium are best known for being a key symbiont in many marine invertebrates, including reef-building corals [7]. As such, they also contribute to the acclimatization potential of their host [8,9]. In dinoflagellates, RNA editing has been described in organelles since the turn of this century. The combined work of [10][11][12][13][14][15] established the presence of base-substitution edits in mitochondrial cox1, cob and cox3 genes; similarly, other studies [16-20, 75, 76] showed that numerous plastid-encoded genes are also subject to RNA editing. Evidence for editing on nuclear genes is, however, lacking.

Widespread nuclear RNA editing in the S. microadriaticum transcriptome
We generated 16 RNA-seq libraries from algal cultures subjected to 4 different conditions: 3 bleaching-relevant stressors (cold, heat and dark) and a control condition. A total of 313 million paired-end reads were mapped to the S. microadriaticum nuclear genome in order to detect RNA editing in the transcriptome (S1 Table). As S. microadriaticum is a haploid organism [21], we focused on identifying genomic positions with unexpected base compositions in the RNA-seq data that significantly departed from the expected reference base. We applied five stringent filters to reduce false positives. Briefly, we removed edits that were deemed more likely to be sequencing errors, and further required the edit frequency to be between 5% and 95% (edit frequency of 0% = not edited; 100% = completely edited). We then removed edits located in sequence contexts that were not unique, in the vicinity of splice junctions, or on ambiguous genomic loci that had had DNA-seq reads with non-reference bases mapped to them (for details, see Materials and Methods). This analysis identified 3,304 significantly edited sites (median coverage of 69×) in the S. microadriaticum nuclear genome (S2 Table). Out of the 3,304 sites, 2,547 (77.1%) occurred in 774 gene models (1.6% of all 49,109 gene models) enriched for genes that regulate metabolism, development and transport (S3 Table). The distribution of edits in genes was uneven: the top 15 genes accounted for 979 edits (Fig 1A), while the mode and median number of edits per gene were both 1. Taken together, these edited nuclear-encoded genes had 1 edited base per 1,000 bases, roughly 1-2 orders lower than that in dinoflagellate organelles (i.e. 1 edited base per 10-100 bases) [10,11,16,17,19]. We also identified 757 edits (22.9%) that resided in intergenic regions, potentially located within long non-coding RNAs or repetitive elements as shown recently [22,23], or unannotated genic regions.

All RNA editing types are present
We observed all combinations of N-to-N editing in the nuclear genome of S. microadriaticum (where N is any nucleotide base), unlike the specific A-to-I [24] and C-to-U [25] RNA edits prevalent in metazoans, or the expanded repertoire of changes observed in dinoflagellate organelles [10-14, 16-20, 75, 76]. Among the 12 possible RNA editing types, we observed more transitions than transversions (Fig 1B). This observation is similar to previous reports from dinoflagellate organelles (S4 Table), plant organelles [26], as well as non-A-to-G editing in humans [27]. This observation might be due to the conservation of RNA editing machineries across Eukarya. The distributions of edit frequencies within each edit type were fairly even, and the proportion of highly edited sites (edit frequency > 80%) in every edit type were similar (18.9% ± 3.6%, 1 SE) ( Fig 1B).

Distribution of RNA edits within genes is non-random
To further understand the spatial distribution of RNA edits, we focused on the 2,547 genic RNA edits to find out whether they exhibited positional tendencies. After normalising for the length of the edited genes, we found significantly more edits located at the 5' end (9.7% of all edits are in the initial 5%, p < 10 −29 , binomial test) (Fig 2A, red bars and line). This tendency was a product of preferential editing close to the 5' end of genes, since individual exons and introns displayed an even spread of editing events (Fig 2A, red vs. blue and green respectively). We also discovered that the edited sites clustered significantly (p < 10 −300 , χ 2 test), i.e. they tended to be closer to each other than expected by chance ( Fig 2B)-a similar trend is present in dinoflagellate organelles, and also in humans, as the bulk of ADAR-driven A-to-I edits are known to cluster tightly (termed "hyper-editing") [28].

Sequence motifs predict editing types
By utilising MEME [29], a motif identifier, we discovered several sequence motifs strongly associated with specific nucleotide conversions (S5 Table and S1 Dataset). In particular, we found that C-to-T, G-to-C and G-to-A editing sites harboured specific sequence motifs within a window of ±100 bp around the edited sites. Of note were motifs associated with C-to-T edits, which represented the best-scoring motifs derived from genic and exonic contexts with highly statistical significance (odds ratio > 15, p-value of < 10 −144 ). These motifs were present in 41% and 73% of all C-to-T edits in genes and exons respectively (S1 Fig). These sequences were, however, different from sequence contexts around C-to-T edits in plant organelles [30].

Condition-specific differential RNA editing
To test the intriguing possibility that S. microadriaticum uses condition-specific differential RNA editing as a mechanism of environmental acclimatization, RNA editomes from cultures subjected to three acute stressors (cold, heat and dark; illustrated as "16C", "36C" and "DS" in S2 Fig) were compared against cultures grown under regular conditions ("control"). To increase statistical power, we selected genes with at least 2 edits present over all 4 replicates in each condition for the assessment via Generalised Linear Models (GLMs) (see Materials and Methods). Each bar was further subdivided into equalwidth quintiles based on editing frequency. Lighter hues correspond to less frequently edited sites; darker hues to more frequently edited sites. For (B), sites with two or more editing events are lumped together in the "multiple" category. https://doi.org/10.1371/journal.pgen.1006619.g001 Out of 229 genes that passed our stringent filters, we identified 114 genes (50%) that are differentially edited in at least 1 stress condition relative to the control treatment (p < 0.05, S6 Table). This indicates that a large fraction of edited genes respond to different stressors by changing their editing patterns in a condition-specific manner. Enrichment analysis of these 114 genes indicates that proliferation-and stress-related proteins are prominent members among the differentially edited genes, supporting a function in stress response (S3 Table). Interestingly, 47 of the differentially edited genes responded to multiple stressors. Most genes (73 genes) were differentially edited under heat stress, and fewer genes were differentially edited under cold stress and dark stress (49 genes and 53 genes respectively, S6 Table).
At this juncture, we assessed the extent of two confounding factors of the observed changes in editing frequencies. Firstly, the editing machinery itself might be responding to the stressors, as exemplified by a study of five drosophilid species [5]. We contrasted the changes in editing frequencies under each stressor against the "control" condition for all sites (lighter bars, S3  Table). This observation, we argue, is unlikely to arise from a general shutdown (or activation) of the editing machinery. To obtain further support, we analysed Symbiodinium transcriptomes for candidate homologues of known proteins involved in RNA editing in metazoans and plants. We identified two PPR-like (pentatricopeptide repeat) [31] and three ADAR-like (adenosine deaminases acting on RNAs) [32] candidates in the S. microadriaticum transcriptome [33]. Similarly, these candidates were also present in the S. minutum transcriptome [34], suggesting that these candidates are conserved on a genus-wide basis (Fig 3, S5 Fig and S6 Fig). By passing the same RNA-seq reads through a differential expression pipeline, we were unable to detect differential expression of any of the five candidate RNA editing genes under any stressor (S8 Table). Condition-specific RNA editing in a dinoflagellate The second confounding factor arises from the differential stability of edited transcripts (relative to their unedited versions), exemplified by a study that reported the increased stability of ADAR-edited transcripts in humans [35]. Increases in editing frequency might thus be a side-effect of edited transcripts having a longer half-life in the cytosol. The current dearth of molecular tools in S. microadriaticum prevents us from replicating the same verification process used in that study. However, our differential expression analysis shows that the overall proportion of differentially expressed genes amongst all genes (2.4%) is neither significantly different from that of all edited genes (1.7%, Fisher's exact p = 0.23) nor differentially edited genes (1.8%, Fisher's exact p = 1) (S8 Table). This suggests that the differential editing of transcripts is largely independent of the differential stabilities of the edited transcripts, as the expression levels of edited genes are largely unchanged under stress.
In line with the idea that RNA editing plays a role in stress acclimatization, we found that differentially edited genes also displayed significantly more edits resulting in non-synonymous changes (p < 10 −11 ), and significantly less edits in intronic regions (p < 10 −12 ) when compared to all edited genes ( Table 1). Within coding regions of differentially edited genes, non-synonymous changes occurred at higher frequencies than synonymous changes (Fig 4) providing further indication that differentially edited genes have a propensity for edits that lead to changes in amino acid sequence.
For non-synonymous edits, the most prevalent changes tended to be from a non-polar side chain to another one (e.g. there were 62 arginine-to-valine and 48 proline-to-leucine). However, we also found many other changes that are likely to alter the structure of the protein. For example, there were 45 instances of the non-polar proline substituted by the polar uncharged serine; and 10 instances of the basic lysine altered to the acidic glutamate ( Fig 5).
To unequivocally confirm that transcripts are edited in a condition-specific manner, we designed PCR primers targeting five genes that were identified as differentially edited under stress. The PCR was performed on cDNA samples and genomic DNA to exclude the presence of un-sequenced genomic regions with similar sequences (n = 154 electropherograms, S3 Dataset). Sanger sequencing results of these amplicons confirmed the condition-specific changes of RNA editing in those genes: 239 of 375 (63.7%, binomial p < 10 −6 ) edited sites had PCR results tallying with the directional shift in editing under stress as observed in the RNAseq results ( Table). Almost all (97.6%) of the primary peaks detected in the PCR traces corresponded to either the genomic base or the edited base-the former Table 1. Distribution of RNA edits in condition-specific and all differentially edited genes. Intergenic edits are not considered; sites with multiple editing events are considered as separate effects. Category labels are as reported by SnpEff, and p-values were calculated using Fisher's exact test adjusted for multiple testing (Benjamini-Hochberg).

Condition-specific differentially edited genes
All genic edits Condition-specific RNA editing in a dinoflagellate occurring when edit frequencies were lower; the latter occurring when the edit frequencies were higher.

Discussion
In this work, we are the first to report RNA editing in genome-encoded transcripts of a dinoflagellate, which supplements similar observations made for dinoflagellate organellar genomes. Unlike metazoans and plants, where the vast majority of edits are A-to-I and C-to-U respectively, there is a greater variation of edit types in dinoflagellates. Previous work in dinoflagellate organelles reported all edit types are possible; similarly, in our data, we could observe all N-to-N edit types. These edit types were not distributed uniformly: the most frequently edited type (C-to-T) was approximately 20-fold more common than the rarest edit type (T-to-G). This is somewhat similar to previous reports from dinoflagellate organelles, where C-to-T, A-to-G and T-to-C edits were among the most frequent; however, exceptions such as A-to-T (19% in S. microadriaticum genome, 0-3% in organelles) were evident as well. While the presence of all edit types and overall spread of edit type frequencies suggest that the editing machinery might be shared between the host and its organelles, the exceptions would indicate that the latter utilises a subset of the RNA editing machinery available, or perhaps the more restricted sequence contexts in these organelles strongly favour certain edits and disfavour others. A confounding factor to the observed condition-specific RNA editing was the dysregulation of the editing machinery, as observed in Drosophila [5]. To assess that possibility, we performed in silico identification of candidate S. microadriaticum RNA editing proteins (PPR-like and ADAR-like) via sequence similarity. While these candidates appear contentious, especially since ADARs are thought to be restricted to metazoans [32], further in vivo verification of these candidates is stymied by the dearth of reliable molecular techniques, e.g. transformation and gene knockdowns in dinoflagellates. Regardless, two observations indirectly suggest that some of these deaminases retained similar functionality in S. microadriaticum. Firstly, edited sites in nuclear-encoded genes of S. microadriaticum appear more clustered than expected, similar to edits in dinoflagellate organelles [36] and A-to-I edits in humans [28]. Secondly, sequence motifs have been reliably discerned in the vicinity of C-to-T edits. These motifs are, however, unlike any that have been identified in metazoans [37][38][39] or plants [30,40], perhaps due to the evolutionary distance of dinoflagellates from metazoans and plants [41]. Condition-specific RNA editing in a dinoflagellate Based on our data, we argue that Symbiodinium is able to shift its RNA editome in response to environmental stress. The 114 genes that we identified in S. microadriaticum were differentially edited in response to at least one of three bleaching-relevant stressors: cold, heat or dark. This subset of genes had significantly more non-synonymous amino acid changes, and our observations are in line with the non-synonymous change in a K + channel of the octopus Pareledone sp. that allows it to adapt to colder waters [3]. The prevalence of non-synonymous edits has similarly been demonstrated in other organisms such as Drosophila [22] and the squid Doryteuthis pealeii [42]. For non-synonymous mutations, larger structural or functional changes are possible through the introduction of amino acids with very different properties. This translational flexibility may represent a previously unrecognised mechanism of acclimatization that provides additional phenotypic plasticity to the organism, and thereby, increases its ability to respond to environmental change in comparison to the genome-encoded gene product [43].
As more research is done on dinoflagellate biology, it becomes more and more apparent that dinoflagellates do not seem to play by the same rules as metazoans or plants. Few genes are commonly observed to respond to stress with a change in expression [44][45][46], possibly linked to the relative paucity of proteins harbouring transcription factor domains [47]. Rather, expression profiles seem to be fixed between different clades, species, or even strains within species, irrespective of physiological conditions [45,48]. Our data concurred with these observations: roughly 2% of all genes are differentially expressed under any of the three stressors, and this proportion is independent of editing state. Condition-specific RNA editing in a dinoflagellate How, then, does the organism respond to stress? One possible answer to this conundrum is the use of post-transcriptional control mechanisms, previously illustrated through the identification of an RNAi machinery and a diverse set of miRNAs in two Symbiodinium species [44,49]. We suggest that RNA editing is an additional genome-wide, post-transcriptional mechanism that modulates gene expression through shifts in the RNA editome.
What, then, would be the biological purpose of RNA editing in Symbiodinium? We postulate that RNA editing serves a dual function-firstly, it serves to regulate gene expression on a genome-wide level, as evidenced by a substantial number of S. microadriaticum genes (774 genes, 1.6% of all genes) being edited. Within genes, edited sites have a propensity to be at the 5' end of genes. Furthermore, we find that differentially edited genes that respond to environmentally relevant stressors show higher frequencies of non-synonymous changes. These observations might be tightly linked to the biological functions of these edits. Synonymous changes might affect gene expression via codon bias [50]; non-synonymous changes at the N-terminus signal peptide region will disrupt protein localisation, while changes in structural regions could affect the stability of the peptide. It is also likely that these edits exert larger effects when they are closer to the 5' end of the gene, and our data substantiates this postulate: the initial 5% of the gene contains 126 of 859 (14.2%) non-synonymous changes; comparatively, 26 of 263 (11.2%) synonymous edits are in the same region. Secondly, RNA editing putatively provides a low evolutionary cost system that introduces additional genetic variation above the coding capacity of the genome on which selection can act without the risk of generating lethal variations. This is especially important as Symbiodinium is haploid in the vegetative state [21], thus deleterious mutations are more likely to be costly.
In conclusion, it is a challenging prospect to identify true edits in an organism with a draft genome and unclear mechanistic origins of these edits. However, our conservative comparative transcriptomics of cultures under bleaching-relevant stressors have revealed bona fide RNA edits in the nuclear genome of S. microadriaticum. Our work highlights a novel role for RNA editing in providing a mechanism for dinoflagellates-and, by extension, their coral hosts-to acclimatize, and potentially adapt to, changing environments.

Methods
Growth conditions and experimental treatments of S. microadriaticum 12 independent S. microadriaticum cultures were grown in 200 ml of f/2 media (Sigma-Aldrich, St. Louis, MA) for 2 weeks in non-treated Nunclon Δ TripleFlasks (Thermo Scientific, Waltham, MA) at 26˚C. The cultures were illuminated at a constant irradiance of 250 μmol photons m -2 Ás -1 on a 12h:12h day:night light-cycle (daytime of 6 am-6 pm). Prior to the day where cells were collected, three bottles were wrapped in foil ("DS": dark stress for 6 hours); three bottles were placed in an incubator programmed to ramp up to 36˚C at 8 am ("36C": extreme heat stress for 4 hours); and another three were placed in an incubator programmed to cool down to 16˚C at 8 am ("16C": extreme cold stress for 4 hours). The remaining three bottles, labelled as "control", were not subjected to additional stresses (S2 Fig). Cells were pelleted in 50 ml Falcon tubes at 500g, decanted, and washed once by resuspending it with autoclaved, filtered seawater, and re-pelleted at 1,000g. The drip-dried pellet was snap-frozen in liquid nitrogen in preparation for RNA extraction.

mRNA sequencing and transcriptome
Total RNA from S. microadriaticum was extracted by grinding the cell pellets with 200-300 μl of 0.5 mm glass beads (BioSpec Products, Bartlesville, OK) in liquid nitrogen-chilled mortar and pestle prior to using the RNeasy Plant Mini Kit (Qiagen, Hilden, Germany). The quality of extracted total RNA was assessed using a Bioanalyzer 2100 (Agilent, Santa Clara, CA). Strandspecific RNA-seq libraries were generated from oligo-dT selected total RNA using the NEB-Next Ultra Directional RNA Prep Kit (New England Biolabs, Ipswich, MA) according to manufacturer's instructions. A total of 397 million paired-end read pairs (read length: 101 bp, insert size: 180 bp) were retrieved from 4 lanes on the HiSeq 2000 platform (Illumina, San Diego, CA).

Identification of significantly edited sites
The pipeline used to identify edited sites is summarised in S8 Fig. Prior to mapping, sequence adapters and low quality ends were trimmed using trimmomatic [51], resulting in 256 million reads. The first and last 6 bp of each read were trimmed to remove potential primer bias during library generation. Mapping was carried out with Bowtie2 [52] ("-fr -mp 1,1 -score-min L,0, -0.08 -end-to-end -very-sensitive"). We refrained from using a gapped-read mapper that maps across introns, as it led to many reads preferring to have artificially long "introns" to better match the reference genome downstream, potentially removing true edit locations if the read matched contiguously. The mapping efficiency is lower with a contiguous read mapper, but we are more confident that mapped reads have had the correct provenance assigned. Postmapping, reads were de-duplicated on a per-replicate basis, to avoid PCR amplification biases influencing editing frequencies. Significantly edited sites were identified across all stress conditions by combining the post-processed FASTQ sequences from this experiment (three biological replicates from four treatments) and another previously published dataset [44] (one biological replicate, identical set of four treatments) into one file. The merged file was analysed using REDItools [53] with several non-default parameters (-T 9-9 -e -W) in order to avoid potential biases in edit calling [54][55][56][57][58].
Previously, sequencing errors have been shown to be concentrated on the ends of the sequenced reads [56]. To reduce potential biases from sequencing errors, the first and last 9 bases were trimmed of each read from further processing using the "-T 9-9" parameter in REDItools (see S9 Fig for error rates from both ends). Non-uniquely mapped reads ("-e") were excluded to prevent reads mapping to paralogs; substitutions in homopolymeric regions ("-W") were also removed as it is hard to distinguish between an insertion (resulting from technical artefacts) and a substitution (possible bona fide editing) in these regions.
REDItools reports the number of reads that are matches and mismatches to the genomic sequence for each transcribed genomic position, and implements an algorithm to calculate the p-value of each site being significantly edited. However, we noticed that the resulting p-values clustered around certain values (0.25, 0.5, 0.75, 1) instead of exhibiting a continuous distribution. We therefore decided to calculate the p-values based on the statistical principles outlined in [2]. Briefly, we modelled the distribution of editing errors as a binomial distribution B(n, p), where n = total coverage of that site in question and p = 0.01 (corresponding to a Phred score of 20). We opted to use the maximum possible sequencing error rate instead of the average error rate, as we preferred to be more conservative in our cutoffs, trading sensitivity for accuracy. Thus, the p-value for a site with k edits in a total coverage of n reads was calculated as the probability of observing at least k edited reads in that site. The resulting p-values were corrected for multiple testing [59,60], and sites with post-corrected p < 0.01 were considered to be significant.
Post-multiple testing correction, we introduced five additional filtering criteria to address other potential biases. Firstly, sites that were edited below 5% and above 95% were removed. The former removes lowly edited sites that would be of less interest, while the latter guards against mis-sequenced bases in the reference genome (and also acts as a 5% filter even if the genomic position was not sequenced correctly). A minority of sites (5%) were reported to be edited to multiple bases. In order to separate bona fide multiple edited sites from those that might arise from sequencing errors, we required sites with multiple edits to have ! 20% overall editing frequency and ! 2 reads supporting each of the particular editing event.
Secondly, we observed that spliced reads do map contiguously to another locus with less sequence similarity, as that locus lacks the intron spanned by the exons present in the spliced read (the same read is unable to map across that intron in the actual locus), giving rise to edited sites that are false positive. To remove these, a window of 201 bp (100 bp upstream and downstream) centred on each edited site was subjected to a BLASTN search against the genome. If the window matched another locus (e-value of 10 −5 ), the site was removed from further consideration. We believe that this step is crucial for dinoflagellate genomes, as these genomes tend to contain many tandem repeats [61][62][63].
Thirdly, edited sites that were overrepresented on reads in a particular direction (> 85%) were removed-these sites mostly arose from spliced reads mapping into intronic regions in a unidirectional manner [54].
Furthermore, edited sites that fell into the first and last 9 bp of annotated introns were removed. This filter stems from the observation that the first few bases of introns tend to match the 5' end of the next downstream exon, while the last few bases of introns closely match the 3' end of the immediately upstream exon [61], producing false edited sites when spliced reads map into introns and the corresponding ends stop being similar.
Lastly, to further discount the possibility of edits resulting from SNPs or sequencing mistakes in the genome, eight paired-end DNA-seq libraries from S. microadriaticum (~823 million read pairs) were mapped against the genome with mapping parameters that mirrored that of the RNA-seq reads. We then opted to be extremely stringent by only permitting unambiguous genomic loci past the filter.
We noticed that REDItools reported RNA editing events with the implicit assumption that the gene models were on the Watson strand of the genome. Based on the gene models for the S. microadriaticum genome [33], edits that occurred in genes on the Crick strand were reversecomplemented via a script; edits that occur in genes on the Watson strand or intergenic regions were left as-is.

Clustering of RNA editing positions
We observed that the edited sites tended to be located in clusters instead of being randomly distributed within the gene. To support to this observation, for every gene that contained at least two significantly edited positions, we recorded the average distance of each edit to its immediate neighbours (upstream and downstream) in the same gene. If the edit was the most upstream or most downstream one in the gene, the distance to the nearest edit was recorded instead.
We contrasted this observed distribution to a null distribution of distances, which was produced by shuffling the positions of the edits randomly within the same gene without changing the number of edits in the gene. The Monte Carlo shuffling was repeated 10,000 times and subsequently averaged out to obtain an "expected distribution" of neighbouring distances.
A χ 2 test was performed on the observed and expected distributions to ascertain whether the distributional differences were statistically significant.

Identification of RNA editing motifs
In order to discover motifs that are strongly associated with RNA editing in S. microadriaticum, we generated training sets for the motif based analysis tool MEME [29]. While the RNA editing machinery has been well-described in plants and metazoans [31,64], these mechanisms do not explain many of the other edit types that we observe in S. microadriaticum. As the editing machinery is unknown to us at present, we ran motif searches on genic sequences (to mimic co-transcriptional editing) or exonic sequences (to mimic post-transcriptional editing).
Briefly, for each possible edit type (e.g. A-to-G, A-to-C, etc.), we extracted sequence contexts of 100 bp upstream and downstream of positions with the specific edit type (i.e. each window is 201 bp). Half of these sequences were randomly chosen as the training set while the other half served as the test set. In order to achieve a cleaner signal from the training set, sequence contexts containing more than one edit were moved to the test set. Potential motifs governing the specificity of the respective editing types were identified with MEME ("-dna -mod zoops -minw 6 -maxw 50 -nmotifs 6 -minsites 50") with the default e-value cutoff of 0.05. Using MAST, true positives were calculated from checking for the presence of these motifs in the test set (p < 10 −4 ); false negatives were obtained from testing against 10,000 randomly retrieved sequence windows (201 bp) from genes with no editing events (p < 10 −4 ).

Identification of differentially edited genes
Differentially edited genes are defined as genes that exhibit significant shifts in edit frequencies in some or all edited positions when subjected to stress relative to the "control" treatment. These genes were identified by fitting generalised linear models (GLMs) in R on a per-gene basis with the following formula: glmðedited; non edited $ growth condition Ã position; family ¼ 0 binomial 0 Þ where "edited, non_edited" refer to a two-column response variable containing the number of reads with and without RNA editing respectively at a particular position; "growth_conditions" and "positions" refer to predictor variables that stored the growth condition and the location of the edited base as factors. Data from each replicate were entered as individual rows instead of being pooled together, as we preferred to assign equal weightages to replicate observations (if the data was pooled, the replicate with the most mapped reads will have a bigger influence on the overall editing frequency). To minimise false positives, we filtered for genes containing ! 2 edit positions that have had reads mapping to these positions in all replicates.
This analysis was repeated for each stress condition ("16C", "36C" and "DS") relative to the "control" condition, with every gene having a p-value representing the probability that the difference in editing patterns across the four replicates of the stress condition relative to the four replicates of the control condition was due to chance (i.e. p approaches 0 if the editing frequency under stress were consistently different from control; p approaches 1 if editing frequency under stress mimics that of control). These p-values were corrected for multiple testing [59,60], and genes with corrected p < 0.05 were considered differentially edited in response to stress.

Identification of candidate RNA editing proteins
To identify proteins that are putatively involved in the RNA editing of Symbiodinium, known reference proteins from both metazoans and plants were retrieved from the UniProt SwissProt database [65] (S2 Dataset). Reference transcriptomes of S. microadriaticum [33] and S. minutum [34] were translated into protein sequences using TransDecoder (https://transdecoder. github.io) [66]. These open reading frames were then queried against reference RNA editing proteins using BLASTP (e-value 10 −5 ). For matches below the cutoff, putative Pfam protein domains were annotated via HMMER (http://hmmer.org); transcripts were only retained if they contained crucial protein domains with RNA editing activity. Protein alignments of known RNA editing domains were generated with Jalview [67], and coloured with the Clustal X Colour Scheme (http://www.jalview.org/help/html/colourSchemes/clustal.html), which assigns the same colour to amino acids of similar side-chain properties. Amino acids are only coloured if the per-site conservation is above a certain threshold.

Identification of differentially expressed genes
Similar to the identification of edited sites, trimmed RNA-seq reads were mapped against all S. microadriaticum gene models and eight candidate RNA editing homologues with kallisto v0.42.4 [68] using default parameters. Differentially expressed genes were subsequently identified using sleuth [69].

Functional analysis of genes with edited sites
To discern potential functional differences between the set of condition-specific edits and the overall set of edits, we used SnpEff [70] to annotate potential changes in our gene models that result from the respective sets of edits. For each editing location (and for each type of edit in sites with multiple editing types), SnpEff identifies the region where the edit lies in (intergenic, intronic, exonic) and the effect of the edit (e.g. synonymous substitutions, non-synonymous substitution, gain of stop codons etc.).

GO term enrichment
Functional enrichment analyses were performed using topGO (version 2.16.0) [71]. We used the default topGO "weight01" settings, and considered terms that had a p-value of < 0.05 as significant. The resulting p-values were not corrected for multiple testing as non-independent tests are carried out on each GO term by topGO [71].

Verification of edited genes
As the genome of S. microadriaticum is not complete, there is still a possibility that similar transcripts produced from unsequenced genomic regions will map to existing regions, and mismatched bases will be considered as false RNA edits. In order to provide additional verification that genes are edited, and also to qualitatively confirm the editing patterns in differentially edited genes, PCR primers were designed to target seven amplicons (each~300 bp containing~20 edits) in five genes (one amplicon for SmicGene12072, SmicGene12421 and SmicGene26211; two amplicons for SmicGene31324 and SmicGene36637). Regions containing edits that exhibited large differences in edit frequencies between stressed samples and controls were preferentially selected. We took exceptional care to avoid designing primers on top of other known edited sites, which would result in PCR amplification biases if edits tended to occur in tandem. To optimise amplification, a script was written to incorporate other considerations e.g. similar melting temperatures, similar GC%, and avoiding long stretches (! 5) of the same nucleotide, allowing us to brute-force primer combinations with Primer3 (v2.3.6) [72] to find primers satisfying all desired criteria.
Leftover RNA from library construction (3 replicates from "16C", 2 from "36C", 2 from "DS" and 3 from "control", i.e. 10 in total) were reverse-transcribed with SuperScript III First-Strand Synthesis SuperMix (Invitrogen, Carlsbad, CA) according to manufacturer's instructions. The cDNA mix was subjected to 40 cycles of PCR (with a melting temperature of 60˚C), loaded into a 1% agarose gel, and bands of the correct size were gel purified using QIAquick MinElute kit (Qiagen, Hilden, Germany) as per manufacturer's instructions.
To provide further support that RNA editing has occurred, the same amplicons were used to amplify genomic DNA extracted from a separate S. microadriaticum culture with the same genotype. PCR amplification with the same settings were carried out, and gel purified in the same manner. Sanger sequencing on all samples were performed in-house (KAUST Bioscience Core Lab). Per-position peak height ratios from the AB1 traces produced from Sanger sequencing was obtained using a manufacturer-provided online tool [73].

Accession numbers
Short read data generated for this study can be retrieved from the NCBI Short Read Archive (SRA), with the following BioSample accession numbers: SRS429458, SRS429460, SRS429463, SRS429465 and SRR3337493-3337505, or BioProject accessions of PRJNA205322 and PRJNA315758. Further detail is provided in S1 Table. The extent of RNA editing can be visualised on a genome browser, available at http://smic.reefgenomics.org [74]. The trends between all edited sites (pale green/red) is contrasted against the set of 114 differentially edited genes (dark green/red). In contrast to cold stress ("16C") and dark stress ("DS"), there is a noticeable preference for increased editing in dealing with heat stress ("36C").