Comparative De Novo transcriptome analysis of the Australian black-lip and Sydney rock oysters reveals expansion of repetitive elements in Saccostrea genomes

Ostreid oysters (the ‘true oysters’) represent a large and commercially important family of bivalve molluscs. Several species, such as the Pacific oyster (Magallana gigas), the American oyster (Crassostrea virginica), the European oyster (Ostrea edulis) and the Sydney rock oyster (Saccostrea glomerata), are currently farmed at a large scale. However a number of other species may also be suitable for commercial-scale aquaculture. One such species is the ‘black-lip oyster’, a large Saccostrea species of uncertain taxonomic affinity found in northern Australia. Here, phylogenetic analysis of the COI gene places this oyster within a clade identified in a previous study of Japanese Saccostrea species, ‘Saccostrea lineage J’. To facilitate comparisons between this oyster and the better-studied S. glomerata, de novo transcriptomes were generated from larval stages and adult tissues of both species. Patterns of orthology indicated an expansion of repetitive elements within Saccostrea genomes when compared to M. gigas and C. virginica, which may be reflected in increased evolutionary rates and/or genome sizes. The generation of high-quality transcriptomes for these two commercially relevant oysters provides a valuable resource for gene identification and comparison of molecular processes in these and other mollusc species.


Introduction
Aquaculture of ostreid oysters is a significant industry worldwide, with an estimated value of $US6.6 billion per annum (2016 data, [1]). Within Australia, the majority of production is focused on two species, the native Sydney rock oyster (Saccostrea glomerata), and the introduced Pacific oyster (Magallana gigas, formerly Crassostrea gigas) [2]. Production of each of these species is hampered by mass mortality events due to disease outbreaks, and significant effort is being expended towards selective breeding of disease resistant lines for both species [3][4][5][6]. An additional suggested course of action is to investigate other native species for aquaculture potential [7], which, depending on the species, may facilitate the establishment of the industry in new coastal regions (for example, the tropics). A prime candidate for tropical oyster aquaculture in Australia is the 'black-lip' oyster, a large Saccostrea species currently farmed on a small scale in Bowen, Queensland (John Collison, personal communication) and Darwin, Northern Territory [8]. The taxonomic status of this oyster is poorly defined, and it is variably reported in the literature as 'Saccostrea echinata' [8,9] or 'Striostrea (Parastriostrea) mytiloides' [2,10]. Given that unambiguous identification of Saccostrea oysters is challenging based on morphology alone [11,12], and that no molecular data exists for the species, this designation must be treated as tentative at this stage. It is assumed that the species currently farmed is the same as that reported from Magnetic Island, Queensland [9,13], New Caledonia [14], and Palau [15].
Aquaculture of the black-lip oyster has largely relied on natural catch of juvenile oysters (spat), and the lack of established hatchery protocols for this species is a major impediment to expansion of production. Recent reports indicate that the poor larval survival previously experienced in hatchery trials [9,14] has largely been overcome, however settlement rates remain low [8]. Current hatchery production protocols have been adapted from those developed for the Sydney rock oyster (Saccostrea glomerata) [16], and may not be optimal for black-lip oysters. Differences in settlement processes could be expected given that black-lip oysters are typically found as isolated individuals on mangrove roots and rocks [13,17], whereas S. glomerata is found in large aggregations [18]. Settlement of marine pelagobenthic invertebrates is a tightly regulated molecular process, whereby larvae at a particular genetically-determined developmental state (competence) are receptive to particular internal or external cues that initiate metamorphosis [19]. Improved understanding of this process in the black-lip oyster will be critical for the successful hatchery production of this species.
Molecular approaches can likewise be applied to other aspects of oyster husbandry, and have already demonstrated promise for the improvement of S. glomerata production. A selective breeding program for faster growth and disease resistance was initiated in New South Wales in 1990 [4,20,21]. Genes encoding anti-oxidant enzymes were shown to be differentially expressed between disease-resistant and wild-type oysters, highlighting the potential for marker-assisted selection [22]. Molecular techniques have also yielded tools for the manipulation of broodstock condition, a development of critical importance for the hatchery production of selected animals [23]. Given that the black-lip and S. glomerata belong to the same genus it is possible that some of the molecular techniques developed may be directly transferrable between the species.
The development of these molecular approaches relies on the availability of sequence data for the species of interest. A number of transcriptome studies have been performed for S. glomerata, investigating gene expression in different tissues and under different environmental conditions and stressors [23][24][25][26][27][28]. None of these transcriptomic studies have incorporated larval samples, therefore genes expressed exclusively in larval stages will not be identified in these transcriptomes. Aside from S. glomerata, transcriptome data exists only for one other Saccostrea species, S. palmula [29], limiting capacity for comparative analyses between taxa.
This study presents comprehensive transcriptomes derived from larval stages and adult tissues of both the black-lip and Sydney rock oysters. The transcriptomes were deemed to be of high quality, based on assembly completeness and comparison of universal benchmarking statistics against whole genome assemblies of M. gigas and Crassostrea virginica. Patterns of orthology were compared between the Saccostrea transcriptomes generated here and predicted proteins from the Crassostreinae (M. gigas and C. virginica) genomes. This revealed that the Saccostrea lineages possess a larger repertoire of repetitive elements, particularly within the LINE, SINE and Penelope classes. The transcriptomes presented here will provide a valuable resource not only for improvement of oyster production, but also for investigation into the evolution of life-history traits in oysters.

Animal sources, husbandry, and sample collection
All samples were taken during commercial hatchery runs conducted at Aquafarms Queensland Pty Ltd, Hervey Bay, Australia. Adult black-lip and S. glomerata were supplied from commercial oyster farms in Bowen, Queensland, and Port Stephens, New South Wales, respectively. Oyster spawning and larval culture followed the methods outlined in [16]. Briefly, oysters were induced to spawn via temperature and salinity treatments. Individuals that had begun spawning were removed and kept separate until spawning was complete, and eggs and sperm from all individuals pooled separately prior to fertilisation. Larvae were stocked in 5,000 L tanks containing aerated filtered sea water (FSW) at an initial concentration of 19 (black-lip) or 26 (S. glomerata) larvae mL -1 , and were maintained at 27˚C ± 2˚C (black-lip) or 25˚C ± 2˚C (S. glomerata). Larvae were drained on to mesh screens and transferred to a clean 5,000L tank daily, and were periodically screened to remove the slowest growers. Feeding was initiated at approximately 16 hours post fertilisation (hpf), and consisted of Pavlova lutheri, Tisochrysis lutea, Chaetoceros calcitrans and Chaetoceros muelleri, at varying concentrations depending on larval age. Settlement induction was performed using epinephrine bitartrate once larvae had reached the pediveliger stage and were able to be retained on a 200 μm mesh screen (day 24 for the black-lip, day 21 for S. glomerata).
Embryos and larvae (~50 per stage) were sampled at least once daily, with more frequent sampling within the first 24 hours and at settlement. Small (<0.25 mg) samples were also taken from various tissues from two S. glomerata and three black-lip adults. All samples were taken in to microcentrifuge tubes containing 1ml RNAlater (Sigma), stored at 4˚C for 24 hours, and then -20˚C until RNA extraction. Details of all samples are provided in S1 Table. DNA extraction and COI sequencing of the black-lip A small piece of adductor muscle (~0.25 mg) was dissected from three adult black-lip individuals for DNA extraction. Extractions were performed using a standard NETS/phenol chloroform protocol. PCR amplification of a partial COI sequence was performed using the primers LCO1490 and HCO2198 [30], 10ng of DNA template, and Taq polymerase (NEB) with the following cycling conditions: an initial denaturation at 94˚C for two minutes, 30 cycles of 94˚C for 30 seconds, 48˚C for 30 seconds, and 68˚C for 45 seconds, followed by a final extension of 68˚C for 10 minutes. Resulting products were gel purified and submitted to the Australian Genome Research Facility, Brisbane, for Sanger sequencing in both directions using the PCR primers.

Phylogenetic analysis
Resultant COI sequences were trimmed to remove the primer and aligned with other partial ostreid COI sequences (downloaded from NCBI) using the program Aliview [31]. The alignment was manually trimmed to the minimal length of the majority of the records (538 bp) and shorter sequences were removed (S1 Alignment). Phylogenetic analysis was performed using RAxML 8.

Transcriptome sequencing, assembly, and quality assessment
For each species, RNA extractions from embros/larvae and adult tissues were performed separately and pooled prior to preparation of sequencing libraries. A minimum of 20 individual embryos or larvae from each sampled stage were pooled in to three separate extractions to maximise coverage of the entire larval timecourse. Samples were homogenised using glass pestles in 1 mL of TRI Reagent (Sigma) and extractions performed as per the manufacturer's instructions, using 1-bromo-3-chloropropane for phase separation. Precipitation of RNA was performed using 0.25 mL of isopropanol and 0.25 mL of high salt precipitation solution (0.8 M sodium citrate and 1.2 M sodium chloride). RNA was shipped to Macrogen (Seoul, Korea) and assessed for quality on an Agilent 4200 Tapestation High Sensitivity ScreenTape before library preparation using a TruSeq Stranded mRNA LT Sample Prep Kit. Libraries were sequenced on a HiSeq 2500 to generate~80,000 100 bp paired-end reads. Quality of the resulting data was assessed using FastQC 0.11.3 [34].
Transcriptome assembly was performed using Trinity 2.4.0 [35], with quality trimming via Trimmomatic and without normalisation of reads. Transcriptome quality assessment was performed by 1) determining the level of representation of reads within the assembly by mapping using Bowtie2 2.0.2 [36], and 2) determining the proportion of full-length transcripts via BLAST+ 2.3.0 [37] alignment of sequences in the SwissProt non-redundant database [38] against the assemblies, both as outlined in the Trinity documentation. Analysis of assembly completeness was performed using BUSCO v3 [39] and the metazoa_odb9 dataset (created 13/ 02/2016), analysing open reading frames identified within transcripts by TransDecoder 5.3.0 [35]. Results for the transcriptomes generated here were compared against whole genome data from the oysters M. gigas [40] (PRJNA276446) and C. virginica [41](PRJNA379157).

Orthology and enrichment analysis
To reduce sequence redundancy present within the transcriptomes, TransDecoder was first used to identify the longest potential open reading frame per transcript using the -single_bes-t_orf command. The dataset was further filtered to retain only the single longest isoform per Trinity 'gene'. These predicted protein datasets, generated for each species, were then analysed along with predicted proteins from the M. gigas and C. virginica genomes using Orthofinder 2.2.3 [47] to identify groups of potentially orthologous proteins and their corresponding genes. From this analysis, genes within the Saccostrea-specific orthogroups were analysed for overrepresentation of 'biological process' GO categories using a hypergeometric test within the BiNGO plugin [48] of Cytoscape [49]. The analysis was performed in both directions, i.e., using the S. glomerata transcripts from the Saccostrea-specific orthogroups against the complete annotation of the S. glomerata predicted protein dataset as a reference, and again using the S. lin. J transcripts against the complete annotation of the S. lin. J dataset. Enrichments with an adjusted p-value of less than 0.01 (Benjamini and Hochberg FDR correction) in both species were deemed to be significant.

Analysis of repetitive elements
Analysis of repetitive element content was performed using RepeatMasker 4.0.7 [50] by aligning sequences against RepBase (RepBase-20170127) [51]. The analysis was performed for S. glomerata and the black-lip using the reduced 'single longest isoform per gene' dataset outlined above, and all predicted genes from the M. gigas and C. virginica genomes. The parameters for the analysis were: -q -species 'fungi/metazoa group'.

Description and identification of the 'black-lip' oyster
The black-lip oysters used in this study were farmed adults sourced from wild-caught spat in Bowen, Queensland, Australia. The oysters were large (70-84 mm in diameter), and characterised by a dark outer surface of the right (upper) valve, a distinct black margin around the inner surface of the right valve, thick shell, prominent chomata, and a dark mantle margin ("blacklip") (Fig 1). A 649 bp fragment of the COI gene was sequenced from three adult black-lip oysters. All three sequences were identical (NCBI accession MH822839). Phylogenetic analysis recovered major clades largely congruent with that of previous studies [11,12,52] (Fig 2, S1 Fig), placing the black-lip oyster within 'Lineage J' as nominated by Sekino and Yamashita [12]. The blacklip is therefore designated as 'Saccostrea lin. J' henceforth. Other oysters within lineage J were collected from Japan (Okinawa), Malaysia (Sabah) and Taiwan, suggesting a broad tropical Indo-Pacific distribution for this lineage. In both this analysis and that of Sekino and Yamashita [12] lineage J falls as a sister clade to all other sequenced Saccostrea lineages, albeit with low support.

Transcriptome sequencing and assembly
To obtain representative transcriptomes for both S. glomerata and S. lin. J, RNA from larval stages and adult tissues were pooled prior to library generation, high-throughput sequencing, and transcriptome assembly. Raw sequence files and assembled transcripts have been deposited in NCBI under BioProject PRJNA487836. Although similar numbers of raw reads (~80 million) were obtained for both libraries, significantly more assembled transcripts were obtained for S. glomerata than S. lin. J (Table 1). Despite this, the median contig length and contig N50 were higher for the S. lin. J assembly, indicating that the higher transcript count may reflect a more fragmented assembly for S. glomerata. Overall, these quality statistics are similar to those of other recent bivalve transcriptomes sequenced using similar techniques [26,53].

Transcriptome quality assessment and annotation
Alignment of raw reads back to the Trinity assemblies revealed that 91.76 and 85.05 percent of paired reads aligned concordantly to the de novo transcriptomes for S. lin. J and S. glomerata, respectively, indicating a high read representation within each assembly. The degree to which assembled transcripts were likely to be full length was assessed by aligning each sequence in the Swissprot non-redundant database against the transcriptome via BLAST. For the 12358 Swissprot sequences with BLAST hits in the S. lin. J assembly, 4150 (33.6%) had at least one transcript that aligned along the entire sequence, whereas 3465 (23.9%) of the Swissprot sequences appeared to be represented by a full-length transcript in the S. glomerata assembly.
Transcriptome completeness was assessed using the Benchmarking Universal Single-Copy Ortholog (BUSCO) assessment tool [39]. De novo transcriptomes were compared against whole genome protein data from M. gigas and C. virginica. The S. glomerata transcriptome lacks 8 BUSCOs, equal to the number missing from the M. gigas genome and less than that missing from C. virginica (Table 2). Even fewer BUSCOs (4) were lacking from the S. lin. J transcriptome. One BUSCO, EOG091G0GA7, was missing from all four oyster datasets, indicating that this gene (encoding a putative N-6 adenine-specific DNA methyltransferase) may have been lost from the ostreid lineage. Overall, these results indicate that the transcriptomes reported here provide good coverage of the total gene content of these species.

Analysis of orthologous sequences
The high degree of transcript redundancy within the transcriptome datasets indicated via BUSCO analysis necessitated filtering of sequence data with the aim of retaining only one transcript per gene for downstream orthology and enrichment analysis. To achieve this, the longest open reading frame was selected per Trinity 'isoform', followed by selection of the longest isoform per Trinity 'gene'. It should be noted that filtering by this method does remove some  [11] and Sekino and Yamashita [12]. The Bowen black-lip COI sequence (circled in inset) falls within 'Lineage J' with strong support. valid sequences, as reflected by a larger number of missing BUSCOs for these datasets (7 and 33 missing BUSCOs for S. lin. J and S. glomerata, respectively).
Orthology analysis performed on this filtered dataset revealed that 15,117 orthogroups (inferred set of genes descended from a single ancestral gene) were shared between all four species (Fig 3). An additional 5,719 orthogroups were shared exclusively by the two Saccostrea species, whereas only 461 were shared exclusively by M. gigas and C. virginica. This large differential was unexpected and possibly reflected the differences in sequencing methodology (i. e., transcriptome vs whole genome) for these species, rather than a true biological difference. To investigate this further, GO-term enrichment was performed to gain insight into the putative functions of orthogroups shared exclusively between the two Saccostrea species.

Enrichment analysis
Potential functional enrichment within the Saccostrea-specific orthogroups identified above was investigated by assessing the representation of GO categories (based upon Swissprot BLAST results) within this subset of transcripts against the whole transcriptome annotation. 43 GO categories were enriched within the Saccostrea-specific orthogroups using an adjusted p-value of less than 0.01 (Table 3). Of these, 11 represented terminal (non-parent) GO terms.
A number of these enriched terms are associated with repetitive elements (e.g., DNA integration, DNA-mediated transposition, RNA-mediated transposition), indicating the potential expansion of these elements in Saccostrea species in relation to M. gigas and C. virginica genomes. To investigate this further, RepeatMasker was used to identify and classify transposable elements in each of the four datasets. This analysis detected higher proportions of retroelements in Saccostrea lineages, particularly within SINE, Penelope, and LINE classes (Fig 4). S. glomerata had the highest overall proportion of repetitive elements (3.20%), and appears to have undergone an additional expansion of the Gypsy/DIRS1 class of LTR elements. The total proportion of repetitive element content within Saccostrea genomes is likely to be even higher than that reported here, as transcriptomics can only reveal elements that are transcriptionally active. As repetitive element content is positively correlated with genome size [54], these results may also indicate larger genomes for Saccostrea species when compared to that of C. virginica (estimated at 675 Mb, [55]) and M. gigas (estimated between 545 and 637 Mb, [40]). Alternatively, the higher proportion of repetitive elements found in Saccostrea genomes may reflect an underestimation of repetitive element content in C. virginica and M. gigas genomes, as masking of repetitive regions prior to gene prediction in whole-genome annotation (e.g., page 18 Supplementary Information in reference [40]) may preclude accurate identification.
Transposable elements are major sources of genetic variation within genomes (reviewed in [56]). The composition of transposable elements can vary even between closely related species, likely the result of differential timing and rates of proliferation events, accumulation of mutations, horizontal transfer, and differential rates of transposable element removal. In some cases, differential transposable element proliferation has been implicated as a major driver of the speciation process itself (reviewed in [57]). Penelope elements, in particular, are associated with chromosomal rearrangements and mutations in hybrids of different genetic populations (hybrid dysgenesis), and in the induction of other transposable elements [58]. Penelope expansions have been reported in other bivalves, for example, in Mytilus galloprovincialis [59] and Ostrea edulis [60], and may therefore have major implications for the evolution of bivalve genomes.

Conclusion
This study presents high-quality transcriptomes derived from embryos, larvae, and adult tissues of two Saccostrea species, S. glomerata and S. lineage J. Analysis of gene orthology patterns between these transcriptomes and whole genome data from the oysters Magallana gigas and Crassostrea virginica demonstrated an expansion of repetitive elements within the Saccostrea lineage, possibly revealing an important mechanism for the generation of genetic diversity Supporting information S1 Alignment. Trimmed sequence alignment used for phylogenetic analysis, in FASTA format. (FA)