Comparative genomics of bdelloid rotifers: evaluating the effects of asexuality and desiccation tolerance on genome evolution

Bdelloid rotifers are microscopic invertebrates that have existed for millions of years apparently without sex or meiosis. They inhabit a variety of temporary and permanent freshwater habitats globally, and many species are remarkably tolerant of desiccation. Bdelloids offer an opportunity to better understand the evolution of sex and recombination, but previous work has emphasized desiccation as the cause of several unusual genomic features in this group. Here, we evaluate the relative effects of asexuality and desiccation tolerance on genome evolution by comparing whole genome sequences for three bdelloid species: Adineta ricciae (desiccation tolerant), Rotaria macrura and Rotaria magnacalcarata (both desiccation intolerant) to the only published bdelloid genome to date, that of Adineta vaga (also desiccation tolerant). We find that tetraploidy is conserved among all four bdelloid species, but homologous divergence in obligately aquatic Rotaria genomes is low, well within the range observed between alleles in obligately sexual, diploid animals. In addition, we find that homologous regions in A. ricciae are largely collinear and do not form palindromic repeats as observed in the published A. vaga assembly. These findings are contrary to current understanding of the role of desiccation in shaping the bdelloid genome, and indicate that various features interpreted as genomic evidence for long-term ameiotic evolution are not general to all bdelloid species, even within the same genus. Finally, we substantiate previous findings of high levels of horizontally transferred non-metazoan genes encoded in both desiccating and non-desiccating bdelloid species, and show that this is a unique feature of bdelloids among related animal phyla. Comparisons within bdelloids and to other desiccation-tolerant animals, however, again call into question the purported role of desiccation in horizontal transfer.


45
The bdelloid rotifers are a class of microscopic invertebrates found in freshwater habitats 46 worldwide. Two life-history characteristics make these soft-bodied filter-feeders extraordinary 47 among animals. First, bdelloids famously lack males [1] or cytological evidence of meiosis [2,3], and 48 are only known to reproduce via mitotic parthenogenesis. They are therefore one of the best-49 substantiated examples of a eukaryotic taxon that seems to have evolved without sex or meiosis for 50 tens of millions of years [1,2,4,5]. Famously labelled as "an evolutionary scandal" [6], bdelloids have 51 diversified into over 500 species [7,8] in apparent defiance of the usual fate of asexual lineages [9-52 11]. Their persistence has implications for theories of the evolution of sex and recombination, a 53 fundamental puzzle in biology [12][13][14][15]. A second key feature is that most bdelloid species are 54 remarkably resilient to desiccation, and can survive the loss of almost all cellular water at any stage 55 in their life cycle, including as adults [16,17]. On desiccation, animals contract their body into a flat, 56 ellipsoid "tun" shape and enter a dormant state called anhydrobiosis, during which all metabolic 57 activities associated with life are suspended [5,16,18]. Individuals can remain in this condition for 58 long periods, usually days or weeks but occasionally several years [19,20]. The return of water 59 restores metabolism and reproduction, with little evidence of negative fitness consequences for 60 survivors [21]. Species that live in limnoterrestrial habitats such as puddles, leaf litter and moss are 61 subject to cycles of drying, and the ability to survive desiccation has been proposed to play a key 62 role in bdelloid evolution [5,22]. 63 Initial marker-based analyses of bdelloid genomes recovered highly divergent gene copies that were 64 interpreted as non-recombining descendants of ancient former alleles [4]. Along with the low copy-65 number of vertically inherited transposable elements (TEs) [23], this result was considered positive 66 genomic evidence of long-term asexual evolution. However, subsequent investigations of larger 67 genomic regions revealed evidence of tetraploidy, probably arising from an ancient hybridization or 68 genome duplication event affecting diploid ancestors prior to the diversification of the bdelloid 69 families [5,24,25]. Thus, genes generally comprise up to four copies, arranged as two pairs with 70 greater divergence between pairs ("ohnologs", also known as homeologs in other polyploid systems) 71 than within ("homologs") [5,24,25]. Another extraordinary feature of bdelloid genomes was 72 discovered concurrently: a remarkably high proportion of bdelloid genes show a high degree of 73 similarity to non-metazoan orthologs, mostly from bacteria, but also fungi and plants, suggesting a 74 rate of horizontal gene transfer (HGT) into the bdelloid genome at least an order of magnitude 75 greater than that observed in other eukaryotes [26]. Later work confirmed that many genes 76 originating by horizontal gene transfer from non-metazoans are expressed and functional [27]. 77 [26,28,30,31]. These ideas remain controversial, however, and recent claims of evidence for DNA 112 transfer between individuals of A. vaga [36] have subsequently been identified as artefacts of 113 experimental cross-contamination [37]. A separate recent study reported a striking pattern of allele 114 sharing among three individuals of another bdelloid species, Macrotrachela quadricornifera, which was 115 interpreted as evidence of sexual reproduction via an unusual form of meiosis (similar to that of 116 plants in the genus Oenothera) [38]. However, this result was not replicated in a larger study of the 117 genus Adineta [39]. In the absence of clear evidence for either occasional sex or between-individual 118 recombination in bdelloids (but without discounting either possibility), the exact nature of bdelloid 119 recombination remains an open question. 120 Showing both extensive anhydrobiotic capabilities and putatively ancient asexuality, bdelloid rotifers 121 sit at a unique junction in animal evolution. To better understand the relative contributions of these 122 features to bdelloid genome evolution, we have taken advantage of natural variation in the capacity 123 of species to survive desiccation, by sampling and comparing whole genomes from multiple taxa. In 124 particular, many species in the genus Rotaria live in permanent water bodies and do not survive 125 desiccation in the laboratory [17,40]. Here, we present high-coverage, high-quality draft genomes for 126 three species from two genera: the desiccation-tolerant species Adineta ricciae and the obligately 127 aquatic, non-desiccating species Rotaria macrura and Rotaria magnacalcarata. These are compared 128 with the published draft genome of A. vaga. Using a range of assembly approaches, we first confirm 129 the conservation of degenerate tetraploidy in all species, and then test predictions regarding the 130 effects of desiccation tolerance on intragenomic homologous divergence. We then investigate 131 genome architecture within species, to ask whether the unusual genomic structures observed in A. 132 vaga are a general feature across bdelloids. Finally, we contrast a range of genome characteristics, 133 including homologous divergence, HGT content and repeat abundance across a wider range of 134 animal taxa, allowing us to place some of the unique features of bdelloid genomes in a wider 135 metazoan context. 136

137
Reference genome assembly and annotation 138 Reference genome sequences for Adineta ricciae, Rotaria macrura and Rotaria magnacalcarata were 139 assembled using a combination of long-and short-read sequencing technologies (S1 Table). Kmer 140 spectra of raw sequence reads indicated high (>100X) but variable coverage across sites in each 141 genome (S1 Fig). In addition, a large proportion of low-coverage kmers indicated substantial 142 polymorphism in the R. magnacalcarata raw data, most likely corresponding to population variation in 143 the multi-individual DNA sample collected for this species (see Methods). Contaminating reads from 144 non-target organisms were excluded by scrutinizing initial draft assemblies, and removing 3.1% of 145 reads from the R. macrura dataset, 2.1% from R. magnacalcarata, and 6.3% from A. ricciae ( Fig, S4 Fig). Second, "maximum haplotype" assemblies were generated with a focus on 152 maximum assembly of heterozygous regions. This was intended to minimise the confounding effects 153 of assembly collapse, a phenomenon whereby homologous regions with no or low divergence are 154 assembled as a single contig with twofold coverage relative to separately assembled regions. Further 155 assembly metrics are provided in S1 Data, and all assemblies have been submitted to 156 DDBJ/ENA/GenBank under the Project accession ID PRJEB23547 1 . 157 Genome metrics for A. ricciae, R. macrura and R. magnacalcarata reference assemblies are shown in 158 A. ricciae, R. macrura and R. magnacalcarata, respectively ( Table 1). Genes with BLAST matches to 207 transposable elements (E-value ≤ 1e-5) were removed, and so were short genes with no matches to 208 the UniProt90 protein database (i.e., likely spurious gene models), which resulted in "high-quality" 209 sets of 49,857, 24,594 and 29,359 genes for downstream analyses ( Table 1). Re-annotation of the 210 A. vaga 2013 assembly resulted in 78,800 predicted genes using BRAKER or 67,364 genes using 211 MAKER + Augustus, reducing to 57,431 after quality control. Thus, the reference genomes of R. 212 macrura and R. magnacalcarata appear to encode approximately half the number of genes observed in 213 A. ricciae and A. vaga. Correspondingly, the mean intergenic distance was higher in R. macrura (mean 214 3.9 kb) and R. magnacalcarata  We checked for mis-annotations by comparing the protein sequences to the assembly contigs from 216 which they had been predicted, using TBLASTN (E-value ≤ 1e-20). This did not reveal any highly 217 similar matches (discounting hits that overlapped with any existing predicted gene model; i.e., hits to 218 self), indicating that "missing" Rotaria genes were not the result of poor gene prediction or other 219 mis-annotation. However, the assemblies of all four species showed a large number of matches at 220 lower similarity (30-35% median identity at the amino-acid level) (Fig 1C). These protein hits to 221 non-gene annotated regions may be derived from putative pseudogenes, resulting either from 222 degradation of coding regions following the suggested ancestral genome doubling, or more recent 223 gene duplications that have subsequently decayed and no longer encode functional proteins. 224 The structure of predicted genes also varied among species. The average intron length was 104 and 225 108 bp for A. ricciae and A. vaga respectively, but up to three times longer in R. macrura and R. 226 magnacalcarata (362 and 208 bp respectively) (Fig 1D, Table 1; S6 Fig). Distributions of intron 227 lengths showed two distinct classes, with the majority of introns falling in the range 30-100 bp but a 228 substantial minority showing a higher variance around a much larger mean (inset of Fig 1D). The 229 mean number of introns per gene was ~4 for genes in A. vaga and R. magnacalcarata, and ~5 for A. 230 ricciae and R. macrura ( Table 1). The proportion of single-exon genes (SEG) was 21-25% for A. 231 ricciae, A. vaga and R. magnacalcarata, but was substantially lower for R. macrura with only 4% SEG 232 ( Table 1). 233 The repeat content of bdelloid assemblies was measured using two approaches: (1) comparisons to 234 known metazoan repeats, sampled from Repbase, (2) comparisons to Repbase plus an additional 235 library modelled ab initio from each assembly, using RepeatModeler (see Methods). For (1), the 236 relative abundances of TEs were low for all species, with the total proportion of interspersed 237 repeats accounting for 1.2% of the assembly span for both A. vaga and R. magnacalcarata, 0.9% for R. 238 macrura and 0.8% for A. ricciae (S2 Data). Including simple and low-complexity repeats resulted 239 only in modest increases, to 2.0%, 3.4%, 2.2% and 3.0% for A. ricciae, A. vaga, R. macrura and R. 240 magnacalcarata, respectively. For (2) however, the inclusion of ab initio repeats (predicted directly 241 from the assembled nucleotides, see Methods) resulted in considerably increased repeat content for 242 all species, but to a greater extent in Rotaria (16.8% for A. ricciae, 18.4% for A. vaga, 22.0% for R. 243 macrura and 27.6% for R. magnacalcarata). A large proportion of ab initio repeats were marked as 244 "unclassified" (7.8%, 10.0%, 17.2% and 19.8% of total ab initio repeats, respectively), and their nature 245 is yet to be determined (Fig 1B; S2 Data). The composition of bdelloid genomes with respect to 246 genome size evolution is considered further below. 247 Adineta species relative to Rotaria species, suggesting substantial differences in either ploidy or 251 divergence patterns between bdelloid genera. To investigate the evolutionary relationships among 252 genes within each species, we estimated nucleotide divergence and collinearity among gene copies 253 using MCScanX [49]. This analysis identifies collinear blocks of genes, defined as pairs of genomic 254 regions that show conserved gene order (see Methods). We plotted the average synonymous 255 divergence (KS) between genes within each collinear block against a "collinearity index", defined as 256 number of collinear genes divided by the total number of genes within a given block [28]. Both the A. 257 ricciae reference and the (reannotated) A. vaga 2013 assemblies showed a clear delineation of genes 258 into homologs (low KS and high collinearity) and ohnologs (high KS and low collinearity), as has been 259 observed previously [28] (Fig 2A). Strikingly however, only ohnologous relationships are inferred in 260

Marked differences in intragenomic divergence among bdelloid
Rotaria genomes: collinear blocks comprised of homologous genes are not observed (Fig 2A). 261 Correspondingly, the extent of collinearity was higher in Adineta genomes: over 75% of genes in A. 262 ricciae (and 68% of genes in A. vaga) were categorised as members of a collinear block, compared to 263 only 12% and 10% of genes in R. macrura and R. magnacalcarata (S2 Table). In addition, collinear 264 blocks within Adineta species were both more numerous (1,378 and 1,828 for A. ricciae and A. vaga, 265 respectively, versus 175 and 187 for R. macrura and R. magnacalcarata) and contained more genes 266 (average number of genes per block was 15 and 13 for A. ricciae and A. vaga, versus 8 and 7 for R. 267 macrura and R. magnacalcarata). 268 Assuming that the ancestor of extant Rotaria lineages was also tetraploid [25], the apparent "loss" of 289 homologous copies in Rotaria species may be caused by either (1) the genuine loss of homologous 290 gene copies from Rotaria genomes, resulting in a shift from tetraploidy to highly diverged diploidy, or 291 (2) extremely low levels of divergence between Rotaria homologs, such that the majority of 292 homologous sites are identical and cannot be separately assembled (i.e., are collapsed). To 293 differentiate between these hypotheses, we characterised patterns of nucleotide polymorphism and 294 read coverage across each genome, as has been used to investigate other polyploid genomes [50-295 53]. Widespread collapse of Rotaria assemblies is predicted to manifest as single nucleotide 296 polymorphisms (SNPs) with a frequency around 50% and a total read-depth (reference + alternative 297 base) that is approximately equal to the genome-wide average, analogous to collapsed heterozygous 298 sites in a segregating diploid genome (e.g., see Fig [28] (Fig 3B). For both R. macrura and R. magnacalcarata, however, read-depth at SNP sites 313 (reference + alternative alleles) varied in concert with the genome-wide coverage (Fig 3B). These 314 patterns indicate that the majority of SNPs occur in collapsed regions, supporting the hypothesis of 315 widespread assembly collapse in Rotaria species. 316 A different pattern is observed in A. ricciae, however. Here, a small proportion of sites (11%) are 328 distributed around a peak at 75x coverage, presumably representing separately assembled regions, 329 but the majority of sites (81%) are covered at 150x, presumably representing collapsed regions with 330 twice the expected coverage (Fig 3B). Furthermore, SNP depth is unimodal and is centred around  Table). No such elevation was observed in the rate of nonsynonymous substitution in 347 A. ricciae, compared to the other species. However, the A. ricciae genome also shows the highest 348 %GC of the four species (approximately 5% higher than A. vaga, and 3-4% higher than either Rotaria 349 species). Thus, one explanation for the increase in KS may be selection for increased G+C content in 350 A. ricciae, with continued purifying selection at nonsynonymous sites [58]. 351 Homologous divergence in non-desiccating Rotaria is lower 352 than allelic divergence in most sexual species 353 To investigate the possibility that non-desiccating Rotaria species are characterised by wholesale 354 assembly collapse, we constructed a number of uncollapsed, "maximum haplotype" assemblies (see 355 Methods) (S1 Data). We also attempted to reassemble the genome of A. vaga for comparison, 356 generating both collapsed and uncollapsed assembly versions. With the exception of A. vaga, 357 maximum haplotype assemblies did not show the expected twofold increase in either assembly span 358 or gene count, as would be expected if this alternative assembly approach were able to resolve 359 homologous regions (S2 Text, S1 Data). Our results suggest that the choice of assembly 360 parameters shows different overall effects in a species-specific manner, presumably depending on the 361 extent and profile of divergence between homologous reads in each dataset (S2 Text, S1 Data). 362 Overall, these results indicate that the majority of collapsed homologs in Rotaria species, inferred 363 from patterns of SNP and read coverage above, cannot be separately assembled using alternative 364 parameters. Notwithstanding confounding issues from other sources of polymorphism (e.g., the 365 multi-individual sample for R. magnacalcarata), the fact that maximum haplotype assemblies for 366 Rotaria species failed to recover homologous gene copies points to substantially lowered 367 homologous divergence relative to Adineta. 368 In the A. ricciae reference and A. vaga 2013 assemblies, the majority of homologs were separately 369 assembled, and we were able to identify homologous gene copies and estimate their sequence 370 divergence using a BLAST-based approach [50,53]. The median divergence between most-similar 371 gene copies was 4.55% (mode = 3.75%) in A. ricciae and 1.42% (mode = 1.25%) between A. vaga gene 372 copies (in close agreement with [28]) (Fig 4A). These estimates may be inflated however, as they 373 fail to consider homologous regions with very low divergence that are collapsed. Alternative 374 estimates of homologous divergence based on SNPs detected in the collapsed A. vaga assembly were 375 correspondingly lower (0.955% and 0.788%, based on alignment of libraries ERR321927 and 376 SRR801084 respectively) (S3 Data). 377 of homologous collapse, an upper limit for the divergence between Rotaria homologs is thus 393 estimated at 0.033% and 0.075% for R. macrura and R. magnacalcarata respectively (Fig 4B). 394 These results indicate that homologous divergence in non-desiccating Rotaria species is at least an 395 order of magnitude lower than that observed in anhydrobiotic Adineta species. This contradicts 396 hypotheses that emphasise the role of desiccation in shaping bdelloid genomes. If the rate of DSB-397 repair is positively correlated with the rate of gene conversion, a lower level of homologous 398 divergence is expected in species with higher rates of desiccation. In fact, we observe the opposite: 399 divergence between homologs in non-desiccating Rotaria species is considerably lower than in A. were detected (an example is shown for scaffold AVAG00001 in Fig 5A). In addition, 25 444 homologous blocks were encoded on the same genomic scaffold, two as tandem arrays and 23 as 445 palindromes (Fig 5B). Thus, our detection methods recover the same signals of ameiotic evolution 446 reported by Flot et al. (2013) [28], for the same A. vaga assembly. The assembly method employed 447 by Flot et al. (2013) was to construct contigs from Roche 454 Titanium and GS-FLX data using the 448 MIRA assembler [68], followed by correction and scaffolding using high coverage Illumina paired-end 449 and mate-pair data (section C1 of [28] supplement). We attempted to independently reassemble the 450 A. vaga paired-end Illumina data (incorporating both mate-pair and 454 data for scaffolding) using a 451 variety of established short-read assembly programs. However, this always resulted in highly 452 fragmented assemblies (1-2 kb N50), except when allowing for the collapse of homologous regions 453 (as discussed above) (S1 Data). The lack of contiguity in A. vaga maximum haplotype assemblies 454 precluded us from using alternative assembly approaches to investigate the features detected in the 455 2013 assembly. A lack of separately assembled homologous gene copies in either Rotaria species 456 similarly precluded structural analysis. 457 introduced during the scaffolding process, despite requisite care taken during assembly scaffolding 479 (Fig 5A,C). For example, an unscaffolded A. ricciae assembly showed only a single break in 480 collinearity, although the increased fragmentation of this assembly (N50 = 18.7 kb) may limit our 481 ability to detect such breaks in this case. In addition, we did not detect any cases of homologous 482 genes arranged as palindromes in A. ricciae. Only two cases of linked homologous blocks of genes 483 were detected for A. ricciae, and both were arranged as tandem repeats rather than palindromes, on 484 relatively short scaffolds involving a small number of genes (Fig 5D). Mb, encoding ~68,000 genes. The largely collapsed reference assemblies of R. macrura and R. 541 magnacalcarata showed approximately 25,000 and 35,000 genes respectively, and thus are in broad 542 agreement with observations from Adineta. This also implies that the total genome size for Rotaria is 543 in the region of 400-500 Mb. Further evidence of genome size variation among bdelloids is supplied 544 by comparisons to the collapsed A. vaga assembly, which is of equivalent ploidy but reduced by 60 545 Mb compared to R. magnacalcarata and 113 Mb compared to R. macrura. 546 What explains the observed differences in genome size? Based on comparisons to known metazoan 547 TEs from Repbase, the abundance of TEs and low-complexity repeats was low in all genomes (2.0%, 548 3.5%, 2.3%, and 3.1% for A. ricciae, A. vaga 2013, R. macrura and R. magnacalcarata, respectively; Fig  549   1B, S2 Data), suggesting that expansions of known TEs or simple repeats in the Rotaria lineage is 550 unlikely to be a major driver. However, the inclusion of ab initio repeats, modelled from the 551 assembly sequences directly using RepeatModeler (see Methods), resulted in a marked increase in 552 total repetitive sequences for all species (16.8%, 18.4%, 22.0%, and 27.6%, respectively; Fig 1B, S2  553 Data). The majority of ab initio repeats were annotated as "unclassified" and thus are not similar to 554 known repetitive elements (in Repbase) (S2 Data). The relative increase is greatest in the two reference assemblies (~17% and 20% respectively) are covered by repeats whose exact nature 557 remains to be elucidated. In addition, average intron sizes in Rotaria genes are longer (by at least 558 100%), driven primarily by an increase in the number of long introns (i.e., ≥ 100 bp). High HGT content is a unique feature of bdelloid genomes 565 To gain a better understanding of how bdelloid genomes compare in a wider metazoan context, we 566 characterised an additional 13 species from across the Protostomia (S4 Table) based on genome 567 size, gene density, patterns of orthologous gene clustering, HGT content and repetitive sequence 568 content (Fig 6, S10 Fig, S5   We assessed the extent of horizontal transfer into protostome genomes using both sequence 589 comparison and phylogenetic approaches. The extent to which HGT contributes to the genomes of 590 multicellular eukaryotes is controversial. For example, a recent claim of 17% non-metazoan genes 591 encoded in the genome of the tardigrade H. dujardini was later shown to be derived mostly from 592 contaminating non-target organisms [89][90][91][92]. Nonetheless, a high proportion of genes from a variety 593 of non-metazoan sources has been consistently reported in bdelloid genomes from a range of 594 independent data, including fosmid sequences [26], transcriptomes [27,40], and whole genome data 595 [28]. To measure the level of horizontal transfer into bdelloid genomes, we developed a HGT 596 assessment pipeline that uses both sequence comparison and phylogenetic approaches to build a 597 body of evidence for the foreignness of each predicted gene. Our goal was not to unequivocally 598 assert the evolutionary history of individual genes, but rather to apply these tests consistently across 599 the set of protostome genomes as a fair comparison for estimating HGT. 600 Our initial screen identified 6,221 (12.5%), 8,312 (14.5%), 3,104 (12.6%) and 3,443 (11.7%) genes 601 from A. ricciae, A. vaga, R. macrura and R. magnacalcarata respectively as HGT candidates (denoted 602 "HGTC") (Fig 6D, S5 Data). These values are substantially higher than the proportion of HGTC 603 observed in any other protostome species included in this analysis, using the same pipeline and 604 thresholds (the highest proportion of HGTC for a non-bdelloid was 3.6%, for the annelid worm 605 Capitella teleta). This is also noticeably higher than estimates based solely on the Alien Index (S5 606 proportion of HGTC to 9.1%, 7.6%, 6.2% and 6.5% for the four bdelloid species, compared with <1% 612 for all other species (S5 Data). The final test was not applicable for the majority of HGTC as 613 orthologs from metazoans were often not detected; thus, the number of genes that additionally 614 showed evidence for monophyly with non-metazoan orthologs was 189, 190, 111 and 82 for A. between anhydrobiosis and elevated rates of HGT that has been previously suggested for bdelloids 628 [26,28,30,31,93]. One explanation may be that differences in the mechanism of anhydrobiosis, in 629 combination with particular ecological properties of each species, may explain the observed 630 differences in HGT content. Further comparative work is thus required to elucidate any relationship 631 between anhydrobiosis and horizontal transfer. 632 Alternatively, it is possible that HGT content in bdelloids is impacted as much by a longstanding lack 633 of meiotic sex as by anhydrobiosis. Based on transcriptome data from Rotaria species, Eyres et al. 634 (2015) have estimated the rate of foreign import to be low in absolute terms, on the order of 13 635 gains per lineage per million years [40]. This rate may reflect the background rate of initial 636 integration of foreign genes for an organism with similar ecological and physiological properties to 637 bdelloid rotifers. If so, the high proportion of non-metazoan genes encoded in bdelloid genomes may 638 reflect a deviation in the rate of retention, rather than the rate of import. Foreign genes that land in 639 an ameiotic background may persist for longer timescales, even if initially deleterious, given the lack 640 of sexual mechanisms such as segregation that would otherwise rapidly remove them. Thus, foreign 641 genes incorporated by asexuals may tend to accumulate over the extended timescales necessary for 642 domestication. 643 We also quantified the abundance of transposable elements (TEs) and low-complexity repeats in 644 each genome using a similar approach to that used for the bdelloid genomes (see previous sections 645 and Methods). We chose to focus on the quantification of known repeats and thus did not perform 646 ab initio repeat modelling (e.g., RepeatModeler) for non-bdelloid species. There was considerable 647 variation in TE abundance among species (Fig 6E), with the total proportion of genome covered by 648 interspersed repeats varying from 0.3% in the tardigrade R. varieornatus to 27.5% in the oyster 649 Crassostrea gigas (S2 Data). The relative abundance of different classes of repeats, including SINES, 650 LINES, LTR elements and DNA elements also differed greatly among species, as did the amount of 651 simple and low-complexity repeats. The proportion of total repeats (TEs + low-complexity) ranged 652 from 0.6% in R. varieornatus to 42% in the annelid worm Helobdella robusta. All four bdelloid species proportion of assembly span), followed by the four bdelloids (2-3%). These estimates of TE 658 abundances in I. linei and R. varieornatus are substantially lower than the total repeat content of these 659 genomes (28% and 20% respectively), which includes a high proportion of ab initio repeats (inferred 660 directly from the assembled nucleotides) marked as "unclassified" (accounting for ~18% and ~19% 661 total repeats, respectively [78,84]), matching our finding of higher ab initio repeat content in 662 bdelloids. Additional work is required to elucidate the nature of these unclassified repeats in 663 bdelloids and other taxa. Our comparisons did not detect any substantial variation in the abundance of known TEs between 673 anhydrobiotic (1.2% and 0.8% for A. ricciae and A. vaga respectively) versus non-desiccating bdelloids 674 (0.9% and 1.2% for R. macrura and R. magnacalcarata respectively), despite a considerable increase in 675 the inferred genome size of Rotaria species. Moreover, the proposed mechanism involving 676 desiccation relies on DSB repair during rehydration, a process which is presumably limited in the 677 aquatic species R. macrura and R. magnacalcarata and may also not apply in the case of R. varieornatus 678 [34]. However, the majority of bdelloid rotifers are resistant to desiccation, suggesting that 679 anhydrobiosis was probably the ancestral state [17]. Thus, it may be that TEs and other repeats 680 were already largely eradicated in the most recent common ancestor to non-desiccating Rotaria 681 species, prior to their adaptation to a fully aquatic lifestyle and loss of anhydrobiosis. 682

683
The bdelloid rotifers have drawn attention because two features of their life history are remarkable 684 among metazoans: their apparent ancient asexuality and their ability to withstand desiccation at any 685 life stage. We assessed the relative contributions of asexuality and anhydrobiosis to genome 686 evolution in bdelloids, using de novo whole genome sequence data for desiccating and non-687 desiccating species and comparisons to both the existing genome of A. vaga and to other animal taxa. 688 We find that both desiccating and non-desiccating species are tetraploid, in agreement with previous 689 work [24,25], but that homologous divergence in non-desiccating Rotaria species is substantially 690 lower than that observed between anhydrobiotic Adineta species, and is low compared to estimates 691 of allelic heterozygosity from sexual eukaryotes. This finding runs counter to predictions based on 692 current hypotheses regarding the genomic effects of desiccation, and thus requires a reevaluation of 693 the causes and consequences of intragenomic interactions between bdelloid homologs. 694 We re-confirm previous reports that bdelloids encode a high proportion of non-metazoan genes. 695 Here too, a role for desiccation-tolerance had been hypothesised. We find that high HGT content is 696 a potentially unique feature of bdelloid genomes among animals, but comparisons to other 697 desiccation-tolerant taxa bring into question the role of anhydrobiosis. Our extensive assembly 698 results also allow for a refinement of the global parameters of bdelloid genomes, and suggest 699 substantial genome size differences between genera that may be linked to desiccation tolerance. The 700 phylogenetic non-independence of these results precludes more direct tests of cause and effect, 701 however, and further genomic evidence is required, ideally from anhydrobiotic species within Rotaria. 702 Comparisons of genome architecture within Adineta revealed that a number of unusual genome 703 features reported in A. vaga, posited as evidence of long-term ameiotic evolution in this species [28], 704 were largely absent from the closely related species A. ricciae, for which a comparable assembly is 705 now available. In addition, we find that bdelloids encode the majority of genes that are required for 706 meiosis and syngamy in sexual taxa, but emphasise that the precise function of these genes in 707 bdelloids is currently unknown. Overall, our results do not exclude the hypothesis that many 708 features of the bdelloid genomes analysed here are consistent with those found in sexual taxa, 709 except for the remarkably high prevalence of HGT. (2.5% w/v each). Approximately 50,000 rotifers were starved overnight before collection and 716 harvested by centrifuging at 10,000 g for 5 mins and discarding the supernatant before treatment 717 according to a DNA or RNA extraction protocol. A starter culture for R. macrura was generated 718 from ~100 wild-caught animals isolated from a small pond near Lake Orta, Italy. Populations were 719 grown in sterile distilled water and fed with autoclaved and filter-sterilised organic lettuce extract. 720 Prior to DNA extraction, animals were washed twice in sterile distilled water and starved overnight 721 kb inserts were also prepared at CGR using Nextera reagents, and all libraries were sequenced (150 745 bases paired end) over three lanes of an Illumina HiSeq4000 at CGR. Short-insert data for R. 746 magnacalcarata has been described previously [40]. All raw data have been submitted to the 747 Sequence Read Archive (SRA), an International Nucleotide Sequence Database Collaboration 748 (INSDC), under the accession IDs ERR2135445-55 2 (S1 Table). 749 BBMap "kmercountexact" v36.02 [105], and library insert sizes, along with initial genome size 754 estimates, were calculated using SGA "preqc" [106]. Error correction of reads was performed using 755

Genome assembly
BBMap "tadpole" (k = 31), discarding any pairs of reads containing unique kmers. 756 Contaminant reads derived from non-target organisms were filtered using BlobTools v0.9.19 [107]. 757 Briefly, trimmed and error-corrected paired-end data were digitally normalised to ~100X using 758 BBMap "bbnorm" [105] and a preliminary, draft assembly was generated using Velvet v1.2.10 [108], 759 setting a kmer length of 75. Taxonomic annotations for all contigs were determined by comparing 760 contigs against the NCBI nucleotide database (nt) and a custom database containing recently 761 published whole genome sequences of metazoans within the Spiralia (Lophotrochzoa) group (S4 762 The reference assembly pipeline above was designed to maximise assembly contiguity, but may lead 789 to assembly collapse the extent of which is undetermined a priori. Thus, maximum haplotype 790 assemblies were also generated for each species for comparison, defined as assemblies with minimal 791 reduction due to assembly collapse. Maximum haplotype assemblies were generated using either 792 dipSPAdes (default settings) or Platanus with the "bubble crush" reduction parameter set to zero (-u 793 0). Details of assembly parameters trialled are given in S1 Data. Collapsed and maximum haplotype 794 (re)assemblies for A. vaga were also generated following the same procedures, using Illumina short-795 insert libraries (accession IDs SRR801084 and ERR321929) for contig building, and mate-pair 796 (accession ID ERR321928) and 454 data for scaffolding (see [ To test if coding sequences had been inadvertently missed during gene prediction, we compared 817 proteins to the source nucleotide sequences from which they had been predicted using TBLASTN 818 (E-value ≤ 1e-20). Matches to existing gene models were discounted by removing alignments that 819 showed any overlap with gene regions (BEDtools "intersect" [129] with the "-v" option), leaving 820 only hits to regions of the genome that had not already been annotated as a gene. 821

Collinearity analyses
822 Syntenic regions within and between genomes were identified using MCScanX [49], calling collinear 823 "blocks" as regions with at least five homologous genes and fewer than 10 "gaps" (i.e., missing 824 genes SNPs were called based on mate-pair libraries mapped as single-end, as paired-end data for these 843 samples were comprised of multiple non-clonal lineages. 844 Horizontal gene transfer 845 We assessed the extent of horizontal transfer into bdelloid genomes using a combination of 846 sequence comparison and phylogenetics-based approaches, and applied the same tests to a set of 13 847 publicly available proteomes from species across the Protostomia (S4 Table). Protein sequences 848 were first compared to the UniRef90 database [134] (downloaded November 29, 2016) using 849 Diamond "blastp" (E-value ≤ 1e-5, maximum target seqs = 500). For each query, two HGT metrics 850 were then calculated: (a) HGT Index (hU [27]), defined as: BOUT -BIN, where BIN is the best (highest) 851 Diamond bitscore from comparisons to "ingroup" taxa and BOUT is the corresponding score for hits 852 to "outgroup" taxa; (b) Consensus Hit Support (CHS), defined as the proportion of all hits that 853 support a given query's ingroup/outgroup classification, itself inferred from the highest sum of 854 bitscores to ingroup or outgroup across all hits [92]. The CHS score therefore takes into account 855 the taxonomic distribution of all hits for each query, and militates against misclassifications based on 856 hU scores alone. We defined the ingroup as "Metazoa" and the outgroup as "non-Metazoa", and 857 marked all proteins with an hU ≥ 30 and CHSOUT ≥ 90% as putative HGT candidates (HGTC). We 858 then looked at the distribution of all HGTC across the genome, and discarded any candidate found 859 on a scaffold encoding ≥ 95% genes of putative foreign origin (i.e., "HGT-heavy" scaffolds likely 860 derived from contaminant sequences that were not removed during assembly). For each HGTC, 861 physical linkage (i.e., presence on the same scaffold) to a gene with good evidence for metazoan 862 origin (hU ≤ 0, CHSIN ≥ 90%) and the number of predicted introns was also recorded. Finally, 863 phylogenetic support for HGT was then assessed: for each HGTC, the sequences of 15 metazoan 864 and 15 non-metazoan UniRef90 hits (when present) were extracted and aligned using MAFFT v7.309 865 [135] with default parameters, and a maximum likelihood phylogeny was constructed using IQ-TREE 866 v1.5.3 [136], specifying automatic model selection and 1,000 ultrafast bootstrap replicates. The 867 functionality of GNU Parallel [137] was used to compute multiple trees simultaneously, and clusters 868 with fewer than four taxa were not analysed. Branching patterns of resultant trees were then 869 assessed using a custom script written in R v3.3.1 [138], utilising functions from the "ape" v4.1 870 package [139]. All HGT analysis scripts are available at https://github.com/reubwn/hgt. 871

872
The abundance of known transposable elements (TEs) was assessed for the same set of protostomes 873 using RepeatMasker, except using a Repbase (v22.02) repeat library specific to the Metazoa (i.e., 874 "queryRepeatDatabase.pl -species 'metazoa'"). Species-specific custom repeat libraries (e.g., using 875 RepeatModeler) were not generated for this analysis; only known repeats from Repbase were 876 compared. The total span of LINEs/SINEs, LTR elements, DNA elements and simple repeats relative 877 to the assembly span for each species was then computed from the RepeatMasker output tables. We 878 also calculated a genome density metric, defined as the number of protein-coding genes per Mb of 879 haploid genome, i.e., accounting for variation in ploidy among species. Presence was recorded if any query within each OG showed a TBLASTN alignment with ≥ 50% 887 identity over ≥ 50% query length, and/or if HMMER reported an alignment above the significance 888 threshold. The genomes and proteomes of D. melanogaster and C. elegans were also searched for 889 comparison (S6 Data). Proteins were also assessed for presence of the Zona pellucida-like domain 890 (Pfam accession PF00100) using HMMER3 as above. 891