Evolution of Ty1 copy number control in yeast by horizontal transfer of a gag gene

Insertion of mobile DNA sequences typically has deleterious effects on host fitness, and thus diverse mechanisms have evolved to control mobile element proliferation across the tree of life. Mobility of the Ty1 retrotransposon in Saccharomyces yeasts is regulated by a novel form of copy number control (CNC) mediated by a self-encoded restriction factor derived from the Ty1 gag capsid gene that inhibits virus-like particle function. Here, we survey a panel of wild and human-associated strains of S. cerevisiae and S. paradoxus to investigate how genomic Ty1 content influences variation in Ty1 mobility. We observe high levels of mobility for a canonical Ty1 tester element in permissive strains that either lack full-length Ty1 elements or only contain full-length copies of the Ty1’ subfamily that have a divergent gag sequence. In contrast, low levels of canonical Ty1 mobility are observed in restrictive strains carrying full-length Ty1 elements containing canonical gag. Phylogenomic analysis of full-length Ty1 elements revealed that Ty1’ is the ancestral subfamily present in wild strains of S. cerevisiae, and that canonical Ty1 in S. cerevisiae is a derived subfamily that acquired gag from S. paradoxus by horizontal transfer and recombination. Our results provide evidence that variation in the ability of S. cerevisiae and S. paradoxus strains to repress canonical Ty1 transposition via CNC is encoded by the genomic content of different Ty1 subfamilies, and that self-encoded forms of transposon control can spread across species boundaries by horizontal transfer.

sequence composition with mobility frequency, we infer that restrictive strains in both S. cerevisiae and S. 119 paradoxus contain full-length Ty1 elements with a canonical form of gag. In contrast, permissive strains 120 either lack full-length Ty1 elements or only contain full-length elements from the Ty1' subfamily that have 121 a divergent gag sequence. Surprisingly, the reconstructed evolutionary history of full-length Ty1 elements 122 in S. cerevisiae and S. paradoxus shows that the Ty1' subfamily is the ancestral subfamily in S. cerevisiae 123 found in wild lineages, while the canonical Ty1 family used in most functional studies is a highly-derived 124 element found in human-associated strains. Furthermore, we discovered that the gag region of the canonical 125 S. cerevisiae Ty1 element was acquired by horizontal transfer from an Old-World lineage of S. paradoxus 126 followed by recombination onto a pre-existing ancestral Ty1'-like element. Our results demonstrate that 127 intraspecific variation in the ability to repress transposition of the canonical Ty1 subfamily in S. cerevisiae 128 is a consequence of horizontal transfer of a CNC-competent gag gene from a closely-related yeast species.  131 Because Ty1 CNC is mediated by a self-encoded factor (p22) and dependent on Ty1 genomic copy number, 132 we hypothesized that variation in endogenous Ty1 genomic content may influence the strength of Ty1 CNC derived from the gag region of the "canonical" Ty1-H3 element (Sup Fig 1A), a full-length Ty1 element 138 isolated in S. cerevisiae as a His + reversion mutant that has been used in many pioneering studies on Ty1 139 structure and function (Boeke et al. 1985; Boeke et al. 1988). This analysis revealed substantial diversity 140 across both S. cerevisiae and S. paradoxus strains in the number of Ty1 elements that share strong sequence 141 similarity to canonical Ty1-H3 gag (Sup Fig 1B and Sup Fig 1C), with most strains having fewer Ty1 142 elements than the S. cerevisiae reference strain S288c. These results also revealed several strains that 143 contained no elements with similarity to canonical Ty1 gag, consistent with the existence of multiple "  less" strains that lack full-length elements in S. cerevisiae and S. paradoxus (Wilke et al. 1992  Next, we selected a diverse panel of ten S. cerevisiae and S. paradoxus SGRP strains with distinct Ty1 148 hybridization patterns ( Figure 1A) to test for variation in the frequency of mobility using a canonical  H3 tester element marked with a his3-AI indicator gene (Curcio and Garfinkel 1991). We performed Ty1 colonies in these S. paradoxus strains is a readout for retromobility that can be monitored by qualitative or 160 quantitative assays similar to Ty1his3-AI reporter system (Curcio and Garfinkel 1991;Atwood et al. 1998; 161 Curcio et al. 2007). These experiments revealed >50-fold differences in the mobility of canonical Ty1 162 7 across strains within both S. cerevisiae and S. paradoxus (Table 1). In both species, we observed 163 "restrictive" strains with very low levels of canonical Ty1 mobility (S. cerevisiae: S288c, Y12, and 164 DBVPG6044; S. paradoxus: CBS432, N-44). Likewise, we observed "permissive" strains in both species 165 with canonical Ty1 mobility frequencies which were more than an order of magnitude higher than 166 restrictive strains (S. cerevisiae: UWOPS05-787.3, YPS606, UWOPS05-227.2, and L1374; S. paradoxus: 167 YPS138).  (Table 2), with canonical Ty1 mobility in populated strains being on the same order as other native 178 strains with restrictive phenotypes. We note that mobility data for native strains in Table 2 were from an 179 independent set of experiments done in parallel with populated strains and thus differ slightly from the data 180 in Table 1 for the same native strains. The ability for permissive strains to become restrictive with the 181 addition of full-length copies of canonical Ty1 indicates that genetic background effects alone cannot 182 explain variation in Ty1 mobility and is consistent with Ty1 CNC playing a major role in shaping variation 183 in Ty1 mobility among yeast strains.

185
The presence of full-length Ty1 elements is not sufficient to restrict Ty1-H3 mobility 186 To determine if variation in Ty1 mobility is influenced by the copy number or sequence of endogenous Ty1 187 elements, we generated ~100x whole-genome shotgun PacBio datasets and assembled genome sequences 188 for the seven S. cerevisiae strains we assayed for Ty1 mobility. We integrated data from our S. cerevisiae  192 which was broken at the highly repeated rDNA locus) and thus provide an essentially-complete catalogue 193 of Ty content in yeast genomes. We identified Ty elements in these ten PacBio assemblies using a 194 RepeatMasker-based strategy that classifies Ty elements as full-length, truncated, or solo LTR sequences 195 8 based on the completeness of internal sequences in each predicted element (see Materials and Methods for 196 details). Although our focus is on Ty1, we annotated all Ty families in these genomes to avoid potential 197 misidentification, and because the similarity of solo LTRs from Ty1 and Ty2 does not allow their 198 unambiguous assignment to either family (see also Yue et al. (2017)). Predicted numbers of full-length, 199 truncated, or solo LTR sequences for Ty1 can be found in Table 3 and for all Ty families in Sup File 1. We 200 focused on full-length elements in our analysis since they are most likely to have the complete set of 201 functional sequences required for Ty1 gene expression and transposition.

203
The total number of full-length Ty1 elements varies substantially across the ten yeast strains with mobility 204 data in our sample ( Recombination occurs among canonical Ty1 and Ty1' subfamilies in S. cerevisiae 220 The two exceptional S. cerevisiae permissive strains that had full-length Ty1 elements detected in their 221 PacBio assemblies (UWOPS05-787.3 and YPS606) displayed multiple bands with weak hybridization to 222 the Ty1 gag probe by Southern blot analysis ( Figure 1A). Some, but not all, of these weak Ty1 bands could 223 be explained by cross-hybridization with Ty2 ( Figure 1B). This observation suggested the possibility of 224 divergent Ty1 sequences in these genomes such as the Ty1' subfamily that is known to differ from the 225 canonical Ty1 element in its gag region (Kim et al. 1998). To determine if the presence of a variant Ty1 226 subfamily could potentially explain the observation of permissive strains with full-length Ty1 elements, we 227 extracted and aligned all full-length Ty1 elements from the PacBio assemblies of the ten S. cerevisiae and 228 9 S. paradoxus strains for which we had mobility data, then clustered full-length Ty1 elements based on 229 sequence similarity. We included the canonical Ty1-H3 tester element used in our mobility assays and used 230 a distance-based clustering approach (Neighbor Joining) in this analysis, since our goal was to identify 231 potential Ty1 subfamilies that could explain variation in canonical Ty1 mobility across strains, not to infer 232 the detailed evolutionary history of Ty1 in these species.

234
Clustering of complete Ty1 sequences revealed a well-supported long branch separating S. cerevisiae 235 elements from those in S. paradoxus (Figure 2A). In S. cerevisiae, two major clusters of Ty1 elements are 236 observed. One cluster corresponds to the "canonical" Ty1 subfamily as defined by the presence of the Ty1-237 H3 tester element in this cluster (green background, Figure 2A). Two strains have full-length elements in  recombinant Y12_f109 element noted above clusters with the canonical Ty1 group (single asterisk, Figure   264 2B) since the majority of its gag gene is canonical Ty1 but is found on a very long unique branch due to 265 the presence of Ty1' sequences in the 5' part of its gag (Sup Fig 2A). Aside from these two elements with 266 evidence of recombination in gag, all full-length Ty1 elements are found in two main groups separated by 267 substantial sequence divergence in their gag region: (i) elements with a canonical Ty1 type gag found in S. 268 cerevisiae and S. paradoxus, and (ii) elements with a Ty1' type gag found only in S. cerevisiae.

270
Clustering of Ty1 pol ( Figure 2C) revealed a topology similar to that of complete Ty1 sequences ( Figure   271 2A) with two notable exceptions. First, the two elements that are recombinant in gag (Y12_f109 and 272 S288c_f486) are both found within the Ty1' cluster in their pol regions, consistent with these elements 273 being predominantly Ty1' except for parts of their gag genes (Sup Fig 2A, B). Second, the seven closely-274 related Y12 elements found on the long internal branch in the complete tree (arrowhead, Figure 2A) cluster 275 in the Ty1' group in the pol tree (arrowhead, Figure 2C). This observation, in addition to the fact that these 276 seven Y12 elements have a canonical Ty1 gag (arrowhead, Figure 2B), implies that they are recombinants   289 We next attempted to interpret variation in canonical Ty1 mobility at the strain level with genomic Ty1 close clustering with S. cerevisiae canonical Ty1 sequences ( Figure 2B). This analysis revealed that 295 restrictive strains from both S. cerevisiae and S. paradoxus contain one or more full-length element that 296 encodes a canonical Ty1 type gag (S. cerevisiae: S288c, DBVPG6044 and Y12; S. paradoxus: CBS432 297 and N-44) (Table 3). Conversely, permissive strains only have full-length elements that encode a Ty1' type 298 gag (S. cerevisiae: UWOPS83-787.3 and YPS606) or lack full-length Ty1 elements altogether (S.  A confounding factor to the interpretation that a canonical Ty1 gag confers the restrictive phenotype is that 307 there is substantial sequence divergence between canonical Ty1 and Ty1' not only in gag but also in the 308 LTRs and pol ( Figure 3A). However, the impact of divergence in gag from these other changes between 309 canonical Ty1 and Ty1' can be addressed using data from the restrictive strain Y12. Full-length elements 310 in this strain are either pure Ty1' elements or "mosaic" Ty1 elements that have an essentially-complete  that the regions of high divergence between canonical Ty1 and Ty1' in gag (red, Figure 3A) and the middle 326 part of pol (purple, Figure 3A) correspond exactly to regions of high sequence similarity between canonical 327 12 Ty1 and S. paradoxus Ty1 (red and purple, Figure 3C). No such regions of high sequence similarity are 328 observed between Ty1' and S. paradoxus Ty1 ( Figure 3D). These results suggest that the extreme 329 divergence between canonical Ty1 and Ty1' in gag that underlies variation in CNC phenotypes may have  Table 1. Several strains in the expanded dataset in addition to those noted above were found to be Ty1-  Analysis of maximum likelihood phylogenetic trees from this expanded set of strains revealed strikingly 356 discordant histories for Ty1 gag ( Figure 4A) and pol ( Figure 4B). Importantly, the phylogenetic history of 357 the gag gene is not compatible with the accepted species tree for these taxa (Kellis et al. 2003). In the gag 358 tree, S. cerevisiae Ty1 sequences are found in two well-supported monophyletic groups (brown 359 background, Figure 4A). One S. cerevisiae gag clade is the sister group to the ancestor of all S. paradoxus 360 13 Ty1 gag sequences and contains only elements with Ty1' type gag. The other S. cerevisiae gag clade -361 which includes the canonical Ty1-H3 tester element -is discordantly placed as being derived from the Old 362 World clade of S. paradoxus Ty1 elements (black arrow, Figure 4A), with the closest affinity to elements 363 from the European lineage of S. paradoxus represented by CBS432 (Sup Fig 5A). Importantly, all S.   Divergence between canonical Ty1 and Ty1' Gag occurs outside functionally-characterized residues 406 Comparison of Ty1 genomic content with mobility phenotypes above revealed that strains encoding only 407 Ty1' gag cannot strongly repress mobility of a canonical Ty1-H3 tester element (Table 1). These mobility 408 assays imply that sequence divergence between canonical Ty1 and Ty1' in Gag may affect the ability of Fisher's Exact Test). These results suggest that the Ty1' subfamily has the capacity to code for a p22-like 425 15 molecule and that potential functional divergence between canonical Ty1 and Ty1' Gag occurs outside 426 residues currently known to affect Ty1 protein function, maturation or resistance to p22.  Our results also extend our understanding of the evolution of Ty1 in S. cerevisiae and S. paradoxus. 468 Combined with previous work by Jordan and McDonald (1998;1999b), our results suggest that the 469 canonical Ty1 element used in most studies on Ty1 expression or function is a highly-derived element that 470 acquired sequences from both S. paradoxus Ty1 and S. cerevisiae Ty2 in a human-associated environment.

471
How and when these events happened remain to be determined, although the importance of homologous 472 recombination in both events is clear. Decoding the history of the canonical Ty1 element will need to 473 explain the somewhat paradoxical observation that this subfamily confers strong repression against itself 474 but also apparently has high fitness, as reflected by its high copy number in strains that carry this subfamily.

475
Understanding the history of these events may be challenging since the lack of overlap in the sequences 476 acquired from S. paradoxus and Ty2 prevents their relative ordering, and the ongoing effects of