Diversity, taxonomy, and evolution of archaeal viruses of the class Caudoviricetes

The archaeal tailed viruses (arTV), evolutionarily related to tailed double-stranded DNA (dsDNA) bacteriophages of the class Caudoviricetes, represent the most common isolates infecting halophilic archaea. Only a handful of these viruses have been genomically characterized, limiting our appreciation of their ecological impacts and evolution. Here, we present 37 new genomes of haloarchaeal tailed virus isolates, more than doubling the current number of sequenced arTVs. Analysis of all 63 available complete genomes of arTVs, which we propose to classify into 14 new families and 3 orders, suggests ancient divergence of archaeal and bacterial tailed viruses and points to an extensive sharing of genes involved in DNA metabolism and counterdefense mechanisms, illuminating common strategies of virus–host interactions with tailed bacteriophages. Coupling of the comparative genomics with the host range analysis on a broad panel of haloarchaeal species uncovered 4 distinct groups of viral tail fiber adhesins controlling the host range expansion. The survey of metagenomes using viral hallmark genes suggests that the global architecture of the arTV community is shaped through recurrent transfers between different biomes, including hypersaline, marine, and anoxic environments.

In the case of Hafunaviridae, the host range largely depends on the variant of virus encoded tail adhesin. The adhesin encoding genes are fast evolving and frequent recombination in this region was observed among viruses from the same species as well as between viruses from different genera. As a result, there is considerable variation in the host ranges even for viruses belonging to the same species (this is discussed in the section "Mutations in tail fiber genes determine the broad host range of hafunaviruses" in the main text and associated Figure 5). It is more difficult to draw conclusion about the host range of other arTVs, because of the limited number of member in these families, with some families including only one species (e.g., HATV-2 from Soleiviridae and HGTV-1 from Halomagnusviridae).
Unlike for hyperthermophilic archaeal viruses, for most arTVs, there is no relationship between genetic closeness (i.e., members of the same taxon) and site of virus isolation. For six out of the seven viral families with more than one isolate (Hafunaviridae, Druskaviridae, Haloferuviridae, Graaviviridae, Vertoviridae and Leisingerviridae), members were isolated from two to five geographically remote locations (see column "virus origin" in Supplementary Table S1). In the cases of families Hafunaviridae and Druskaviridae, which contain members belonging to the same species, nearly identical viruses were isolated from distant locations, e.g. hafunavirus HRTV-10 was isolated from Israel, whereas HRTV-18 from Thailand; druskavirus HCTV-1 was isolated from Italy whereas HCTV-16 from Thailand, etc. Conversely, the susceptible hosts for arTVs also originate from geographically remote sites. The only case where viruses belonging to the same family were isolated from the same sampling site is presented by HCTV-2 and HHTV-2 from Saparoviridae (both isolated from Samut Sakhon, Thailand).
Some of the isolated viruses encode integrases e.g. HRTV-8, HRTV-26, HRTV-27 -do they integrate? Integrases are not discussed when discussing the genomic content of viruses. Do only viruses of a particular family encode integrases?
RESPONSE: To answer this question, we added a new subsection "Integrases" (line 302-319) under the section "Gene content of archaeal tailed viruses" in the main text. All members of four viral families, namely, Hafunaviridae (HF1-like), Graaviviridae (BJ1-like), Vertoviridae (phiCh1-like) and Leisingerviridae (psiM2-like) encode integrases. To assess their integration potential, we searched the available archaeal genomes in the NCBI database for proviruses, which would be considered as members of the four families based on the established demarcation criteria. We found proviruses from all four families (new Supplementary Table S9 and Fig. S6). Consistently, viruses from Hafunaviridae have been observed to form either clear (e.g. HRTV-27, HRTV-13, HSTV-4, etc.) or turbid (e.g. HRTV-26, HRTV-20, HCTV-7, etc.) plaques on the cell lawns of their natural hosts (Table 2 from Atanasova et al., 2015), suggesting that viruses of this family can undergo either lysogenic or lytic life cycles, although the exact regulation remains unclear. Besides, phiCh1 has been shown to be able to integrate into host chromosome (Witte et al., 1997). Taken together, viruses from these four arTV families have the potential to integrate into the host chromosomes presumably using the encoded integrases. Notably, in the Druskaviridae, only one member, HCTV-5, encodes a tyrosine recombinase. However, this protein is more closely related to the invertase of phiCh1-like viruses, which has been shown to be responsible for the inversion of the tail-fiber module (Klein et al., 2012). Consistently, no HCTV-5-like proviruses could be identified in the available archaeal genomes.
Interestingly, we identified proviruses (encoding integrases) related to Haloferuviridae and Anaerodiviridae (Supplementary Table S9, Fig.S6 in this study), although none of the currently isolated members of these families encodes integrases, indicating that the integration module and hence the integration ability can be occasionally gained by arTVs. No proviruses related to viruses from the other eight families were identified, suggesting a strictly lytic life cycle for viruses from these families.
Other comment: Lane 290-297: The fact that HGTV-1 encodes a great number of tRNA is already discussed in earlier publications. Are there any new conclusions, for example tRNAs enabling a broader host range?
RESPONSE: To answer this question, we evaluated the relationship between the number of tRNAs per genome and the determined host ranges of 13 myoviruses (including HGTV-1) from three families. The efficiency of plating of these viruses on 29 haloarchaeal strains belonging to five genera was tested previously (Table S3 from Atanasova et al., 2012). There is no obvious correlation between the number of virus-encoded tRNAs and the number of sensitive host strains (correlation r = -0.47) or between the number of viral tRNAs and the number of host genera (correlation r = 0.2) (see Figure 1 below). HGTV-1, which encodes the largest numbers of tRNAs among arTVs, did not have a broader host range than arTVs with fewer or even non tRNA genes. The analysis of codon usage of HGTV-1 versus that of its host Halogranum sp. SS5-1 would provide insight into the function of viral tRNAs. Unfortunately, the whole genome sequence of Halogranum sp. SS5-1 is currently not available.  RESPONSE: The colored empty fields have been tested but the infection of the corresponding strains was not observed. The description within Table S9 (now Table S10) has been modified: "Colored fields indicate that the infectivity of a virus has been tested on a particular host strain. In the case of successful infection, the efficiency of plating is indicated, whereas in the absence of infection, the field is left open".
Reviewer #2: The paper by Lie et al. provides new and valuable insights into the diversity, function, and phylogenetic relationships among the Caudoviricetes class of viruses. The work reported here more than doubles of knowledge of these arTV. It is remarkable that 63 arTV viral genomes can lead to the proposed formation of 14 new viral families. The phylogenetic analysis is well done, compelling, and supports the formation of these 14 new families. The presented (bioinformatic) annotation of genes in these new viruses is interesting and thoughtful without being too overly speculative. The data presenting on virus isolate's host range and their correlation with tail fiber adhesion proteins well done, although in retrospect, it is not all that surprising of a finding. More interesting was the (likely) different origins of some of the arTV viral families and their distinct separation from tailed bacteriophages. Overall, this is an excellent manuscript. RESPONSE: Thank you for the positive assessment of our work and for constructive suggestions.
I have only minor suggestions for improving this manuscript. They include the following.
1. I would tone down a bit how robust the taxonomic framework is (i.e. lines 99-101) given that some families are represented by only a single member. As the author's themselves state, there is likely much more diversity out there in this class of viruses (lines 424-425).
RESPONSE: We agree with the reviewer. The statement about the robustness of the taxonomic framework has been toned down (line 99-101): "Collectively, our results provide the first global overview of arTV diversity and evolution and establish a taxonomic framework for their classification." 2. It would be useful if the authors provide more details and discussion of the arTV MCP and portal proteins. How distinct or not are the secondary structures of the 9 MCP clades. Likewise for the portal proteins. Are the two trees coherent with each other or not? RESPONSE: To address this question we performed phylogenetic congruence analysis and compared the MCP and portal trees (new Fig. S12). Generally, arTVs formed the same clades in the two trees, suggesting that portal and MCP proteins coevolved in arTVs and are rarely separated by recombination. The possible exceptions are presented by HGTV-1 and ChaoS9, with the latter being notoriously chimeric (PMID: 30832293; see also Fig. S1G). With regard to the comparison of the secondary structures, we performed structural modeling using AlphaFold2 and RoseTTAFold for representatives of all 14 arTV families as well as selected uncultured arTVs. The structural models were then compared to each other in all-against-all analysis and a cladogram was derived from pairwise structural similarity (Z) scores (new Fig. 7). This analysis confirmed that all identified MCPs have the HK97 structural fold, shared with tailed bacteriophages and eukaryotic herpesviruses, and revealed the same 9 MCP clades obtained using sequence-based phylogenetic analysis. Besides the subtle differences throughout the protein in different members, the more pronounced variation was present in the N-termini of the MCP. In particular, some arTVs (e.g., HATV3, HCTV2, HFTV1, HVTV1, etc) contained N-terminal 100-120 aa extensions, equivalent to the scaffolding delta domain of the HK97 MCP, which is essential for capsid assembly and is cleaved from the mature MCP. These results are also described in the Supplementary text.