Revisiting evolutionary trajectories and the organization of the Pleolipoviridae family

Archaeal pleomorphic viruses belonging to the Pleolipoviridae family represent an enigmatic group as they exhibit unique genomic features and are thought to have evolved through recombination with different archaeal plasmids. However, most of our understanding of the diversity and evolutionary trajectories of this clade comes from a handful of isolated representatives. Here we present 164 new genomes of pleolipoviruses obtained from metagenomic data of Australian hypersaline lakes and publicly available metagenomic data. We perform a comprehensive analysis on the diversity and evolutionary relationships of the newly discovered viruses and previously described pleolipoviruses. We propose to classify the viruses into five genera within the Pleolipoviridae family, with one new genus represented only by virus genomes retrieved in this study. Our data support the current hypothesis that pleolipoviruses reshaped their genomes through recombining with multiple different groups of plasmids, which is reflected in the diversity of their predicted replication strategies. We show that the proposed genus Epsilonpleolipovirus has evolutionary ties to pRN1-like plasmids from Sulfolobus, suggesting that this group could be infecting other archaeal phyla. Interestingly, we observed that the genome size of pleolipoviruses is correlated to the presence or absence of an integrase. Analyses of the host range revealed that all but one virus exhibit an extremely narrow range, and we show that the predicted tertiary structure of the spike protein is strongly associated with the host family, suggesting a specific adaptation to the host S-layer glycoprotein organization.

For the results to be actually useful to others the authors have to either deposit the identified genomes to GenBank (as third party annotations in the case IMG/VR contigs and as new entries for those assembled from the new data) or, at the very least, provide all the genomes as GenBank and fasta formatted files in the supplementary information.Currently, the authors only deposited the raw reads, which is good, but not sufficient.If I or someone else were to reassemble the reads, we would obtain a different set of contigs and would have no way of comparing with those reported in this manuscript.Also note that even if the contigs can be downloaded from IMG/VR, they are not annotated and gene calling could produce different results in other studies.Thus, please provide all the data in one place so that others could build upon it rather than regenerating it from scratch.The manuscript is too long and could be shortened by removing some generic speculations or moving some sections to the supplement (intended for the biggest pleolipovirus enthusiasts).For instance, "Frequent recombination and gene loss…" is one of such sections, as it presents little new insights and is repetitive with what was said before in the manuscript.

Response
Response: Thank you for this suggestions.We moved the suggested section to Supplements and shortened the remaining manuscript where possible.
It is not clear why the authors chose not to trim the alignments (line 168), contrary to the widely accepted standards in phylogenetics.This decision might have affected the results and is particularly pertinent to the claims by the authors that VP4 is a bad phylogenetic marker.The impact of trimming is expected to have different impact for short and long proteins, hence, the lack of congruence in phylogenies produced for different proteins.Phylogenetic analyses have to be repeated with properly trimmed alignments to exclude this possibility.
Response: Phylogenetic analysis were performed both with and without alignment trimming.We tested two different trimming softwares independently: Trimal (https://doi:10.1093/bioinformatics/btp348)and ClipKIT (https://doi.org/10.1371/journal.pbio.3001007)with almost identical results.The observed lack of congruence reported before for ORFs 4 (spike protein) and 6 (unknown function) was consistent regardless of the approach used.We presented the results from untrimmed sequences in the previous version of the manuscript, because it still debated in the literature whether some trimming strategies can introduce a bias (https://doi.org/10.1093/sysbio/syv033). Nevertheless, we include now in the main manuscript and figures the versions of the trees were trimming of the alignments was performed (Figure 3 and 4) and specified the strategy used in the methods section.

L206-230: The presented data does not quite convince me that "Novel pleolipovirus-like elements from
Australian salt lakes reveal the preference for a productive life cycle".Most of the contigs are very short (cutoff of 2 kb), which might not be sufficient to make assessment of integrated vs extracellular state.
Response: We have removed this section.We agree that more evidence is required to sustain such a claim.

224-230: This part is overly speculative and obscure given that impact of pleolipoviruses on host expression has been shown only for one virus, with all other isolates being apparently rather benign.
Response: We have removed this paragraph.We agree that more evidence from other pleolipoviruses and their impact in metabolic remodeling is required to sustain these claims.
L279: If I am not mistaken, vContact genus-level clusters are identified when 3 genes are shared.The authors settled for 2 shared genes.If so, the genus-level calibration might be off.
Response: You are correct.We have intentionally modified the threshold for the network analysis in vContact.The pleolipovirus genomes show very low levels of similarity at the amino acid sequence level, even for homolog proteins the sequence identity is often below the thresholds used for blast analysis.Additionally, pleolipoviruses exhibit on average much smaller genomes than bacteriophages for which the software was initially designed.These factors tend to artificially weaken the signal and thus the product is a large number of small genus-level clusters.By using the 3 gene threshold many of the connections present in the 2 gene analysis are lost, in particular between the more divergent Pleolipovirus genomes e.g. the Gammapleolipovirus (see new Supplementary figure 3).Nonetheless, the overall patterns of the network remain unchanged.
For the proposed taxonomy, we do not only use the vContact analysis as single criteria, but rather the overall results from the different approaches.Thus, we would not recommend to the split the family in a myriad of genera without stronger biological evidence to support it.Naturally, we cannot exclude that with the discovery of additional representatives of the family, the taxonomy will need further revision, but we believe the current evidence strongly supports our proposal.We now included the analysis using the 3 gene threshold into the supplements and discuss the results accordingly.L290-291: I am really not fond of this sentence: "Altogether, this indicates that the diversity within the Pleolipoviridae family could be greater than previously thought and further challenges the current classification."It is obvious that diversity within any given virus family is greater than currently sampled and classification has to be constantly revised and expanded once new viruses are discovered.Same comment for statement on L346 -"greater than previously thought" -thought by whom?I personally think that that even after this study we are still very far from appreciating the true genomic diversity of these viruses.
Response: We have considered the reviewer suggestion and the statements have been removed from the text.

:
The data analyzed in this study were made available and are readily accessible in an online repository (Digital Object Identifier: 10.5281/zenodo.8248829).The data includes individual assembled virus genomes and contigs used in this study (PV_fasta.tar.gz), protein prediction (PV_fna.tar.gz and PV_faa.tar.gz for nucleotide and amino acid sequence of gene prediction respectively)