Plant virus movement proteins originated from jelly-roll capsid proteins

Numerous, diverse plant viruses encode movement proteins (MPs) that aid the virus movement through plasmodesmata, the plant intercellular channels. MPs are essential for virus spread and propagation in distal tissues, and several unrelated MPs have been identified. The 30K superfamily of MPs (named after the molecular mass of tobacco mosaic virus MP, the classical model of plant virology) is the largest and most diverse MP variety, represented in 16 virus families, but its evolutionary origin remained obscure. Here, we show that the core structural domain of the 30K MPs is homologous to the jelly-roll domain of the capsid proteins (CPs) of small RNA and DNA viruses, in particular, those infecting plants. The closest similarity was observed between the 30K MPs and the CPs of the viruses in the families Bromoviridae and Geminiviridae. We hypothesize that the MPs evolved via duplication or horizontal acquisition of the CP gene in a virus that infected an ancestor of vascular plants, followed by neofunctionalization of one of the paralogous CPs, potentially through the acquisition of unique N- and C-terminal regions. During the subsequent coevolution of viruses with diversifying vascular plants, the 30K MP genes underwent explosive horizontal spread among emergent RNA and DNA viruses, likely permitting viruses of insects and fungi that coinfected plants to expand their host ranges, molding the contemporary plant virome.

RESPONSE: We highly appreciate the constructive comments by both reviewers and have revised our manuscript taking all of them into consideration. In particular, we now provide the per-residue local distance difference test (lDDT) plots for both AlphaFold and RoseTTAFold models in new Supplementary Figure 1. In addition, we provide all models generated in our study in Supplementary Data 1. The statements on the origin of MPs in geminiviruses were toned down both in the text and Figure 6. We have also combined Results and Discussion, as suggested by Reviewer 2. Finally, we performed alignments of capsid and movement protein structures and calculated the corresponding tree using MUSTANG, as suggested by Reviewer 2. These new results fully support our original conclusions. We also note that while the manuscript was under review, genomes of several viruses associated with lower vascular plants (ferns) and non-vascular plants (mosses and liverworts) were reported (Viruses 2023, 15, 840; https://doi.org/10.3390/v15040840). Analysis of their 30K MPs were also included in the revised manuscript. Our point-by-point answers to the reviewers' comments are appended below.

Reviewer #1:
The 30K family is the most prevalent among the movement proteins (MPs) of extremely diverse plant viruses that affect a broad variety of wild and crop plants alike. Despite the decades-long effort, however, the molecular mechanisms whereby 30K MPs empower plant virus infection remain poorly understood. In this respect, the work by Butkovic et al. represents a sea change by revealing 3D structures of a broad variety of 30K MPs. Methodologically, the authors used the most advanced bioinformatics available for the prediction and comparison of protein structures (AlphaFold2, RoseTTAFold, DALI), as well as a variety of more traditional, complementary approaches for the sequence and phylogenetic analyses. Furthermore, they generated a comprehensive database of 30K MPs that provides an important resource for the entire plant virology community.
This work compellingly demonstrates the single jelly-roll (SJR) fold of the structural core of all 30K MP subfamilies. The authors also analyzed additional structural elements, such as N-and C-terminal core extensions in the 30K MPs of distinct subfamilies. Taken together, the findings of this study are certain to dramatically facilitate experimental analysis of the structure-to-function relationships within 30K MPs.
Finally, this work reports a largely unexpected evolutionary discovery: the origin of the 30K MPs from the SJR capsid proteins of the icosahedral RNA and DNA viruses, representing an example of the functional repurposing of the virus proteins during expansion to a novel ecological niche. RESPONSE: We thank the reviewer for the kind words and useful suggestions.
What this reviewer is a bit less enthusiastic about, is the suggestion that the geminiviruses could be an original virus lineage in which SJR CP gene duplication and re-functionalization have occurred. From the studies of the global distribution of virus families, it appears that the host range and geography of geminiviruses were historically limited to mostly tropical areas. This distribution has dramatically expanded into temperate regions in the second part of the XX century, likely due to the expansion of the geographical range of the whiteflies (most common geminivirus vectors) along with global warming. Accordingly, it seems unlikely that the ssDNA geminiviruses which represent a relatively minor part of the plant virome were the original source for 30K MP emergence. Because the significant majority of plant virus families possessing diverse lineages of 30K MPs are RNA viruses, it seems more plausible that these MPs have emerged among the RNA viruses.
RESPONSE: We agree with the reviewer that the evidence of the origin of 30K MPs in geminiviruses is not very strong. In the revised version of the manuscript, this claim has been toned down (in the text and figure depicting the evolutionary scenario) and the alternative possibility, namely, the more parsimonious emergence of the MP in RNA viruses has been presented.

Reviewer #2:
Butkovic et al has studied structural similarity of a jelly-roll motif and evolutionary lineage between movement proteins (MPs) encoded in broadly RNA/DNA viruses (especially in plant viruses) and capsid proteins (CPs) in ssRNA and ssDNA viruses. The results largely depend on recently invented AI-based accurate structural predictions of structurally unrevealed MPs using AlphaFold2 and RoseTTAFold. Based on the authors' structure-based alignments' and similarity analysis, previously recognized conserved secondary structure and D-motif in MPs and CPs are deeply and comprehensively analyzed, which could explain their same evolutionary origin. Butkovic et al also hypothesizes likely evolutionary scenario of a MP gene acquisition from a CP gene.
The addressed question is very interesting in the sense of understanding function and origin of the 30K MPs superfamily in diverse viruses and the hypothesized evolutionary scenario is likely. Since the jellyroll fold and D-motif of the MPs has already mentioned in previously papers, the main contribution of the manuscript has been the thoroughly analysis of the predicted MPs and CPs structures in diverse viruses. However, the main bottleneck is no experimental structure of the MP available for confirming the hypothesis and the results. Although I agree that Al-based structural prediction is a strong and useful tool for performing structure-based classifications, it should be cautiously of validating and interpreting the results owing not to being misled by the predicted models. Therefore, major and minor comments to be addressed. RESPONSE: We thank the reviewer for the constructive comments and suggestions.
Major and technical issues 1. Many parts of discussion and speculations are found in Results section and lengthy, and thus it is hard to read. I highly recommend reorganizing "Results" and "Discussion" or using an option of "Results and Discussion". The same or similar discussions are also found both in results and in discussion sections. It should be integrated to one. RESPONSE: We have combined the Results and Discussion, as suggested, and hope that the revised text is now more balanced.
2. Fig. 2A and lines 196-205. The most difficult part for me to review is that there is no available data for clarifying the accuracy of the MPs' structural prediction. AlphaFold2 automatically generates per-residue pLDDT score (e.g. Tunyasuvunakool et al., 2021, DOI: 10.1038/s41586-021-03828-1) to validate predicted and generated structures. The jelly-roll core of the MPs seems to be well predicted, however most likely loops, N-/C-terminal parts are not. Such validation data should be included in the manuscript and described in the main text. RESPONSE: Thank you for raising this valid point. We have now provided plDDT plots for all analyzed models as a new supplementary figure 1. As we originally pointed out in the text, the N-and C-terminal regions are largely unstructured and the quality of the models in these regions is poor. Hence, the structural details of these regions were considered in neither the original nor revised version of the manuscript. The model quality assessment is described in the main text, as requested by the reviewer.
Also, it is mentioned that RoseTTAFold was used in case AlphaFold2 showed poor IDDT results, but then how to verify the RoseTTAFold models were accurate? RESPONSE: The end-to-end version of RoseTTAFold provides structural models with estimated residuewise CA-lDDT in the B-factor column. The lDDT plots for RoseTTAFold and AlphaFold models are now provided in the supplementary information (new Supplementary Figure 1). In addition, we provide all models generated in our study in Supplementary Data 1.
3. Fig. 3 and lines 206-238. It is not clear which part of the predicted structures are used for DALI calculation. If you used the entire predicted structures, the obtained DALI-score largely affected by badly predicted loops or N-and C-terminal regions. This is crucial for discussing the structural similarity in the MPs and the CPs. RESPONSE: We apologize for not making this clearer in the original version of the manuscript. For the dendrogram calculation with DALI, only the jelly-roll domain was considered, with the N-terminal regions which do not have counterparts in the capsid proteins being removed prior to comparison. Thus, all reported Z scores correspond to the pair-wise comparisons of the jelly-roll domains.
Could you generate a structure-based phylogeny using only accurately predicted jelly-roll domain of the MPs and the CPs? Also, DALI is a good start for identifying similar structures, but not suitable for generating accurate multiple structural alignments. There is a better alignment tool (e.g. HSF, SHP, MUSTANG...) to generate an RMSD-based phylogeny as seen in previously published papers (e.g. Riffel et al., 2002, DOI: 10.1016/S0969-2126(02)00896-1; Wang et al., 2014, DOI: 10.1038/nature13806). It should be very careful of concluding that geminivirus CP is an origin of MPs from the predicted structures.
RESPONSE: Thank you for this suggestion. In the revised manuscript, we used MUSTANG to generate the RMSD-based tree of SJR CPs and MPs. The tree is largely congruent with the dendrogram based on the DALI Z-scores, with minor differences. Essentially, as previously, MPs emerge from within the CP diversity and form a sister group to CPs of plant RNA and DNA viruses (geminiviruses, nanoviruses, bromoviruses, satellite viruses). The only difference is that the CP of SPMV is nested among the MPs. This is likely due to poor representation of this CP group (only one structure) and its simple organization which indeed closely resembles that of MPs. We discuss these findings in the revised text, and the MUSTANG tree is now provided as a new supplementary figure 3.