Benchmarking free energy calculations: Analysis of single and double mutations across two simulation software platforms for two protein systems

Shivani Gupta; Qinfang Sun; Ronald M. Levy

doi:10.1371/journal.pone.0335829

Peer Review History

Original SubmissionOctober 16, 2025
11 Nov 2025 Decision Letter - Yong Wang, Editor Dear Dr. Levy, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Dec 26 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.... We look forward to receiving your revised manuscript. Kind regards, Yong Wang Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS One has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. Thank you for stating the following financial disclosure: “NIH Grant R35 GM132090” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 4. Thank you for stating the following in the Acknowledgments Section of your manuscript: “This work was supported in part by NIH Grant R35 GM132090.” We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: “NIH Grant R35 GM132090” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 5. Please provide a complete Data Availability Statement in the submission form, ensuring you include all necessary access information or a reason for why you are unable to make your data freely accessible. If your research concerns only data provided within your submission, please write "All data are in the manuscript and/or supporting information files" as your Data Availability Statement. 6. If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. [Note: HTML markup is below. Please do not edit.] Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Partly Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: No Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.--> Reviewer #1: No Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: Yes ******** Reviewer #1: The paper is about the benchmark of mutation ddG on two software platforms (FEP+ and Gromacs), on 2 protein system. 1. Correlation should not be the main matrix of comparison. RMSE, kendall tau, or AUE (already being used in the results and discussion), can also be used to evaluate the quality of free energy calculations. And Correlation of ddG is not well defined, because of the following reason. There is not really a direction of A to B and B to A, but plotting some ddG edges of A to B or B to A will give different correlation analysis results. 2. In the method section of “FEP calculations using GROMACS”. Please more clearly explain: 1. What soft-core potential is used? Beutler or Gapsys 2. Soft-core potential is applied to both vdw and coulomb or vdw only. 3. “from λ= 0 to λ = 26 …” is confusing and inconsistent with equation 4 in which λ ϵ [0,1]. 4. Are those 27 windows simulated separately or together with Hamiltonian replica exchange. Hamiltonian replica exchange is an available enhanced sampling method in the native Gromacs mdrun. It does not affect simulation through put while bringing in sampling enhancement. If not using, please justify the reason for not using. 5. “AMBER99sbILDN” was stated in method, but “Amber99SB-ILDN” was stated in the abstract. 6. How was the charge neutralized when a charge mutation was applied? 7. Which ion parameters were used? 8. Please clarify the exact lambda schedule for example: 0.0, 0.05, … 1.00 for electrostatic … 9. As a benchmark, all the input files (structure, Gromacs mdp files for every steps), analysis files, and results files (e.i. ddG values in csv file) should be provided, either in SI or a openly available repository. Not only should the value of free energy estimation be reported, the uncertainties of each free energy estimation should also be reported. 3. Please make the x-y axis aspect equal when comparing experiment and computed delta G. As 1 kcal/mol in the experiment means the same as 1 kcal/mol in the experiment. 4. Two platforms (FEP+ and gromacs) are using different force fields, enhanced sampling algorithms, and sampling time. From the benchmark results, can we get some insight about the advantages and disadvantages about OPLS4 vs amber99ildn and REST2 with 16/24 lambda vs 27 lambda with no replica exchange? 5. Does the magnitude of the error correlate with the type of mutation (sides of the amino acid, neutral or charged mutation). Does the error correlate with the place of the mutation (the surface of the protein or in the core of the protein.) With this benchmark, can we get some insight about what kind of mutation is more difficult to calculate with alchemical free energy calculations. Reviewer #2: This manuscript presents a systematic benchmarking study comparing FEP calculations using Schrödinger FEP+ and GROMACS platforms for predicting mutation-induced stability changes in S. nuclease and T4 lysozyme. The work is methodologically sound and addresses an important gap in the field by directly comparing two widely-used computational platforms. However, several areas require clarification and improvement before publication. (1)The manuscript reports good correlations between calculated and experimental ΔΔG values. However, I notice that no statistical uncertainties (e.g., standard deviations or confidence intervals) are provided for the computed free energy changes. Given the stochastic nature of molecular dynamics simulations, it would be valuable to assess the statistical reliability of these predictions. Could the authors clarify whether multiple independent simulations were performed for each mutation? If not, it would be feasible to estimate statistical uncertainties using block averaging or bootstrap methods. (2)The study employs different numbers of λ-windows between the two platforms: 16-24 for Schrödinger and 27 for GROMACS. I am curious about the specific λ-scheduling strategies used in each platform. Could the authors provide details on: (1) the exact distribution of λ values (uniform vs. non-uniform spacing), (2) whether additional λ-windows were placed in critical regions where electrostatic-to-van der Waals transitions occur, and (3) how the different λ-schedules might affect the precision of free energy integration, particularly in regions of poor phase space overlap? Including supplementary data showing dH/dλ distributions or overlap matrices would help readers assess the adequacy of sampling at each λ-window. (3)I note that Schrödinger (using MBAR) and GROMACS (using BAR) utilized different free energy estimators. While MBAR is theoretically advantageous, its performance often relies on sufficient λ-window sampling. Did the authors evaluate the specific impact of this methodological difference on the comparative results? For instance, would post-processing the GROMACS data with MBAR yield results significantly different from those obtained with BAR? (4)I notice that the simulation setup differs between platforms regarding the solvent box size: a 5 Å buffer for Schrödinger versus a 10 Å buffer for GROMACS. The authors need to clarify the rationale behind this difference. (5)The conclusions of the manuscript heavily rely on the convergence of the simulations. Beyond simulation length, the most compelling evidence would be the time-evolution of the free energy for key calculations. The authors need provide convergence plots of ΔΔG vs. simulation time for some representative mutants (e.g., the best and worst predicted cases) to visually demonstrate the convergence and stability of the calculations? (6) The dataset includes charge-changing mutations, which introduce net charge artifacts under periodic boundary conditions with PME electrostatics—a well-documented source of systematic error in alchemical free energy calculations. I note that the manuscript briefly mentions the use of a "co-alchemical water" approach in Schrödinger to mitigate this issue, but provides no details on its implementation or validation. More critically, there is no discussion of how GROMACS handles net charge corrections for the same mutations. Could the authors clarify: (1) the exact protocol for the co-alchemical water method and whether it fully compensates for finite-size artifacts, (2) whether GROMACS employs any net charge correction schemes (e.g., the analytical correction by Rocklin et al., J. Chem. Theory Comput. 2013, 9, 3072-3083) or simply relies on a neutralizing background plasma, and (3) how these potentially different treatments between platforms might systematically bias the comparative results? This concern is particularly important given the different box sizes noted in point (4), as finite-size effects scale with system size. I would recommend either: (a) a sensitivity analysis quantifying the magnitude of net charge artifacts for a subset of charge-changing mutations, or (b) a thorough literature-based justification demonstrating that the chosen approaches yield negligible systematic errors under the simulation conditions employed. (7) A fundamental methodological concern is that this study simultaneously compares two different software platforms, two different force fields (OPLS4 in Schrödinger vs. AMBER99SBILDN in GROMACS), two different water models (SPC vs. TIP3P), and different enhanced sampling strategies (REST2 in Schrödinger vs. standard MD in GROMACS). This multifactorial design makes it challenging to disentangle whether the observed differences in performance arise from: (1) software implementation details, (2) force field parameterization, (3) water model effects, or (4) sampling methodology. While I recognize that systematically testing all possible combinations may be beyond the scope of this benchmarking study, this confounding factor represents a significant limitation that should be explicitly acknowledged in both the Discussion and Limitations sections. ******* what does this mean?). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our Privacy Policy..--> Reviewer #1: Yes:Chenggong HuiChenggong HuiChenggong HuiChenggong Hui Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] To ensure your figures meet our technical requirements, please review our figure guidelines: https://journals.plos.org/plosone/s/figures You may also use PLOS’s free figure tool, NAAS, to help you prepare publication quality figures: https://journals.plos.org/plosone/s/figures#loc-tools-for-figure-preparation. NAAS will assess whether your figures meet our technical requirements by comparing each figure against our figure specifications. Attachments Attachment Submitted filename: review.docx https://doi.org/10.1371/journal.pone.0335829.r001
Revision 1
12 Feb 2026 Author Response Authors: We thank both reviewers for the time and effort spent on reviewing our manuscript. Their comments/questions have been very helpful for improving the paper. Reviewer #1: The paper is about the benchmark of mutation ddG on two software platforms (FEP+ and Gromacs), on 2 protein systems. 1. Correlation should not be the main matrix of comparison. RMSE, kendall tau, or AUE (already being used in the results and discussion), can also be used to evaluate the quality of free energy calculations. And Correlation of ddG is not well defined, because of the following reason. There is not really a direction of A to B and B to A, but plotting some ddG edges of A to B or B to A will give different correlation analysis results. Authors: We agree that correlation alone does not fully capture the quality of ΔΔG predictions. In response, we have added RMSE, Kendall’s τ, and AUE as additional metrics to assess the accuracy and consistency of the free energy calculations across platforms on sections to provide a more comprehensive and direction-independent evaluation of performance. We have calculated RMSE and Kendall tau values for the comparisons of the experiments with the results of the simulations; they are listed in table S1-S4 and have also been added on page 9 (line # 314-320), page 10 (line # 341-347), page 10 (line # 361- 366) and page 11 (line # 384-391). All these lines have been highlighted in yellow in the main manuscript. The correlation of ΔΔG (Wild type → Mutated) quantifies the agreement between computational predictions and experimental measurements of Gibbs free energy changes upon mutation. 2. In the method section of “FEP calculations using GROMACS”. Please more clearly explain: 1. What soft-core potential is used? Beutler or Gapsys 2. Soft-core potential is applied to both vdw and coulomb or vdw only. 3. “from λ= 0 to λ = 26 …” is confusing and inconsistent with equation 4 in which λ ϵ [0,1]. 4. Are those 27 windows simulated separately or together with Hamiltonian replica exchange. Hamiltonian replica exchange is an available enhanced sampling method in the native Gromacs mdrun. It does not affect simulation through put while bringing in sampling enhancement. If not using, please justify the reason for not using. 5. “AMBER99sbILDN” was stated in method, but “Amber99SB-ILDN” was stated in the abstract. 6. How was the charge neutralized when a charge mutation was applied? 7. Which ion parameters were used? 8. Please clarify the exact lambda schedule for example: 0.0, 0.05, … 1.00 for electrostatic … 9. As a benchmark, all the input files (structure, Gromacs mdp files for every steps), analysis files, and results files (e.i. ddG values in csv file) should be provided, either in SI or a openly available repository. Not only should the value of free energy estimation be reported, the uncertainties of each free energy estimation should also be reported. Authors: The Methods section has been revised to clarify all points as follows: Soft-core potential: The Beutler soft-core potential (Beutler et al., J. Comput. Chem. 1994; https://pubs.acs.org/doi/10.1021/ct300220p) is used in GROMACS. Application: The soft-core potential was applied to both van der Waals and Coulomb interactions to ensure a smooth decoupling process. λ notation: We have clarified the text to read “from λ index 0 to λ index 26”, corresponding to 27 windows over λ ∈ [0,1]. Not Replica exchange: The 27 windows were simulated separately. Force field naming: The inconsistency has been corrected to “AMBER99sbILDN” throughout the manuscript. Charge neutralization: For charge-changing mutations, a counter-ion (Na⁺ or Cl⁻) was mutated to water to maintain charge neutrality. Ion parameters: Na⁺ and Cl⁻ ion parameters compatible with the TIP3P water model were used, consistent with the AMBER99sbILDN force field. λ schedule: The λ schedule has been added explicitly in the revised Methods section. This has been added for Schrodinger and GROMACS on page 6 and 7 in line no. 218-219, and pg- 244-246 respectively. Data availability: All input files (system structures, GROMACS input and output files, analysis scripts, and ΔΔG results with uncertainties) are stored on our laboratory server. We will make them available upon request or upload them to an open-access repository in the final submission to ensure full reproducibility. 3. Please make the x-y axis aspect equal when comparing experiment and computed delta G. As 1 kcal/mol in the experiment means the same as 1 kcal/mol in the experiment. Authors: The figures comparing experimental and computed ΔG values have been updated to use equal x–y axis scaling, ensuring a one-to-one correspondence between the two datasets. 4. Two platforms (FEP+ and gromacs) are using different force fields, enhanced sampling algorithms, and sampling time. From the benchmark results, can we get some insight about the advantages and disadvantages about OPLS4 vs amber99ildn and REST2 with 16/24 lambda vs 27 lambda with no replica exchange? Authors: As noted on page 9, we have discussed the complementary strengths and limitations of the two platforms: “In this work, we employed two widely used MD platforms – Schrödinger FEP+ and GROMACS – for FEP simulations, each offering distinct advantages and limitations. GROMACS is an open-source, freely available MD engine known for its flexibility, extensive customizability, and support for a broad range of force fields, including AMBER, CHARMM, OPLS, and GROMOS. However, its implementation for FEP workflows requires manual setup, command-line proficiency, and greater time investment, making it less accessible to non-expert users. In contrast, Schrödinger FEP+ is a commercial, closed-source platform that employs a modified OPLS4 force field. While it provides less flexibility, its user-friendly interface, automated FEP+ workflow, and enhanced sampling via REST2 (Replica Exchange with Solute Tempering) streamline the process and enhance convergence, particularly for systems with significant conformational heterogeneity.” 5. Does the magnitude of the error correlate with the type of mutation (sides of the amino acid, neutral or charged mutation). Does the error correlate with the place of the mutation (the surface of the protein or in the core of the protein.) With this benchmark, can we get some insight about what kind of mutation is more difficult to calculate with alchemical free energy calculations. Authors: To investigate these possible correlations, we analyzed the magnitude of the prediction error as a function of both mutation type and location (shown in Table S5 and S6). We have tabulated the error distribution by mutation type (neutral or charged) and by mutation location (surface vs. buried residues). There is a direct correlation between the magnitude of size change and the error. Mutations with No Change (NC) are the most accurate. As we move to Size Increases, the error rises significantly. Charged mutations have larger errors than neutral ones, particularly for the Schrödinger dataset. Overall, Surface mutations tend to have higher errors than Buried mutations. The most difficult scenario identified is charge-changing mutations at the protein-solvent interface and bulky side-chain growth. Reviewer #2: This manuscript presents a systematic benchmarking study comparing FEP calculations using Schrödinger FEP+ and GROMACS platforms for predicting mutation-induced stability changes in S. nuclease and T4 lysozyme. The work is methodologically sound and addresses an important gap in the field by directly comparing two widely-used computational platforms. However, several areas require clarification and improvement before publication. (1) The manuscript reports good correlations between calculated and experimental ΔΔG values. However, I notice that no statistical uncertainties (e.g., standard deviations or confidence intervals) are provided for the computed free energy changes. Given the stochastic nature of molecular dynamics simulations, it would be valuable to assess the statistical reliability of these predictions. Could the authors clarify whether multiple independent simulations were performed for each mutation? If not, it would be feasible to estimate statistical uncertainties using block averaging or bootstrap methods. Authors: We have now calculated and reported statistical uncertainties for all computed ΔΔG values. These uncertainties were obtained from multiple independent simulations performed for each mutation. The resulting error bars are now included in the revised manuscript/Supplementary Information and figures to reflect the statistical reliability of our free energy estimates. (2) The study employs different numbers of λ-windows between the two platforms: 16-24 for Schrödinger and 27 for GROMACS. I am curious about the specific λ-scheduling strategies used in each platform. Could the authors provide details on: (1) the exact distribution of λ values (uniform vs. non-uniform spacing), (2) whether additional λ-windows were placed in critical regions where electrostatic-to-van der Waals transitions occur, and (3) how the different λ-schedules might affect the precision of free energy integration, particularly in regions of poor phase space overlap? Including supplementary data showing dH/dλ distributions or overlap matrices would help readers assess the adequacy of sampling at each λ-window. Authors: We thank the reviewer for this suggestion. The exact λ-schedules used in GROMACS and Schrödinger FEP+ are now provided in the revised manuscript. In both platforms, denser λ-spacing was applied near the end states to improve phase-space overlap and sampling stability. To further evaluate sampling adequacy, we have included dH/dλ overlap matrices plots for representative mutations in the Supplementary Information. These analyses confirm sufficient overlap between neighboring λ-windows for reliable free energy integration. Figures S5 and S6 provide overlap matrix plots for the S. nuclease (mutation T41C and T41V) and T4 lysozyme (mutation T59G and G77A) systems. (3)I note that Schrödinger (using MBAR) and GROMACS (using BAR) utilized different free energy estimators. While MBAR is theoretically advantageous, its performance often relies on sufficient λ-window sampling. Did the authors evaluate the specific impact of this methodological difference on the comparative results? For instance, would post-processing the GROMACS data with MBAR yield results significantly different from those obtained with BAR? Authors: We have re-analyzed the GROMACS data using MBAR and found that the resulting ΔΔG values are very similar to those obtained with BAR, indicating that the choice of estimator does not significantly affect the comparative conclusions. The corresponding results are provided in Table S1 and Table S4 of the Supplementary Information. (4)I notice that the simulation setup differs between platforms regarding the solvent box size: a 5 Å buffer for Schrödinger versus a 10 Å buffer for GROMACS. The authors need to clarify the rationale behind this difference. Authors: The solvent box sizes used in each platform (5 Å buffer for Schrödinger and 10 Å buffer for GROMACS) reflect the default/common-used settings of the respective workflows. These defaults have been chosen to balance computational efficiency and simulation stability. (5) The conclusions of the manuscript heavily rely on the convergence of the simulations. Beyond simulation length, the most compelling evidence would be the time-evolution of the free energy for key calculations. The authors need provide convergence plots of ΔΔG vs. simulation time for some representative mutants (e.g., the best and worst predicted cases) to visually demonstrate the convergence and stability of the calculations? Authors: We thank the reviewer for this important suggestion. The convergence plots of ΔΔG versus simulation time for representative mutants (mutation L25I and T33V in S nuclease, mutation S44A and D47A in T4) have been added to the Supplementary Information (Figures S7 and S8). These plots demonstrate that the free energy estimates for both systems are stable and sufficiently converged over the simulation trajectory. (6) The dataset includes charge-changing mutations, which introduce net charge artifacts under periodic boundary conditions with PME electrostatics—a well-documented source of systematic error in alchemical free energy calculations. I note that the manuscript briefly mentions the use of a "co-alchemical water" approach in Schrödinger to mitigate this issue but provides no details on its implementation or validation. More critically, there is no discussion of how GROMACS handles net charge corrections for the same mutations. Could the authors clarify: (1) the exact protocol for the co-alchemical water method and whether it fully compensates for finite-size artifacts, (2) whether GROMACS employs any net charge correction schemes (e.g., the analytical correction by Rocklin et al., J. Chem. Theory Comput. 2013, 9, 3072-3083) or simply relies on a neutralizing background plasma, and (3) how these potentially different treatments between platforms might systematically bias the comparative results? This concern is particularly important given the different box sizes noted in point (4), as finite-size effects scale with system size. I would recommend either: (a) a sensitivity analysis quantifying the magnitude of net charge artifacts for a subset of charge-changing mutations, or (b) a thorough literature-based justification demonstrating that the chosen approaches yield negligible systematic errors under the simulation conditions employed. Authors: We clarify the treatment of charge-changing mutations in both platforms as follows: Schrödinger FEP+ (co-alchemical water): For charge-changing mutations, co-alchemical ions (Specifically, the co-alchemical ion is a Na+ if the charged residue is negative, or a Cl− if the residue is positively charged.) were mutated to water using the co-alchemical water approach, which is designed to maintain overall charge neutrality during the alchemical transformation and mitigate finite-size artifacts. This method has been previously validated in the literature (https://pmc.ncbi.nlm.nih.gov/articles/PMC6453258/) GROMACS: We employed a similar strategy, introducing counter-ions in the alchemical transformation to neutralize charge changes, effectively implementing a “co-alchemical ion” scheme analogous to that in FEP+. Comparative impact: Our benchmark results for charge-changing mutations indicate that the ΔΔG predictions remain consistent between platforms, suggesting that the chosen neutralization schemes do not introduce significant systematic bias. Given our observed agreement between platforms and the small magnitude of potential artifacts, we consider the applied neutralization schemes adequate for this benchmark. (7) A fundamental methodological concern is that this study simultaneously compares two different software platforms, two different force fields (OPLS4 in Schrödinger vs. AMBER99SBILDN in GROMACS), two different water models (SPC vs. TIP3P), and different enhanced sampling strategies (REST2 in Schrödinger vs. standard MD in GROMACS). This multifactorial design makes it challenging to disentangle whether the observed differences in performance arise from: (1) software implementation details, (2) force field parameterization, (3) water model effects, or (4) sampling methodology. While I recognize that systematically testing all possible combinations may be beyond the scope of this benchmarking study, this confounding factor represents a significant limitation that should be explicitly acknowledged in both the Discussion and Limitations sections. Authors: We have updated the Discussion and Limitations sections to explicitly acknowledge that the multifactorial differences between platforms - including software implementations, force fields, water models, and sampling protocols - Attachments Attachment Submitted filename: Response-to-Reviews_020926.docx* https://doi.org/10.1371/journal.pone.0335829.r002
9 Mar 2026 Decision Letter - Yong Wang, Editor Benchmarking Free Energy Calculations: Analysis of Single and Double Mutations Across Two Simulation Software Platforms for Two Protein Systems PONE-D-25-56221R1 Dear Dr. Levy, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support.... If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Yong Wang Academic Editor PLOS One Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions??> Reviewer #1: Yes Reviewer #2: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: Yes Reviewer #2: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.--> Reviewer #1: Yes Reviewer #2: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: Yes ****** Reviewer #1: All the questions have been addressed. Reviewer #2: (No Response) ****** what does this mean?). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files.). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our For information about this choice, including consent withdrawal, please see our Privacy Policy..--> Reviewer #1: Yes:Chenggong HuiChenggong HuiChenggong HuiChenggong Hui Reviewer #2: No ******** https://doi.org/10.1371/journal.pone.0335829.r003
Formally Accepted
Acceptance Letter - Yong Wang, Editor PONE-D-25-56221R1 PLOS One Dear Dr. Levy, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Yong Wang Academic Editor PLOS One https://doi.org/10.1371/journal.pone.0335829.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .