Network analysis of toxin production in Clostridioides difficile identifies key metabolic dependencies

Deborah A. Powers; Matthew L. Jenior; Glynis L. Kolling; Jason A. Papin

doi:10.1371/journal.pcbi.1011076

Peer Review History

Original SubmissionJanuary 3, 2023
30 Jan 2023 Decision Letter - Kiran Raosaheb Patil, Editor Dear Professor Papin, Thank you very much for submitting your manuscript "Network analysis of toxin production in Clostridioides difficile identifies key metabolic dependencies" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We especially recommend you to comprehensively address the following issues: a) ML overfitting, b) empty github repository, and c) integration of phenotypic data. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Kiran Patil Section Editor PLOS Computational Biology ********************* We especially recommend you to comprehensively address the following issues: a) ML overfitting, b) empty github repository, and c) integration of phenotypic data. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Summary of Research: The authors present a computational analysis of metabolic regulation of toxin (TcdA and TcdB) production in C. difficile. They integrate public transcriptomics data to develop 16 context specific metabolic models, using the RIPTiDe algorithm and C. difficile metabolic models. These base models and algorithm were previously developed by several of the same authors. They analyze the models using a combination of flux sampling, machine learning (random forest), and shadow prices to identify patterns associated with toxin production. They also implement the metabolic transformation algorithm to identify reaction knockouts that drive metabolic flux towards a low-toxin state. The discussion of the results focuses on arginine and ornithine transporters and isoleucine. Overall, the scope and motivation of the work is interesting. The conceptual model is set up well. The analysis and interpretation of the results should be explained more clearly. Specific major and minor comments are provided below. Major Comments: 1. I am concerned with overfitting in the machine learning results. The total number of samples is quite small (16). I understand that there are additional samples created through the flux sampling process, but these will likely be highly correlated. The model can likely learn, from these correlated samples, which condition a flux sample comes from and use that information to infer the toxin level. Therefore, the degree to which the results correspond to microbial physiology in toxin production vs condition specific physiology is not clear. This should be addressed in the discussion. 2. To further address overfitting to condition in the results, I would also suggest that the authors implement a cross-validation scheme where flux samples from the same condition are not mixed across the train and validate sets. To do this you can randomly select a set of conditions (of the 16) to assign to the training set (repeating random selections for statistics). The random selection could be stratified to ensure there are always some high and some low toxin samples. Then use all the flux samples from the randomly selected conditions in the training set and the remaining samples in the validate sets. Cross-validating in this way will give insight into whether the ML model can generalize across conditions and should be presented alongside the current results. 3. Another suggestion for the machine learning analysis is to reduce feature correlations. ML algorithms, and in particular feature importance calculations, are typically more stable when the input features are not correlated. Before implementing the ML the authors can clean up the input flux features in two ways. First, reduce the number of features by getting rid of any flux that has variability across all samples below some specified threshold (these features contain little to no information). Second, cluster features with covariance across all samples above some threshold into one feature (one of these fluxes can be used as a representative feature for the machine learning). There are often linear pathways of reactions with highly correlated flux across samples that can be reduced in this way. Reducing the feature space could make the results more stable and easier to interpret. If any feature engineering of this sort has already been implemented, the authors should include the details in the methods section. 4. The authors could further discuss the relationship between the ML analysis and the MTA analysis. What are the differences in the types of results you expect to find with these two approaches? How do the results from the two approaches complement or support each other? Minor Comments: 1. Line 11. Consider changing “its toxins” to “two toxic proteins”. Defining TcdA,B more specifically as proteins (as opposed to metabolites) will help non-expert readers quickly understand the context of the modeling work. 2. Line 51-54. This is not a comment for this paper, but I was curious if the glucosylating activity of TcdA,B is a metabolic process that could be modeled. Is that included in the genome-scale metabolic models? Just a thought on another possibly interesting line of inquiry. 3. Line 81-83. It would be nice to include in the introduction a bit more description of the transcriptomic data that was used. Is this all from one study, multiple studies? Is there any motivation behind the different conditions that were included? Maybe just a reference to Supplemental Table 1 would suffice. 4. Line 98-99. Is there any reason for choosing tcdA over tcdB? Are tcdA and tcdB correlated across conditions? 5. Line 107-108. Please provide additional description of the flux sampling here or point to this description in the methods. Was this sampling of the entire feasible steady-state space, was biomass production optimized and used as a constraint? 6. Paragraph 111-121. The machine learning approach should be described in more detail. How many sampled flux distributions were used for each condition? What was the train/validate/test split procedure? What was the accuracy of the ML model on the test (or validation) set relative to a null distribution? How was feature importance extracted from the random forest model? This information is important for interpreting the results so it should be presented to some degree in the results section. 7. Line 122. Figure 2. It may be nice to highlight the arginine and ornithine transport related reactions. The figure in part B does a nice job of explaining the context of the reactions from the ML importance results in part A. Maybe highlight the relevant reactions in A with bold or a different colored font and link them to the matching reaction in part B with a superscript. 8. Paragraph 130-141. It would be nice to have more discussion of why certain reactions have many sensitive metabolites while others do not. How could this arise and what are the implications? In general, what are the implications of a metabolite having a strong shadow price for an important reaction? I was surprised to see that arginine and ornithine do not come up in the shadow price analysis, as I would naively think that they would limit the transport reactions. The authors could expand on this in the results section here or in the discussion. 9. Line 145. The blue column seems to be a fraction of models not the number of models. 10. Line 143. Figure 3b. It would be good to include the ID of the metabolites or some other more specific name. 11. Paragraph 155-177. The MTA sections seems to be only weakly connected to the previous results from the ML section. Any efforts to link the two results in the discussion would be appreciated. (See major comment 4) 12. Line 183. Include the value of epsilon in the caption. 13. Paragraph 240-250. Good discussion of implications of the RIPTiDe algorithm. Additional discussion should be added regarding the limitations of GENRES and the other analyses utilized here. 14. Line 455. Sup Fig S2. An additional PCA plot with high and low toxin as the colors would be good to include here. 15. Lines 488-489. A numbering system is mentioned in the figure caption, but I do not see any use of that numbering system in the table. Maybe it would be clearer to include the RIPTiDe model reference in the table. 16. Line 348. The github repository that is linked seems to be empty at this time. Reviewer #2: This is a review of “Network analysis of toxin production in Clostridioides difficile identifies key metabolic dependencies” by Powers and colleagues. This paper focuses on Clostridium difficile, a notorious opportunistic pathogen which has a diverse metabolism that enables it to establish a niche within the complex gut environment. Its pathogenesis is primarily mediated by toxins TcdA and TcdB. This paper successfully demonstrates how toxic production is regulated by the organism's metabolism, a long-standing question in the field. The study employs a system biology-based workflow, utilizing genome-scale metabolic models, to reveal how different extracellular environments can affect the regulation of toxin production and how this relates to changes in intracellular metabolism. This is a strong paper especially because of its innovative use of publicly available transcriptomic data from various studies to provide an extracellular context for the genome-scale metabolic models. The choice of low and high toxin states through transcriptomic data gives these states the context while performing metabolic modelling using genome scale metabolic models. The contextualization of these models has shown that how these states are influenced by both extracellular and intracellular environments. The use of the RIPTiDe algorithm and machine learning methods highlights the key role that arginine and ornithine, which are available from the environment, play in regulating toxin production. The paper goes on to show which is further regulated by intracellular pools of fatty acids and large polymer metabolite pools, as shown by the flux balance analysis and shadow pricing analysis. Additionally, the application of the mMTA algorithm identifies important reactions involved in transitioning from high to low toxin production states, providing ideas for potential therapeutic targets. I am convinced that this paper will be a valuable contribution to the field. However, there are important issues that should be addressed before publication: Major issues: 1. The paper states that the analysis code is shared but the associated link is empty. This needs to be fixed before publication. 2. The methods section could be more detailed in describing the usage and caveats of the RIPTiDE algorithm and shadow pricing analysis with respect to flux sampling to improve the reproducibility of the work. Minor issues: 3. The paper is a valuable contribution to the field's understanding of toxin regulation by metabolism in C. difficile. However, it would have been beneficial to include a discussion of isoleucine fermentation as a source of energy metabolism in the results section. 4. The abstract and introduction effectively summarize the work and highlight its potential therapeutic applications, but the results and discussion sections could benefit from more specific recommendations on this subject. 5. Additionally, the figure legends in Figure 3A could be clearer, and the use of the term "context dependent" for metabolites with shadow price>2 may be confusing. 6. The legend in Figure 4 should also be more descriptive to make it accessible to non-experts in computational methods, and the use of the term epsilon without further explanation may be confusing to readers. Reviewer #3: This paper report integration of publicly available transcriptomic data using RIPTiDe algorithm to create contextualize how nutritional changes regulate toxin levels in C. difficile. This work is built on previously published genome scale models of strains 630 and and R20291. While it is interesting to see transcriptomic data integration into the previously published model, it is incremental work. The following aspects could make this paper much stronger. 1. Integrate actual toxin level data following growth in Biolog Phenotype array previously reported by Lei and Bochner 2013. That paper is cited as ref#26 in the discussion. However, that data is not integrated with the model. There is some agreement between the results in the current work and phenotypic toxin data published in ref#26. For example, Arg-dipeptides showed higher toxin production in Biolog plates. Calibrating the FBA model with validated phenotypic toxin data will make this work much better. Since authors have done that previously in E. coli (data guided FBA model), doing that here should not be that difficult. 2. The other question is how genome variation in C. difficile affects toxin production. CD 630 and CD R20291 belong to different toxinotypes. There have been conflicting reports including and excluding the importanc of toxinotypes in C. difficle virulence. Although there is high genome variation in C. difficile, genes coding for the central metabolic pathways are somewhat conserved, albeit with some sequence variation. Is data integration with genome scale modeling sensitive enough to understand how the pathogenicity locus (where cluster of toxin genes are located) interacts with master regulators of central metabolic pathways (codY, CcpA, and others). Integrating Biolog data could be interesting because adenine and related compounds were the strongest toxin inducers in the previous study. 3. Public data set is available where both transcriptome and metabolome were analyzed following C. difficile infection in a mouse model (PMID: 29600278). Unfortunately, toxin levels were not reported in that paper. However, toxin gene expression level could be pulled from the transcriptome data, and do the predictions here correlate with what is reported in that work? I agree that dataset may not amenable for modelling, but could be useful in expanding the discussion in the context of toxinotype variation. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No: The github repository is currently empty. The authors will need to upload their models and code here before the paper is published. Reviewer #2: No: The links for the code and the data are made available in the paper but those links are empty and nothing is uploaded on them so far. Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Vishwas Mishra Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1011076.r001
Revision 1
13 Mar 2023 Author Response Attachments Attachment Submitted filename: PLOSCompReviews.docx https://doi.org/10.1371/journal.pcbi.1011076.r002
4 Apr 2023 Decision Letter - Kiran Raosaheb Patil, Editor Dear Professor Papin, We are pleased to inform you that your manuscript 'Network analysis of toxin production in Clostridioides difficile identifies key metabolic dependencies' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Kiran Patil Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have done a good job of updating their manuscript following the first round of reviews. In particular, I am happy with the updates to the machine learning methods and discussion. The github is also now available with the code. I encourage the authors to include some additional documentation in a github readme file, but that should not hold up publication. Reviewer #2: The authors have very well taken into account the reviewer's comments and have made necessary changes to the previous manuscript submission. This paper successfully shows how toxic production is regulated by the organism's metabolism. This is a great paper to be added to the field because of its novel way to make use of publicly available transcriptomic data to provide an extracellular context for the genome-scale metabolic models. I am convinced that this paper will be a valuable contribution to the field. Reviewer #3: Authors have incorporated most of the suggestions made earlier. I don't have any further comments ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1011076.r003
Formally Accepted
20 Apr 2023 Acceptance Letter - Kiran Raosaheb Patil, Editor PCOMPBIOL-D-22-01878R1 Network analysis of toxin production in Clostridioides difficile identifies key metabolic dependencies Dear Dr Papin, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofi Zombor PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1011076.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .