Mechanistically informed machine learning links non-canonical TCA cycle activity to Warburg metabolism and hallmarks of malignancy

Lihao Lin; Francesco Lapi; Bruno Giovanni Galuzzi; Marco Vanoni; Lilia Alberghina; Chiara Damiani

doi:10.1371/journal.pcbi.1013384

Peer Review History

Original SubmissionJuly 31, 2025
23 Sep 2025 Decision Letter - Julio Banga, Editor PCOMPBIOL-D-25-01534 Mechanistically Informed Machine Learning Links Non-Canonical TCA Cycle Activity to Warburg Metabolism and Hallmarks of Malignancy PLOS Computational Biology Dear Dr. Damiani, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Nov 23 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Julio R. Banga, Ph.D. Academic Editor PLOS Computational Biology Pedro Mendes Section Editor PLOS Computational Biology Journal Requirements: 1) We ask that a manuscript source file is provided at Revision. Please upload your manuscript file as a .doc, .docx, .rtf or .tex. If you are providing a .tex file, please upload it under the item type u2018LaTeX Source Fileu2019 and leave your .pdf version as the item type u2018Manuscriptu2019. 2) Please provide an Author Summary. This should appear in your manuscript between the Abstract (if applicable) and the Introduction, and should be 150-200 words long. The aim should be to make your findings accessible to a wide audience that includes both scientists and non-scientists. Sample summaries can be found on our website under Submission Guidelines: https://journals.plos.org/ploscompbiol/s/submission-guidelines#loc-parts-of-a-submission 3) Please upload all main figures as separate Figure files in .tif or .eps format. For more information about how to convert and format your figure files please see our guidelines: https://journals.plos.org/ploscompbiol/s/figures 4) We have noticed that you have uploaded Supporting Information files, but you have not included a list of legends. Please add a full list of legends for your Supporting Information files after the references list. Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Summary The submitted manuscript describes a study of the newly discovered Non-canonical TCA Cycle (Arnold Cycle) by a combination of constrained-based metabolic modeling and machine learning to identify transcriptomic signatures and marker genes of activity. Since the Arnold Cycle was described only very recently, the authors contribution is novel and interesting for a broader interdisciplinary audience and aims a generating testable hypotheses via in silico approaches in a classical systems biology cycle. The manuscript is well written, results are well depicted with illustrations and reproducibility is ensured via a open-source repository including code for in silico approaches. Beside these strong points, I see one critical point and additional major as well as minor points (outlined below). The critical point of the study, in my view, is the fragility of the two-step approach without external or experimental validation of the first step where a newly developed metabolic is leveraged to predict pathway activity. As the authors themselves, this is crucial for the second step to predict other marker genes and overall results look fragile towards modelling choices and decisions in the first step. Hence, I suggest that authors outline or even apply a strategy to externally validate the first step gaining confidence and reliability of overall findings. Critical Study builds on two steps with predictions without external/experimental validation of the modeling and method. The Authors in the last paragraph acknowledge that metabolic modeling in the first step is crucial for analysis but do no outline or apply a strategy to validate their approach/results before applying ML in the second step. Further, they state that denoising of the transcriptome was necessary which additionally shows that there is a degree of fragility in the two-step approach. Main - The benefit of Arnold Cycle unclear: Is it Speed? Cost-Efficiency? Oxygen limit/Warburg … -> Please, address this more in Introduction and Discussion in the sense of “nothing makes sense except in the light of evolution” there must be hypotheses about the necessity of this metabolic route. - “we trained regression models to predict both Cycle Propensity and Cycle Flux Intensity” -> In parts of the manuscript it is ambiguous that in fact fully independent ML-models were trained for each target variable. Please, make sure to clearly distinguish from Multi-Task Learning/Transfer-Learning approaches where a simultaneous optimization is pursuit. Perhaps, discuss why such approaches have not been used. - Highly correlated features (genes) can severely hamper feature importance analysis with different sensitivity towards this issue across the used methods. Hence, I would suggest aggregating gene clusters of very high correlated genes before training models and performing importance analysis to increase robustness. - I do not understand the choice of XGBoost in addition to the prior used RandomForests. Please, motivate more why XGBoost + SHAP are used for the final models and which benefit they have over RandomForest + feature importance. - Details on train-test-valid splits. The authors perform a cross-validation strategy and I appreciate that robustness is checked via inner and outer-folds. However, information on how the data is split into training, valid and a final test set is scarce. Further, performance metrics should be reported also on the training set to see the degree of over/underfitting. Lastly, I suggest considering a splitting strategy on the cell line level by reserving e.g. 10% of the >500 cell lines as test-set to check the generalizability of the model. Minor - Please, provide line-numbers in the manuscript PDF. This makes giving feedback easier. - Give some crucial method details in result part to understand approach without going to Method section or change overall order (Method section before Result) -> Examples: (1) what are main and necessity updates of ENGRO2 -> 2.2? Clearly state that this model is new and not tested in other studies. (2) Constrained-based modeling approach: sampling approach, (non)-objective function. The metabolic modeling approach and objective remains unclear without fully reading the method section. - “This inverse relationship suggests that a higher likelihood of Arnold Cycle engagement does not necessarily correspond to increased metabolic throughput.” -> I suggest rephrasing. “Not necessarily” implies more a neutral and not a strong inverse correlation. - You may discuss more the interrelation of the Arnold Cycle, Angiogenesis and Warburg. I guess they are connected via low oxygen requirements. - I understand the priority to relate DepMap with Cycle Propensity. But relating DepMap with Flux Intensity may also provide some insights and could be included in a supplement. - Authors make the relation of citrate-to-alpha-KG with epigenetic regulation and cell fate decision. To which degree is this feasible based on the used metabolic model? Is this aspect even captured by the constrained-based modeling approach? - “In particular, it yielded higher silhouette scores when clustering cell lines based on simulated flux distributions, and also led to significantly better alignment with transcriptome-based clusters defined by metabolic gene expression.” -> Do cell lines really need to form good clusters in the flux distribution space similarly to the transcriptome space? I suggest justifying more this assumption and to include the results of this prior analysis in the supplemental material in order to understand this. - “In line with this, preliminary flux sampling experiments using the Recon3D [46] model revealed that both canonical and non-canonical TCA cycles were poorly represented. These results support the use of a core model for our analysis.” -> Similar as above, Authors refer to a prior analysis which is not included in the manuscript or a electronic supplement. I suggest including all of those to foster transparency. - Why were weights for objective function sampled with negative and positive values with switching between maximization and minimization? Why not only maximization? - The overlap between ElasticNet and RandomForest discovered genes (6) is surprisingly little. Filtering and aggregating highly correlated genes should improve this. I suggest having a closer look at those 6 genes and their role. - When applying SHAP for estimating feature importance, the definition of a reference/background data set can be crucial. Please, give information and justification on what is used here. A self-defined set or a randomized set? Reviewer #2: This is an interesting paper in which the authors use a combination of mechanistic modelling and machine learning to investigate the extent to which the non-canonical TCA cycle is active in approximately 500 cancer cell lines and show how its activity correlates with enrichment in genes associated with the hallmarks of cancer (eg angiogenesis, invasion, stemness). Excitingly, these findings provide new insights into how cancer cells can adapt their metabolism as they become more pathological and could serve as a basis for new targets for treating cancer. The paper is well-written and logically structured, with the results clearly stated and developed in a logical manner. The methods used are outside my expertise so I am unable to comment on them (and hope that the other reviewers are able to comment on the methodology). From what I understand, the methods used seem to be fairly straight forward, but they are appropriate for the questions of interest and build on the authors’ earlier work. Reviewer #3: Review: Lihao et al., investigated how cancer cells leverage metabolic circuits to support phenotypic plasticity and progression, with a particular focus on the non-canonical citrate metabolism. Recognizing the difficulty of inferring compartment-spanning fluxes from existing datasets, they develop a mechanistically informed machine learning framework that integrates constraint-based metabolic modeling with supervised learning. Using non-canonical citrate metabolic activity metrics derived from over 500 cancer cell lines, they identify transcriptional programs predictive of cell cycle activity. Their approach not only establishes robust transcriptional correlates of citrate metabolic flux but also uncovers broader links between this cycle and metabolic reprogramming, providing new insights into how metabolic flexibility underpins aggressive cancer phenotypes. 1- The manuscript designates the metabolic circuit as the Arnold Cycle. However, I note that the original foundational study on this pathway listed two co–first authors, both of whom should receive equivalent recognition. As such, if the authors wish to propose a new phenotype or formalize a naming convention, the terminology should reflect joint credit (e.g., Arnold–Jackson or Arnold–Jackson–Finely). Alternatively, a more neutral and descriptive designation, such as non-canonical citric metabolism, may be preferable to avoid ambiguity or inequity in attribution. 2- Figure 1b comes before Figure 1c in the text. The order should be fixed either in the figure or in the text. 3- is claimed that:” From these samples, we derived two complementary metrics: Cycle Propensity, defined as the fraction of sampled states in which the three core reactions (SLC25A1, ACLY, MDH1) of the Arnold Cycle are simultaneously active; and Cycle Flux Intensity, measuring the average flux through the cycle’s bottleneck reaction in active configurations (Fig. 1b).” However, Figure 2 doesn’t show this; and the two concepts needs much more clarifications. 4- Cancer is separated to stages and subtypes and any of the cell lines are assigned to types or subtypes of cancer such as MDA-mb-231 as triple negative metastatic breast cancer with high Warburg phenotype features or MCF7 as non-metastatic hormone positive cell types. These should be tested in the model comparing for example 231 cells as high Warburg phenotype compared to MCF7 as low Warburg phenotype and normal such as MCF10A. These can be also tested in section 2.2 for Warburg phenotype and oxygen consumption. 5- DepMap dependency correlations 6- For DepMap dependency correlation analysis, they correlate Propensity with CERES scores genome-wide and highlight WNT/KRAS/PI3K/MAPK patterns. It is critical to multiple-testing control and confounders (lineage, proliferation rate, TP53 status). 7- The manuscript remains purely in silico. Even a lightweight orthogonal validation would greatly strengthen the claims: a) Select 2–3 “high-propensity/low-intensity” lines and 2–3 “low-propensity/high-intensity” lines; measure OCR/ECAR and lactate secretion (Seahorse/biochemical assays) to verify predicted trends; b) Test sensitivity to ACLY (SB-204990, Bempedoic-acid CoA analog) and SLC25A1 (CTPi), plus IDH1 inhibition (AG-120 in IDH1-WT as control). Show differential viability/flux adaptation consistent with your dependency analyses; c) Perform limited ¹³C-glucose/¹³C-citrate tracing to confirm cytosolic citrate routing via ACO1→IDH1 vs ACLY. Minor Comments Throughout: fix small typos (“bas been”, “characterizate”, “cine–specific”, “theWarburg”, “ACLY, promotes”), and a repeated “p = 5.50 < 10−91 / 6.24 < 10−110” formatting, should be “p < 5.5×10⁻⁹¹”, etc. Figure 1A/B: add units for fluxes (arbitrary units are fine but state “normalized steady-state flux”). Figure 2A: add sample sizes per lineage; consider rug plots/marginal densities for each axis. Figure 3B: report odds ratios and top term counts; consider reducing redundancy with parent-child trimming. SHAP plots: add fold-wise variation bars or stability heatmaps; ensure no metabolic genes slipped into the non-metabolic feature set (list in Supp. Table S1). ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No: ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jan Ewald Reviewer #2: No Reviewer #3: Yes: mehdi damaghi [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, we strongly recommend that you use PLOS’s NAAS tool (https://ngplosjournals.pagemajik.ai/artanalysis) to test your figure files. NAAS can convert your figure files to the TIFF file type and meet basic requirements (such as print size, resolution), or provide you with a report on issues that do not meet our requirements and that NAAS cannot fix. After uploading your figures to PLOS’s NAAS tool - https://ngplosjournals.pagemajik.ai/artanalysis, NAAS will process the files provided and display the results in the "Uploaded Files" section of the page as the processing is complete. If the uploaded figures meet our requirements (or NAAS is able to fix the files to meet our requirements), the figure will be marked as "fixed" above. If NAAS is unable to fix the files, a red "failed" label will appear above. When NAAS has confirmed that the figure files meet our requirements, please download the file via the download option, and include these NAAS processed figure files when submitting your revised manuscript. Reproducibility:** To enhance the reproducibility of your results, we recommend that authors of applicable studies deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols https://doi.org/10.1371/journal.pcbi.1013384.r001
Revision 1
21 Nov 2025 Author Response Attachments Attachment Submitted filename: Review letter.pdf https://doi.org/10.1371/journal.pcbi.1013384.r002
1 Dec 2025 Decision Letter - Julio Banga, Editor Dear Prof. Damiani, We are pleased to inform you that your manuscript 'Mechanistically Informed Machine Learning Links Non-Canonical TCA Cycle Activity to Warburg Metabolism and Hallmarks of Malignancy' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Julio R. Banga, Ph.D. Academic Editor PLOS Computational Biology Pedro Mendes Section Editor PLOS Computational Biology ********************************************************* Please check the suggestions of reviewer #1. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors of the manuscript have done a tremendous job in improving their study and precisely addressed the points raised by all reviewers. I really appreciate their efforts which led to a vastly improved manuscript and robust findings. Hence, I do not have any new major points and overall suggest an acceptance of this valuable research. I have only two minor things after going through the revised manuscript: - Section 2.5 and partly others look and read difficult with all the information in brackets about FDR/FE values. In my opinion it is enough that you state that they are all significant at FDR <0.05 and not provide all values which can be part of an electronic supplement. - Check any length restrictions by the journal. Manuscript is now pretty long after the revision (naturally). You may consider shortening if deemed necessary by the editorial team. Reviewer #3: The authors addressed all of my comments and questions. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jan Ewald Reviewer #3: Yes:** Mehdi Damaghi https://doi.org/10.1371/journal.pcbi.1013384.r003
Formally Accepted
Acceptance Letter - Julio Banga, Editor PCOMPBIOL-D-25-01534R1 Mechanistically Informed Machine Learning Links Non-Canonical TCA Cycle Activity to Warburg Metabolism and Hallmarks of Malignancy Dear Dr Damiani, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Anita Estes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013384.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .