Introducing gold-standard essential gene datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses

Cléophée Van Maele; Ségolène Caboche; Nathan Nicolau-Guillaumet; Anaëlle Muggeo; Thomas Guillard

doi:10.1371/journal.pcbi.1013945

Peer Review History

Original SubmissionJune 23, 2025
28 Aug 2025 Decision Letter - Eric Dykeman, Editor Introducing gold-standard essential gene datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses PLOS Computational Biology Dear Dr. Guillard, Thank you for submitting your manuscript to PLOS Computational Biology. After careful consideration, we feel that it has merit but does not fully meet PLOS Computational Biology's publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript within 60 days Oct 28 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at ploscompbiol@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pcompbiol/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: * A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. This file does not need to include responses to formatting updates and technical items listed in the 'Journal Requirements' section below. * A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. * An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, competing interests statement, or data availability statement, please make these updates within the submission form at the time of resubmission. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter We look forward to receiving your revised manuscript. Kind regards, Eric C. Dykeman, Ph.D. Academic Editor PLOS Computational Biology Dominik Wodarz Section Editor PLOS Computational Biology Additional Editor Comments (if provided): Dear Authors, Please pay close attention to the comments of reviewers 2 and 3 when drafting your revision and response. Kind regards. Journal Requirements: If the reviewer comments include a recommendation to cite specific previously published works, please review and evaluate these publications to determine whether they are relevant and should be cited. There is no requirement to cite these works unless the editor has indicated otherwise. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: Defining the essential genes during TnSeq experiments is a mandatory step. Once these genes defined, the selective pressure chosen by the researchers to be applied on the saturated bank of mutants will define the genes important or essential for survival in the different conditions chosen by the scientists. As of today, no state of the art methodology has been published for finding and defining these essential genes. Major Comments: Can groups mapping and analyzing their TnSeq data with CLC genomics use the authors approach and if yes, how? A final schematic to summarize their approach In the discussion, an additional emphasize on how this extensive work will be used by other researchers could be useful Minor comments: 1) More explanation for the use of the Delta-oprD strain could be helpful for the reader 2) Line 169: remove the (…), be more specific or add a reference Reviewer #2: The authors performed a systematic study of commonly used analysis tools for Tn-Seq, TRANSIT2 and FiTnEss, from which they offer suggestions for parameter settings. They also provide two lists of essential genes in P. aeruginosa that they termed to be “gold-standard” sets – one specific for the strain PA14 for grown in LB and the other applicable to any Tn-seq experiment in P. aeruginosa. The authors further generated their own Tn-Seq datasets from wild-type PA14 and an OprD-deficient strain, which they used to test the parameter settings of the various tools. While the suggestions for parameters to use for analysis would be helpful to other researchers, there are crucial issues that need to be addressed. First, the gold-standard set that they delineate is not much different than what has been previously published in the literature (Poulsen et al.) and the choices made in creating those gene sets were not well-defined. It is not clear that they are appropriate gene sets for comparing analytical tools. Moreover, the authors do not provide a comprehensive metric with which to evaluate essential gene lists, as they only consider how many of the gold-standard genes are represented, but do not have any assessment of potential false positives or false negatives. Last, there are other potential biases that need to be addressed more thoroughly and are detailed in the comments below. Major comments 1.The “gold-standard” set of essential genes is very similar to what was already published in Poulsen et al. (Dataset S6), in intersecting gene lists from the same previous studies as utilized in this manuscript (Table 2). There is not enough technical justification as to why the gold standard sets defined in the manuscript should be termed “gold standard.” The GOLD_84 set particularly seems arbitrary, in that many sets are already intersections of one another, not all possible intersections are considered (e.g. Poulsen CORE could be defined using FDR and/or FWER thresholds), and some sets have more confidence than others (e.g. Poulsen FWER and FDR). Thus, the term “gold standard” is not well-defined and is overselling the novelty of the gene sets. The manuscript needs justification for the suitability of the gene sets for comparing analysis methods by providing explanations for why the authors chose to intersect the gene sets that they used and why they chose to take an intersection rather than a union of gene sets. 2.The only metric the authors provide to evaluate the Tn-Seq tools is the percentage of gold-standard genes that are in the essential gene list. However, that metric does not consider how many genes are present in the essential gene list, which may be false positives. For example, if a tool outputs all the genes in the genome as its list of essential genes, it would capture 100% of the gold-standard genes, and would be “better” than the tools listed in this study. There is no penalty for the number of false positives given by the tool. The authors need to provide a more robust metric (or metrics) to quantify the performance of the tools with the given parameters, accounting for potential false positives, and possibly even false negatives. 3.The authors mention multiple other tools that are used in Tn-Seq analysis like Magenta, TSAS, etc but restrict themselves to only testing TRANSIT2 and FiTnEss. For a more comprehensive comparison of different methods, it would be ideal to test all available methods with different parameter settings. This could also give more insight into the reliability of the ‘gold-standard’ sets as a metric for determining the performance of a tool. 4.The section “Impact of the mapper on identifying essential genes” has potential biases that need to be addressed. Specifically, the mapper used in the previous studies were not stated, and it is possible that bowtie yielded more of the gold standard genes than BWA simply because all the studies from which the gold standard genes were derived also used bowtie rather than BWA. The result may not be due to bias in the data rather than to actual differences in the mappers. Could the authors add which aligners were used in each of the studies in Table 2, and adjust their interpretation accordingly? 5.Also in the section “Impact of the mapper on identifying essential genes,” the authors should more comprehensively assess whether normalization techniques could adjust for differences in performance between BWA and bowtie. In Table 3, BWA seems superior to bowtie in most metrics except for skewness, and yet the authors suggest to use bowtie. Would it be possible that using alternate normalization techniques such as the “betageom” option in TRANSIT2 could account for the skewness and improve the performance using BWA? It seems that the aligner was chosen prior to adjusting any of the other parameters in the TRANSIT2 tool, and that the testing was not comprehensive in this regard. 6.While the authors provide empirical evidence for some parameters yielding better results than others, they do not provide explanations to explain why those results occur. Could authors provide some technical conjectures as to why the parameters they chose yielded the best results? Minor comments 1.In Figure 3b, is the delta-oprD for the 115 gene set missing? 2.In the section comparing bwa and bowtie, the lower bound for the density metric to assess performance of the aligner is given as 35%, while the citation used for the same gives it as 30%. 3.The language in the “Methods to determine conditional gene essentiality” section needs to be clarified. Particularly when intersections are mentioned, it would be helpful to specifically state what the intersection is between. Also, it would help to state explicitly that the genes that were not in the intersection between the essential genes in wild-type versus delta-oprD were considered to be the conditionally essential genes. Currently it seems to read that the genes in the intersection were the conditionally essential genes. 4.At the end of the section “Methods to determine conditional gene essentiality,” it states “we propose FiTnEss_FDR complemented with information retrieved from ZINB, would be an accurate and reliable method…” Could the authors explain how they came to this conclusion and practically how a researcher would “complement the information” with the other method? 5.It would be helpful to have a table describing all the method abbreviations used in the manuscript. It would also be helpful to be consistent in referring to these abbreviations in both the figures and the text. For example, sometimes it seems that HMM is used interchangeably with HMM_GD, which can be confusing to the reader. Reviewer #3: This study by Maele et al. addresses a critical need in the Pseudomonas aeruginosa research community by establishing gold-standard essential gene (EG) datasets to benchmark bioinformatics pipelines for transposon sequencing (Tn-Seq) analyses. Drawing on both literature and newly generated Tn-seq data from PA14 wild-type and ΔoprD strains grown in LB, the authors compared EG lists produced by several statistical methods implemented in TRANSIT2 and FiTnEss. They constructed a reference set of 84 EGs for P. aeruginosa, as well as a PA14-LB-specific list of 115 EGs, and assessed how effectively these genes could be identified using their datasets. Retrieval rates varied across methods, with the Hidden-Markov Model in TRANSIT2 recovering approximately 90% of gold-standard EGs and FiTnEss achieving up to 100%. These curated datasets will prove valuable for the Tn-seq community, providing objective benchmarks for evaluating analysis pipelines, thereby improving reproducibility, standardization, and essential gene identification. The study is well organized, with clear writing and effective figures. I have several suggestions to enhance clarity and expand the scope of the work. Major comments: 1. The motivation for utilizing the ΔoprD mutant in Tn-seq experiments is not clear. Further, the subsequent comparison of statistical tests for identifying conditional essential genes between the wild type and ΔoprD backgrounds is somewhat confusing. Could the authors include a known essential gene that displays conditional essentiality in the ΔoprD background (but not in wild type) as a positive control to validate the statistical tests? Alternatively, have the authors performed experimental validation—for instance, by comparing gene deletions in both backgrounds, such as with shaF? Additionally, what would the authors predict if a similar comparison was conducted in another gene deletion background? 2. The authors refer to two essential gene lists from prior studies: GOLD 84 (general P. aeruginosa essential genes) and GOLD 115 (PA14-specific essential genes). While it is noted that the gene ontology profiles of these lists are similar, it would be helpful if the authors could highlight and discuss the key differences between these two sets to provide readers with more context. 3. For these two essential gene lists, it would be useful to include a summary of the mappers and statistical approaches employed in prior studies in a table. It would be important for the authors to clarify how robust the ‘gold standard’ lists are to different analysis methods. 4. The focus on PA14 is understandable given the availability of high-quality Tn-seq datasets, but this may limit the generalizability of the findings. Could the authors consider extending their analysis to PAO1 using publicly available Tn-seq datasets or discuss the potential for broader application of their evaluation framework to non-Pseudomonas Tn-seq studies? Minor comments: 1. The resolution of Figure 1A and 1B is low and should be improved for clarity. ******** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #2: No Reviewer #3: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] Figure resubmission: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. If there are other versions of figure files still present in your submission file inventory at resubmission, please replace them with the PACE-processed versions. Reproducibility:** ?> https://doi.org/10.1371/journal.pcbi.1013945.r001
Revision 1
16 Dec 2025 Author Response Attachments Attachment Submitted filename: PCOMPBIOL-D-25-01258_Reply to reviewers.docx https://doi.org/10.1371/journal.pcbi.1013945.r002
25 Jan 2026 Decision Letter - Eric Dykeman, Editor Dear Prof Guillard, We are pleased to inform you that your manuscript 'Introducing gold-standard essential gene datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Eric C. Dykeman, Ph.D. Academic Editor PLOS Computational Biology Dominik Wodarz Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: No more comments Reviewer #3: The authors have answered most of my comments. This study will serve as a useful benchmark for the P. aeruginosa community. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review?** For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #3: No https://doi.org/10.1371/journal.pcbi.1013945.r003
Formally Accepted
Acceptance Letter - Eric Dykeman, Editor PCOMPBIOL-D-25-01258R1 Introducing gold-standard essential gene datasets for Pseudomonas aeruginosa to enhance Tn-Seq analyses Dear Dr Guillard, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. For Research, Software, and Methods articles, you will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Aiswarya Satheesan PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1013945.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .