Peer Review History

Original SubmissionNovember 30, 2022
Decision Letter - Diarmaid Hughes, Editor, Lotte Søgaard-Andersen, Editor

Dear Dr Goodall,

Thank you very much for submitting your Research Article entitled 'A multiomic approach to defining the essential genome of the globally important pathogen Corynebacterium diphtheriae' to PLOS Genetics.

The manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review a much-revised version. We cannot, of course, promise publication at that time.

Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. Among the many good suggestions from the reviewers to improve your paper, reviewer #3 requested genetic validation of at least some of the inferences drawn from the TnSeq data - if you can do this it would greatly strengthen the significance of your findings. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript.

If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org.

If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist.

To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols

Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission.

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org.

PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process.

To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder.

We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions.

Yours sincerely,

Diarmaid Hughes

Academic Editor

PLOS Genetics

Lotte Søgaard-Andersen

Section Editor

PLOS Genetics

Reviewer #1: In this article the authors use transposon insertion sequencing, proteomics and data analysis to determine which are the essential genes in Corynebacterium diphtheriae. They also explore the distribution of these genes within the Corynebacterium genus, the presence of the essential proteins in the Diphtheriae vaccine, and they compare the essential genome of C. diphtheriae with those of Mycobacterium tuberculosis and Corynebacterium glutamicum.

The approaches used in the article are appropriate, the results presented are novel, the article is well written, and it is of great interest not only to the community that studies diphtheria, but also to the community that studies different members of Corynebacteriales.

However, I believe that some points should be addressed before publication:

Line 163. How did you choose the threshold of 12? Is it arbitrary? It should be stated.

Line 169. What does it mean to fall in the category “unclear” from a biological point of view? Is it really “unclear” the word that describes this group? I guess that, for example, some of these proteins are not essential but the mutants are more sensitive to the environment. I think this should be discussed.

Line 188. I think that the approach you used to identify essential “regions” is correct. But why not also identify essential “domains”? You could have used the NCBI conserved domains tool on the total proteins to delimit the domains, and then the same approach you used for the identification of the essential proteins, in this case on each domain. This would be useful especially in proteins with more than 2 domains, or when the domains have very different lengths. This is just a suggestion, but I really think it would enrich the article. I believe that this would be an incredible contribution to the Corynebacteria community, in particular, for determining the importance of domains of unknown function.

Line 197. I’m not sure if I understood the approach. Should it say “calculating the proportion of insertion-free regions” instead of “calculating the proportion of insertion-free CDS”?

Line 261. What are the annotation artifacts that you mention?

Line 290. It would be useful to indicate which are the groups of genes with differential conservation on Supplementary Figure 7.

Line 291. What do you think it means that some genes of the isoprenoid biosynthetic pathway are present but others are absent in some species? Are these genes shared with other pathways? Is it possible that you did not identify the genes because of the approach you used or because the genomes are not complete?

Line 338. In order to check the absences of the essential genes in Mtb you should try more sophisticated approaches than blastp, like Dali searches against the Alphafold database (http://ekhidna2.biocenter.helsinki.fi/dali/). I tried it for a couple of the atp proteins you did not identify using blastp and I obtained good results.

Line 336. I think that you should also review the localization of the total of the proteins and compare these results with those of the essential ones. Membrane proteins in Corynebacteriales are not well characterized, so it would be very interesting to know what proportion of them are essential proteins.

Line 391. Which are the essential proteins not detected by the proteome? You mention that many are associated by cell envelope but I could not find this information. For the proteins that are not associated with the cell envelope, what are the reasons why you may not have detected them in the proteomics analysis? You only mention silencing by DNA-binding proteins, but you should mention the limitations of the approach considering that you did not detect a large fraction of the essential proteins.

Line 407. In Table 3, why did you present the abundance of the essential proteins as a fraction of the total? I am not sure it is the best format. I consider it is very important to show that more than half of the essential secreted proteins are uncharacterized, and this information is diluted when you present it in that format.

Line 447. In order to provide some context, I think you should mention what is the total amount of proteins present in the diphtheria vaccines you are referring to.

Line 537. You mention that the plasmid was an artifact. Can you explain how you determined that?

Line 579. .gbk files are not provided anymore. I guess you meant .gff files.

Line 633. I would appreciate it if you could comment on the differences you found in the comparison between strains ISS3319 and NCTC 13129. I guess most of them are associated with the synthesis of the toxin, but I’m curious about it.

Line 652. Did you check if the genes identified as orthologous can actually be concatenated? Did you do any topology test before concatenation?

Line 658. In the data availability statement you indicate that the annotated genome can be accessed at the ENA site, but I could not find this information. It is extremely important that you provide the annotations of the genome so the community can make use of the data you generated in this article. It is also of huge importance that every time that you mention a gene like “diphtheriae_XXXXX” you also include the corresponding DIP gene tag, otherwise it is very hard to know which is the gene you are referring to.

Supplementary Figures 6 and 12. Which are the genes indicated with numbers? These numbers should be specified in the legend or not provided at all.

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: In this article the authors use transposon insertion sequencing, proteomics and data analysis to determine which are the essential genes in Corynebacterium diphtheriae. They also explore the distribution of these genes within the Corynebacterium genus, the presence of the essential proteins in the Diphtheriae vaccine, and they compare the essential genome of C. diphtheriae with those of Mycobacterium tuberculosis and Corynebacterium glutamicum.

The approaches used in the article are appropriate, the results presented are novel, the article is well written, and it is of great interest not only to the community that studies diphtheria, but also to the community that studies different members of Corynebacteriales.

However, I believe that some points should be addressed before publication:

Line 163. How did you choose the threshold of 12? Is it arbitrary? It should be stated.

Line 169. What does it mean to fall in the category “unclear” from a biological point of view? Is it really “unclear” the word that describes this group? I guess that, for example, some of these proteins are not essential but the mutants are more sensitive to the environment. I think this should be discussed.

Line 188. I think that the approach you used to identify essential “regions” is correct. But why not also identify essential “domains”? You could have used the NCBI conserved domains tool on the total proteins to delimit the domains, and then the same approach you used for the identification of the essential proteins, in this case on each domain. This would be useful especially in proteins with more than 2 domains, or when the domains have very different lengths. This is just a suggestion, but I really think it would enrich the article. I believe that this would be an incredible contribution to the Corynebacteria community, in particular, for determining the importance of domains of unknown function.

Line 197. I’m not sure if I understood the approach. Should it say “calculating the proportion of insertion-free regions” instead of “calculating the proportion of insertion-free CDS”?

Line 261. What are the annotation artifacts that you mention?

Line 290. It would be useful to indicate which are the groups of genes with differential conservation on Supplementary Figure 7.

Line 291. What do you think it means that some genes of the isoprenoid biosynthetic pathway are present but others are absent in some species? Are these genes shared with other pathways? Is it possible that you did not identify the genes because of the approach you used or because the genomes are not complete?

Line 338. In order to check the absences of the essential genes in Mtb you should try more sophisticated approaches than blastp, like Dali searches against the Alphafold database (http://ekhidna2.biocenter.helsinki.fi/dali/). I tried it for a couple of the atp proteins you did not identify using blastp and I obtained good results.

Line 336. I think that you should also review the localization of the total of the proteins and compare these results with those of the essential ones. Membrane proteins in Corynebacteriales are not well characterized, so it would be very interesting to know what proportion of them are essential proteins.

Line 391. Which are the essential proteins not detected by the proteome? You mention that many are associated by cell envelope but I could not find this information. For the proteins that are not associated with the cell envelope, what are the reasons why you may not have detected them in the proteomics analysis? You only mention silencing by DNA-binding proteins, but you should mention the limitations of the approach considering that you did not detect a large fraction of the essential proteins.

Line 407. In Table 3, why did you present the abundance of the essential proteins as a fraction of the total? I am not sure it is the best format. I consider it is very important to show that more than half of the essential secreted proteins are uncharacterized, and this information is diluted when you present it in that format.

Line 447. In order to provide some context, I think you should mention what is the total amount of proteins present in the diphtheria vaccines you are referring to.

Line 537. You mention that the plasmid was an artifact. Can you explain how you determined that?

Line 579. .gbk files are not provided anymore. I guess you meant .gff files.

Line 633. I would appreciate it if you could comment on the differences you found in the comparison between strains ISS3319 and NCTC 13129. I guess most of them are associated with the synthesis of the toxin, but I’m curious about it.

Line 652. Did you check if the genes identified as orthologous can actually be concatenated? Did you do any topology test before concatenation?

Line 658. In the data availability statement you indicate that the annotated genome can be accessed at the ENA site, but I could not find this information. It is extremely important that you provide the annotations of the genome so the community can make use of the data you generated in this article. It is also of huge importance that every time that you mention a gene like “diphtheriae_XXXXX” you also include the corresponding DIP gene tag, otherwise it is very hard to know which is the gene you are referring to.

Supplementary Figures 6 and 12. Which are the genes indicated with numbers? These numbers should be specified in the legend or not provided at all.

Reviewer #2: Goodall et al. constructed and analyzed a high-density transposon directed insertion library in C. diphteria to identify the essential genome. The authors find 341 essential genes that largely correspond to essential genes of other species within the Actinobacteriota.

This work contains an impressively dense library and a very thorough analysis. I only have a few questions that are mainly related to the EZ-Tn5 Kan-2 kit that is used in the study. I would appreciate a couple of sentences and maybe a small supplementary figure what the effect of an insertion within a coding sequence is (i.e., How large is the insertion? Does it contain outward facing promotors?).

1. How does the insertion affect operon structures? Is it possible that a non-essential gene is classified as essential because the insertion disrupts transcription of another gene within the operon?

2. Can you detect essential RNAs?

3. Are there duplicate genes in the genome that are potentially essential but would be missed by this approach since either can the deleted individually?

4. Lines 166-169: Have the 115 genes that are classified as unclear been analyzed further? I agree that the insertion density within the resA_2 gene (Fig 1) seems lower than for the others but based on the insertion pattern resA_2 doesn’t look like an essential gene to me. Also, the gene name indicates that there are more copies of this gene on the genome. Could that affect the results?

5. Is there a sequence bias for insertion? Could some of the ‘unclear’ genes be explained due to this bias?

6. Lines 196-201: Does this analysis for essential domains only work for the N-terminus of a gene or also for the C-terminus?

7. Lines 202-253: Would the insertion of a transposon in the middle of the gene split the protein into two independent proteins? A MurJ-like protein and a Pseudokinase protein.

8. Fig 1C: The figure looks like transposons are inserted right up to the start codon of nadE. Does the Tn5 transposon contain a transcriptional start and ribosome binding site?

9. How are occasional insertions within essential genes explained (i.e., mrpA in Fig S10)?

Reviewer #3: This paper from Goodall and co-workers describes the use of transposon insertion sequencing (TnSeq/TraDIS) to characterize the essential genome of the pathogen Corynebacterium diptheriae (Cdip). This bacterium belongs to the Corynebacteriales order that includes other pathogens like Mycobacterium tuberculosis (Mtb) and model organisms like Mycobacterium smegmatis and Corynebacterium glutamicum (Cglu). These bacteria have a unique cell envelope consisting of multiple layers including the peptidoglycan (PG), arabinogalactan (AG), and mycomembrane (MM). Unlike most model bacteria, they also grow via tip extension. Thus, understanding the biology of these bacteria is important for addressing fundamental questions about cell growth as well as for practical reasons of therapeutic development.

The main advance of this paper is the generation of the first high-density transposon library in a pathogenic Corynebacterium and using it to determine its essential genome and compare it to other results from Mtb and Cglu. The paper is very well written and the bioinformatics is well done. The dataset generated will be highly valuable to the field.

Major critique:

Where the paper fall a little short in my opinion is the lack of genetic validation of any of the inferences drawn from the TnSeq data. The results would be much stronger if some of the observations were followed up with genetic experiments. For example, the authors claim that pknB is non-essential in Cdip unlike Mtb and that the mrp operon is essential and a unique transport pathway distinct from Mtb. My understanding is that tools are available to engineer Cdip, so adding some genetic validation to strengthen the results should be feasible.

Minor points:

1) The tables require more detail in the legends. It is not clear what “inline barcode match”, “transposon check 1 or 2” refer to in Table 1. Insertions density should be defined in Table 2 legend.

2) For the predictions of secreted proteins, it would be nice to add Tat transport and sortase signals.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: No: The genome annotation file is not provided.

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Daniela Megrian

Reviewer #2: No

Reviewer #3: No

Revision 1

Attachments
Attachment
Submitted filename: reviewer_responses.docx
Decision Letter - Diarmaid Hughes, Editor, Lotte Søgaard-Andersen, Editor

Dear Dr Goodall,

We are pleased to inform you that your manuscript entitled "A multiomic approach to defining the essential genome of the globally important pathogen Corynebacterium diphtheriae" has been editorially accepted for publication in PLOS Genetics. Congratulations!

Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made.

Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org.

In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager.

If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date.

Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics!

Yours sincerely,

Diarmaid Hughes

Academic Editor

PLOS Genetics

Lotte Søgaard-Andersen

Section Editor

PLOS Genetics

www.plosgenetics.org

Twitter: @PLOSGenetics

----------------------------------------------------

Comments from the reviewers (if applicable):

Reviewer's Responses to Questions

Comments to the Authors:

Please note here if the review is uploaded as an attachment.

Reviewer #1: I believe that the authors addressed most of my concerns and enriched the article accordingly. I have just a few minor comments (line numbers refer to the numbers in the author's answers):

Line 169. The references are missing in the text that was added.

Line 338. While conserved structural fold is not necessarily evidence for a functional analogue, relying on blastp hits is definitely much worse. There were not many absences of essential genes to verify, I believe you should have used more reliable approaches.

Line 579. You did used Genbank files, but those were not ".gbk" files as stated, because that extension has been deprecated by the NCBI several years ago (https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/).

Line 652. The answer does not reflect what was done in the article. In the materials and methods section "Identification of homologs in Actinobacteria species", you explain thay you obtained 35 orthologous genes that you aligned and concatenated to reconstruct a reference phylogeny of Actinobacteria. You should have verified that those genes were suitable for concatenation (i.e. congruence analysis), or at least justify why you didn't verify. In this case, I would say that the topology you obtained resembles the Actinobacteria phylogenies previously published, and that would be enough considering you are just using it to present a phyletic pattern.

Reviewer #2: The authors have addressed all of my comments.

Reviewer #3: The authors have nicely addressed my concerns and those of the other reviewers.

**********

Have all data underlying the figures and results presented in the manuscript been provided?

Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information.

Reviewer #1: Yes

Reviewer #2: Yes

Reviewer #3: Yes

**********

PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Daniela Megrian

Reviewer #2: No

Reviewer #3: No

----------------------------------------------------

Data Deposition

If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website.

The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: 

http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-22-01369R1

More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support.

Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present.

----------------------------------------------------

Press Queries

If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org.

Formally Accepted
Acceptance Letter - Diarmaid Hughes, Editor, Lotte Søgaard-Andersen, Editor

PGENETICS-D-22-01369R1

A multiomic approach to defining the essential genome of the globally important pathogen Corynebacterium diphtheriae

Dear Dr Goodall,

We are pleased to inform you that your manuscript entitled "A multiomic approach to defining the essential genome of the globally important pathogen Corynebacterium diphtheriae" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course.

The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript.

Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work!

With kind regards,

Zsofi Zombor

PLOS Genetics

On behalf of:

The PLOS Genetics Team

Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom

plosgenetics@plos.org | +44 (0) 1223-442823

plosgenetics.org | Twitter: @PLOSGenetics

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .