Predicting yield of individual field-grown rapeseed plants from rosette-stage leaf gene expression

Sam De Meyer; Daniel Felipe Cruz; Tom De Swaef; Peter Lootens; Jolien De Block; Kevin Bird; Heike Sprenger; Michael Van de Voorde; Stijn Hawinkel; Tom Van Hautegem; Dirk Inzé; Hilde Nelissen; Isabel Roldán-Ruiz; Steven Maere

doi:10.1371/journal.pcbi.1011161

Peer Review History

Original SubmissionJanuary 4, 2023
9 Feb 2023 Decision Letter - Kiran Raosaheb Patil, Editor Dear Prof. Maere, Thank you very much for submitting your manuscript "Predicting yield of individual field-grown rapeseed plants from rosette-stage leaf gene expression" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Kiran Raosaheb Patil, Ph.D. Section Editor PLOS Computational Biology ********************* A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this interesting paper, the authors use single plant-omics to predict yield of individual field-grown rapeseed plants from rosette-stage 2 leaf gene expression. This is interesting as it suggests a way of predicting crop yield from the expression state of younger plants. They use machine learning to successfully predict the phenotypes of individual B. napus plants from rosette-stage leaf gene expression data. The authors have generated a huge amount of data on individual plants and have carefully compared early stage gene expression and phenotype with final yield. It is nice that the top predictive genes from their models are often linked to the juvenile-adult phase change in Arabidopsis. I think the topic and material will be of interest to PloS computational Biology readers, and the data will be useful to researchers who want to dig into the topic. The authors should be commended for making so much of their data and code available online. I have the following points: Major point. My one major problem with the paper is I found it difficult to understand from the main text how the machine learning approach was applied, how the techniques were chosen, and to get an intuitive understanding of the results and how they can be validated. This wasn’t completely fixed by going to the methods, which explained in detail the approach, but not how the approach was chosen in the first place. This can be simply fixed with a few more explanations. For example, the authors write: ‘We built random forest (RF) and elastic net (enet) models to predict the phenotypes of individual plants from their autumnal leaf 8 transcriptome, using either all genes or only transcription factors (TFs) as potential features and using three different feature selection techniques (see Methods). For each combination of phenotype, model type (RF or enet), potential feature set (all genes or TFs) and feature selection technique, 9 repeat models werelearned, each time using 10-fold cross-validation with different splits (see Methods), resulting per combination in a total of 90 test sets and 9 test set predictions per plant. It should be explained in the main text the differences between the single gene and multi-gene models, and why random forest and elastic net models are used (why both?). It would also be helpful to explain how the multi-gene models are implemented and what assumptions were used. Minor points The authors write in lines 193-206, ‘ In a previous study on a similar number of field-grown maize plants [28], 14.17% of transcripts were found to be significantly spatially autocorrelated at q ≤ 0.01, which is considerably more than the 0.22% recovered here at q ≤ 0.05. This may be due to differences in the way Moran’s I values and their significance were calculated in Cruz, De Meyer (28) versus the present study (see Methods). Reference 28 appears to be the from the same group as this paper. Wouldn’t it be possible to get the data from 28 and do the same test? The methods dont seem to explain the differences in how the Moran's I values were calculated between the two papers I don’t fully understand what was done after this in lines up to 206, and what the evidence is that there is spatial patterning in the new data? I don’t think it matters for this papers results whether there is spatial patterning, but I found this part of the paper confusing. Lines 935/936 I found it interesting that seedling emergence was a poor predictor of yield ‘ the seedling emergence date was not recorded in the present field trial, but the closest proxy that was measured, namely rosette area at 14 DAS, was found to be a bad predictor for yield, indicating that variation in seed germination and seedling emergence across the field did not by themselves have a major impact on yield.’ I note in the methods that ‘Early- and late-emerging seedlings were pruned preferentially (based on visual assessment) to make the remaining seedling population as homogeneous as possible.’ Should the authors clarify that in this work they cant assess the effects of seedling emergence on yield as they prune early and late emerging seedlings? Also, is the data showing that rosette area at 14DAS was a bad predictor for yield in the paper? (sorry if I missed it, but if so maybe reference where this data is shown here as well?) Reviewer #2: The study by De Meyer et al., investigates the phenotypes and gene expression in rapeseed plants, with the aim to predict spring phenotypes given autumn data. The authors found that single-plant omics can be used to identify genes and processes influencing crop yield in the field. Overall, the work is interesting, well-executed, and the topic exciting. Overall, we would be happy to see this paper published. However, there are some places that need clarification: Figure 1: ‘Principal component analysis (PCA) suggests that there are no subpopulations of plants with distinct expression or phenotype profiles’. It would be good to have statistics support this. For example, is there a correlation between the distance of plants in the field and the points in the PCA plot? Line 195: The authors mention that Moran’s I is calculated differently in this study. Why? Please elaborate. Line 452 (and other occurrences): The autors discuss the differences in performance of single- and multi-gene models, but it is unclear what we should look for. Please cite the table/figure, and also the R2 value ranges. The top predictors for leaf and seed phenotypes chapter is somewhat long and tedious to read due to the descriptions of many genes found in the table. Perhaps it would be easier to summarize the findings and phenotypes as another table/figure? Line 429: Is a comparison of the phenotype prediction performance between multi-gene models and transcription factors models relevant? Line 455-456: According to the data in Table 2, it seems that most of the shoot dry weight traits are not better predicted by muti-gene models than single-gene models. It would be useful to indicate TFs that are found in RF and enet models if Figure 3, 4. Line 495: “Random forest” and “elastic net” abbreviations have been defined in line 330, so no need to define it again in 495. Please also check similar problems throughout the manuscript. Line 695: Please mark the numbers of TFs in Figure 3 and Figure S8 as well. Is it possible to further analyze whether top predictors and phenotype are positively or negatively correlated? ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Marek Mutwil Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. https://doi.org/10.1371/journal.pcbi.1011161.r001
Revision 1
11 Apr 2023 Author Response Attachments Attachment Submitted filename: reponse_to_reviewers.docx https://doi.org/10.1371/journal.pcbi.1011161.r002
5 May 2023 Decision Letter - Kiran Raosaheb Patil, Editor Dear Prof. Maere, We are pleased to inform you that your manuscript 'Predicting yield of individual field-grown rapeseed plants from rosette-stage leaf gene expression' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Kiran Raosaheb Patil, Ph.D. Section Editor PLOS Computational Biology ********************************************************* Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I am happy with the revisions and support publication Reviewer #2: The authors did a great job addressing our questions. ****** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: None Reviewer #2: None ****** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes:** Marek Mutwil https://doi.org/10.1371/journal.pcbi.1011161.r003
Formally Accepted
23 May 2023 Acceptance Letter - Kiran Raosaheb Patil, Editor PCOMPBIOL-D-23-00014R1 Predicting yield of individual field-grown rapeseed plants from rosette-stage leaf gene expression Dear Dr Maere, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Timea Kemeri-Szekernyes PLOS Computational Biology \| Carlyle House, Carlyle Road, Cambridge CB4 3DN \| United Kingdom ploscompbiol@plos.org \| Phone +44 (0) 1223-442824 \| ploscompbiol.org \| @PLOSCompBiol https://doi.org/10.1371/journal.pcbi.1011161.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .