Peer Review History
| Original SubmissionNovember 8, 2024 |
|---|
|
-->PONE-D-24-50599-->-->Machine learning for clustering and yield prediction in Ethiopian and Senegal sorghum collections-->-->PLOS ONE Dear Dr. Ahn, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.-->--> Please submit your revised manuscript by Apr 12 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:-->
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Nguyen-Thanh Son, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 4. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions -->Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. --> Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** -->2. Has the statistical analysis been performed appropriately and rigorously? --> Reviewer #1: Yes Reviewer #2: N/A Reviewer #3: Yes Reviewer #4: Yes ********** -->3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.--> Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** -->4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.--> Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No Reviewer #4: Yes ********** -->5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)--> Reviewer #1: Ensure proper use of acronyms in the text e.g. if Boosted Tree (XGBoost) then XGBoost can be used consistently in the text. Methodology can be improved 1. The significance of R-squared and RMSE for assessing clustering i.e. classification results is not sufficiently explained 2. Citations for JMP 17 Pro, if available would be helpful for readers Results section can be improved 1. Table 1. Respective units for each treats can be given for better understanding. 2. Large difference in accuracy of grain yield prediction models is not sufficiently explained. Reviewer #2: First of all, I congratulate all the authors on their excellent work. Some corrections are required for a better presentation of the work. The recommendations are given below: 1. Modify the abstract and include some key findings in this section. 2. Update the introduction with recent work and references. 3. The discussion section requires current-year references. 4. Separate the conclusion section and modify it with specific recommendations for sustainable agricultural practices in the study area. 5. Figure 3 is not visible; update it with a higher resolution for better understanding. 6. In Figure 4, the x and y axes need to be clearer (explain what "Predicted" and "Actual" mean, including their units or variables). 7. Clearly mention all data sources specifically in data source section. 8. Mention the level of significance of the p-value below Table 1. 9. Describe the selection procedure for the 179 observations. Reviewer #3: The manuscript titled explores the application of various machine learning models, including Random Forest, Boosted Tree, and DBSCAN, to cluster 179 sorghum accessions based on phenotypic traits and predict grain yield. The study identifies distinct clustering patterns and highlights seed weight and germination rate as key predictors of yield, with the Boosted Tree model demonstrating superior performance. While the research showcases the potential of ML in sorghum germplasm characterization, issues related to methodological justification, data handling, model validation, and biological interpretation limit its current impact. The manuscript requires major revision to address these critical concerns. First of all, I have few questions. • How do the identified clusters translate into meaningful biological or agronomic categories? Are they linked to specific landraces, environmental conditions, or genetic traits? • Why were random undersampling and basic imputation methods chosen over more advanced techniques that preserve data integrity and variability? • How do the authors explain the exceptionally high R² value (>0.90) in yield prediction? Could this indicate overfitting, and how was this risk mitigated? • Were external validation datasets considered to confirm the generalizability of the clustering and prediction models? If not, why? • How do the findings align or contrast with existing literature on sorghum yield prediction and clustering? Are there unexpected results that warrant further investigation? Comments need to be addressed • Add overall methodological flowchart for better understanding to the readers • While the introduction outlines the importance of sorghum and the application of machine learning (ML), it does not critically engage with the why behind the use of ML in this specific context. What inherent limitations of traditional phenotypic analysis methods motivated this study? For instance, are traditional clustering techniques insufficient due to non-linearity or high-dimensional trait interactions. This needs to be explicitly articulated to establish a stronger foundation for the study’s relevance. • The introduction lacks clearly defined research questions or hypotheses. What specific biological or agronomic insights were the authors expecting to uncover through clustering and yield prediction? A clearly articulated hypothesis would help guide the reader through the manuscript's logical flow. • There’s little discussion on how ML has been applied in other crop-related studies, particularly in African germplasm contexts. Are there comparable studies in maize, millet, or other cereals? Adding such comparative insights could strengthen the rationale for selecting sorghum and the specific ML techniques. • The manuscript applies a wide range of ML models (Random Forest, DBSCAN, Boosted Trees, etc.) without adequately explaining the selection criteria. Why were these models chosen over others, such as Gradient Boosting Machines, LightGBM, or even deep learning approaches like CNNs, which have shown promise in similar agricultural datasets? • The handling of missing data via Multivariate Normal Imputation is mentioned briefly. However, why was this specific imputation method chosen? Were other imputation strategies (e.g., k-NN imputation, MICE) considered, and how might the choice of imputation affect downstream clustering and prediction outcomes? • The study uses random undersampling to address class imbalance, which is problematic as it can discard potentially informative data. Did the authors consider alternative methods like SMOTE, ADASYN, or cost-sensitive algorithms that preserve data integrity? If not, why? Moreover, how does the undersampling affect model generalizability, especially given the small sample size? • While GridSearchCV is referenced, critical details are missing- What was the cross-validation strategy (e.g., k-fold, stratified k-fold)? Were metrics like precision, recall, F1-score considered in addition to RMSE and R²? Were there any issues with overfitting, especially given the high performance of the Boosted Tree model? • The manuscript heavily focuses on RMSE and R² for model evaluation. However, these metrics alone do not provide a complete picture. For clustering, did the authors consider internal validation metrics like the Silhouette Score, Davies-Bouldin Index, or Dunn Index to assess cluster quality? • The clustering results identify distinct groups, but what do these clusters mean biologically or agronomically? Are the clusters linked to known ecotypes, landrace origins, or environmental adaptations? Without such context, the clustering analysis feels descriptive rather than explanatory. • The Boosted Tree model achieved an exceptionally high R² (0.9026) for yield prediction. This raises concerns about potential overfitting, especially since real-world agricultural yield data is notoriously noisy. Was a learning curve analysis conducted to ensure model robustness? How does the model perform on truly unseen data (e.g., through external validation or time-based splits)? • There’s no discussion of prediction errors. Were there specific accessions where the model consistently under- or over-predicted yield? Error analysis could reveal important insights into model biases or data limitations. • The discussion is largely descriptive, summarizing results without linking them to broader agronomic or biological theories. For instance, why might seed weight and germination rate be the most important predictors of yield? Are these findings consistent with physiological models of plant growth or prior breeding studies? • The manuscript claims that ML models “highlight the potential for efficient germplasm characterization,” yet the models were not externally validated. Without independent validation, this conclusion is speculative. The discussion should temper such claims and acknowledge limitations. • The manuscript does not critically reflect on its own methodological limitations. Issues such as potential overfitting, the limitations of undersampling, or biases in phenotypic data collection are not discussed. A more self-reflective section acknowledging these limitations would enhance the manuscript’s credibility. • The manuscript lacks sufficient detail for reproducibility. For example- Were random seeds set during model training to ensure reproducibility? What software versions and libraries were used (e.g., specific versions of Scikit-learn, JMP Pro)? Can the code pipeline be shared to allow other researchers to replicate the analysis? Reviewer #4: Review with Suggestions: TITLE: Machine Learning for Clustering and Yield Prediction in Ethiopian and Senegal Sorghum Collections This study presents an in-depth application of machine learning (ML) for clustering and yield prediction in Ethiopian and Senegalese sorghum accessions. The research is well-structured, employing diverse ML models and statistical techniques to analyze phenotypic traits. However, there are areas where clarifications, additional justifications, and refinements would improve the study’s rigor and applicability. TITLE, ABSTRACT & AUTHOR INFORMATION: The Title is informative and accurately conveys the research focus, but it could be more specific regarding the ML techniques applied. A refined title such as “Machine Learning-Based Clustering and Yield Prediction in Ethiopian and Senegalese Sorghum Accessions Using Tree-Based Models” would improve clarity. The abstract effectively summarizes the study but could be more concise by focusing on the most impactful ML models rather than listing all tested algorithms. Additionally, performance metrics should be included, particularly R² and RMSE values, to quantify the effectiveness of the best-performing models. The author list and affiliations are well-structured, ensuring credibility. A more refined title would make the research focus clearer, and reducing the number of ML models mentioned in the abstract would improve readability while keeping the key findings prominent. Including numerical performance metrics in the abstract would make the study’s contributions more tangible. Lastly, ending the abstract with a statement on real-world applications for sorghum breeding would reinforce its practical significance. INTRODUCTION: The introduction effectively contextualizes the study by discussing sorghum’s importance, its susceptibility to biotic and abiotic stresses, and the need for better classification and prediction techniques. The transition from traditional phenotypic methods to machine learning approaches is well-articulated. However, the section spends too much space discussing fungal diseases, which, while relevant, should not overshadow the main objective—the application of ML in sorghum classification and yield prediction. Additionally, basic ML definitions (supervised, unsupervised, and reinforcement learning) are unnecessary for an audience likely familiar with these concepts. The research gap is not explicitly stated, and it is unclear how this study differs from prior ML applications in plant breeding. Reducing the discussion on fungal diseases would improve the focus on ML applications, and if necessary, these details could be moved to a separate background section. Removing basic ML definitions and instead elaborating on why clustering and predictive modeling are particularly useful for sorghum breeding would strengthen the introduction. Explicitly defining the research gap and explaining how this study advances previous ML applications in agriculture would make the introduction more compelling. MATERIALS AND METHODS: The materials and methods section is well-structured and provides details on data sources, statistical analysis, and machine learning models. The data description is clear, specifying the number of accessions and phenotypic traits analyzed. However, there is no mention of how categorical or non-numeric variables were handled, nor whether environmental variables such as soil quality or climate data were considered. The statistical analysis is robust, utilizing t-tests, PCA, and hierarchical clustering, but no justification is provided for selecting Ward’s linkage method. The ML analysis is comprehensive, covering feature scaling, train-test splitting, and hyperparameter tuning. However, the selection of ML models is not justified—why were Random Forest, XGBoost, and KNN chosen, and were other approaches like deep learning models (e.g., CNNs, LSTMs) considered? Additionally, clustering validity metrics (e.g., silhouette scores, Dunn index, Davies–Bouldin index) are not reported, making it difficult to assess the effectiveness of the clusters. Clarifying how categorical or non-numeric variables were handled would make the methodology more transparent. Justifying the selection of Ward’s linkage for clustering would strengthen the statistical rigor of the study. Explaining why certain ML models were chosen over others and whether alternative ensemble or deep learning models were considered would enhance the credibility of the ML approach. Including clustering validity metrics would provide stronger validation for the identified clusters. RESULTS: The results section thoroughly presents findings from PCA, hierarchical clustering, and machine learning classification. PCA results successfully illustrate trait relationships and geographical clustering of sorghum accessions, reinforcing the role of genetic and environmental adaptation. However, the variance explained by the first two components (47.6%) is relatively low, suggesting that additional components may be needed for a more comprehensive interpretation. The hierarchical clustering analysis identifies four clusters, but the rationale for selecting four clusters is unclear—were silhouette scores or the elbow method used? The machine learning classification results demonstrate that tree-based models significantly outperformed linear regression and SVMs, achieving accuracies above 70%. However, Table 2 lacks standard deviations or confidence intervals, making it difficult to assess result variability. The yield prediction analysis shows that the Boosted Tree model performed best (R² = 0.9026, RMSE = 1.9165), with seed weight and germination rate as the most influential traits. This aligns with prior studies but lacks a comparative discussion with other ML-based plant breeding studies. Additionally, potential overfitting in the Boosted Tree model is not addressed. Justifying the selection of four clusters using silhouette scores or the elbow method would add statistical robustness to the clustering results. Reporting standard deviations/confidence intervals in Table 2 would improve the reliability of ML performance results. Discussing overfitting concerns in the Boosted Tree model and whether cross-validation was used to mitigate this issue would enhance the credibility of the model’s accuracy. Comparing findings with previous ML-based studies on crop yield prediction would place the study in a broader scientific context. DISCUSSION: The discussion effectively synthesizes key findings and links them to previous research. The geographical clustering of accessions is well-explained, and the link between rust resistance and East African climate conditions is insightful. The ML classification results are discussed thoroughly, emphasizing the advantages of tree-based models. However, there is no discussion on feature importance variability across models—were the same traits ranked important in different models? Additionally, the study does not explore whether the Boosted Tree model is generalizable to other sorghum populations. The breeding implications are well-stated, particularly in emphasizing the importance of targeted trait selection. However, no mention is made of practical challenges in implementing ML in breeding programs. Furthermore, the role of environmental variables in yield prediction is ignored, despite their known impact on crop performance. The future research directions are promising, particularly the integration of genotypic data, but the potential for deep learning techniques (e.g., CNNs, transformers) is not discussed. Discussing whether feature importance rankings were consistent across models would provide a more detailed understanding of trait significance. Addressing the generalizability of the Boosted Tree model to diverse sorghum populations would clarify its broader applicability. Mentioning practical challenges in applying ML to real-world breeding programs would make the discussion more applicable to agricultural research. Exploring the potential for integrating deep learning techniques into future research would align the study with cutting-edge advancements in ML. CONCLUSION: The conclusion effectively summarizes the study’s contributions and highlights the value of tree-based models in phenotypic trait analysis. It links findings to broader food security challenges, ensuring relevance. However, it does not mention whether these findings are generalizable to other crops. Additionally, a brief discussion on the broader applicability of ML approaches in plant breeding would enhance the paper’s impact. Highlighting whether these ML methods can be applied to other cereal crops like maize and wheat would improve the generalizability of the findings. Discussing the potential integration of ML techniques in breeding pipelines would reinforce the study’s real-world impact. This study is well-structured and scientifically rigorous, presenting valuable insights into ML applications for sorghum breeding. However, improvements are needed in justifying ML model selection, reporting additional statistical metrics, addressing overfitting concerns, and discussing broader applicability. Addressing these points would enhance the study’s scientific impact and practical utility. ********** -->6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .--> Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.
|
| Revision 1 |
|
-->PONE-D-24-50599R1-->-->Seed quality drives grain yield in Ethiopian and Senegalese sorghum: Insights from machine learning-->-->PLOS ONE Dear Dr. Ahn, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jun 06 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:-->
-->If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Nguyen-Thanh Son, Ph.D. Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions -->Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.--> Reviewer #1: (No Response) Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed Reviewer #4: All comments have been addressed ********** -->2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. --> Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** -->3. Has the statistical analysis been performed appropriately and rigorously? --> Reviewer #1: Yes Reviewer #2: I Don't Know Reviewer #3: Yes Reviewer #4: Yes ********** -->4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.--> Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** -->5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.--> Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** -->6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)--> Reviewer #1: (No Response) Reviewer #2: The manuscript presents an insightful study on how seed quality influences grain yield in Ethiopian and Senegalese sorghum using machine learning approaches. The study is well-structured and provides valuable contributions to agricultural research and food security. I am satisfied with all the comments. Now his paper is consider for further process. Reviewer #3: No more comments. The Manuscript have been strengthen. Double check the reference and affiliations during proofread Reviewer #4: PONE-D-24-50599R1 TOPIC: Seed Quality drives Grain yield in Ethiopian and Senegalese Sorghum: Insights from Machine : Insights from Machine Learning The manuscript presents a timely and thoughtful application of machine learning (ML) to predict sorghum grain yield using phenotypic traits. The two-step approach- combining hierarchical clustering and regression-based prediction- demonstrates a solid grasp of trait interactions and adds robustness to the analytical framework. The integration of clustering with multiple regression models is a notable strength. However, clarification is needed for the model labeled “NTanH(3)NBooste(8),” particularly regarding its architecture, neuron layers, and boosting method, to aid readers unfamiliar with neural networks. Including a short description of the network structure (e.g., number of hidden layers, activation functions, and boosting strategy) would greatly improve clarity. Additionally, while ADASYN is appropriately used to handle class imbalance, a brief explanation of why it was preferred over alternatives like SMOTE would enhance methodological transparency. A sentence comparing ADASYN’s advantages-such as its ability to generate synthetic samples based on the density of minority class samples-over SMOTE would strengthen this section. Clustering results yielded relatively low silhouette scores (0.32 and 0.31), indicating weak cluster separation-an expected challenge in high-dimensional biological data. Acknowledging this and supplementing with dimensionality-reduction visualizations (e.g., PCA or t-SNE) would improve interpretability. Including a figure showing PCA or t-SNE plots of clusters could visually support the clustering outcome. Although the best model’s R² value of 0.36 is modest, it remains valuable given the absence of environmental or genotypic data. Briefly contextualizing this limitation would benefit the reader. A sentence discussing the challenges of phenotype-only yield prediction and potential model improvement with multi-modal data would be insightful. The Materials and Methods section is rigorous, combining phenotypic data from 179 accessions with appropriate ML tools. However, restructuring it under subheadings like “Data Preprocessing,” “Feature Engineering,” and “Modeling Approaches” would improve clarity. Such structure would guide readers through the modeling pipeline more effectively. A short rationale for selecting specific models (e.g., MLP, KNN, SVM) and justifying the choice of three clusters in K-means would strengthen the narrative. Mentioning whether clustering validation methods beyond silhouette score (e.g., Davies-Bouldin index or elbow method) were applied would add depth. More detail on JMP Pro 17’s model selection process would also support reproducibility. Specifying which models were screened and on what basis they were selected (e.g., cross-validation metrics) would be helpful. The Results section is well-organized, effectively integrating PCA, clustering, and supervised ML to explore phenotypic variation. The clear trait-wise differentiation between Ethiopian and Senegalese accessions enhances the study’s biological relevance. The shift from direct classification to a cluster-informed method shows thoughtful refinement. Inclusion of ADASYN improved classification metrics, and further discussion on the biological interpretation of trait clusters would be beneficial. Elaborating how trait clusters relate to known agronomic or adaptive differences among accessions would enhance the biological insight. The Discussion contextualizes findings within the broader goals of sorghum improvement in stress-prone areas. The comparison of model performance and insights into classification enhancements through ADASYN are well-articulated. Explaining key metrics, like R² and RASE would aid accessibility. Adding brief definitions or intuitive explanations of these metrics would help general readers. Noting sample size or environmental heterogeneity as potential limitations would round out the discussion. This would clarify the extent to which findings may generalize to broader populations or conditions. The Conclusion effectively highlights the study’s contributions and practical implications. Suggestions for incorporating genomic or environmental variables in future work reflect a forward-looking approach. Emphasizing that integrating multi-modal data could significantly enhance model accuracy and biological relevance would strengthen this section. Minor formatting and typographical corrections are recommended to enhance clarity and polish. PLOS ********** -->7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .--> Reviewer #1: No Reviewer #2: No Reviewer #3: No Reviewer #4: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.-->
|
| Revision 2 |
|
-->PONE-D-24-50599R2 Seed quality drives grain yield in Ethiopian and Senegalese sorghum: Insights from machine learning PLOS ONE Dear Dr. Ahn, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Aug 02 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:-->
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Somashekhar Mallikarjun Punnuri, PhD Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions -->Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.--> Reviewer #2: All comments have been addressed Reviewer #3: All comments have been addressed Reviewer #4: All comments have been addressed ********** -->2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. --> Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** -->3. Has the statistical analysis been performed appropriately and rigorously? --> Reviewer #2: I Don't Know Reviewer #3: Yes Reviewer #4: Yes ********** -->4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.--> Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** -->5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.--> Reviewer #2: Yes Reviewer #3: Yes Reviewer #4: Yes ********** -->6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)--> Reviewer #2: The author has addressed all the queries and explained them clearly. The paper is now more precise and written in a scientific manner. The arguments are better structured, and the use of terminology is appropriate. The revised version significantly improves the clarity and academic quality of the manuscript. Reviewer #3: No comments. I recommend Accept. Congratulations to the authors on a well-executed and meaningful contribution to the field Reviewer #4: For Title and Author Details: � Capitalized each word in the title per academic style. � Corrected spacing in names ("Adama R Tukuli" → "Adama R. Tukuli"). � Added a missing comma before the last author. � Ensured consistent superscripts (¹, ², etc.) and symbols (†, *) for affiliations and contributions. Introduction: (with highlighted suggestions) Sorghum (Sorghum bicolor (L.) Moench) is a vital food and fodder source in Africa and Asia. Globally, it's more common as animal feed but is gaining attention as a biofuel crop [1]. With over 60 million tonnes produced annually and Africa contributing ~20 million tonnes, it ranks second only to maize on the continent [2]. Sorghum is vulnerable to fungal diseases that reduce yield and quality [3]. Key diseases include anthracnose, grain mold, and rust. Colletotrichum sublineola, causing anthracnose, can reduce yields by up to 70% in hot, humid conditions [4]. (Clarify sentence structure for better flow.) It spreads easily due to its resilience and dispersal by wind/water [4]. Fusarium spp., a common grain mold pathogen, produces mycotoxins like fumonisins—a food safety risk in regions highly dependent on sorghum [5]. Losses range from 30% to 100% depending on multiple factors [6,7]. (Replace “staggering” with neutral term.) Rust (Puccinia purpurea) appears as rust-like spots and causes yield loss up to 65%, depending on plant maturity and conditions [6,7]. (Clarify “unfavorably mature” wording.) Germplasm-based resistance is seen as the most effective control strategy [7]. Resistant genotypes have been identified through screening and breeding efforts [6]. Our previous study evaluated 179 accessions from Ethiopia, Gambia, and Senegal. Traits included yield, seed weight, flowering time, germination, panicle traits, and disease resistance [8,9]. Due to limitations in basic statistical analysis, we applied machine learning to explore complex relationships and identify the best algorithm for trait evaluation. The Materials & Methods Section: It provides a comprehensive description of the data and analytical approach. The data were collected from field trials in Isabela, Puerto Rico, involving 179 sorghum accessions from Ethiopia, Gambia, and Senegal, with various phenotypic traits measured. However, the mention of “including controls 201 accessions” is unclear and should be rephrased or separated for clarity to specify the total sample size and the role of controls. The description of missing data handling using Multivariate Normal Imputation is appropriate, but it would be beneficial to briefly justify the choice of this method and mention any criteria used for missing data filtering. Disease resistance scoring on a 1-5 scale is noted, but it would strengthen the methodology to indicate whether this scale is a standard or validated measure in sorghum phenotyping. The statistical analysis section mentions t-tests comparing accessions from Ethiopia and Senegal, but the rationale for excluding Gambia accessions from this comparison should be explained. The use of PCA and hierarchical clustering is suitable for exploring trait relationships, though the method for selecting the number of clusters (four clusters) should be clarified, for example, by referencing dendrogram inspection or cluster validity indices. The machine learning workflow is well-outlined and supported by Figure 1, although the figure caption is embedded in the text and should be moved to the figure legend area with formal formatting. In cluster classification, a pre-balanced dataset of 248 points was used with undersampling, but it is not clear how this number was derived, so explanation of the balancing process is recommended. The use of stratified splits and GridSearchCV for hyperparameter tuning is good practice, yet including the rationale for using macro F1-score as the evaluation metric would enhance transparency. The study applies several machine learning models spanning linear, kernel-based, ensemble, and instance-based methods, which is commendable for model comparison. However, it would be helpful to state whether default hyperparameters were used initially and consider summarizing model characteristics in a table for reader clarity. The Results section provides a thorough and well-organized presentation of the phenotypic diversity analyses using PCA, hierarchical clustering, and machine learning classification. The interpretation of the PCA results is clear, particularly in describing the variance explained by the first two principal components and the clustering of traits. However, it would be beneficial to briefly discuss the implications of the remaining unexplained variance and whether further principal components were considered or excluded, to give a fuller picture of the data structure. Additionally, while the PCA biplot explanation outlines trait clusters, clarifying the direction and nature of trait correlations with the principal components would help readers interpret the loading plots more effectively. The comparative analysis between Ethiopian and Senegalese accessions is well-supported by statistical evidence, but the presentation of Table 1 could be improved for better readability. Restructuring the table, possibly by separating phenotypic and genotypic traits or enhancing column headers, would make it easier for readers to digest the information. Furthermore, although the statistical differences are clear, adding brief commentary on the biological or agronomic significance of these differences would strengthen the relevance of the findings. The hierarchical clustering analysis is appropriately detailed, and the use of the DIANA algorithm is well explained. Regarding the machine learning classification, the section clearly justifies the need for a two-step approach due to initial low accuracy. However, mentioning which algorithms were initially tested and elaborating on how class imbalance and the feature-to-sample ratio affected performance would provide better context and insight into the challenges encountered. Finally, when referring to figures throughout the section, ensuring detailed and accessible figure legends close to the figures will facilitate cross-referencing and enhance reader understanding. The discussion effectively highlights the importance of sorghum for food security in dry tropics, particularly Ethiopia, and appropriately connects previous findings on yield declines due to biotic and abiotic stresses. The explanation of geographic patterns in phenotypic diversity using PCA and clustering is well presented and shows thoughtful interpretation. However, the section would benefit from more clearly separating speculative interpretations from established results to avoid potential overstatements. For example, while cluster interpretations are insightful, explicitly stating that these are hypotheses requiring further validation, ideally with genotypic data, would strengthen the discussion. The references to related machine learning studies demonstrate good awareness of the field, but the comparison between prior studies and the current more granular classification attempt could be expanded to clarify why phenotypic trait overlap limited classification accuracy here. The machine learning approach is described in detail, with a good rationale for model selection. Yet, the discussion could improve by including more critical reflection on limitations, such as the relatively small sample size per class and the low number of traits, which constrained classification performance. Additionally, the transition from unsuccessful fine-grained classification to clustering and then classification of clusters is logical but would be clearer if the rationale for choosing four clusters was explicitly justified. The mention of model performances is useful, though the discussion appears to be cut off abruptly; completing the results summary and relating model outcomes back to biological relevance or breeding applications would enhance the narrative. Finally, better signposting within the discussion through subheadings or paragraph breaks would improve readability, given the density of information. The conclusion effectively summarizes the study’s key findings and emphasizes the potential of machine learning to enhance sorghum breeding through improved germplasm characterization and yield prediction. The recognition of the Neural Boosted model’s superior performance and the identification of seed weight and germination rate as critical yield determinants are well highlighted. However, the conclusion could be strengthened by explicitly acknowledging limitations, such as the study’s focus on only Ethiopian and Senegalese accessions and the relatively small phenotypic trait set, which may affect the generalizability of the findings. Additionally, while the broader applicability to other cereal crops is mentioned, providing specific examples or cautionary notes about differences in crop biology would make this claim more balanced. The discussion of disease resistance traits’ limited influence is insightful, but further elaboration on possible reasons—such as environmental conditions or disease prevalence during trials—would clarify these findings. The application of ADASYN to address class imbalance is a notable methodological strength; however, the conclusion would benefit from a brief mention of potential challenges or biases introduced by synthetic data generation. The practical recommendations about prioritizing seed quality and establishing local seed testing are valuable and grounded in the results, yet highlighting potential implementation challenges in resource-limited settings could add nuance. Finally, the call for future integration of multi-modal data is well placed but could be enhanced by suggesting concrete next steps or specific types of data integration that would most benefit sorghum breeding. Overall, the conclusion provides a solid wrap-up but would improve with clearer acknowledgment of limitations and a more nuanced discussion of the broader implications and future directions. ********** -->7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .--> Reviewer #2: No Reviewer #3: No Reviewer #4: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. |
| Revision 3 |
|
Seed quality drives grain yield in Ethiopian and Senegalese sorghum: Insights from machine learning PONE-D-24-50599R3 Dear Dr. Ahn, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Somashekhar Mallikarjun Punnuri, PhD Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
| Formally Accepted |
|
PONE-D-24-50599R3 PLOS ONE Dear Dr. Ahn, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Somashekhar Mallikarjun Punnuri Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .