Peer Review History
| Original SubmissionJuly 5, 2022 |
|---|
|
Transfer Alert
This paper was transferred from another journal. As a result, its full editorial history (including decision letters, peer reviews and author responses) may not be present.
PONE-D-22-19015Hi-LASSO: High-performance Python and Apache spark packages for feature selection with high-dimensional dataPLOS ONE Dear Dr. Kang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Sep 11 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Sathishkumar V E Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 3. Thank you for stating the following in the Acknowledgments Section of your manuscript: "This research was supported by the National Research Foundation of Korea (NRF-2021R1I1A3048029). " We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: "Y.S. Kim is supported for the work by the National Research Foundation of Korea (NRF-2021R1I1A3048029)." Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. 5. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes Reviewer #3: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: No Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The Research Paper stands Rejected and is NOT RECOMMENDED for Publication because of the following strong reasons: 1. The overall presentation and conceptual methodology of the paper is very weak and lots of advanced papers are already published. 2. No Strong analysis and experimental results are observed in the paper. 3. No Novelty is there. 4. It is the work of simple theoretical description but even the actual research orientation is missing in the paper. Reviewer #2: Few experiments can be repeated or justified for f1 scores. The literature study can strengthen with more recent papers. The authors can state how the current standards are maintained, materials and methods are not cited with previous works. the authors can consider the below works for better literature -Y. Lu, L. Yang, S. X. Yang, Q. Hua, A. K. Sangaiah, T. Guo, and K. Yu, “An Intelligent Deterministic Scheduling Method for Ultra-Low Latency Communication in Edge Enabled Industrial Internet of Things,” IEEE Transactions on Industrial Informatics, 2022, doi: 10.1109/TII.2022.3186891. J. Wei, Q. Zhu, Q. Li, L. Nie, Z. Shen, K. -K. R. Choo, K. Yu, “A Redactable Blockchain Framework for Secure Federated Learning in Industrial Internet-of-Things”, IEEE Internet of Things Journal, doi: 10.1109/JIOT.2022.3162499. -Subbiah, S.S. and Chinnappan, J., 2021. Opportunities and Challenges of Feature Selection Methods for High Dimensional Data: A Review. Ingénierie des Systèmes d'Information, 26(1). -Bolón-Canedo, V. and Alonso-Betanzos, A., 2019. Ensembles for feature selection: A review and future trends. Information Fusion, 52, pp.1-12. -Y. He, L. Nie, T. Guo, K. Kaur, M. M. Hassan, and K. Yu," A NOMA-Enabled Framework for Relay Deployment and Network Optimization in Double-Layer Airborne Access VANETs," IEEE Transactions on Intelligent Transportation Systems, doi: 10.1109/TITS.2021.3139888. Reviewer #3: This paper presents a new implementation of a previously published algorithm called Hi-LASSO, with parallel computations that make the algorithm more practical for use with large real-world data sets. It shows experiments on both synthetic and real data that demonstrate the algorithm's utility for feature selection in high-dimensional data sets. Comparisons to a Spark implementation are shown, with performance results indicating the scalability of the method. Finally, the work describes the model's hyperparameters and robustness. I found the paper to be relatively well written overall, with a reasonable order of its sections. This paper will be a good candidate for PLOS ONE with some or all of the revisions suggested below. I have broken my comments into a few sections focused on the paper, figures, grammar, reproducibility, references, and the code described in the paper. Paper comments: - The previous paper on this algorithm used Relative Model Error, Root Mean Square Error, and F1 scores. Why are only F1 scores reported in this work? An explanation of the choice of metric would help strengthen the data. - The hyperparameters q1, q2, L, alpha should be described in further detail. These are described a little bit in the "tuning" section. However, it would be helpful to know not only the trends in how performance is affected, but also how to choose an initial value for each. It appears there is an "auto" setting in the Python package but that automatic behavior is not described in the paper from what I could tell. - Were hyperparameters optimized for all LASSO algorithms? How did the authors ensure that all algorithms were fairly assessed? It is surprising to see so many algorithms with F1 scores of zero in the BRCA dataset. Similarly, it is surprising to see the results in Table S5. Is there another dataset that shows a nonzero score for some of the compared algorithms? - I find it a little hard to believe that Hi-LASSO is this much better than similar algorithms without more information about how each algorithm was run, to ensure fairness in the assessment. Are there cases where Hi-LASSO performs poorly? If so, it would be helpful to include such a case for a baseline. How does Hi-LASSO perform in lower-dimensional cases with more data where other LASSO algorithms have been used in the past? Comparisons like this would help reduce the sense that the datasets are cherry-picked for Hi-LASSO's benefit, and would help to illuminate the contrast between prior art and this algorithm's improvements for specific types of problems. - Some of the results are a bit surprising, with several comparison methods yielding few or no positive results. This may indicate the selection of overly specific benchmark data sets, or a lack of competitive algorithms for comparison. A bit more explanation of the results in these areas would benefit the reader as well as make the work more defensible. The authors' claim of "extraordinary performance" appears to be somewhat supported by the data that is presented, but it is a little unclear whether this is due to a selective choice of benchmarks. Understanding where the algorithm fails (or performs in an "average" way) is important for readers who wish to make practical use of the package. - The introduction or conclusions should spend more time contextualizing this algorithm. What fields should consider adopting Hi-LASSO? Genomics may be one such candidate, but other potential applications should be described. - It would be good to summarize the contributions of each author to the work, perhaps using a standardized framework like CRediT (Contributor Roles Taxonomy). Figure comments: - Figure 1 is hard to read and should be higher resolution - ideally a vector graphic format like PDF or EPS. Same for supplementary figures S1, S2. - Figure 1(B) could be replaced by a scatter plot showing weak scaling performance for the process parallel and Spark implementations from 1 core to the number of cores in the benchmarking machine. This could be for one dataset, or a geometric average of a few datasets. Weak scaling plots are far more useful to understand computational efficiency than a raw speedup chart with no clear baseline. It's not clear if the speedup is linear with the number of cores, which a weak scaling plot would help indicate. Grammatical / typographical comments: - Line 24: "impeded to apply Hi-LASSO for practical applications" should say "impeded practical applications of Hi-LASSO" - Line 111: should say "desired average number of times" - Line 156: "the Apache version" should say "the Apache Spark version." Apache Spark (or Spark for short) is the proper name of the library -- not just "Apache." - Line 198: missing a subscript on q1 References / reproducibility comments: - The TCGA data sets should be cited. https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/using-tcga/citing-tcga - In accordance with the PLOS ONE "Exceptions to sharing materials" (https://journals.plos.org/plosone/s/materials-software-and-code-sharing), the "authors should include a statement in their Materials and Methods discussing any restrictions on availability or use." It appears the TCGA data is subject to controlled access. This should be made clear to the reader, with information about how to access these controlled datasets (if possible) in order to make the results reproducible. - The code used to generate synthetic Datasets I - IV does not appear to be included in the linked GitHub repository (I looked in the benchmark models and sample data directories). That should be included to meet PLOS ONE data sharing policies, along with a script to execute the code in the benchmark models directory for all benchmarks on the synthetic data. - Check the capitalization of journal names and article titles in the references section. Some have unexpected lowercase letters. - Please cite all relevant scientific software packages used in the hi_lasso software, such as NumPy and SciPy. See https://numpy.org/citing-numpy/ and https://scipy.org/citing-scipy/ for examples. Code comments: - Line 116 of the paper: Rather than describing both "parallel" and "n_jobs", just let "n_jobs" default to 1 (the serial case). Then only one parameter is needed, and "parallel" can be removed. A special value of "n_jobs is None" or "n_jobs == 0" could use the number of CPU cores returned by "multiprocessing.cpu_count()" for automatic parallelization across all available cores. - The choice of the MIT license is good for future works to build on this one! - Could the Spark and non-Spark libraries be combined, or make the Spark library use the base Python library as a dependency? The two code paths look fairly unrelated right now. - The "simulation_data" folder on GitHub could include a README that indicates where the data came from or how it was generated. It is my hope that the authors will consider adapting this algorithm for inclusion in a popular toolkit such as scikit-learn after publication. It seems like a helpful algorithm. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
Hi-LASSO: High-performance Python and Apache spark packages for feature selection with high-dimensional data PONE-D-22-19015R1 Dear Dr. Kang, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Sathishkumar V E Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The Revised paper has incorporated all the revisions as mentioned in the last review, and now the paper looks Ok in all aspects. So, the paper stands Accepted with no further revisions. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No ********** |
| Formally Accepted |
|
PONE-D-22-19015R1 Hi-LASSO: High-performance python and apache spark packages for feature selection with high-dimensional data Dear Dr. Kang: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Sathishkumar V E Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .