Peer Review History
| Original SubmissionApril 28, 2022 |
|---|
|
PONE-D-22-12529Predicting Firm Creation in Rural Texas: A Multi-Model Machine Learning Approach to a Complex Policy ProblemPLOS ONE Dear Dr. Hand, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. I recommend that it should be revised taking into account the changes requested by the reviewers. Since the requested changes include valuable and constructive reviews, I would like to give you a chance to revise your manuscript. The revised manuscript will undergo the next round of review by two reviewers. Please submit your revised manuscript by Dec 02 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Baogui Xin, Ph.D. Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and 2.Thank you for stating the following in the Acknowledgments Section of your manuscript: “We thank UT Austin’s IC2 Institute, and Art Markman and Gregory Pogue in particular, for supporting this research; Melinda Taylor, for her early support and encouragement; Paul von Hippel, Daniel Armanios and Tim Fitzgerald for their early feedback; Megan Morris for her thoughtful support throughout; and Fritz Boettner for generosity with his data.” We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: “This study was funded by a grant to VR by the IC2 Institute at the University of Texas at Austin (https://ic2.utexas.edu/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: REVIEW “Predicting Firm Creation in Rural Texas: A Multi-Model Machine Learning Approach to a Complex Policy Problem” PONE-D-22-12529 Plos One This paper examines the factors driving the creation of firms in rural America. In doing so, the authors employ a Multi-Model Machine Learning Approach. The authors find that some factors that promote entrepreneurship may not be as predictive as socioeconomic ones. Moreover, the strength of specific industries predicts firm growth, as does the number of local banks. Finally, the authors provide some policy implications of their findings. COMMENTS: The paper examines a very interesting and policy-relevant research question. In this sense, the paper's findings are interesting but there are some points that the authors should address before considering publishing the paper at Plos One. Let me summarize my comments and suggestions below: 1) Texas: External validity The authors should explain better why Texas could be a good laboratory to conduct this analysis. Are there any features that make Texas different from other states (such as Vermont or Indiana). This point is relevant for the external validity of the analysis. If Texas is largely different than other rural states it could be the case that the specificities of Texas are the ones driving the results. I am not suggesting that Texas is not a good laboratory just the need to explain and justify it. 2) Hyper-parameters: Given that the hyper-parameters (e.g. number of trees, number of features in each tree, etc.) used by the random forest are arbitrarily selected, it is important to discuss why no optimization took place and which criteria were used exactly for making any choices. These hyper-parameters should be reported as it is standard in the literature (Albanesi & Vamossy, 2019; Kou et al., 2021; Obrizan et al., 2019; Petropoulos et al., 2019). Also, it would be nice to perform a sensitivity analysis, reporting how the importance of the variables changes when other hyper-parameters are being used. Are the results robust, or do they depend on the parameters and initializations considered?. In the current form of the manuscript, the operations performed are not reproducible. 3) Other machine learning methods. The authors should justify the machine learning techniques used. Why did the authors choose the random forest algorithm rather than other classification techniques (e.g. Extreme Gradient Boosting). This technique (Extreme Gradient Boosting) is also commonly used in these studies and it also provides a ranking of the variable in terms of importance (Carbo-Valverde et al., 2020; Carmona et al., 2019; Fuster et al., 2018; Obrizan et al., 2019). This point should be discussed in the paper. 4) Ranking the variables: Random Forest The random forest algorithm allows the authors to rank the variables based on 1) the mean decrease in accuracy (which reflects the mean loss in accuracy when each specific variable is excluded from the regression algorithm) and 2) (the mean decrease in Gini (which reflects how each feature contributes to the homogeneity between the decision trees used in the resulting random forest). However, the authors rank the variables just based on the mean decrease in accuracy (“In the random forest model, we rank predictors according to their contribution to reducing the percent mean-squared error.”). I think it could be interesting to consider also the mean decrease in Gini. In this line, the authors could follow the approach employed by Carbo-Valverde et al., (2020). This paper 1) ranks every variable using the mean decrease in accuracy and the mean decrease in Gini, respectively, 2) scores each variable, 3) computes the total score of each variable, and 4) reorders the variables by the total score. This approach would allow the authors to provide a rank combining both measures (Accuracy and Gini). 5) Data description The paper does not provide any summary statistics of the main variables employed in the analysis. The paper does not report the number of companies created in rural counties of Texas over time. The authors should provide some summary statistics (mean, min, max, median, sd) of the main variables used. 6) Sample period The authors examine the creation of firms in rural Texas from 2008 to 2018. However, during this large period, there are different phases. From 2008 to 2012, the whole country was facing the Global Financial Crisis (GFC), one of the greatest financial crises. I would expect a relatively low number of firms created during these years. However, from 2012 to 2018, the economy was recovering, so the number of firms created in those years will be relatively higher compared to those created during the GFC. I think that it would make sense to split the sample period into different phases (at least as a robustness check)? Minor comments: • Footnote 2 (page 15) is missing. References - Albanesi, S., & Vamossy, D. F. (2019). Predicting Consumer Default: A Deep Learning Approach. NBER Working Paper Series, N. 26165. - Carbo-Valverde, S., Cuadros-Solas, P., & Rodríguez-Fernández, F. (2020). A machine learning approach to the digitalization of bank customers: Evidence from random and causal forests. In PLoS ONE (Vol. 15, Issue 10 October). https://doi.org/10.1371/journal.pone.0240362 - Carmona, P., Climent, F., & Momparler, A. (2019). Predicting failure in the U . S . banking sector : An extreme gradient boosting approach. International Review of Economics and Finance, 61(March 2018), 304–323. https://doi.org/10.1016/j.iref.2018.03.008 - Fuster, A., Goldsmith-Pinkham, P., Ramadorai, T., & Walther, A. (2018). Predictably Unequal ? The Effects of Machine Learning on Credit Markets. Review of Financial Studies, Forthcomin(November). - Kou, G., Xu, Y., Peng, Y., Shen, F., Chen, Y., Chang, K., & Kou, S. (2021). Bankruptcy prediction for SMEs using transactional data and two-stage multiobjective feature selection. Decision Support Systems, 140, 113429. https://doi.org/10.1016/j.dss.2020.113429 - Obrizan, M., Torosyan, K., & Pignatti, N. (2019). Tobacco Spending in Georgia: Machine Learning Approach. In Y. Kondratenko, G. Kondratenko, & I. Sidenko (Eds.), Recent Developments in Data Science and Intelligent Analysis of Information (Vol. 836, pp. 71–80). Springer International Publishing. https://doi.org/10.1007/978-3-319-97885-7 - Petropoulos, A., Siakoulis, V., Stavroulakis, E., & Klamargias, A. (2019). A robust machine learning approach for credit risk analysis of large loan level datasets using deep learning and extreme gradient boosting. BIS. Reviewer #2: The article uses a variety of machine learning methods to predict business growth in rural counties in Texas, providing policymakers with a set of tools different from traditional econometric models for initial policy solutions. The research methodology and process are interesting and challenging, but again, some areas are questionable. (1) Machine learning approaches are generally not built on interpretability, and there is insufficient evidence in the paper to demonstrate that the accuracy of predictions is due to good luck or the validity of the method. If the authors can futher strengthen the externalities of the study findings, such as using other region’s data for prediction and comparison of accuracy with Texas region, it will be more persuasive. (2) Why authors applied linear regression, forward and backward subset selection of linear regression, lasso regression, and random forest among hundreds ways? As I know, they are not considered as novel algorithms or SOTA model nowdays. They are maybe more suitable for this particular problem, but the selection reasonability of above models need to be futher justified by involving context or literatures in the paper. I’m also confused about "the weighted ranking of all models is combined, their mean values are taken, and the combined measures are ranked again". What is the basis for taking the mean weighted ranking? It is necessary to explain more clearly about the selection and combination of sub-models in the multi-model interpretation framework. (3) It’s not rigorous enough about the explanation for the relatively low predictive power of patents and/or business density (patenting efforts are likely to be concentrated in modern technology-driven industries, which are overwhelmingly located in large cities, while in rural areas, having large, dominant patent-producing firms can discourage the establishment of other potential firms), because most empirical studies have proved there is positive effect by knowledge diffusion. Even if the author's judgment is correct, it’s more reasonable to break down patents into inventions, designs, and plant patents instead of using the total number of patents to prove it. (4) The factors that affect the creation of enterprises, in reality, are complex, and machine learning through passively observed data should ensure the accuracy of the data. In the article, more variables are used, and whether the measurement of variables from different sources will bring errors and thus affect the reliability of the final prediction results. The multi-model explanatory model may rely to some extent on parameter adjustment and function selection, and the final results may be more sensitive to missing values, and it is questionable how to ensure the stability of the model prediction. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 1 |
|
PONE-D-22-12529R1Predicting Firm Creation in Rural Texas: A Multi-Model Machine Learning Approach to a Complex Policy ProblemPLOS ONE Dear Dr. Hand, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. We recommend that it should be revised taking into account the changes requested by the reviewers. Since the requested changes includes Minor Revision, the revised manuscript will undergo the next round of review by the same reviewers or only by the Academic Editor. Please submit your revised manuscript by Jun 18 2023 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:
If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Baogui Xin, Ph.D. Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: I Don't Know ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: REVIEW “Predicting Firm Creation in Rural Texas: A Multi-Model Machine Learning Approach to a Complex Policy Problem” PONE-D-22-12529 Plos One This paper examines the factors driving the creation of firms in rural America. In doing so, the authors employ a Multi-Model Machine Learning Approach. The authors find that some factors that promote entrepreneurship may not be as predictive as socioeconomic ones. Moreover, the strength of specific industries predicts firm growth, as does the number of local banks. Finally, the authors provide some policy implications of their findings. COMMENTS: The authors have addressed all the comments that I raised before. They have done a great effort to include new machine learning methods. I believe the paper could be published in its current form. Reviewer #2: The main problem is the introduction part, of course which can be written in a variety of ways, but some important things still need to be clarified: 1) Research objectives. The paper asks a general question about "what can predict firm creation in rural America". But obviously the first paragraph does not address around the essence of this question (what's legitimacy of your research objective? from which aspect you will cut into this issue? what you will get and why it's important in theory or practice?... ). Promoting rural entrepreneurship is certainly important, but does it have anything to do with the question you raised in the first sentence? If so, what is the logic? If your logic is that the US government can intervene to promote rural entrepreneurship based on what machine learning predicts, that please gives us enough evidences that this logic can be implemented at the state level of the United States... Don't leave anything behind which may make reviewers and audiance confused and to guess. 2) Research gap. Not recommend to demonstrate the existing research by table here, it's not enough to just throw out some evidences, stating logic and viewpoint is your job and not homework of audience. In the introduction part, our concern have not reached the specific determinants and varialbles level yet , but more concerned your comment about the progress of research at the general level. This will show your familiarity with the field, your understanding of the problem, and the legitimacy of your following research gap summary. Furthermore, your summary of the research gap is not logical enough to connect the research objectives and existing research. 3) Margin contribution. Obviously I didn't find it in introduction part. Explain what you have done to fill the research gap and what is the incremental academic value of doing so. Generally speaking, the narrative in the introduction part is insufficient and appropriate, which hinders the audience's jugement of the importance and innovation about this research. Don't let reviewers and audiance to fill the blanc of (why, what, how and whether good you have done) for you, they don't like it. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. |
| Revision 2 |
|
Predicting Firm Creation in Rural Texas: A Multi-Model Machine Learning Approach to a Complex Policy Problem PONE-D-22-12529R2 Dear Dr. Hand, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Baogui Xin, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: |
| Formally Accepted |
|
PONE-D-22-12529R2 Predicting Firm Creation in Rural Texas: A Multi-Model Machine Learning Approach to a Complex Policy Problem Dear Dr. Hand: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Baogui Xin Academic Editor PLOS ONE |
Open letter on the publication of peer review reports
PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.
We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.
Learn more at ASAPbio .