Modeling of injury severity of distracted driving accident using statistical and machine learning models

Neero Gumsar Sorum; Martina Gumsar Sorum

doi:10.1371/journal.pone.0326113

Peer Review History

Original SubmissionOctober 13, 2024
13 Oct 2024 Author Response https://doi.org/10.1371/journal.pone.0326113.r001
Mar 21 2025 Decision Letter - Quan Yuan, Editor PONE-D-24-45051Modeling of Injury Severity of Distracted Driving Accident Using Statistical and Machine Learning ModelsPLOS ONE Dear Dr. sorum, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please try to revise your manuscript and respond to all the reviewers' comments. Please submit your revised manuscript by Mar 21 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Quan Yuan, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1.Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. a. For studies reporting research involving human participants, PLOS ONE requires authors to confirm that this specific study was reviewed and approved by an institutional review board (ethics committee) before the study began. Please provide the specific name of the ethics committee/IRB that approved your study, or explain why you did not seek approval in this case. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”). For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research. b. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information. If you are reporting a retrospective study of medical records or archived samples, please ensure that you have discussed whether all data were fully anonymized before you accessed them and/or whether the IRB or ethics committee waived the requirement for informed consent. If patients provided informed written consent to have data from their medical records used in research, please include this information. 3. We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Before we proceed with your manuscript, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible. We will update your Data Availability statement on your behalf to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This paper examines the individual and interacted effects of the influential factors on the injury severity of the DDAs using the Binary Logistic Regression (BLR) method, and to select the best machine learning (ML) model in predicting the injury severity of the DDA. The study is based on the 10-year accident data from India. Overall, the paper is well-organized and easy to follow. However, I have the following concerns. Major: 1. The dataset consists of approximately 78.6% non-fatal accidents and 21.4% fatal accidents, suggesting that a classification model could easily achieve an accuracy of at least 0.78 by simply predicting non-fatal accidents. However, the model in the paper yields an accuracy ranging from 0.4 to 0.77. Could the authors provide a more detailed explanation of these results, particularly regarding why the model does not perform as expected given the huge class imbalance? 2. The age groups are categorized as follows: below 18, 18-24, 25-40, and above 40. The second threshold is set at 24 (years old), which seems somewhat arbitrary. Could the authors clarify why this specific threshold was chosen? Furthermore, the third age group (25-40) contains the largest number of samples, which may lead to the dominance of this group in the analysis. This could skew the results. Could the authors consider either adjusting the groupings or discussing this potential bias? Minor: 1. The study concludes that the "gender variable was not statistically significant to the injury severity of the DDA," which contrasts with findings from other studies. Could the authors provide an explanation for this discrepancy? It would be helpful to understand why gender does not appear to be a significant factor in this case. 2. On Page 19, Line 525, the authors apply three different hyperparameter conditions: 10-fold cross-validation with training sets of 70%, 80%, and 90%. Could the authors clarify why 10-fold cross-validation was chosen here? Based on the earlier results, 15-fold cross-validation seemed to yield the highest accuracy. Reviewer #2: 1.How did the author confirm that the accident cases used were caused by distracted driving? If this problem cannot be solved, the core of this research will be seriously challenged. 2. Although 20 categorical variables have been identified, I think there is no deep connection between the involved variables and distracted driving. Is it possible to add factors such as using a mobile phone, operating the car's display screen, feeling sleepy, chatting, etc.? 3. The machine learning model only uses the default settings of the Dataiku platform and does not fully explore the optimal parameter combinations. 4. The performance of the model in dealing with the imbalance between the number of fatal and non-fatal accidents has not been deeply explored. 5. The time trend change of DDA data during 2011- 2020 has not been analyzed. Time series analysis can be added to study the variation law of accident severity over time and the dynamic influence of influencing factors. 6. Can the influence of environmental factors be considered? ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0326113.r002
Revision 1
11 Feb 2025 Author Response Reply to Academic Editor and Reviewer’s Comments Reply to the reviewer’s comments on the manuscript entitled ‘Modeling of Injury Severity of Distracted Driving Accident Using Statistical and Machine Learning Models", by Neero Gumsar Sorum and Martina Gumsar Sorum. Thanks to the Academic Editor, “PLOS ONE)” for allowing us to improve the current version of the manuscript and resubmission after revision. The authors would also like to sincerely thank all the anonymous reviewers for their encouraging and constructive comments to improve the quality of the manuscript. Comments from the Academic Editor and Reviewers: Academic Editors Comments to Authors 1) Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. Reply: The revised manuscript is prepared as per the PLOS ONE's style requirements, including those for file naming. 2) a. For studies reporting research involving human participants, PLOS ONE requires authors to confirm that this specific study was reviewed and approved by an institutional review board (ethics committee) before the study began. Please provide the specific name of the ethics committee/IRB that approved your study, or explain why you did not seek approval in this case. Once you have amended this/these statement(s) in the Methods section of the manuscript, please add the same text to the “Ethics Statement” field of the submission form (via “Edit Submission”). For additional information about PLOS ONE ethical requirements for human subjects research, please refer to http://journals.plos.org/plosone/s/submission-guidelines#loc-human-subjects-research. b. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information. If you are reporting a retrospective study of medical records or archived samples, please ensure that you have discussed whether all data were fully anonymized before you accessed them and/or whether the IRB or ethics committee waived the requirement for informed consent. If patients provided informed written consent to have data from their medical records used in research, please include this information. Reply: Not Applicable. 3) We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Before we proceed with your manuscript, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible. We will update your Data Availability statement on your behalf to reflect the information you provide. Reply: Actually, we couldn't understand this statement properly at the time of article submission. Now understood, and the dataset used for the present study is uploaded as a supporting information file. Reviewers Comment to Authors: Reviewer 1 ## This paper examines the individual and interacted effects of the influential factors on the injury severity of the DDAs using the Binary Logistic Regression (BLR) method, and to select the best machine learning (ML) model in predicting the injury severity of the DDA. The study is based on the 10-year accident data from India. Overall, the paper is well-organized and easy to follow. However, I have the following concerns. Thanks for the encouraging comments. The concerns in the manuscript are incorporated point by point and given below. Major 1) The dataset consists of approximately 78.6% non-fatal accidents and 21.4% fatal accidents, suggesting that a classification model could easily achieve an accuracy of at least 0.78 by simply predicting non-fatal accidents. However, the model in the paper yields an accuracy ranging from 0.4 to 0.77. Could the authors provide a more detailed explanation of these results, particularly regarding why the model does not perform as expected given the huge class imbalance? Reply: In the context of imbalanced datasets, accuracy is often not the best measure of model performance, as it can be misleading. A model might predict the majority class most of the time and still achieve a high accuracy while performing poorly on the minority class (fatal accidents). Therefore, F1-Score and AUC metrics were also used for the evaluation of model performance in the present study. These metrics provide a more detailed understanding of how well the model is distinguishing between fatal and non-fatal accidents, especially in the case of imbalanced data. The possible reason for the actual performance of the ML model (accuracy =0.4 to 0.77) in the present study might be due to overfitting, underfitting, or a lack of variables that effectively capture the key differences between the non-fatal and fatal classes. 2). The age groups are categorized as follows: below 18, 18-24, 25-40, and above 40. The second threshold is set at 24 (years old), which seems somewhat arbitrary. Could the authors clarify why this specific threshold was chosen? Furthermore, the third age group (25-40) contains the largest number of samples, which may lead to the dominance of this group in the analysis. This could skew the results. Could the authors consider either adjusting the groupings or discussing this potential bias? Reply: It was planned to categorize the age group variable into teen age group (below 18), young age group (18-30), middle-aged group (31-45), and old age group (above 45) for the study. however, after analyzing, it was difficult to make group of middle-aged group data because young age group and old age group had higher numbers in the original data received (the dataset used in the present study is only for distracted driving: one cause of accident). Therefore, the age grouping was re-arranged into below 18, 18-24, 25-40, and above 40 so that each group could make a more or less balanced representation. Minor: 1) The study concludes that the "gender variable was not statistically significant to the injury severity of the DDA," which contrasts with findings from other studies. Could the authors provide an explanation for this discrepancy? It would be helpful to understand why gender does not appear to be a significant factor in this case. Reply: The results of the present study indicated gender variable was not statistically significant to the injury severity of the DDA (p-value > 0.05). The statistical insignificance indicates that gender does not have a measurable impact on the injury severity of DDA. The possible explanation may be that other variables (e.g., age, vehicle type, nature of accident, and time of accident) played a much larger role in injury severity, and their effects might overshadow the potential impact of gender. When multiple factors are accounted for in the model, gender may not show any significant relationship. Another possible reason might be due to the imbalanced dataset (e.g., predominantly one gender). It may lead to an inability to detect any significant differences. 2) On Page 19, Line 525, the authors apply three different hyperparameter conditions: 10-fold cross-validation with training sets of 70%, 80%, and 90%. Could the authors clarify why 10-fold cross-validation was chosen here? Based on the earlier results, 15-fold cross-validation seemed to yield the highest accuracy. Reply: The main objective of hyperparameter tuning (changing k-fold cross-validation and train ratio values) was to study the variation in model performance, accordingly, the best ML model will change. So, in the first set, the ML algorithms were implemented by changing k-fold cross-validation values (5-, 10-, and 15-FCV) at a train ratio value (TR = 0.7), and in the second set, the algorithms were trained at three different TR values (0.7, 0.8, and 0.9) at a 10-FCV. Further, the two sets used in the present study were completely independent. In other words, it means that the first set can be carried out after the second set (in that case, the first set will become the second set) and vice-versa. Reviewer 2 1) How did the author confirm that the accident cases used were caused by distracted driving? If this problem cannot be solved, the core of this research will be seriously challenged Reply: According to the police-reported accident data (used in the present study), those accidents which occurred due to mobile use, talking with passengers, or looking away from the roadway, were commonly designated as distracted driving accidents (DDAs). According to the police officials, the officer in charge asked the passengers (if involved in the accident) or the drivers (who survived the accidents) about the cause of the accident (on the spot or during medical treatment). If it was due to mobile use while driving, chatting/talking, or looking away from the roadway, then that particular accident was designated as a distracted driving accident (DDA). For more clarity, the following sentences are incorporated in the revised manuscript and highlighted: “In the present study, the DDAs were defined as those accidents which occurred due to driver engagement in secondary tasks (mobile use, chatting and talking, or looking away from the roadway) other than driving.” (line numbers: 303-305, page number: 14) “The dataset contained two DDA categories: Non-fatal (not resulting in death within one month of the accident) and fatal (resulting in death within one month of the accident).” (line numbers: 308-310, page number: 14 ) 2) Although 20 categorical variables have been identified, I think there is no deep connection between the involved variables and distracted driving. Is it possible to add factors such as using a mobile phone, operating the car's display screen, feeling sleepy, chatting, etc.? Reply: The categorical variables identified in the present study were not the causes of distracted driving but were the factors that might contribute to the injury severity of the DDAs. Therefore, the main aims of the present study were (i) to examine the individual and interacted effects of these identified factors on the injury severity of the DDA using a logistic regression method, and (ii) to select the best ML model in predicting the injury severity of the DDA among trained six ML models. 3) The machine learning model only uses the default settings of the Dataiku platform and does not fully explore the optimal parameter combinations. Reply: By “The default settings of the Dataiku platform” statement, the authors meant that the ML algorithms available in the Dataiku platform were used, and no new algorithms were developed by their own in the platform. Regarding optimal parameter combinations, two sets of hyperparameter tuning were employed in the present study (mentioned in line numbers:…. And page number: …….): In Set 1, all six ML algorithms were implemented using 5, 10, and 15-fold cross-validation in a Train Ratio value of 0.70. In Set 2, the ML algorithms were trained at Train Ratio values of 0.70, 0.80, and 0.90 with 10-fold cross-validation. 4) The performance of the model in dealing with the imbalance between the number of fatal and non-fatal accidents has not been deeply explored. Reply: To deal with the imbalance between the number of fatal and non-fatal accidents, the present study employed a stratified k-fold cross-validation technique (instead of under-sampling and oversampling techniques) because this technique addresses this issue by ensuring that each fold contains approximately the same proportion of each class as the entire dataset. This is particularly important in imbalanced datasets where one class (e.g., non-fatal accidents) may be significantly more prevalent than the other (e.g., fatal accidents). By maintaining the class distribution in each fold, the stratified k-fold helps the model learn better representations for both classes and avoids bias toward the majority class. This is described in section 3.6 of the revised manuscript (page 17). 5) The time trend change of DDA data during 2011- 2020 has not been analyzed. Time series analysis can be added to study the variation law of accident severity over time and the dynamic influence of influencing factors. Reply: The study of temporal stability/instability of contributing factors to the injury severity of DDA was not included in the present study and the same will be taken up in future research by the authors. 6) Can the influence of environmental factors be considered? Reply: There was no information about environmental factors in the police-reported accident data used in the present study, otherwise, these factors would have been included. Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pone.0326113.r003
Mar 21 2025 Decision Letter - Quan Yuan, Editor PONE-D-24-45051R1Modeling of Injury Severity of Distracted Driving Accident Using Statistical and Machine Learning ModelsPLOS ONE Dear Dr. sorum, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please address the reviewer's new comments and revise the manuscript again. Please submit your revised manuscript by May 21 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Quan Yuan, Ph.D. Academic Editor PLOS ONE [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) Reviewer #2: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: While the authors claim to have handled the imbalance using a stratified k-fold cross-validation technique, this reflects a misunderstanding of the method. Stratified k-fold ensures proportional representation of classes in each fold but does not actively address imbalance during model training. Additionally, the machine learning models exhibit poor performance, even when considering F1-score and AUC. The results suggest that the models are performing only slightly better than random, indicating that they fail to effectively distinguish between non-fatal and fatal cases. Addressing the class imbalance properly and improving model robustness are necessary to enhance predictive performance. Without these improvements, the value of the ML approach remains questionable Reviewer #2: The authors have addressed all the comments point-by-point in their revised manuscript. The revised article now provides a detailed explanation of the DDA (distracted driving accident) determination criteria and data sources, which enhances the credibility of the study. The methodology is rigorous, with reasonable solutions proposed for addressing class imbalance and hyperparameter tuning. The authors have also acknowledged the limitations of the data (e.g., the absence of environmental factors) and expressed their intention to conduct future research on temporal trends. Overall, I have no further suggestions. It is recommended that the revised manuscript be accepted. ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No Reviewer #2: Yes: wei ji ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0326113.r004
Revision 2
10 Apr 2025 Author Response Comments from the Academic Editor and Reviewers: Academic Editors Comments to Authors ## Dear Dr. Sorum, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Reply: The latest revised manuscript has been prepared by incorporating the concerns raised by the reviewers. Reviewers Comment to Authors: Reviewer 1 ## While the authors claim to have handled the imbalance using a stratified k-fold cross-validation technique, this reflects a misunderstanding of the method. Stratified k-fold ensures proportional representation of classes in each fold but does not actively address imbalance during model training. Additionally, the machine learning models exhibit poor performance, even when considering F1-score and AUC. The results suggest that the models are performing only slightly better than random, indicating that they fail to effectively distinguish between non-fatal and fatal cases. Addressing the class imbalance properly and improving model robustness are necessary to enhance predictive performance. Without these improvements, the value of the ML approach remains questionable Reply: Thank you for the insightful feedback. While it is true that stratified k-fold cross-validation ensures proportional representation of classes in each fold, we agree that it does not address class imbalance during model training. The authors' claim may reflect a misunderstanding of the method’s limitations in handling imbalance in terms of model learning - To address this, we could consider employing techniques like oversampling, undersampling, or using class-weight adjustments during training, which can more directly mitigate the impact of imbalance on model performance. Since the issue of imbalance in the dataset was not handled in the present study, this limitation is mentioned/addressed in the limitation section of the latest revised manuscript (the same is highlighted and incorporated in section 5.3 of the latest revised manuscript). Major Reviewer 2 1) The authors have addressed all the comments point-by-point in their revised manuscript. The revised article now provides a detailed explanation of the DDA (distracted driving accident) determination criteria and data sources, which enhances the credibility of the study. The methodology is rigorous, with reasonable solutions proposed for addressing class imbalance and hyperparameter tuning. The authors have also acknowledged the limitations of the data (e.g., the absence of environmental factors) and expressed their intention to conduct future research on temporal trends. Overall, I have no further suggestions. It is recommended that the revised manuscript be accepted. Reply: Thank you for your encouraging comments. Attachments Attachment Submitted filename: Response_to_Reviewers_auresp_2.docx https://doi.org/10.1371/journal.pone.0326113.r005
Decision Letter - Quan Yuan, Editor Modeling of Injury Severity of Distracted Driving Accident Using Statistical and Machine Learning Models PONE-D-24-45051R2 Dear Dr. sorum, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Quan Yuan, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ******** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ****** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ****** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ****** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The authors have responded to all comments point by point in the revised draft. The revised draft explicitly includes the handling of unbalanced datasets as a limitation, which is fair. ****** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy . Reviewer #1: No ******** https://doi.org/10.1371/journal.pone.0326113.r006
Formally Accepted
Acceptance Letter - Quan Yuan, Editor PONE-D-24-45051R2 PLOS ONE Dear Dr. sorum, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Quan Yuan Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0326113.r007

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .