Leading predictors and their associations with combination opioid pain therapy in older adults with cancer: Application of machine learning approaches

Christy Xavier; Rafia S. Rasu; Chanhyun Park; Sydney Manning; Usha Sambamoorthi

doi:10.1371/journal.pone.0337912

Peer Review History

Original SubmissionNovember 21, 2024
25 Apr 2025 Decision Letter - Sara Mucherino, Editor Dear Dr. Rasu, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== Dear Authors, thank you for submitting your manuscript. The manuscript presents a robust analysis using real-world data. With revisions to strengthen clarity and methodological transparency, the manuscript would make a valuable contribution to the literature. Specifically, I recommend to address reviewers detailed comments and suggestions. I also recommend to include IRB approval number explicitly and clarify informed consent procedures ant to refine figure legends to be more self-explanatory, particularly for SHAP and interaction plots. ============================== Please submit your revised manuscript by Jun 09 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols . We look forward to receiving your revised manuscript. Kind regards, Sara Mucherino Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. Thank you for stating the following in the Acknowledgments Section of your manuscript: “The collection of cancer incidence data used in this study was supported by the California Department of Public Health pursuant to California Health and Safety Code Section 103885; Centers for Disease Control and Prevention’s (CDC) National Program of Cancer Registries, under cooperative agreement 1NU58DP007156; the National Cancer Institute’s Surveillance, Epidemiology and End Results Program under contract HHSN261201800032I awarded to the University of California, San Francisco, contract HHSN261201800015I awarded to the University of Southern California, and contract HHSN261201800009I awarded to the Public Health Institute.” We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: “The author(s) received no specific funding for this work.” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 4. We note that you have indicated that there are restrictions to data sharing for this study. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions . Before we proceed with your manuscript, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., a Research Ethics Committee or Institutional Review Board, etc.). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of recommended repositories, please see https://journals.plos.org/plosone/s/recommended-repositories. You also have the option of uploading the data as Supporting Information files, but we would recommend depositing data directly to a data repository if possible. We will update your Data Availability statement on your behalf to reflect the information you provide. Additional Editor Comments : Dear Authors, Thank you for submitting your manuscript to PLOS ONE. After careful evaluation we recommend major revisions before the manuscript can be considered for publication. We invite you to revise your manuscript accordingly, addressing all reviewer comments point-by-point. Please also highlight the changes in your revised manuscript. We look forward to receiving your resubmission. Best regards. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Yes Reviewer #2: Partly ******** 2. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #1: Yes Reviewer #2: No ****** 3. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #1: No Reviewer #2: Yes ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #1: Yes Reviewer #2: No ****** Reviewer #1: Summary: In this study, the authors leverage a large, nationally representative dataset – The Surveillance, Epidemiology, and End Results (SEER) cancer registry linked with Medicare claims – and employ machine learning to predict opioid combination therapy in older adult cancer patients. The authors aimed to identify the leading predictors of opioid combination therapy that may be inappropriate for older adults. The results demonstrated baseline pain medication prescriptions, age, female, cancer treatment, such as surgery and chemotherapy, care fragmentation, and areas where the percent of Hispanic residents living in poverty and where the percent of Native Americans living in poverty as leading predictors of opioid combination therapy. While this study did offer insight into prescribing patterns, I have a few suggestions of additional details or clarifications that should be made to better reflect the study to the audience. Major Concerns: Within the methods section the authors mention “Due to imbalance in target variable, … we used up-sampling procedures for prediction. As a precaution, I would like to ensure that up-sampling was only performed on the training dataset. Otherwise, this technique can allow data leakage of the test set and overstate model performance. Please also detail the method (e.g., random, SMOTE, etc.) and magnitude (i.e., the final class sizes) of up-sampling. The authors also state within the methods section: “… we split our data into training (80%) and testing (20%) subsets… We used test (unseen) data to evaluate the performance … 10-fold cross-validation … and tuning of hyper-parameters … we calculated the optimal probability classification threshold to improve fit” Based on this information, there are a few things I would like to precaution. Hyperparameter tuning and classification thresholding should occur on training data (as would be the case for a deployed model). By choosing hyperparameters and thresholds based on test performance, the test data is no longer “unseen” by the model (i.e., data leakage). If this is the case, please reevaluate your model. Otherwise, please carefully rephrase the methodology to ensure the results are replicable for the readers. In the results section, all figures are uninterpretable. I assume this is due to low resolution. Please reupload higher-dpi or vector images. Further, the results state “Confusion plots show the quality of the output” but confusion plots were not provided. Please include confusion matrices for transparency to the reader. Minor Concerns: Why were patients with HMO insurance plans excluded? Please provide rationale. On line 168: “A [majority] were 70-74” Reviewer #2: • Research guideline(s)/standard(s) appropriate to the study design should be reported in the paper text. • Which randomization method was used in the distribution of the individuals included in the study to the groups? • Which blinding (masking) method was used in the study? • The primary output/endpoint variable(s)/measurement(s) of the study should be defined. • How was the sample size determined? This information should be explained in the Materials and Methods section. • Which sampling (probable or non-probable, etc.) method was used in the study? • Statistical tests for hypothesis testing and their assumptions should be specified in the study's statistical analysis in the Materials and Methods section. • The details (version, license number, etc.) of the statistical package(s) or program(s) should be given in the sub-section of "Data Analysis or Statistical Analysis". • It should be explained how the qualitative and quantitative data are summarized under the sub-heading of Statistical Analyses in the Materials and Methods section of the study. • Data analysis or Statistical analysis sub-section title should be added to the Materials and Methods. • The exact P values should be added to the table(s) (e.g., p=0.25; p=0.03). • Which methods are used to model relationships between variables? • The descriptions and other descriptive values/data should be defined on the tables and shapes. • Are the data subjected to pre-processing? • How were extreme/outlier values in the data determined and resolved? • What approaches were used to test the validity of the models? • Which metrics were used in the performance evaluation of the estimates of models/algorithms? • How were the predictive models selected in this study? ****** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #1: No Reviewer #2: No ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0337912.r001
Revision 1
8 Aug 2025 Author Response Dear Ms. Sara Mucherino, We sincerely thank you and the reviewers for the thoughtful and constructive feedback on our manuscript entitled “Leading Predictors and their Associations with Combination Opioid Pain Therapy in Older Adults with Cancer: Application of Machine Learning Approaches”. The comments provided were helpful in strengthening our manuscript significantly. Please see below for our responses to each comment. Reviewer comments are shown in italics, followed by our detailed responses in blue font. Changes made to the manuscript have been tracked using tracked changes as appropriate. However, as there have been numerous changes (such as using tables instead of figures, additional text, modifications), we will also attach a clean version of the manuscript. Reviewer #1 Within the methods section the authors mention “Due to imbalance in target variable, … we used up-sampling procedures for prediction. As a precaution, I would like to ensure that up-sampling was only performed on the training dataset. Otherwise, this technique can allow data leakage of the test set and overstate model performance. Please also detail the method (e.g., random, SMOTE, etc.) and magnitude (i.e., the final class sizes) of up-sampling. Yes, up-sampling procedures were only performed on training dataset. As per the reviewer’s suggestion, we reanalyzed the data with SMOTE rather than “plain oversampling” and model performance was evaluated on the test data. The findings remained consistent with the original version. However, care fragmentation was no longer one of the top 11 leading predictors and it was replaced by the presence of chronic pain conditions before cancer diagnosis. All text and figures were revised based on the revised analyses and output. The authors also state within the methods section: “… we split our data into training (80%) and testing (20%) subsets… We used test (unseen) data to evaluate the performance … 10-fold cross-validation … and tuning of hyper-parameters … we calculated the optimal probability classification threshold to improve fit” Based on this information, there are a few things I would like to precaution. Hyperparameter tuning and classification thresholding should occur on training data (as would be the case for a deployed model). By choosing hyperparameters and thresholds based on test performance, the test data is no longer “unseen” by the model (i.e., data leakage). If this is the case, please reevaluate your model. Otherwise, please carefully rephrase the methodology to ensure the results are replicable for the readers. Yes, only training data was used for hyperparameter tuning and classification thresholding. In the results section, all figures are uninterpretable. I assume this is due to low resolution. Please reupload higher-dpi or vector images. Further, the results state “Confusion plots show the quality of the output” but confusion plots were not provided. Please include confusion matrices for transparency to the reader. We provided original tiff files with high resolution that were obtained from the software. Unfortunately, when the journal converted the files to PDF, the resolutions were probably be lost. Why were patients with HMO insurance plans excluded? Please provide rationale. In claims‐based research, excluding participants ever enrolled in HMO/managed‐care plans is standard practice because HMO encounter data do not contain billing information found in fee-for-service (FFS) claims, so service use is systematically under-ascertained. Restricting analyses to continuous FFS enrollees ensure complete, comparable records across all beneficiaries. On line 168: “A [majority] were 70-74” Thank you for bringing this discrepancy to our attention. We have made this revision accordingly. Reviewer #2 Research guideline(s)/standard(s) appropriate to the study design should be reported in the paper text. Thank you for this guidance. We have provided detailed responses to your questions involving our study design and methods below. Which randomization method was used in the distribution of the individuals included in the study to groups? Which blinding (masking) method was used in the study? How was the sample size determined? This information should be explained in the Materials and Methods section. Which sampling (probable or non-probable, etc.) method was used in the study? As stated in the methods, the data were derived from population-based cancer registries from 18 different geographical area. The SEER program makes 5% sample of those diagnosed with cancer, which includes individuals with cancer who resided in a SEER area and were in the Medicare 5% sample available to researchers. The primary output/endpoint variable(s)/measurement(s) of the study should be defined. The primary outcome variable, which is also known as target variable in machine learning methods, was concomitant opioid use with other pain medications (NSAIDs, benzodiazepines, gabapentinoids, and/or skeletal muscle relaxants) during the 12 months after an incident cancer diagnosis in the year 2014. These were derived from Medicare Part D files. We have clarified this in the manuscript. Statistical tests for hypothesis testing and their assumptions should be specified in the study's statistical analysis in the Materials and Methods section. As our study was based on a predictive‐machine‐learning framework rather than on classical hypothesis testing, we have clarified this in the Materials and Methods. We have added details in a separate section titled “Machine Learning Analyses”. The details (version, license number, etc.) of the statistical package(s) or program(s) should be given in the sub-section of "Data Analysis or Statistical Analysis". We used SAS 9.4 (SAS Institute Inc. 2010, Cary, NC, USA) for data management and descriptive analyses. Machine learning analyses conducted using Python 3.9.7 (Python Software Foundation, 2021). This information has been provided in the methods section. It should be explained how the qualitative and quantitative data are summarized under the sub-heading of Statistical Analyses in the Materials and Methods section of the study. Data analysis or Statistical analysis sub-section title should be added to the Materials and Methods. We have included a section “Descriptive analyses” to explain how the data were summarized. The machine learning analyses section has details on how the data are summarized. The exact P values should be added to the table(s) (e.g., p=0.25; p=0.03). Table 2 provides descriptive statistics only; no hypothesis tests were performed and therefore no p-values are reported. Table 3 now provides p-values for differences in pain drug use before and after diagnosis. The differences were tested with the McNemar test, a non-parametric method suitable for multiple observations on the same individual. Table 4 displays all p-values (4 decimal points). However, any p-value smaller than 0.0001 is represented as <0.0001. Are the data subjected to pre-processing? What approaches were used to test the validity of the models? Which metrics were used in the performance evaluation of the estimates of models/algorithms? How were the predictive models selected in this study? Which methods are used to model relationships between variables? The descriptions and other descriptive values/data should be defined on the tables and shapes. The revised section on machine learning analyses explains all the steps that were used such as data pre-processing, model development, model performance, and model interpretation. We hope these descriptions will be helpul. How were extreme/outlier values in the data determined and resolved? All continuous predictors were standardized to zero mean and unit variance (StandardScaler). This z-score normalization places each feature on a common scale and helps to compress the influence of any residual extreme values. Because we then modeled with XGBoost—an ensemble of decision trees that is itself robust to outliers—no additional outlier exclusion was required.as XGBoost’s tree‐based splits are minimally affected by extreme values. Thank you again for this opportunity to revise and resubmit our work. We have made additional edits to improve the readability of our manuscript. We hope that our revisions meet the expectations of the reviewers and editorial team. Please let us know if you have any further comments, suggestions, or modifications. Regards, Dr. Rafia Rasu Attachments Attachment Submitted filename: SEER_Medicare_PLoS_response to reviewers.docx https://doi.org/10.1371/journal.pone.0337912.r002
16 Nov 2025 Decision Letter - Sara Mucherino, Editor Leading Predictors and their Associations with Combination Opioid Pain Therapy in Older Adults with Cancer: Application of Machine Learning Approaches PONE-D-24-48620R1 Dear Dr. Rasu, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. For questions related to billing, please contact billing support . If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Sara Mucherino Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author Reviewer #2: (No Response) ******** 2. Is the manuscript technically sound, and do the data support the conclusions??> Reviewer #2: (No Response) ****** 3. Has the statistical analysis been performed appropriately and rigorously? -->?> Reviewer #2: (No Response) ****** 4. Have the authors made all data underlying the findings in their manuscript fully available??> The PLOS Data policy Reviewer #2: (No Response) ****** 5. Is the manuscript presented in an intelligible fashion and written in standard English??> Reviewer #2: (No Response) ****** Reviewer #2: Accept Accept Accept Accept Accept Accept ****** what does this mean? ). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy Reviewer #2: No ******** https://doi.org/10.1371/journal.pone.0337912.r003
Formally Accepted
Acceptance Letter - Sara Mucherino, Editor PONE-D-24-48620R1 PLOS One Dear Dr. Rasu, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS One. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Sara Mucherino Academic Editor PLOS One https://doi.org/10.1371/journal.pone.0337912.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .