Harnessing explainable artificial intelligence for patient-to-clinical-trial matching: A proof-of-concept pilot study using phase I oncology trials

Satanu Ghosh; Hassan Mohammed Abushukair; Arjun Ganesan; Chongle Pan; Abdul Rafeh Naqash; Kun Lu

doi:10.1371/journal.pone.0311510

Peer Review History

Original SubmissionJanuary 24, 2024
15 Jul 2024 Decision Letter - Agnieszka Konys, Editor PONE-D-24-02458Harnessing explainable Artificial Intelligence for Patient-to-Clinical-Trial matching: A proof-of-concept pilot study using Phase I Oncology TrialsPLOS ONE Dear Dr. Lu, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please carefully check the Reviewers’ comments and improve the manuscript. Reviews provide details into areas that require improvement. Please submit your revised manuscript by Aug 29 2024 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Agnieszka Konys, Ph.D. Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. Thank you for stating the following financial disclosure: “This work is supported by an internal seed fund at The University of Oklahoma.” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. 4. In the online submission form, you indicated that [Patient records cannot be shared publicly due to the HIPPA requirement. The output of the system can be shared through Google drive]. All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons on resubmission and your exemption request will be escalated for approval. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly Reviewer #3: Partly ******** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: No Reviewer #3: No ****** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No Reviewer #3: No ****** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No ****** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Strengths: Innovative Approach: The study introduces an innovative use of explainable AI to the domain of clinical trial matching, focusing on phase I oncology trials. This approach is significant as it aims to increase the efficiency of patient recruitment without compromising the quality of patient-trial matching. Use of NLP Techniques: The application of modern NLP techniques to process unstructured patient records and clinical trial protocols represents a substantial advancement over traditional manual matching methods. This could potentially streamline the matching process, making it faster and more accurate. Explainability and Transparency: One of the key contributions of this study is the emphasis on explainability in the AI matching process. Providing a summary matching score along with explanations for the evidence contributes to the transparency and trustworthiness of the AI system, which is crucial in a clinical setting. Pilot Study Design: The proof-of-concept nature of the study, demonstrated through a pilot system, showcases the feasibility of the proposed approach. The detailed analysis of the system's performance, including precision, sensitivity/recall, accuracy, and specificity metrics, provides a solid foundation for further development and refinement. Limitations: Sample Size and Data Limitations: The study is based on a relatively small dataset of synthesized dummy patient records and clinical trial protocols, which may not fully capture the complexity and variability encountered in real-world settings. Future work would benefit from larger and more diverse datasets to validate and improve the system's performance. Misclassified Cases Analysis: The manuscript discusses the instances of misclassification, attributing errors to ambiguity in abbreviations, misunderstanding of context, and variations in expression. These insights are valuable, but they also highlight the need for continuous refinement of the NLP models and preprocessing steps to handle such complexities more effectively. Scope for Expanding Eligibility Criteria: The study currently focuses on four main criteria for matching. However, clinical trial eligibility often involves a wider range of criteria. Expanding the system to consider additional criteria could enhance its applicability and accuracy. Future Directions: The manuscript outlines several promising areas for future research, including the integration of gene name thesauri to address genetic mutation matching errors and the exploration of structured reporting approaches to reduce ambiguity. Furthermore, extending the prototype system to include more detailed inclusion/exclusion criteria and testing it on a larger scale are essential steps towards realizing a practical AI-assisted patient-clinical trial matching tool. Reviewer #2: The research is emerging with the application of explanation AI. However, more detailed analysis how explainable AI is used in the proposed work is suggested. 2.The authors should consider some explanation without explainable AI and discuss the improvement achieved. 3. How, the proposed method is trustworthy, transparent and secured as mentioned by the authors, to be elaborated with more detail . 4. Some comparison with other approaches to be discussed with their pitfalls, even though the authors states no research is carried out is novel. 5. How the proposed preprocessing and feature extraction process carried out and what were the outputs and which AI technique is used, please present it in a flow chart and if possible, write down the pseudocode for them. Reviewer #3: This work propose using an AI system, that utilizes modern NLP methods to match patient records with clinical trial protocols based on four criteria: cancer type, performance status, genetic mutation, and measurable disease. It provides a matching score and evidence-based explanations. While the idea of the model is great and may help in finding th epropoer clinical triials. I have major concern about the depth of the information and number of samples as the following: - More in depth data are required to build the relationships between gene mutations+ clinical data with the clinical trial. For example, full variant information included the position the reference and the change, and more in depth about coverage. - A lack of number of samples in this study - A lack of proper measurements (evaluations) out of this study. Again, this is a good idea but still immature for publication. ****** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Abedalrhman Alkhateeb ******** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. https://doi.org/10.1371/journal.pone.0311510.r001
Revision 1
6 Sep 2024 Author Response August 30th, 2024 Dear editor, We thank you very much for your work on processing this submission. We appreciate reviewers’ feedback and comments. Below please find our responses to reviewers. Along with this letter, please also find a marked-up copy of the manuscript that highlights all changes labelled “Revised Manuscript with Track Changes,” and an unmarked version of the revised paper labelled “Manuscript.” We have also addressed the journal requirements in this revision. Please feel free to reach out if you have any questions. Sincerely, Kun Lu on behalf of coauthors Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf Response: We have revised the manuscript to meet PLOS ONE style requirements. 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. Response: The authors of this publications are seeking a patent application from this work. Sharing code publicly at this point would constitute a public disclosure of the work and conflicts with the patent application. We have added information on how to request access to the code in the Data Availability Statement in the revised manuscript. 3. Thank you for stating the following financial disclosure: “This work is supported by an internal seed fund at The University of Oklahoma.” Please state what role the funders took in the study. If the funders had no role, please state: "The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct you must amend it as needed. Please include this amended Role of Funder statement in your cover letter; we will change the online submission form on your behalf. Response: The funders had no role in the study. We have included the amended Role of Funder statement in the cover letter. 4. In the online submission form, you indicated that [Patient records cannot be shared publicly due to the HIPPA requirement. The output of the system can be shared through Google drive]. All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information. This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons on resubmission and your exemption request will be escalated for approval. Response: These data can't be shared. These data contain potential patient identifiers and genomic information that can't be shared openly in the journal. Select deidentified data can be potentially shared with readers upon specific requests to the corresponding authors. This is imposed by the Institutional Review Board based on HIPAA. Reviewer #1: Strengths: Innovative Approach: The study introduces an innovative use of explainable AI to the domain of clinical trial matching, focusing on phase I oncology trials. This approach is significant as it aims to increase the efficiency of patient recruitment without compromising the quality of patient-trial matching. Use of NLP Techniques: The application of modern NLP techniques to process unstructured patient records and clinical trial protocols represents a substantial advancement over traditional manual matching methods. This could potentially streamline the matching process, making it faster and more accurate. Explainability and Transparency: One of the key contributions of this study is the emphasis on explainability in the AI matching process. Providing a summary matching score along with explanations for the evidence contributes to the transparency and trustworthiness of the AI system, which is crucial in a clinical setting. Pilot Study Design: The proof-of-concept nature of the study, demonstrated through a pilot system, showcases the feasibility of the proposed approach. The detailed analysis of the system's performance, including precision, sensitivity/recall, accuracy, and specificity metrics, provides a solid foundation for further development and refinement. Limitations: Sample Size and Data Limitations: The study is based on a relatively small dataset of synthesized dummy patient records and clinical trial protocols, which may not fully capture the complexity and variability encountered in real-world settings. Future work would benefit from larger and more diverse datasets to validate and improve the system's performance. Response: Thanks for the comment. We agree that the small dataset is a limitation of this pilot study. We have acknowledged this as a limitation of the study in the manuscript and plan to seek resources for a larger follow-up study involving more and real patient records. This pilot study only serves as a proof-of-concept purpose to convince a larger follow-up study is feasible. However, we think this work is still valuable as there has not been any explainable-AI-based systems currently deployed for patient-clinical-trials matching for phase 1 ontology trials and the initial functions of the system can already be helpful in assisting patient-clinical-trial matching. This pilot study also points out future directions of the development. We have reorganized the content of the manuscript and added a dedicated section “Limitations and Future Directions” in the Discussion section to discuss the limitations and future directions of this study. Misclassified Cases Analysis: The manuscript discusses the instances of misclassification, attributing errors to ambiguity in abbreviations, misunderstanding of context, and variations in expression. These insights are valuable, but they also highlight the need for continuous refinement of the NLP models and preprocessing steps to handle such complexities more effectively. Response: We agree with the reviewer that continuous refinement of NLP methods is needed to improve the performance of the system. This is a future direction we plan to explore. We have added this to the “Limitations and Future Directions” section to emphasize this need. Scope for Expanding Eligibility Criteria: The study currently focuses on four main criteria for matching. However, clinical trial eligibility often involves a wider range of criteria. Expanding the system to consider additional criteria could enhance its applicability and accuracy. Response: Thanks for the comment. We agree that phase 1 oncology trials involve many more matching criteria than what we have considered (e.g. previous treatment status of the patients as some trials limit eligibility to patients after certain therapies or a certain number of treatment lines). The selection of the four main criteria in this pilot study is based on the experience of the physician on the team to include the most important ones. We do plan to consider additional matching criteria as we further develop this line of research. Future Directions: The manuscript outlines several promising areas for future research, including the integration of gene name thesauri to address genetic mutation matching errors and the exploration of structured reporting approaches to reduce ambiguity. Furthermore, extending the prototype system to include more detailed inclusion/exclusion criteria and testing it on a larger scale are essential steps towards realizing a practical AI-assisted patient-clinical trial matching tool. Response: Thanks! Yes, we agree. We have outlined these in the new section on “Limitations and Future Directions” in Discussion. Reviewer #2: The research is emerging with the application of explanation AI. However, more detailed analysis how explainable AI is used in the proposed work is suggested. 2.The authors should consider some explanation without explainable AI and discuss the improvement achieved. Response: Thanks for the comment. We realize how this part was lacking in our paper and therefore we have added a paragraph (3rd paragraph) in the “Introduction” section and the second part of “Discussion” section about implications of using explainable v/s. black-box model in healthcare or similar domains. 3. How, the proposed method is trustworthy, transparent and secured as mentioned by the authors, to be elaborated with more detail . Response: Thanks! The idea of a transparent system is to provide evidence for predictions. Our method is trustworthy and transparent because it covers two important aspects: 1) it provides not with binary prediction but rather a score implying the strength of matching, and 2) evidence that contributes toward the score is also provided to the end-users. So, ultimately it is up to the physician to make the final call, based on the score and the evidence cumulatively. For example, if a protocol serves a very specific cancer type and the patient does not have that type of cancer then the patient should not be selected even if all the three other criteria matched. But we leave it to the oncologists and only provide information to take the final decision. Our approach reduces the cognitive load from a physician by providing important information but does not take a final decision. 4. Some comparison with other approaches to be discussed with their pitfalls, even though the authors states no research is carried out is novel. Response: Thank you for the comment. As we mentioned that no previous method exists for Phase1 Clinical Trial for Oncology, so we could not compare it directly to any other method. However, we added some relevant literature in the “Introduction” and discuss their pitfalls and why this study is uniquely poised. 5. How the proposed preprocessing and feature extraction process carried out and what were the outputs and which AI technique is used, please present it in a flow chart and if possible, write down the pseudocode for them. Response: Thanks! We have added a flow-chart (Figure 1) in Section 3. Also, we want to point towards: Figure 2: describes the tools and normalizations that were performed in the pre-processing step, Figure 3: show the output of the pre-processing step, and Figure 4: contains an example output of the system. To explain the nature of AI technique we have added the following text: “… an unsupervised prototype NLP system that follows a mixed approach combining tools, regular expressions and expert curated rules …” in the description of our work (can be found in the last paragraph of “Introduction”). Reviewer #3: This work proposes using an AI system, that utilizes modern NLP methods to match patient records with clinical trial protocols based on four criteria: cancer type, performance status, genetic mutation, and measurable disease. It provides a matching score and evidence-based explanations. While the idea of the model is great and may help in finding the proper clinical trials. I have major concern about the depth of the information and number of samples as the following: - More in depth data are required to build the relationships between gene mutations+ clinical data with the clinical trial. For example, full variant information included the position the reference and the change, and more in depth about coverage. Response: We thank the reviewer for the insightful comment. However, while we agree that more in-depth genomic data would be important for therapy response or sensitivity, that is not the focus of this current body of work and is not relevant to this manuscript at this point. We are focused on NLP and textual information in this study. - A lack of number of samples in this study Response: Thanks for the comment. We acknowledge the small sample size is a limitation of this pilot study. We are using this pilot study as a proof-of-concept to seek resources for a larger follow-up study involving more and real patient records. We have added a subsection in the Discussion section on “Limitations and Future Directions.” However, we believe this pilot study is still valuable as there has not been any explainable-AI-based systems currently deployed for patient-clinical-trials matching for phase 1 ontology trials. In addition, some functions of the system can already be helpful in assisting patient-clinical-trial matching. - A lack of proper measurements (evaluations) out of this study. Response: Thanks for the comment. Our system is developed to assist matching patients with phase 1 oncology trials. The matching decisions can be considered as a classification problem with the outcomes to be either match or non-match. We therefore evaluate the system accordingly using the classic confusion matrix, and then report metrics such as accuracy, precision, recall, specificity etc. The results provide insights into where the system made correct decisions (the ones match those from a physician) and where the system produced false positives or false negatives. This is a common practice in evaluating classification problems and should have provided sufficient information on the performance of the system. We kindly request the reviewer to clarify if there is some other information on evaluation is needed. Attachments Attachment Submitted filename: Response to Reviewers.docx https://doi.org/10.1371/journal.pone.0311510.r002
20 Sep 2024 Decision Letter - Agnieszka Konys, Editor Harnessing explainable Artificial Intelligence for Patient-to-Clinical-Trial matching: A proof-of-concept pilot study using Phase I Oncology Trials PONE-D-24-02458R1 Dear Dr. Lu, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager® and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Agnieszka Konys, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: https://doi.org/10.1371/journal.pone.0311510.r003
Formally Accepted
11 Oct 2024 Acceptance Letter - Agnieszka Konys, Editor PONE-D-24-02458R1 PLOS ONE Dear Dr. Lu, I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team. At this stage, our production department will prepare your paper for publication. This includes ensuring the following: * All references, tables, and figures are properly cited * All relevant supporting information is included in the manuscript submission, * There are no issues that prevent the paper from being properly typeset If revisions are needed, the production department will contact you directly to resolve them. If no revisions are needed, you will receive an email when the publication date has been set. At this time, we do not offer pre-publication proofs to authors during production of the accepted work. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few weeks to review your paper and let you know the next and final steps. Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. If we can help with anything else, please email us at customercare@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Agnieszka Konys Academic Editor PLOS ONE https://doi.org/10.1371/journal.pone.0311510.r004

Open letter on the publication of peer review reports

PLOS recognizes the benefits of transparency in the peer review process. Therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. Reviewers remain anonymous, unless they choose to reveal their names.

We encourage other journals to join us in this initiative. We hope that our action inspires the community, including researchers, research funders, and research institutions, to recognize the benefits of published peer review reports for all parts of the research system.

Learn more at ASAPbio .