A prospective single-masked, non-inferiority, parallel-group randomized controlled trial of the efficacy of a ChatGPT-based AI chatbot to improve Boston bowel preparation scores for colonoscopy preparation: A trial protocol

Nabil Mohammad Azmi; Muhammad Irfan Abdul Jalal; Siti Hamizah Mohd Ashar; Muhammad Irfan Mohd Nazri; Young Jie; Nagulan Ganeson; Joane K. Augustine; Yew Sheng Qian

doi:10.1371/journal.pone.0334349

Abstract

Artificial intelligence (AI) is transforming healthcare through tools like large language model chatbots. AI chatbots can simulate human conversation, provide personalized information, and interact with patients in real time. Their ease of use and conversational interface make them attractive for healthcare education, especially in resource-limited settings. We propose a prospective, single-masked, randomized controlled trial to evaluate whether an AI-based chatbot (ChatGPT) is non-inferior to standard counseling in terms of patients’ adherence to pre-colonoscopy bowel preparation instructions and thus enhance the Boston Bowel Preparation Score (BBPS). Patients undergoing colonoscopy (n_total = 96) will be randomized to ChatGPT 4.0 Large Language Model (LLM)-aided Colonoscopy Counseling (N_ChatGPT = 48) or standard counseling (n_sc= 48) arms at a 1:1 ratio using a central block randomization scheme of varying block sizes. In the first group, participants will interact with ChatGPT 4.0 for bowel preparation counseling before colonoscopy, whilst the second group will receive standard counseling from trained clinicians. Only the outcome assessors will be masked to the intervention allotment. The primary endpoint is the BBPS, assessed for non-inferiority. Secondary endpoints are patient anxiety (DASS-21) and patient satisfaction assessed using DASS-21 and PSQ-18 questionnaires, respectively and the findings will be reported descriptively with two-sided 95% confidence interval and any p-values will be considered exploratory without multiplicity adjustment. The primary endpoint data will be analyzed using the intention-to-treat (ITT) analysis and non-inferiority framework based on the analysis of covariance (ANCOVA) to control the confounders (age, gender (male as the risk factor), prior colonoscopy experience, colonoscopy indication, and baseline constipation score). The results will be compared with the findings based on the per-protocol (PP) analysis as part of the sensitivity analysis. The protocol adheres to SPIRIT 2025 and the SPIRIT-AI extension guidelines to ensure comprehensive reporting of this AI-based intervention. This trial has received ethics approval and the trial protocol has been registered with the clinicaltrials.gov registry (NCT06905782).

Citation: Mohammad Azmi N, Abdul Jalal MI, Mohd Ashar SH, Mohd Nazri MI, Jie Y, Ganeson N, et al. (2025) A prospective single-masked, non-inferiority, parallel-group randomized controlled trial of the efficacy of a ChatGPT-based AI chatbot to improve Boston bowel preparation scores for colonoscopy preparation: A trial protocol. PLoS One 20(10): e0334349. https://doi.org/10.1371/journal.pone.0334349

Editor: Chih-Wei Tseng, Dalin Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, TAIWAN

Received: June 20, 2025; Accepted: September 25, 2025; Published: October 15, 2025

Copyright: © 2025 Mohammad Azmi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Deidentified research data will be made publicly available when the study is completed and published.

Funding: This study is financially sponsored by the Faculty of Medicine Fundamental Grant, Universiti Kebangsaan Malaysia (Project code: FF-2025-112).

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: AI, Artificial Intelligence; BBPS, Boston Bowel Preparation Score; CRC, Colorectal Cancer; ADR, Adenoma Detection Rate; DASS-21, Depression Anxiety and Stress Scale-21; PSQ-18, Patient Satisfaction Questionnaire-18

Introduction

Colonoscopy is the gold standard for colorectal cancer (CRC) screening and detection of adenomas. Its life-saving potential depends critically on high-quality bowel preparation, as poor preparation can reduce the adenoma detection rate (ADR) and necessitate repeat procedures [1,2]. Inadequate preparation also increases costs and may delay diagnosis of precancerous or malignant lesions, thereby compromising patient outcomes [3]. Previous research demonstrated that up to one-third of colonoscopies have suboptimal preparation, with rates between 6.8% and 33% [3]. Common factors associated with poor bowel preparation include patient non-compliance, misunderstanding of instructions, and procedural anxiety, often influenced by sociodemographic variables such as older age and lower education [4]. Efforts to improve preparation have focused on patient education and support.

Adherence to clear bowel preparation instructions is crucial for colonoscopy success. However, ensuring patient compliance remains challenging due to barriers such as limited health literacy and anxiety about the procedure [4]. Inadequate preparation often leads to failed colonoscopies, delayed diagnoses, and increased healthcare burden. Conventional counseling by healthcare providers has limitations in reaching and engaging all patients effectively.

Artificial intelligence (AI) offers innovative solutions to augment patient education. AI systems have achieved remarkable success in medical image analysis, for example, with deep convolutional neural networks (CNNs) improving bowel cleanliness scoring (e.g., ENDOANGEL) [5]. Yet, most image-based AI tools do not directly enhance patient communication. In contrast, large language models like ChatGPT can engage patients in natural conversation. ChatGPT (based on GPT-4) has demonstrated proficiency in medical text processing and can improve doctor–patient dialogue [6]. Its multimodal abilities also extend to image recognition, suggesting future versatility in patient interaction.

AI chatbots are emerging in healthcare for tasks including triage, education, and counseling [7–16]. By simulating a human conversational partner, chatbots can deliver standardized and personalized guidance on a large scale. They are accessible via widely available digital platforms (smartphones, computers) and can overcome provider shortage by supplementing routine care [7–9,12–16]. Chatbots can also actively engage users who have poor health literacy or limited access by using clear, jargon-free language and continuous availability [12–15]. Two recent systematic reviews found that conversational agents generally improve patient engagement and outcomes, for example, 75% of studies reported positive effects on usability and adherence [10,17]. In healthcare settings, chatbots have shown promise in increasing adherence to pre-procedure protocols through interactive guidance, and they can help alleviate patient anxiety by providing supportive information.

Recent work has examined ChatGPT-4 in a different part of the bowel-preparation pathway, namely, as a stand-alone image grader of bowel cleanliness on colonoscopy frames using the Boston Bowel Preparation Scale (BBPS). In that concordance analysis, ChatGPT-4 labelled fewer frames as adequately prepared than human endoscopists (62.9% vs 89.3%), with slight agreement on adequacy (κ = 0.099, p < 0.001) and moderate concordance for raw BBPS scores (W = 0.554, p = 0.036) [18]. However, our trial addresses a distinct question: whether an LLM-based patient-education adjunct can improve adherence to standard preparation. In our study, BBPS is scored exclusively by masked endoscopists during colonoscopy, and the chatbot does not perform any image interpretation or outcome assessment. To date, most evidence has come from psychiatric [19] or medical education studies [20], with limited data on procedural preparation such as colonoscopy. Studies noted gaps in safety, privacy, and bias assessment in many AI applications [17]. To address this concern, this trial incorporates best practices for AI interventions. The protocol explicitly follows the Standard Protocol Items: Recommendations for Interventional Trials (SPIRIT) 2025 guidelines and the SPIRIT-AI extension for AI-based clinical trials, ensuring transparent reporting of the AI system, input/output handling, human–AI interaction, error monitoring, and performance evaluation [21,22].

We will compare standard bowel preparation for colonoscopy counseling with an AI chatbot–augmented approach, together with conventional healthcare worker counseling. We hypothesize that patients who interact with the ChatGPT-based counselor will have better bowel cleanliness (based on BBPS scores) compared to controls. The study will also exploratorily compare anxiety and satisfaction of trial participants receiving these two different interventions and any AI-related safety issues. Here, we present the trial design and methods.

Methods

Study design

This is a prospective, single-center, single-masked (outcome assessor), two-parallel arm, non-inferiority randomized controlled trial. The primary goal is to assess whether AI chatbot counseling improves bowel preparation quality based on BBPS score for elective colonoscopy. Eligible patients will be randomized (block randomization, block size = 4) in a 1:1 ratio to either the AI intervention group or the control group receiving standard instructions. Endoscopists, who will score the BBPS during colonoscopy, will be masked to group allocation. The protocol adheres to SPIRIT 2025 guidelines and has received ethical approval from the Universiti Kebangsaan Malaysia (UKM) Research Ethics Committee (Ref: UKM PPI/111/18, Date of approval: 14^th Mar 2025). Trial registration with ClinicalTrials.gov has been approved on 01/04/2025 (ID: NCT06905782).

Table 1 shows the SPIRIT 2025 schedule of participant enrollment, administration of interventions and evaluations of trial endpoints. The structured, summarized details of the trial design, based on the World Health Organization (WHO) Trial Registration Dataset, are given in Table 2. The planned participant data collection is summarized in Table 3. Written informed consent will be obtained from each study participant by the trained study recruiters. All participants will be covered by private insurance schemes as a means of compensation in the event of adverse events (including death) occurring related to the procedures. The participants will not be involved in the design and conduct of the trial since the trial will not involve any patient advocacy groups.

Download:

Table 1. Schedule of participant enrollment, administration of interventions and assessments of study outcomes for this trial (adapted from SPIRIT 2025 guideline).

https://doi.org/10.1371/journal.pone.0334349.t001

Download:

Table 2. Components of the WHO Registration Data Set. Extracted from the ClinicalTrials.gov registry and adapted from the SPIRIT 2025 guideline.

https://doi.org/10.1371/journal.pone.0334349.t002

Download:

Table 3. Relevant participant demographics will be collected during the trial.

https://doi.org/10.1371/journal.pone.0334349.t003

Study setting

The trial will be conducted at Hospital Canselor Tuanku Muhriz (HCTM), the teaching hospital of UKM in Cheras, Kuala Lumpur. Patient recruitment will be carried out between 01/04/2025 and 31/07/2025.

Eligibility criteria

Individuals who wish to take part in this study must meet several inclusion criteria. They need to be at least 18 years old and scheduled for a first-time or repeat elective colonoscopy at HCTM. Besides, participants should be comfortable using a smartphone, tablet, or computer to interact with a web-based system, have basic digital literacy and have personal internet access (provided by HCTM if needed). We require participants to have basic digital literacy in order to interact with the chatbot interface. For participants who are otherwise eligible but lack a personal device, the trial will provide a loaner tablet and a short onboarding session (10 minutes or less) to enable participation. The provision of loaner devices aims to mitigate but not eliminate selection bias; the trial remains a pragmatic first-phase evaluation among individuals who can use a mobile/tablet interface. They also need to be able to understand and communicate in either English or Bahasa Malaysia, as the chatbot used in the study will rely on these languages. Lastly, they must be prescribed a standard bowel preparation regimen by their attending doctor, without any special adjustments.

At the same time, the following individuals will be excluded from the trials: those with significant cognitive problems, such as dementia or severe neurological conditions, that affect their ability to follow instructions, patients who require a customized bowel preparation, for example due to chronic kidney disease, inflammatory bowel disease, or similar conditions, those with severe anxiety disorders or other psychiatric illnesses that could affect how they experience or report anxiety and individuals who are not able or do not consent to study participation.

Randomization and masking

After eligibility screening and informed consent, participants will be randomly allocated to the AI chatbot arm or control arm. A computer-generated random sequence (blocks of 4) will be created by the trial statistician (MIAJ) for assigning participants to the allotted interventions in a 1:1 ratio and this will be stored in the REDCap system. All other study personnel will not have access to the randomization sequence to maintain allocation concealment. Recruitment will be mainly performed by the principal investigator (NMA). The REDCap system will only be used by the study personnel to allocate participants to trial interventions after participants have consented to trial participation. The trial statistician (MIAJ) will neither be involved in participant recruitment nor in the administration of the allotted intervention.

Participants and other study personnel will not be masked to the assigned interventions, but the primary outcome will be objectively assessed by endoscopists who will be masked to the interventions received by the participants. Complete masking of participants and staff is not feasible since participants are inherently aware of whether they interact with an AI chatbot or a clinician, and staff facilitating the counseling or questionnaires necessarily know the intervention delivered. However, staff influence will be minimized through strict adherence to standardized hospital counseling protocols (control arm) and scripted questionnaire delivery (ChatGPT arms). Importantly, the BBPS is objectively evaluated by endoscopists who are masked to allocation, preserving the integrity of the primary endpoint evaluation. The endoscopists and participants will be instructed to maintain minimal conversation to prevent unmasking. No emergency unmasking is anticipated since we do not expect any significant harm to be associated with the interventions received by the participants.

Interventions

Control Arm (Standard Counseling).

Participants in the control group will receive routine pre-procedure counseling as per hospital protocol. This includes written instructions (as a brochure) and/or face-to-face explanation by trained healthcare staff about dietary restrictions and bowel preparation steps. The counseling covers key points on diet modification and laxative use (Polyethylene glycol 4000 (Fortrans™); DCH-Auriga, Malaysia), where one sachet of Fortrans^TM will be mixed with one liter of water and taken at 6:00 PM and 8:00 PM the night before, and another 1 liter at 6:00 AM on the colonoscopy procedural day. The content and dosing regimen were developed with reference to the European Society of Gastrointestinal Endoscopy (ESGE) Guideline Update 2019 on bowel preparation for colonoscopy [23] and the Malaysian Ministry of Health (MOH) Clinical Practice Guidelines on colorectal cancer screening [24]. Our regimen specifies that the final sachet should be consumed 4–5 hours before the procedure, completed at least 2 hours before colonoscopy. This adaptation reflects local institutional policy and MOH recommendations, while remaining aligned with ESGE’s general principles of split dosing and minimum fasting intervals. This is further corroborated by evidence that modest variations within this range do not compromise bowel cleanliness or adenoma detection [25,26]. Standard practice may include answering patient questions in person but does not involve any AI tools.

Intervention arm (AI-ChatGPT Chatbot Counseling).

Participants in the intervention group will receive a supplementary counseling session via an AI chatbot powered by ChatGPT. The chatbot content is customized for colonoscopy preparation and is based on reputable guidelines from the ESGE and MOH recommendations. Each participant will first be given a personalized script (electronic or printed) that contains their name, age, relevant comorbidities, current medications, and scheduled procedure date. This script also summarizes the standard preparation protocol for clarity. Participants choose their preferred language (English or Malay), and the script is provided in that language.

After reviewing the script, participants will access the ChatGPT-based chatbot on a hospital-provided laptop (or personal device) in a private area. The chatbot is implemented via the OpenAI web interface (using ChatGPT version 4.0). The chatbot’s parameters are fixed (temperature = 1.0 for creative and empathetic responses, max tokens = 200,000) and each user’s query will be submitted as a new session to prevent context carryover. A temperature parameter of 1.0 was chosen since our pilot testing showed that lower values (0.0–0.7) produced rigid, sometimes confusing phrasing that reduced patient comprehension, while 1.0 maintained natural dialogue without compromising accuracy of instruction for bowel preparation. Besides, all dosing and safety-critical instructions (dietary restrictions, purgative timing, fasting intervals, sedation precautions) are delivered using templated, pre-specified messages to ensure consistency across participants. The temperature parameter affects only conversational phrasing and clarifications, selected to improve patient comprehension and engagement.

To ensure the participants received consistent guidance throughout the trial, the ChatGPT used in this trial runs on a locked local deployment finalized before the trial starts to eliminate variability due to external model updates. Core educational content, including dietary restrictions, purgative timing, and fasting intervals, is implemented as templated outputs to ensure that all participants receive identical evidence-based instructions. The language model delivers this content conversationally and provides clarifications, but does not alter the dosing regimen. All interactions are stored as transcripts with date and time stamps for monitoring.

The chatbot interaction proceeds as follows: the participant inputs their personalized script, and then the chatbot provides step-by-step guidance on dietary restrictions (e.g., clear fluid allowance, solid food cutoff) and bowel preparation (laxative timing and use). The chatbot also offers motivational encouragement and reinforcement messages. Participants may ask the chatbot additional questions about the procedure and preparation; the chatbot will answer within predefined safety limits (e.g., it can clarify instructions but will not give medical diagnoses). No personal identifiable information (PII) is shared with ChatGPT since all interactions are captured via anonymized user inputs. If any user input is incomplete or unclear, the system’s input validation will prompt for clarification. Clinic staff are available to assist with technology, but will not interject into the chatbot responses. Each chatbot session is expected to last 15–30 minutes.

The instructions for bowel preparation delivered by the ChatGPT chatbot will be developed through an iterative process: Clinical guidelines (ESGE, MOH) will be used to construct initial patient questions and corresponding answers. We will subsequently employ ChatGPT to draft dialogue scripts in both English and Malay. Multiple refinements ensured localization (e.g., replacing “pencahar” with “pelawas” in the Malay language, specifying local timing for laxative doses at 6 PM/8 PM and 6 AM/ 9 AM) and alignment with institutional practice. The final script will be structured into conversational sections (Introduction, Dietary Preparation, Bowel Preparation, Mental Preparation, Risks & Safety, Day of Procedure) to mirror typical patient concerns. Clinicians will review the content for accuracy, cultural appropriateness, and readability before deployment. Thus, ChatGPT’s knowledge base consists of standardized, evidence-based bowel preparation protocols, with patient-friendly language. The readability of the responses generated by ChatGPT will be assessed using the Flesch-Kincaid Grade Level score and the Gunning-Fog index [27,28].

Those who cannot complete the structured input forms or who submit incomplete, non-English/non-Malay, or ambiguous responses will be identified during enrollment and will be referred to standard counseling instead (data from such cases will not enter AI analysis). The chatbot will not collect PII, and responses longer than 500 words (approximately between 800 and 1000 tokens) will be truncated to fit processing constraints.

In this trial, ChatGPT is used only to support standard care by providing text‐based bowel-preparation advice. Clinicians retain full responsibility for all decisions and will decide when in-person counseling is needed. If a patient raises questions that ChatGPT cannot answer, or if the information provided is not adequate for safe decision-making, the clinician in charge will step in and deliver conventional, in-person counseling.

AI System Infrastructure and Privacy.

The chatbot is hosted on a secure cloud platform with strong data protection. Access is restricted to trial participants and authorized study personnel through authentication controls. All data transmissions are encrypted end-to-end. The ChatGPT system logs interaction data (timestamps, user inputs, and chatbot responses) for analysis, but no identifiable patient information is ever stored. Input restrictions (English/Malay text only, structured formats) ensure compatibility with the model. Responses exceeding the token limit are automatically truncated or rephrased by the system.

Clinical sites will have internet-enabled devices (computers, tablets, or smartphones) available in private rooms for supervised interactions if needed. Printed or digital instructions will orient participants on using the chatbot platform. Patients may also access the chatbot remotely via their own devices to review the information later. Technical support will be available to troubleshoot any access issues.

AI error taxonomy, thresholds and escalation.

All chatbot interactions will be logged and reviewed according to a pre-specified error taxonomy. Errors are classified as Critical, Major or Minor (definitions below). A weekly random sample of 10% of sessions will be reviewed by trained clinicians and review sampling will expand to 25% if thresholds are exceeded. Escalation rules are:

Critical error: any instance of advice that could cause direct physical harm (e.g., wrong dosing interval for the laxatives leading to electrolyte imbalance; instruction to discontinue essential medication without context) will immediately suspend the affected module and trigger urgent review by the Safety Officer and Data Monitoring Committee (DMC).
Major errors: defined as incorrect or misleading clinical advice that could materially reduce bowel preparation adequacy (e.g., wrong laxative timing exceeding 6 hours). If 1% or more of audited sessions in a rolling 2-week window contain Major errors, root cause analysis will be performed, chatbot prompts/content will be corrected, and audit sampling will increase to 25% until the error rate falls below the threshold.
Minor errors: non-material issues such as ambiguous wording, style, or formatting. If more than 5% of audited sessions contain minor errors in two consecutive reviews, content will be refined and monitored more closely.

This structured governance approach aligns with international AI safety recommendations, including WHO and SPIRIT-AI/CONSORT-AI guidance [29,30], healthcare AI safety principles [31–33], the National Institute of Standards and Technology (NIST) AI Risk Management Framework [34], and explainability principles for multidisciplinary evaluation of AI in health [35].

Algorithm Details and Updates.

The AI chatbot uses OpenAI’s ChatGPT-4.0 model accessed using the Application Programming Interface (API). The deployed model version and system parameters (temperature, token limit) will be documented. If OpenAI releases updates during the trial period, we will record any change in model version and assess for consistency. Since ChatGPT is a proprietary model, its internal weights are not modifiable; we rely on configuring prompts and contexts. Our focus is on standardizing inputs and validating outputs to achieve reproducibility.

Human–ChatGPT Interaction and Expertise.

The intervention is designed for direct patient use without real-time expert oversight. Participants need only basic digital literacy. Healthcare staff will introduce the chatbot and assist with any technical difficulties or initial login. They will not censor or alter the chatbot’s responses, but will be available to clarify instructions if the participant requests additional explanation. No specialized training in AI is required for staff or participants.

Error Monitoring and Performance Evaluation.

Given the novel AI-ChatGPT intervention, we will systematically monitor chatbot performance. The study will classify any AI-generated errors (e.g., incorrect guidance, omissions, ambiguous responses) by frequency and type. Automated system logs will detect anomalies or repeated errors in output patterns. Additionally, at least 10% of chatbot sessions (randomly selected) will be reviewed by clinicians to assess the accuracy and appropriateness of the guidance. Participants can also flag any confusing or incorrect responses, which will be recorded. We will calculate error rates (e.g., percentage of responses flagged) and perform qualitative case reviews to identify common issues.

To mitigate safety risks, the chatbot’s knowledge base is restricted to evidence-based bowel prep content. Potential hallucinations (fabricated or unsafe advice) are minimized by configuring the prompt context to clinical guidelines and by implementing real-time checks: if the chatbot’s output falls outside established instructions or is contradictory, it will be logged and corrected offline. In practice, if a response appears clinically inappropriate (e.g., suggesting contraindicated actions), the participant will be directed to human counseling immediately. An escalation protocol ensures that any ambiguous or safety-related queries are referred to staff. These measures are intended to catch and correct AI errors before they impact patient care.

Bias, Explainability, and Reproducibility.

We will actively address common AI concerns. The chatbot’s training and prompts are based on standardized clinical protocols to reduce hallucinations or omissions. Apart from those, during the prompt engineering phase, we will also use the shortest possible prompts based on formal languages with maximum use of concrete words representing tangible objects to reduce ChatGPT’s hallucination rate [36]. Besides, the ChatGPT system is tested to operate within defined knowledge boundaries (preventing speculative answers). We will conduct periodic audits of AI advice across demographic subgroups to check for any bias (e.g., differences in style or content by patient age or language). If any bias or disparity is found (for example, if the AI inadvertently gives different instructions for Malay versus English outputs), we will adjust the model prompts accordingly.

The interaction framework is standardized: inputs are structured (same script template) and outputs follow fixed templates (sections on diet, instructions, etc.). This consistency enhances reliability and reproducibility. We will document the exact prompt templates, session settings, and output formats to enable replication. In addition, patient-facing language is kept clear and conversational to aid explainability. Participants can request simpler explanations from the chatbot if needed.

Data Protection and Privacy for ChatGPT-based intervention development.

All AI interactions are designed with privacy safeguards. No PII is entered into the chatbot system. Data transmissions are encrypted, and the platform enforces access controls. Participant responses are de-identified before storage; we will use de-identification techniques to ensure that logged transcripts cannot be traced back to individuals. All data analysis will use anonymized IDs.

Reproducibility and Access to Intervention Details.

ChatGPT 4.0 is a proprietary model not available for modification or distribution. Therefore, to support reproducibility, we will fully describe the intervention’s configuration: model version, system parameters, prompt designs, and supervision procedures. Study findings, including the scripts used and performance results, will be published. While the chatbot code cannot be shared, researchers with appropriate licenses could replicate the approach by following our documented methods. Anonymized interaction data (without PII) may be made available to other researchers upon reasonable request and institutional approval.

Assessments of Intervention Adherence.

Adherence will be assessed using a triangulated approach: (a) participant self-report with timestamps (intervention: chatbot confirmation logs; control: electronic diary/phone confirmation); (b) nurse-verified time of last oral intake on admission; and (c) cross-checking where discrepancies exist.

Adherence categories are pre-specified, consistent with ESGE guidelines and published adequacy thresholds [23,37]:

Full adherence: all doses consumed and final dose initiated between 4 and 6 hours before procedure start and completed 2 hours or more before procedure.
Partial adherence: Single missed dose or final dose outside the 2-to-6-hour window but within 12 hours.
Non-adherent: More than one missed dose or final dose initiated more than 12 hours before procedure.

The per-protocol population comprises participants meeting full adherence. Adherence rates will be reported by arm and incorporated in sensitivity analyses, following recommendations for transparent adherence reporting [38].

Trial endpoint assessments.

The primary endpoint of this study is the quality of bowel preparation, assessed using the BBPS during colonoscopy [39,40]. This scoring system evaluates three segments of the colon, right, transverse, and left, with each segment scored from 0 to 3, resulting in a total score ranging from 0 (indicating poor preparation) to 9 (indicating excellent preparation). Colonoscopy and BBPS scoring will be performed by two consultant-level endoscopists, each with at least five years of independent colonoscopy practice and have carried out at least 200 colonoscopies per year. Prior to trial commencement, all participating endoscopists undertook a structured calibration exercise involving independent scoring of recorded colonoscopy videos using the BBPS, followed by consensus discussion until concordance was reached. During the trial, inter-observer reliability will be evaluated by secondary review of 10% of procedures using recorded images, with agreement quantified using intra-class correlation coefficients (ICC). Periodic calibration meetings will be held to maintain consistency [39,40]. Each patient’s total BBPS score will serve as the primary outcome measure for comparing bowel preparation quality across study arms.

Several secondary endpoints will also be assessed. Patient anxiety will be measured using the anxiety subscale of the Depression Anxiety and Stress Scale-21 (DASS-21), administered on the day of the colonoscopy to evaluate state anxiety following the intervention. Patient satisfaction will be assessed using the Patient Satisfaction Questionnaire-18 (PSQ-18), completed immediately after the procedure to capture perceptions of the preparation process and the clarity of information provided. The DASS-21 and PSQ-18 were selected since both have been validated and widely used instruments with strong psychometric properties in our setting. We acknowledge, however, that DASS-21 uses a one-week recall frame and may not fully capture very acute anxiety fluctuations on the day of colonoscopy and that PSQ-18 scores could be influenced by residual sedation. We will acknowledge this limitation when interpreting the trial results, though the application of identical instruments across both groups may reduce the risk of bias.

All questionnaires have been translated into Malay and have good properties of inter-rater reliability and validity: DASS-21 for anxiety (Cronbach’s α: 0.81 (English) – 0.84 (Malay)) [41,42], and PSQ-18 for satisfaction (Cronbach’s α: 0.745 (Malay) – 0.750 (English)) [43,44]. All secondary outcomes are scored on Likert scales for standardization. Anxiety and satisfaction are assessed at the time of the procedure when patient concerns are most salient, providing a realistic gauge of the intervention’s impact.

To minimize performance and detection bias, patient-reported outcomes (DASS-21, PSQ-18) will be administered by trained research assistants using a verbatim scripted protocol, without disclosing intervention allocation. Patients will complete questionnaires electronically, ensuring self-entry and eliminating interviewer influence. Research assistants will receive structured training and ongoing monitoring to ensure protocol fidelity. All questionnaires will be administered at prespecified time points in a quiet, standardized environment.

Finally, the proportion of patients achieving adequate bowel preparation, defined as a BBPS score of at least 2 in all three segments, will be calculated to determine overall effectiveness. Additional exploratory outcomes will include standard colonoscopy quality metrics such as the adenoma detection rate (ADR) and cecal intubation rate, providing context for interpreting the impact of preparation quality on procedural performance.

Adverse event assessments and monitoring

Adverse events (AE) are defined as “an abnormal sign, symptom, laboratory test, syndromic combination of such abnormalities, untoward or unplanned occurrence (e.g., accident), or any unexpected deterioration of concurrent illness” [45]. For serious AE (SAE), this is defined as “adverse events that result in the following outcomes: 1) death; 2) life-threatening AEs; 3) inpatient hospitalization or prolongation of existing hospitalization; 4) a persistence of significant incapacity or substantial disruption [46].

Earlier research has demonstrated that the anticipated adverse events are essentially consistent with the complications typically associated with standard diagnostic and therapeutic colonoscopy procedures, such as abdominal bloating, flatulence, and the rarer but more serious adverse events, such as colonic perforation, infection, and post-polypectomy syndrome. Besides, the ChatGPT utilized in this study is a proprietary large language model that does not involve direct contact with the human body. Therefore, ChatGPT is assumed to have minimal harm to study participants.

The participants will be monitored for 24 hours following colonoscopy for AE and SAE through clinical observations in the daycare ward and through telephone calls once the participants are discharged. All AEs and SAEs will be recorded in the case report form (CRF) and the details include the adverse event’s characteristics, the date and time of onset and disappearance and severity of the AEs. AEs related to colonoscopies will be classified using the American Society for Gastrointestinal Endoscopy (ASGE) endoscopic adverse-event lexicon [47] and reported according to ASGE guidance for colonoscopy complications [48]. Severity of AEs will be classified based on the Common Terminology Criteria for Adverse Events (CTCAE) Version 5 [49], restricted to items relevant to colonoscopy and the peri-procedural period, including gastrointestinal bleeding, perforation, abdominal pain, nausea or vomiting, hypotension and hypoxia. This combined framework combines field-standard endoscopy terminology with a widely used graded severity scale to improve consistency, transparency and comparability across studies. Participants experiencing SAEs will be treated via standard clinical management practice. All AEs grade 3 and above will be reported to the UKM Ethics Committee within five business days. On the other hand, SAEs (including Sudden Unexpected Serious Adverse Reactions (SUSAR)) will be reported to the UKM Ethics Committee within 24 hours (expedited reporting). All study participants will be provided sufficient ancillary care according to the standard clinical management protocol for the treatment of each adverse event.

Sample size calculation

The trial is only powered for the primary endpoint, the BBPS. The sample size is based on detecting a difference in BBPS scores, which was calculated using PS Software version 3.1.2 (Dupont and Plummer, 2014, available from: https://github.com/vubiostat/ps/raw/refs/heads/master/bin/pssetup3.exe). We utilized the information from a similar trial by Zhu and colleagues [33]. The standard deviation of the BBPS in the standard instruction group was chosen to be 1.81, based on Zhu and co-workers [50]. We also assumed a minimum clinically important difference of BBPS between the ChatGPT-guided and standard instruction groups to be at least 1.1 BBPS points for the non-inferiority margin. This was determined based on validation studies of the BBPS demonstrating that differences one point or less do not materially impact adenoma detection rate (ADR) or adequacy classification (total BBPS score of 6 or more with all segments of 2 or more) [39,40]. This threshold also aligns with ESGE recommendations that bowel preparation adequacy should be maintained in at least 85% of cases [23]. The selected margin, therefore, represents a conservative yet clinically meaningful limit.

Using a two-sample t-test (type I error rate (two-sided) = 0.05, power (1-β) = 80%) and allocation ratio (m) of 1:1, 43 patients per group (n_total = 86) are required. Considering a 10% attrition rate due to failure to obtain consents in eligible participants, participant withdrawal prior to randomization and others, the sample size is inflated to 48 participants per group (n_total = 96). Hence, we plan to screen 96 potential participants for eligibility before randomization.

No formal sample size calculation was performed for anxiety or satisfaction scores due to limited prior data. The planned sample size is thus not powered to detect differences in secondary endpoints, specifically anxiety (DASS-21) and satisfaction (PSQ-18) scores. This approach is in line with the methodological standards for non-inferiority trials, in which the sample size is driven by a single clinically decisive primary endpoint, while secondary outcomes are used to provide supportive and contextual evidence [51–53].

Participant recruitment strategy and recruitment monitoring

We will recruit eligible participants who are patients scheduled for elective colonoscopy at the Endoscopic unit of HCTM, who will be identified from the weekly endoscopic list. The patient’s eligibility will be verified by medically qualified personnel, who must document this in the patient’s medical records upon taking consent for colonoscopy. As an additional strategy to ensure adequate patient recruitment, the medical officers at the HCTM surgical outpatient clinics will be notified to inform the research team members if potentially eligible participants are identified during the outpatient appointments.

Based on our institutional audit data (around 250 colonoscopies per month, with approximately 40 percent meeting eligibility and a 50 percent consent rate), we expect to recruit 96 participants within 3 months (maximum 5 months if needed) at a single center. Recruitment progress will be monitored weekly and contingency strategies are in place should accrual rates fall below expectations based on this traffic light criteria:

Green: 24 or more participants recruited and randomized per month, with less than 10% attrition rate.

Yellow: 15–23 participants recruited and randomized per month, with less than 10% attrition rate. Strategies are found to increase the recruitment rate to 24 or more participants per month (green level).

Red: Less than 15 participants recruited and randomized per month, with less than 10% attrition rate. No strategies are found that can result in a sustainable increase in the recruitment rate to the green level.

The trial will be stopped if the recruitment rate falls under the red category for the trial progress criteria. The trial will be cautiously monitored for two months if the recruitment rate is in the yellow category and will be stopped prematurely if the trial recruitment is within the yellow category for two consecutive months. The progress of participant recruitment will be monitored by the Trial Monitoring Group (TMG) consists of the principal investigator and trial statistician. The TMG’s recommendation on the trial continuation status will be relayed and discussed with the funder monthly. The recruitment flow of the study participants is summarized in Fig 1.

Download:

Fig 1. The planned schematic flow of the clinical trial.

https://doi.org/10.1371/journal.pone.0334349.g001

Data collection and management

Participants’ demographic data (age, gender, BMI, occupation, digital literacy), medical history (comorbidities, medication use, prior GI history, smoking/alcohol), and preparation adherence will be collected by the study investigators. For intervention patients, chatbot interaction logs (questions asked and responses given) will be collected anonymously. Individual participant data (including outcome data) will be entered into a paper-based clinical record form (CRF; available upon request to the principal trial investigator (NMA)), which will be subsequently entered into a secure SPSS spreadsheet with coded personal identifiers. Only authorized study staff will have access to the data. All data manipulation will be performed using SPSS.

BBPS will be recorded by the endoscopist immediately after colonoscopy using the standard BBPS form. Anxiety and satisfaction questionnaires will be administered by the trained study investigators on the procedure day before sedation. The study investigators will be trained by an independent psychometrician via supervised interviews of potential respondents.

To preserve data quality control, data quality checks (range, consistency) will be performed regularly by the principal investigator and trial statistician. The research personnel collecting the data will also be instructed to evaluate their own performance by checking their individual rate of missing data and implausible responses. Any rectification of errors in data recording is only allowed before the information in the CRFs is transferred into the SPSS spreadsheet. The double data entry procedure will be carried out by two research personnel to prevent errors in data entry.

Besides, missing data for primary and secondary outcomes will be minimized by protocol diligence; all endoscopists will be consistently reminded to record the BBPS score and all colonoscopy quality metrics before performing colonoscopy. Besides, the participants will also be reminded by the trained questionnaire administrators to complete the whole questionnaire and all completed questionnaire forms will be checked for response completion before colonoscopy is carried out.

Patient confidentiality and data security

The personal identifying information for each participant will be removed from the SPSS spreadsheet. All CRFs will be stored in a designated locked cabinet accessible to the principal investigator and trial statistician only. All electronic datasets will be password protected and encrypted, and the encryption key will be kept by the trial statistician. All CRFs and SPSS datasets will be stored for at least 10 years after the end of the trial and routine data inspections will be made to ensure data readability.

Statistical analysis plan

Analysis will primarily follow the intention-to-treat (ITT) principle, where all randomized trial participants will be included in the analysis according to their original allocated intervention group. All anonymized and deidentified individual participant data and statistical codes used for the data cleaning, transformation and analysis will be shared via the Harvard Dataverse Repository (https://dataverse.harvard.edu/). Statistical analysis will be conducted using SPSS Version 29 (IBM Corp, Armonk, New York, USA; 2023) or R Version 4.5.1 (R Core Team, Vienna, Austria; 2025) software.

Baseline characteristics will be summarized by group means (standard deviation) or medians (interquartile range) for continuous variables and counts and percentages for categorical variables (e.g., adequate bowel preparation rate). The normality of the continuous variables will be assessed using objective (Shapiro-Wilks test, Fisher’s coefficient of skewness [54]) and graphical measures (QQ Plot, Box and Whisker Plot).

Multiple imputations will be carried out if the percentage of missing data is more than 5% per variable under the missing at random (MAR) assumption using multiple imputation by chained equations method implemented on R “mice” package [55]. Following recommended practice, five imputed datasets will be generated, which is considered appropriate for this trial size and avoids spurious variance inflation [56–58]. Results will then be pooled using Rubin’s rules, which account for both within-imputation and between-imputation variability to obtain valid statistical inference and reduce Monte Carlo error [57,58].

The primary analysis will then be performed using Analysis of Covariance (ANCOVA), adjusting for age, gender (male as the risk factor), prior colonoscopy experience, colonoscopy indication, and baseline constipation score for the primary trial outcome, BBPS [4,59]. This limited set of covariates was chosen to improve precision without overfitting, in line with evidence that excessive adjustment in small samples undermines model stability [60,61]. The adjusted mean difference and its 95% confidence interval will be compared with the non-inferiority margin. The non-inferiority margin (Δ) for BBPS score difference will be set at −1.1 points (i.e., the AI-ChatGPT arm is considered non-inferior if the lower bound of the two-sided 95% confidence interval (CI) for the mean BBPS difference AI-ChatGPT – control is greater than −1.1 BBPS points) based on the minimum clinically important difference used for sample size calculation. The primary non-inferiority analysis will use a one-sided t-test at an alpha level of 0.025, testing the null hypothesis that the mean difference in BBPS between the intervention and control groups is less than or equal to −1.1 against the alternative that it is greater than −1.1. The non-inferiority margin of −1.1 BBPS points was selected based on validation studies showing that differences of this magnitude do not compromise adenoma detection rate or adequacy classification (BBPS ≥6 with all segments ≥2) [39,40]. This threshold is also consistent with ESGE guideline quality benchmarks that require at least 85% adequacy in bowel preparation [23]. We will then evaluate the statistical assumptions of ANCOVA (residuals’ normality and homoscedasticity, linearity of covariates with the outcome variables, homogeneity of regression slopes) and apply transformations if necessary. Similarly, rates of adequate preparation (BBPS ≥6 overall or ≥2 per segment) will be compared with a chi-square test or Fisher’s exact test if the number of cells in the contingency tables with expected counts of less than 5 is more than 20%.

For secondary outcomes, change in anxiety (DASS-21) and satisfaction (PSQ-18) scores, ANCOVA will be employed with statistical adjustment made with baseline anxiety (DASS-21) and satisfaction (PSQ-18) scores. However, for all secondary trial endpoints, the findings will be considered supportive and exploratory only. The results will be summarized with two-sided 95% confidence intervals and any p-values will be treated as exploratory with no multiplicity adjustment, in line with guidance on multiplicity and endpoint hierarchy in clinical trials [52,53,62]. For colonoscopy quality metrics (ADR and caecal intubation rate), comparisons will only be made at the univariable level using the chi-squared or Fisher’s exact test. Besides, as a measure of quality control, inter-observer reliability for BBPS scoring will also be reported using intra-class correlation coefficients (ICC) derived from the 10% of procedures undergoing dual review [39]. These results will be presented descriptively as secondary trial outcomes.

For sensitivity analysis, the results obtained based on the ITT analyses will be compared with the findings from the per-protocol (PP) analyses. Besides, if there are trial variables with a more than 5% rate of missingness, the results based on the imputed datasets will be compared with the findings from complete-case analyses. Apart from that, worst-case imputation for missing primary endpoint (control group had the best BBPS scores, AI-ChatGPT group had the worst) will also be employed to assess the robustness of the results. Furthermore, we will also perform an exploratory subgroup analysis that will also compare outcomes between first-time and repeat patients, with a treatment-by-prior-colonoscopy interaction tested descriptively to assess consistency of intervention effects across these groups [63,64]. The conclusion will be based on concordant results (i.e., non-inferiority is observed in all sensitivity analyses) as recommended by the International Council for Harmonization (ICH) E9(R1) statistical principles [65].

For non-inferiority results, they will be reported as point estimates with corresponding 95% CIs and visually presented using the forest plots, with the non-inferiority margins (e.g., −1.1 BBPS points and −10% risk difference) will be clearly indicated as vertical reference lines. One-sided p-value of 0.025 (for primary non-inferiority test) will be employed as the statistical significance thresholds.

Ethical considerations and safety

The trial will be conducted in accordance with the Declaration of Helsinki and International Council for Harmonization (ICH)’s Principles of Good Clinical Practice (GCP). Participation is voluntary and all patients can withdraw at any time without affecting their care. The chatbot provides non-judgmental guidance, and no medications or invasive procedures are involved in the intervention itself. Should a participant exhibit significant distress or report an urgent medical concern during the chatbot session, staff will intervene according to clinical judgment. Any adverse events (e.g., severe anxiety spikes) related to the study procedures will be recorded and reported to the ethics committee. Standard clinical practice will continue for all patients (for example, rescheduling a colonoscopy if bowel preparation is insufficient).

Protocol amendments

Any modifications to this protocol (e.g., changes in chatbot implementation) will be submitted to the ethics committee for approval. The trial registry will be updated with protocol versions. Study progress (enrollment dates, recruitment numbers, withdrawals, and any serious events) will be communicated to the funder and oversight bodies per institutional requirements.

Dissemination of trial findings

Results will be published in peer-reviewed journals and presented at scientific meetings. The full protocol and statistical analysis plan will be made publicly available (e.g., ClinicalTrials.gov or institutional repository).

Trial oversight and monitoring

The trial steering committee comprises the trial principal investigators (NMA and MIAJ). It is responsible for the design, execution, and overall monitoring of the progress of the trial. Besides, the trial steering committee is also responsible for executing any trial modifications, including the ones proposed by the Data Monitoring Committee (DMC).

On the other hand, the TMG is responsible for ensuring a smooth conduct of the day-to-day trial operations. This comprises NMA (principal investigator) and other co-investigators (HA, IN, YJ, NG, JA and YSQ).

The DMC is appointed to monitor the trial safety data and comprises one independent statistician and a colorectal surgeon. All DMC members declared no conflict of interest.

Trial status

At the time of submission, this clinical trial is still ongoing. Participant recruitment began on 01/04/2025 and was originally expected to be completed by 31/07/2025. Data collection was originally anticipated to be finalized by 15/08/2025 after the last check has been carried out to ensure data integrity. The study results were originally expected to be available by 30/09/2025. No interim analyses have been conducted so far, and no preliminary results have been generated or disseminated through any means of scientific publications or any form of public media. At present, the trial has just finished participant recruitment on 02/09/2025 and data collection is now projected to be finalized by 17/10/2025. As a result, the study results are now expected to available in full by 01/12/2025 (a delay of 2 months from the original timeline).

Discussion

Effective bowel preparation is key to a high-quality colonoscopy. By leveraging AI, we aim to enhance patient education in a scalable way. The primary outcome, the BBPS, is a validated measure of bowel cleanliness [39,40], with higher scores reflecting clearer mucosa. Even modest improvements in BBPS can increase adenoma detection and reduce repeat procedures. Secondary outcomes (anxiety and satisfaction) address patient-centered effects: reduced anxiety and higher satisfaction may improve overall preparation adherence and experience. To avoid alpha-inflation, the protocol designates BBPS as the sole primary endpoint for confirmatory non-inferiority testing, whereas anxiety and satisfaction are secondary and interpreted descriptively [52,53,62]. Our findings will thus complement a recent AI-to-image concordance study in which ChatGPT-4 graded colonoscopy frames for BBPS and showed lower adequacy calls with slight agreement relative to expert raters [18]. In contrast, our trial evaluates patient-facing counseling as the intervention where bowel cleanliness is adjudicated by masked endoscopists rather than by the chatbot.

This trial incorporates rigorous methods to ensure reliability and safety. Masking of endoscopists and validated scales (BBPS, DASS-21, PSQ-18) promote unbiased outcome assessment. The sample size is powered for the expected effect on BBPS, and analyses will adjust for potential confounders.

Besides, the trial design and reporting align with internationally recognized clinical trial standards, thereby supporting methodological rigor and reproducibility for AI-based interventions. We have explicitly described the AI model version, input requirements, handling of bad input, human-AI interaction, and performance monitoring. We will analyze any AI performance errors and implement an escalation pathway if needed. Strategies to prevent hallucinations and bias include restricting content to evidence-based guidelines and auditing outputs. Privacy protections (no PII, encryption, access controls) safeguard patient data.

Study limitations

The trial methodology has several limitations. First, apart from the endoscopists who are masked, both participants and the staff facilitating counseling or questionnaire administration are necessarily unmasked. This design constraint arises because the nature of the intervention (AI chatbot vs. clinician counseling) is inherently perceptible to patients, and staff members must directly support the delivery of each counseling modality. However, the risk of bias from unblinded staff is minimized by strict adherence to the trial protocol: counseling in the control arm follows standard hospital protocols, and questionnaire administration in both arms is performed by trained research assistants using verbatim scripted instructions, with responses entered electronically by patients themselves. Most importantly, the primary outcome (BBPS) is assessed objectively by masked endoscopists, ensuring that the central efficacy measure remains unbiased.

Second, requiring basic digital literacy and fluency in English or Malay may limit generalizability of our study findings. This first-phase trial intentionally restricts inclusion to participants able to engage with the intervention so that safety and fidelity of digital health-based intervention administration can be established without technical confounders [66–68]. Future work will adapt the system for broader populations (multilingual versions, voice interfaces, assisted navigation) and test effectiveness in individuals with limited digital or health literacy.

Third, we excluded participants with diagnosed clinical anxiety disorders to minimize confounding from psychiatric comorbidities and concurrent pharmacotherapy since both of which could mask or exaggerate the intervention’s effect on situational procedural anxiety, thus limiting the generalizability of our trial findings. Given the modest sample size, inclusion of such patients also risked an imbalance between study arms that could distort outcomes. This approach is consistent with methodological practice in peri-procedural anxiety research, including colonoscopy studies, but it limits applicability to patients with baseline psychiatric diagnoses. Future studies should address this subgroup specifically, as they may derive distinct benefits from tailored interventions.

Fourthly, we employed ChatGPT 4.0 as the chatbot for delivering bowel preparation instructions for colonoscopy. With the release of ChatGPT 5.0 in mid-August 2025, our trial findings may appear outdated, as they do not reflect the performance of the most recent version. However, migrating to ChatGPT 5.0 during the trial is not feasible, since it would require major protocol amendments. Patients who have already received instructions using version 4.0 would need to be excluded, and a completely new cohort of participants would have to be recruited to ensure consistency with the updated version. We recognize this limitation, and recommend that future studies use ChatGPT 5.0 to evaluate and compare its effectiveness in delivering bowel preparation instructions against standard counseling provided by trained medical personnel.

Finally, we acknowledge that both first-time and repeat patients will be enrolled under simple randomization without a stratification scheme, hence introducing some degree of bias due to prior participants’ prior colonoscopy experience. We will address this analytically by including it as a covariate and reporting subgroup analyses. This approach mitigates confounding risk, though a small degree of residual imbalance cannot be entirely excluded [63,64].

Conclusion

If successful, this study will demonstrate that an AI chatbot can be a viable adjunct to traditional counseling, improving colonoscopy preparation quality. It will also provide a model for the transparent reporting of AI interventions in clinical trials, per SPIRIT-AI. Further research will be needed to generalize findings to other settings, but this trial aims to set a new standard for integrating AI safely into patient education.

Supporting information

S1 Table. The filled-in SPIRIT 2025 Checklist.

https://doi.org/10.1371/journal.pone.0334349.s001

(DOC)

S1 File. Full trial protocol (AI Chatbot).

https://doi.org/10.1371/journal.pone.0334349.s002

(PDF)

Acknowledgments

We thank the HCTM’s staff and patients for their support and encouragement.

References

1. Lebwohl B, Kastrinos F, Glick M, Rosenbaum AJ, Wang T, Neugut AI. The impact of suboptimal bowel preparation on adenoma miss rates and the factors associated with early repeat colonoscopy. Gastrointest Endosc. 2011;73(6):1207–14. pmid:21481857
- View Article
- PubMed/NCBI
- Google Scholar
2. Clark BT, Rustagi T, Laine L. What level of bowel prep quality requires early repeat colonoscopy: systematic review and meta-analysis of the impact of preparation quality on adenoma detection rate. Am J Gastroenterol. 2014;109(11):1714–23; quiz 1724. pmid:25135006
- View Article
- PubMed/NCBI
- Google Scholar
3. Alvarez-Gonzalez MA, Flores-Le Roux JA, Seoane A, Pedro-Botet J, Carot L, Fernandez-Clotet A, et al. Efficacy of a multifactorial strategy for bowel preparation in diabetic patients undergoing colonoscopy: a randomized trial. Endoscopy. 2016;48(11):1003–9. pmid:27490086
- View Article
- PubMed/NCBI
- Google Scholar
4. Mahmood S, Farooqui SM, Madhoun MF. Predictors of inadequate bowel preparation for colonoscopy: a systematic review and meta-analysis. Eur J Gastroenterol Hepatol. 2018;30(8):819–26. pmid:29847488
- View Article
- PubMed/NCBI
- Google Scholar
5. Zhou J, Wu L, Wan X, Shen L, Liu J, Zhang J, et al. A novel artificial intelligence system for the assessment of bowel preparation (with video). Gastrointest Endosc. 2020;91(2):428-435.e2. pmid:31783029
- View Article
- PubMed/NCBI
- Google Scholar
6. Lim DYZ, Tan YB, Koh JTE, Tung JYM, Sng GGR, Tan DMY, et al. ChatGPT on guidelines: Providing contextual knowledge to GPT allows it to provide advice on appropriate colonoscopy intervals. J Gastroenterol Hepatol. 2024;39(1):81–106. pmid:37855067
- View Article
- PubMed/NCBI
- Google Scholar
7. Bibault J-E, Chaix B, Nectoux P, Pienkowsky A, Guillemasse A, Brouard B. Healthcare ex Machina: Are conversational agents ready for prime time in oncology?. Clin Transl Radiat Oncol. 2019;16:55–9. pmid:31008379
- View Article
- PubMed/NCBI
- Google Scholar
8. Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, et al. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc. 2018;25(9):1248–58. pmid:30010941
- View Article
- PubMed/NCBI
- Google Scholar
9. Luxton DD. Ethical implications of conversational agents in global public health. Bull World Health Organ. 2020;98(4):285–7. pmid:32284654
- View Article
- PubMed/NCBI
- Google Scholar
10. Jiang X-H, Yuan X-H, Zhao H, Peng J-S. The Effectiveness and Feasibility of Conversational Agents in Supporting Care for Patients With Cancer: Systematic Review and Meta-Analysis. J Med Internet Res. 2025;27:e76968. pmid:40736462
- View Article
- PubMed/NCBI
- Google Scholar
11. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94–8. pmid:31363513
- View Article
- PubMed/NCBI
- Google Scholar
12. Chang P, Sheng Y-H, Sang Y-Y, Wang D-W. Developing a wireless speech- and touch-based intelligent comprehensive triage support system. Comput Inform Nurs. 2008;26(1):31–8. pmid:18091619
- View Article
- PubMed/NCBI
- Google Scholar
13. van Heerden A, Ntinga X, Vilakazi K. The potential of conversational agents to provide a rapid HIV counseling and testing services. In: 2017 International Conference on the Frontiers and Advances in Data Science (FADS), 2017.
- View Article
- Google Scholar
14. Bickmore TW, Pfeifer LM, Byron D, Forsythe S, Henault LE, Jack BW, et al. Usability of conversational agents by patients with inadequate health literacy: evidence from two clinical trials. J Health Commun. 2010;15 Suppl 2:197–210. pmid:20845204
- View Article
- PubMed/NCBI
- Google Scholar
15. Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med. 2024;7(1):183. pmid:38977771
- View Article
- PubMed/NCBI
- Google Scholar
16. Zhang Z, Bickmore T. Medical Shared Decision Making with a Virtual Agent. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents, 2018. 113–8.
- View Article
- Google Scholar
17. Milne-Ives M, de Cock C, Lim E, Shehadeh MH, de Pennington N, Mole G, et al. The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review. J Med Internet Res. 2020;22(10):e20346. pmid:33090118
- View Article
- PubMed/NCBI
- Google Scholar
18. Pellegrino R, Federico A, Gravina AG. Conversational LLM Chatbot ChatGPT-4 for Colonoscopy Boston Bowel Preparation Scoring: An Artificial Intelligence-to-Head Concordance Analysis. Diagnostics (Basel). 2024;14(22):2537. pmid:39594203
- View Article
- PubMed/NCBI
- Google Scholar
19. Fitzpatrick KK, Darcy A, Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment Health. 2017;4(2):e19. pmid:28588005
- View Article
- PubMed/NCBI
- Google Scholar
20. Jiang Y, Fu X, Wang J, Liu Q, Wang X, Liu P, et al. Enhancing medical education with chatbots: a randomized controlled trial on standardized patients for colorectal cancer. BMC Med Educ. 2024;24(1):1511. pmid:39707245
- View Article
- PubMed/NCBI
- Google Scholar
21. Hróbjartsson A, Boutron I, Hopewell S, Moher D, Schulz KF, Collins GS, et al. SPIRIT 2025 explanation and elaboration: updated guideline for protocols of randomised trials. BMJ. 2025;389:e081660. pmid:40294956
- View Article
- PubMed/NCBI
- Google Scholar
22. Rivera SC, Liu X, Chan A-W, Denniston AK, Calvert MJ, SPIRIT-AI and CONSORT-AI Working Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ. 2020;370:m3210. pmid:32907797
- View Article
- PubMed/NCBI
- Google Scholar
23. Hassan C, East J, Radaelli F, Spada C, Benamouzig R, Bisschops R, et al. Bowel preparation for colonoscopy: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2019. Endoscopy. 2019;51(8):775–94. pmid:31295746
- View Article
- PubMed/NCBI
- Google Scholar
24. Ministry of Health Malaysia. Clinical Practice Guidelines: Management of Colorectal Carcinoma (including screening and prevention). Putrajaya: MOH. 2017. https://www.moh.gov.my/moh/resources/penerbitan/CPG/CPG%20Management%20of%20Colorectal%20%20Carcinoma.pdf
25. Kudo T, Kawakami H, Hayashi T, Yasuda I, Mukai T, Inoue H, et al. High and low negative pressure suction techniques in EUS-guided fine-needle tissue acquisition by using 25-gauge needles: a multicenter, prospective, randomized, controlled trial. Gastrointest Endosc. 2014;80(6):1030-7.e1. pmid:24890422
- View Article
- PubMed/NCBI
- Google Scholar
26. Martel M, Barkun AN, Menard C, Restellini S, Kherad O, Vanasse A. Split-Dose Preparations Are Superior to Day-Before Bowel Cleansing Regimens: A Meta-analysis. Gastroenterology. 2015;149(1):79–88. pmid:25863216
- View Article
- PubMed/NCBI
- Google Scholar
27. Kincaid JP, Fishburne RP, Rogers RL, Chissom BS. Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy enlisted personnel. Millington (TN): Naval Air Station Memphis, Chief of Naval Technical Training. 1975;8–75.
28. Gunning R. The technique of clear writing. New York: McGraw-Hill. 1952.
29. World Health Organization. Ethics and governance of artificial intelligence for health: WHO guidance. Geneva: World Health Organization. 2021. https://iris.who.int/bitstream/handle/10665/341996/9789240029200-eng.pdf?sequence=1
30. Ibrahim H, Liu X, Rivera SC, Moher D, Chan A-W, Sydes MR, et al. Reporting guidelines for clinical trials of artificial intelligence interventions: the SPIRIT-AI and CONSORT-AI guidelines. Trials. 2021;22(1):11. pmid:33407780
- View Article
- PubMed/NCBI
- Google Scholar
31. Coiera EW, Verspoor K, Hansen DP. We need to chat about artificial intelligence. Med J Aust. 2023;219(3):98–100. pmid:37302124
- View Article
- PubMed/NCBI
- Google Scholar
32. Topol E. Preparing the healthcare workforce to deliver the digital future: The Topol review. London (UK): Health Education England. 2019. https://topol.hee.nhs.uk/wp-content/uploads/HEE-Topol-Review-2019.pdf
33. Joshi I, Morley J. Artificial Intelligence: How to get it right. London (UK): NHSX. 2019. https://www.nhsx.nhs.uk/ai-lab/explore-all-resources/understand-ai/artificial-intelligence-how-get-it-right
34. Tabassi E. Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (U.S.). 2023.
- View Article
- Google Scholar
35. Amann J, Blasimme A, Vayena E, Frey D, Madai VI, Precise4Q consortium. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20(1):310. pmid:33256715
- View Article
- PubMed/NCBI
- Google Scholar
36. Rawte V, Priya P, Tonmoy SMTI, Zaman SMMM, Sheth A, Das A. Exploring the Relationship between LLM Hallucinations and Prompt Linguistic Nuances: Readability, Formality, and Concreteness. Computer Science: Artificial Intelligence. 2023.
- View Article
- Google Scholar
37. Clark BT, Protiva P, Nagar A, Imaeda A, Ciarleglio MM, Deng Y, et al. Quantification of Adequate Bowel Preparation for Screening or Surveillance Colonoscopy in Men. Gastroenterology. 2016;150(2):396–405; quiz e14-5. pmid:26439436
- View Article
- PubMed/NCBI
- Google Scholar
38. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Int J Surg. 2014;12(12):1500–24. pmid:25046751
- View Article
- PubMed/NCBI
- Google Scholar
39. Lai EJ, Calderwood AH, Doros G, Fix OK, Jacobson BC. The Boston bowel preparation scale: a valid and reliable instrument for colonoscopy-oriented research. Gastrointest Endosc. 2009;69(3 Pt 2):620–5. pmid:19136102
- View Article
- PubMed/NCBI
- Google Scholar
40. Calderwood AH, Schroy PC 3rd, Lieberman DA, Logan JR, Zurfluh M, Jacobson BC. Boston Bowel Preparation Scale scores provide a standardized definition of adequate for describing bowel cleanliness. Gastrointest Endosc. 2014;80(2):269–76. pmid:24629422
- View Article
- PubMed/NCBI
- Google Scholar
41. Lovibond PF, Lovibond SH. The structure of negative emotional states: comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behav Res Ther. 1995;33(3):335–43. pmid:7726811
- View Article
- PubMed/NCBI
- Google Scholar
42. Musa R, Fadzil MA, Zain Z. Translation, validation and psychometric properties of Bahasa Malaysia version of the Depression Anxiety and Stress Scales (DASS). ASEAN Journal of Psychiatry. 2007;8(2):82–9.
- View Article
- Google Scholar
43. Marshall GN, Hays RD. The Patient Satisfaction Questionnaire Short Form (PSQ-18). Santa Monica, CA: RAND Corporation. 1994.
44. Chan CMH, Azman WA. Attitudes and role orientations on doctor-patient fit and patient satisfaction in cancer care. Singapore Med J. 2012;53(1):52–6. pmid:22252184
- View Article
- PubMed/NCBI
- Google Scholar
45. Aronson JK. Medication errors: what they are, how they happen, and how to avoid them. QJM. 2009;102(8):513–21. pmid:19458202
- View Article
- PubMed/NCBI
- Google Scholar
46. Code of Federal Regulation, Title 21, Section 312.32 (21CFR312.32). (2020). https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/cfrsearch.cfm?fr=312.32. 27 May 2025.
47. Cotton PB, Eisen GM, Aabakken L, Baron TH, Hutter MM, Jacobson BC, et al. A lexicon for endoscopic adverse events: report of an ASGE workshop. Gastrointest Endosc. 2010;71(3):446–54. pmid:20189503
- View Article
- PubMed/NCBI
- Google Scholar
48. ASGE Standards of Practice Committee, Fisher DA, Maple JT, Ben-Menachem T, Cash BD, Decker GA, et al. Complications of colonoscopy. Gastrointest Endosc. 2011;74(4):745–52. pmid:21951473
- View Article
- PubMed/NCBI
- Google Scholar
49. U.S. Department of Health and Human Services, National Institutes of Health, National Cancer Institute. Common Terminology Criteria for Adverse Events (CTCAE) Version 5.0. Bethesda, MD: National Cancer Institute. 2017. https://ctep.cancer.gov/protocoldevelopment/electronic_applications/docs/CTCAE_v5_Quick_Reference_8.5x11.pdf
50. Zhu Y, Zhang D-F, Wu H-L, Fu P-Y, Feng L, Zhuang K, et al. Improving bowel preparation for colonoscopy with a smartphone application driven by artificial intelligence. NPJ Digit Med. 2023;6(1):41. pmid:36918730
- View Article
- PubMed/NCBI
- Google Scholar
51. Piaggio G, Elbourne DR, Pocock SJ, Evans SJW, Altman DG, CONSORT Group. Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA. 2012;308(24):2594–604. pmid:23268518
- View Article
- PubMed/NCBI
- Google Scholar
52. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Statistical principles for clinical trials. Geneva: ICH. 1998. https://database.ich.org/sites/default/files/E9_Guideline.pdf
53. U.S. Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research. Multiple endpoints in clinical trials: guidance for industry. Silver Spring (MD): US FDA. 2017. https://www.fda.gov/files/drugs/published/Multiple-Endpoints-in-Clinical-Trials-Guidance-for-Industry.pdf
54. Doane DP, Seward LE. Measuring skewness: a forgotten statistic?. J Stat Educ. 2011;19(2):1–18.
- View Article
- Google Scholar
55. Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations inR. J Stat Soft. 2011;45(3).
- View Article
- Google Scholar
56. Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med Res Methodol. 2017;17(1):162. pmid:29207961
- View Article
- PubMed/NCBI
- Google Scholar
57. Little RJ, Rubin DB. Statistical analysis with missing data. 3rd ed. Hoboken, NJ: John Wiley & Sons. 2019.
58. White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30(4):377–99. pmid:21225900
- View Article
- PubMed/NCBI
- Google Scholar
59. Chan W-K, Saravanan A, Manikam J, Goh K-L, Mahadeva S. Appointment waiting times and education level influence the quality of bowel preparation in adult patients undergoing colonoscopy. BMC Gastroenterol. 2011;11:86. pmid:21798022
- View Article
- PubMed/NCBI
- Google Scholar
60. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373–9. pmid:8970487
- View Article
- PubMed/NCBI
- Google Scholar
61. Babyak MA. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med. 2004;66(3):411–21. pmid:15184705
- View Article
- PubMed/NCBI
- Google Scholar
62. Dmitrienko A, D’Agostino RB Sr. Multiplicity Considerations in Clinical Trials. N Engl J Med. 2018;378(22):2115–22. pmid:29847757
- View Article
- PubMed/NCBI
- Google Scholar
63. Vickers AJ, Altman DG. Statistics notes: Analysing controlled trials with baseline and follow up measurements. BMJ. 2001;323(7321):1123–4. pmid:11701584
- View Article
- PubMed/NCBI
- Google Scholar
64. Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol. 2004;57(3):229–36. pmid:15066682
- View Article
- PubMed/NCBI
- Google Scholar
65. International Council for Harmonisation. Addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials E9(R1). Geneva: ICH. 2019. https://database.ich.org/sites/default/files/E9-R1_Step4_Guideline_2019_1203.pdf
66. Marcolino MS, Oliveira JAQ, D’Agostino M, Ribeiro AL, Alkmim MBM, Novillo-Ortiz D. The Impact of mHealth Interventions: Systematic Review of Systematic Reviews. JMIR Mhealth Uhealth. 2018;6(1):e23. pmid:29343463
- View Article
- PubMed/NCBI
- Google Scholar
67. Mohr DC, Weingardt KR, Reddy M, Schueller SM. Three Problems With Current Digital Mental Health Research . . . and Three Things We Can Do About Them. Psychiatr Serv. 2017;68(5):427–9. pmid:28412890
- View Article
- PubMed/NCBI
- Google Scholar
68. Whittaker R, Merry S, Dorey E, Maddison R. A development and evaluation process for mHealth interventions: examples from New Zealand. J Health Commun. 2012;17 Suppl 1:11–21. pmid:22548594
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Lebwohl B, Kastrinos F, Glick M, Rosenbaum AJ, Wang T, Neugut AI. The impact of suboptimal bowel preparation on adenoma miss rates and the factors associated with early repeat colonoscopy. Gastrointest Endosc. 2011;73(6):1207–14. pmid:21481857
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Clark BT, Rustagi T, Laine L. What level of bowel prep quality requires early repeat colonoscopy: systematic review and meta-analysis of the impact of preparation quality on adenoma detection rate. Am J Gastroenterol. 2014;109(11):1714–23; quiz 1724. pmid:25135006
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Alvarez-Gonzalez MA, Flores-Le Roux JA, Seoane A, Pedro-Botet J, Carot L, Fernandez-Clotet A, et al. Efficacy of a multifactorial strategy for bowel preparation in diabetic patients undergoing colonoscopy: a randomized trial. Endoscopy. 2016;48(11):1003–9. pmid:27490086
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Mahmood S, Farooqui SM, Madhoun MF. Predictors of inadequate bowel preparation for colonoscopy: a systematic review and meta-analysis. Eur J Gastroenterol Hepatol. 2018;30(8):819–26. pmid:29847488
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Zhou J, Wu L, Wan X, Shen L, Liu J, Zhang J, et al. A novel artificial intelligence system for the assessment of bowel preparation (with video). Gastrointest Endosc. 2020;91(2):428-435.e2. pmid:31783029
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Lim DYZ, Tan YB, Koh JTE, Tung JYM, Sng GGR, Tan DMY, et al. ChatGPT on guidelines: Providing contextual knowledge to GPT allows it to provide advice on appropriate colonoscopy intervals. J Gastroenterol Hepatol. 2024;39(1):81–106. pmid:37855067
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Bibault J-E, Chaix B, Nectoux P, Pienkowsky A, Guillemasse A, Brouard B. Healthcare ex Machina: Are conversational agents ready for prime time in oncology?. Clin Transl Radiat Oncol. 2019;16:55–9. pmid:31008379
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, et al. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc. 2018;25(9):1248–58. pmid:30010941
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Luxton DD. Ethical implications of conversational agents in global public health. Bull World Health Organ. 2020;98(4):285–7. pmid:32284654
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Jiang X-H, Yuan X-H, Zhao H, Peng J-S. The Effectiveness and Feasibility of Conversational Agents in Supporting Care for Patients With Cancer: Systematic Review and Meta-Analysis. J Med Internet Res. 2025;27:e76968. pmid:40736462
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Davenport T, Kalakota R. The potential for artificial intelligence in healthcare. Future Healthc J. 2019;6(2):94–8. pmid:31363513
View Article
PubMed/NCBI
Google Scholar

[42] View Article

[43] PubMed/NCBI

[44] Google Scholar

[ref12] 12. Chang P, Sheng Y-H, Sang Y-Y, Wang D-W. Developing a wireless speech- and touch-based intelligent comprehensive triage support system. Comput Inform Nurs. 2008;26(1):31–8. pmid:18091619
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref13] 13. van Heerden A, Ntinga X, Vilakazi K. The potential of conversational agents to provide a rapid HIV counseling and testing services. In: 2017 International Conference on the Frontiers and Advances in Data Science (FADS), 2017.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref14] 14. Bickmore TW, Pfeifer LM, Byron D, Forsythe S, Henault LE, Jack BW, et al. Usability of conversational agents by patients with inadequate health literacy: evidence from two clinical trials. J Health Commun. 2010;15 Suppl 2:197–210. pmid:20845204
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Haltaufderheide J, Ranisch R. The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ Digit Med. 2024;7(1):183. pmid:38977771
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Zhang Z, Bickmore T. Medical Shared Decision Making with a Virtual Agent. In: Proceedings of the 18th International Conference on Intelligent Virtual Agents, 2018. 113–8.
View Article
Google Scholar

[61] View Article

[62] Google Scholar

[ref17] 17. Milne-Ives M, de Cock C, Lim E, Shehadeh MH, de Pennington N, Mole G, et al. The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review. J Med Internet Res. 2020;22(10):e20346. pmid:33090118
View Article
PubMed/NCBI
Google Scholar

[64] View Article

[65] PubMed/NCBI

[66] Google Scholar

[ref18] 18. Pellegrino R, Federico A, Gravina AG. Conversational LLM Chatbot ChatGPT-4 for Colonoscopy Boston Bowel Preparation Scoring: An Artificial Intelligence-to-Head Concordance Analysis. Diagnostics (Basel). 2024;14(22):2537. pmid:39594203
View Article
PubMed/NCBI
Google Scholar

[68] View Article

[69] PubMed/NCBI

[70] Google Scholar

[ref19] 19. Fitzpatrick KK, Darcy A, Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment Health. 2017;4(2):e19. pmid:28588005
View Article
PubMed/NCBI
Google Scholar

[72] View Article

[73] PubMed/NCBI

[74] Google Scholar

[ref20] 20. Jiang Y, Fu X, Wang J, Liu Q, Wang X, Liu P, et al. Enhancing medical education with chatbots: a randomized controlled trial on standardized patients for colorectal cancer. BMC Med Educ. 2024;24(1):1511. pmid:39707245
View Article
PubMed/NCBI
Google Scholar

[76] View Article

[77] PubMed/NCBI

[78] Google Scholar

[ref21] 21. Hróbjartsson A, Boutron I, Hopewell S, Moher D, Schulz KF, Collins GS, et al. SPIRIT 2025 explanation and elaboration: updated guideline for protocols of randomised trials. BMJ. 2025;389:e081660. pmid:40294956
View Article
PubMed/NCBI
Google Scholar

[80] View Article

[81] PubMed/NCBI

[82] Google Scholar

[ref22] 22. Rivera SC, Liu X, Chan A-W, Denniston AK, Calvert MJ, SPIRIT-AI and CONSORT-AI Working Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI Extension. BMJ. 2020;370:m3210. pmid:32907797
View Article
PubMed/NCBI
Google Scholar

[84] View Article

[85] PubMed/NCBI

[86] Google Scholar

[ref23] 23. Hassan C, East J, Radaelli F, Spada C, Benamouzig R, Bisschops R, et al. Bowel preparation for colonoscopy: European Society of Gastrointestinal Endoscopy (ESGE) Guideline - Update 2019. Endoscopy. 2019;51(8):775–94. pmid:31295746
View Article
PubMed/NCBI
Google Scholar

[88] View Article

[89] PubMed/NCBI

[90] Google Scholar

[ref24] 24. Ministry of Health Malaysia. Clinical Practice Guidelines: Management of Colorectal Carcinoma (including screening and prevention). Putrajaya: MOH. 2017. https://www.moh.gov.my/moh/resources/penerbitan/CPG/CPG%20Management%20of%20Colorectal%20%20Carcinoma.pdf

[ref25] 25. Kudo T, Kawakami H, Hayashi T, Yasuda I, Mukai T, Inoue H, et al. High and low negative pressure suction techniques in EUS-guided fine-needle tissue acquisition by using 25-gauge needles: a multicenter, prospective, randomized, controlled trial. Gastrointest Endosc. 2014;80(6):1030-7.e1. pmid:24890422
View Article
PubMed/NCBI
Google Scholar

[93] View Article

[94] PubMed/NCBI

[95] Google Scholar

[ref26] 26. Martel M, Barkun AN, Menard C, Restellini S, Kherad O, Vanasse A. Split-Dose Preparations Are Superior to Day-Before Bowel Cleansing Regimens: A Meta-analysis. Gastroenterology. 2015;149(1):79–88. pmid:25863216
View Article
PubMed/NCBI
Google Scholar

[97] View Article

[98] PubMed/NCBI

[99] Google Scholar

[ref27] 27. Kincaid JP, Fishburne RP, Rogers RL, Chissom BS. Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy enlisted personnel. Millington (TN): Naval Air Station Memphis, Chief of Naval Technical Training. 1975;8–75.

[ref28] 28. Gunning R. The technique of clear writing. New York: McGraw-Hill. 1952.

[ref29] 29. World Health Organization. Ethics and governance of artificial intelligence for health: WHO guidance. Geneva: World Health Organization. 2021. https://iris.who.int/bitstream/handle/10665/341996/9789240029200-eng.pdf?sequence=1

[ref30] 30. Ibrahim H, Liu X, Rivera SC, Moher D, Chan A-W, Sydes MR, et al. Reporting guidelines for clinical trials of artificial intelligence interventions: the SPIRIT-AI and CONSORT-AI guidelines. Trials. 2021;22(1):11. pmid:33407780
View Article
PubMed/NCBI
Google Scholar

[104] View Article

[105] PubMed/NCBI

[106] Google Scholar

[ref31] 31. Coiera EW, Verspoor K, Hansen DP. We need to chat about artificial intelligence. Med J Aust. 2023;219(3):98–100. pmid:37302124
View Article
PubMed/NCBI
Google Scholar

[108] View Article

[109] PubMed/NCBI

[110] Google Scholar

[ref32] 32. Topol E. Preparing the healthcare workforce to deliver the digital future: The Topol review. London (UK): Health Education England. 2019. https://topol.hee.nhs.uk/wp-content/uploads/HEE-Topol-Review-2019.pdf

[ref33] 33. Joshi I, Morley J. Artificial Intelligence: How to get it right. London (UK): NHSX. 2019. https://www.nhsx.nhs.uk/ai-lab/explore-all-resources/understand-ai/artificial-intelligence-how-get-it-right

[ref34] 34. Tabassi E. Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology (U.S.). 2023.
View Article
Google Scholar

[114] View Article

[115] Google Scholar

[ref35] 35. Amann J, Blasimme A, Vayena E, Frey D, Madai VI, Precise4Q consortium. Explainability for artificial intelligence in healthcare: a multidisciplinary perspective. BMC Med Inform Decis Mak. 2020;20(1):310. pmid:33256715
View Article
PubMed/NCBI
Google Scholar

[117] View Article

[118] PubMed/NCBI

[119] Google Scholar

[ref36] 36. Rawte V, Priya P, Tonmoy SMTI, Zaman SMMM, Sheth A, Das A. Exploring the Relationship between LLM Hallucinations and Prompt Linguistic Nuances: Readability, Formality, and Concreteness. Computer Science: Artificial Intelligence. 2023.
View Article
Google Scholar

[121] View Article

[122] Google Scholar

[ref37] 37. Clark BT, Protiva P, Nagar A, Imaeda A, Ciarleglio MM, Deng Y, et al. Quantification of Adequate Bowel Preparation for Screening or Surveillance Colonoscopy in Men. Gastroenterology. 2016;150(2):396–405; quiz e14-5. pmid:26439436
View Article
PubMed/NCBI
Google Scholar

[124] View Article

[125] PubMed/NCBI

[126] Google Scholar

[ref38] 38. Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Int J Surg. 2014;12(12):1500–24. pmid:25046751
View Article
PubMed/NCBI
Google Scholar

[128] View Article

[129] PubMed/NCBI

[130] Google Scholar

[ref39] 39. Lai EJ, Calderwood AH, Doros G, Fix OK, Jacobson BC. The Boston bowel preparation scale: a valid and reliable instrument for colonoscopy-oriented research. Gastrointest Endosc. 2009;69(3 Pt 2):620–5. pmid:19136102
View Article
PubMed/NCBI
Google Scholar

[132] View Article

[133] PubMed/NCBI

[134] Google Scholar

[ref40] 40. Calderwood AH, Schroy PC 3rd, Lieberman DA, Logan JR, Zurfluh M, Jacobson BC. Boston Bowel Preparation Scale scores provide a standardized definition of adequate for describing bowel cleanliness. Gastrointest Endosc. 2014;80(2):269–76. pmid:24629422
View Article
PubMed/NCBI
Google Scholar

[136] View Article

[137] PubMed/NCBI

[138] Google Scholar

[ref41] 41. Lovibond PF, Lovibond SH. The structure of negative emotional states: comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behav Res Ther. 1995;33(3):335–43. pmid:7726811
View Article
PubMed/NCBI
Google Scholar

[140] View Article

[141] PubMed/NCBI

[142] Google Scholar

[ref42] 42. Musa R, Fadzil MA, Zain Z. Translation, validation and psychometric properties of Bahasa Malaysia version of the Depression Anxiety and Stress Scales (DASS). ASEAN Journal of Psychiatry. 2007;8(2):82–9.
View Article
Google Scholar

[144] View Article

[145] Google Scholar

[ref43] 43. Marshall GN, Hays RD. The Patient Satisfaction Questionnaire Short Form (PSQ-18). Santa Monica, CA: RAND Corporation. 1994.

[ref44] 44. Chan CMH, Azman WA. Attitudes and role orientations on doctor-patient fit and patient satisfaction in cancer care. Singapore Med J. 2012;53(1):52–6. pmid:22252184
View Article
PubMed/NCBI
Google Scholar

[148] View Article

[149] PubMed/NCBI

[150] Google Scholar

[ref45] 45. Aronson JK. Medication errors: what they are, how they happen, and how to avoid them. QJM. 2009;102(8):513–21. pmid:19458202
View Article
PubMed/NCBI
Google Scholar

[152] View Article

[153] PubMed/NCBI

[154] Google Scholar

[ref46] 46. Code of Federal Regulation, Title 21, Section 312.32 (21CFR312.32). (2020). https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfcfr/cfrsearch.cfm?fr=312.32. 27 May 2025.

[ref47] 47. Cotton PB, Eisen GM, Aabakken L, Baron TH, Hutter MM, Jacobson BC, et al. A lexicon for endoscopic adverse events: report of an ASGE workshop. Gastrointest Endosc. 2010;71(3):446–54. pmid:20189503
View Article
PubMed/NCBI
Google Scholar

[157] View Article

[158] PubMed/NCBI

[159] Google Scholar

[ref48] 48. ASGE Standards of Practice Committee, Fisher DA, Maple JT, Ben-Menachem T, Cash BD, Decker GA, et al. Complications of colonoscopy. Gastrointest Endosc. 2011;74(4):745–52. pmid:21951473
View Article
PubMed/NCBI
Google Scholar

[161] View Article

[162] PubMed/NCBI

[163] Google Scholar

[ref49] 49. U.S. Department of Health and Human Services, National Institutes of Health, National Cancer Institute. Common Terminology Criteria for Adverse Events (CTCAE) Version 5.0. Bethesda, MD: National Cancer Institute. 2017. https://ctep.cancer.gov/protocoldevelopment/electronic_applications/docs/CTCAE_v5_Quick_Reference_8.5x11.pdf

[ref50] 50. Zhu Y, Zhang D-F, Wu H-L, Fu P-Y, Feng L, Zhuang K, et al. Improving bowel preparation for colonoscopy with a smartphone application driven by artificial intelligence. NPJ Digit Med. 2023;6(1):41. pmid:36918730
View Article
PubMed/NCBI
Google Scholar

[166] View Article

[167] PubMed/NCBI

[168] Google Scholar

[ref51] 51. Piaggio G, Elbourne DR, Pocock SJ, Evans SJW, Altman DG, CONSORT Group. Reporting of noninferiority and equivalence randomized trials: extension of the CONSORT 2010 statement. JAMA. 2012;308(24):2594–604. pmid:23268518
View Article
PubMed/NCBI
Google Scholar

[170] View Article

[171] PubMed/NCBI

[172] Google Scholar

[ref52] 52. International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use. Statistical principles for clinical trials. Geneva: ICH. 1998. https://database.ich.org/sites/default/files/E9_Guideline.pdf

[ref53] 53. U.S. Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research. Multiple endpoints in clinical trials: guidance for industry. Silver Spring (MD): US FDA. 2017. https://www.fda.gov/files/drugs/published/Multiple-Endpoints-in-Clinical-Trials-Guidance-for-Industry.pdf

[ref54] 54. Doane DP, Seward LE. Measuring skewness: a forgotten statistic?. J Stat Educ. 2011;19(2):1–18.
View Article
Google Scholar

[176] View Article

[177] Google Scholar

[ref55] 55. Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations inR. J Stat Soft. 2011;45(3).
View Article
Google Scholar

[179] View Article

[180] Google Scholar

[ref56] 56. Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts. BMC Med Res Methodol. 2017;17(1):162. pmid:29207961
View Article
PubMed/NCBI
Google Scholar

[182] View Article

[183] PubMed/NCBI

[184] Google Scholar

[ref57] 57. Little RJ, Rubin DB. Statistical analysis with missing data. 3rd ed. Hoboken, NJ: John Wiley & Sons. 2019.

[ref58] 58. White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30(4):377–99. pmid:21225900
View Article
PubMed/NCBI
Google Scholar

[187] View Article

[188] PubMed/NCBI

[189] Google Scholar

[ref59] 59. Chan W-K, Saravanan A, Manikam J, Goh K-L, Mahadeva S. Appointment waiting times and education level influence the quality of bowel preparation in adult patients undergoing colonoscopy. BMC Gastroenterol. 2011;11:86. pmid:21798022
View Article
PubMed/NCBI
Google Scholar

[191] View Article

[192] PubMed/NCBI

[193] Google Scholar

[ref60] 60. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol. 1996;49(12):1373–9. pmid:8970487
View Article
PubMed/NCBI
Google Scholar

[195] View Article

[196] PubMed/NCBI

[197] Google Scholar

[ref61] 61. Babyak MA. What you see may not be what you get: a brief, nontechnical introduction to overfitting in regression-type models. Psychosom Med. 2004;66(3):411–21. pmid:15184705
View Article
PubMed/NCBI
Google Scholar

[199] View Article

[200] PubMed/NCBI

[201] Google Scholar

[ref62] 62. Dmitrienko A, D’Agostino RB Sr. Multiplicity Considerations in Clinical Trials. N Engl J Med. 2018;378(22):2115–22. pmid:29847757
View Article
PubMed/NCBI
Google Scholar

[203] View Article

[204] PubMed/NCBI

[205] Google Scholar

[ref63] 63. Vickers AJ, Altman DG. Statistics notes: Analysing controlled trials with baseline and follow up measurements. BMJ. 2001;323(7321):1123–4. pmid:11701584
View Article
PubMed/NCBI
Google Scholar

[207] View Article

[208] PubMed/NCBI

[209] Google Scholar

[ref64] 64. Brookes ST, Whitely E, Egger M, Smith GD, Mulheran PA, Peters TJ. Subgroup analyses in randomized trials: risks of subgroup-specific analyses; power and sample size for the interaction test. J Clin Epidemiol. 2004;57(3):229–36. pmid:15066682
View Article
PubMed/NCBI
Google Scholar

[211] View Article

[212] PubMed/NCBI

[213] Google Scholar

[ref65] 65. International Council for Harmonisation. Addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials E9(R1). Geneva: ICH. 2019. https://database.ich.org/sites/default/files/E9-R1_Step4_Guideline_2019_1203.pdf

[ref66] 66. Marcolino MS, Oliveira JAQ, D’Agostino M, Ribeiro AL, Alkmim MBM, Novillo-Ortiz D. The Impact of mHealth Interventions: Systematic Review of Systematic Reviews. JMIR Mhealth Uhealth. 2018;6(1):e23. pmid:29343463
View Article
PubMed/NCBI
Google Scholar

[216] View Article

[217] PubMed/NCBI

[218] Google Scholar

[ref67] 67. Mohr DC, Weingardt KR, Reddy M, Schueller SM. Three Problems With Current Digital Mental Health Research . . . and Three Things We Can Do About Them. Psychiatr Serv. 2017;68(5):427–9. pmid:28412890
View Article
PubMed/NCBI
Google Scholar

[220] View Article

[221] PubMed/NCBI

[222] Google Scholar

[ref68] 68. Whittaker R, Merry S, Dorey E, Maddison R. A development and evaluation process for mHealth interventions: examples from New Zealand. J Health Commun. 2012;17 Suppl 1:11–21. pmid:22548594
View Article
PubMed/NCBI
Google Scholar

[224] View Article

[225] PubMed/NCBI

[226] Google Scholar

Figures

Abstract

Introduction

Methods

Study design

Study setting

Eligibility criteria

Randomization and masking

Interventions

Control Arm (Standard Counseling).

Intervention arm (AI-ChatGPT Chatbot Counseling).

AI System Infrastructure and Privacy.

AI error taxonomy, thresholds and escalation.

Algorithm Details and Updates.

Human–ChatGPT Interaction and Expertise.

Error Monitoring and Performance Evaluation.

Bias, Explainability, and Reproducibility.

Data Protection and Privacy for ChatGPT-based intervention development.

Reproducibility and Access to Intervention Details.

Assessments of Intervention Adherence.

Trial endpoint assessments.

Adverse event assessments and monitoring

Sample size calculation

Participant recruitment strategy and recruitment monitoring

Data collection and management

Patient confidentiality and data security

Statistical analysis plan

Ethical considerations and safety

Protocol amendments

Dissemination of trial findings

Trial oversight and monitoring

Trial status

Discussion

Study limitations

Conclusion

Supporting information

S1 Table. The filled-in SPIRIT 2025 Checklist.

S1 File. Full trial protocol (AI Chatbot).

Acknowledgments

References