Figures
Abstract
Unmet health-related social needs (HRSNs) are major drivers of poor health outcomes in early childhood. Children with unmet HRSNs are at greater risk for developmental delays, caregiver stress, and increased healthcare utilization, yet current screening approaches in pediatric primary care are resource-intensive and inconsistently implemented. AI-powered chatbots (conversational agents or virtual assistants) may offer a private, secure, scalable, and cost-effective alternative for identifying unmet needs and connecting families to services. This protocol describes a pilot randomized controlled trial designed to evaluate the feasibility, acceptability, and usability of DAPHNE, an AI-driven chatbot developed to facilitate the identification of unmet HRSNs and provide personalized community resource referrals. One hundred caregivers of children under two years of age will be recruited from Nationwide Children’s Hospital pediatric primary care clinics and randomized to either the standard care (control) group or DAPHNE+ Standard care (intervention) group (n = 50 each arm). Caregivers will complete surveys at baseline, 1 month, 3 months, and 6 months post-intervention (depending on the measure). For the intervention group, participants will receive weekly chatbot prompts and on-demand access throughout the 6-month study period. Primary outcomes include study feasibility (recruitment, retention, and survey completion across both arms), acceptability (caregiver-reported ratings in both arms and intervention-specific ratings), and usability of the DAPHNE chatbot (System Usability Scale among intervention participants). Secondary outcomes includes caregiver-reported outcome measures (caregiver stress, self-efficacy, satisfaction with resource access, quality of life), and electronic health record-derived measures (including documentation of HRSN screening and referrals, adherence to well-child visits, missed appointments, emergency department utilization, and estimated healthcare costs). In addition, ten primary care providers will also participate to assess workflow integration and report on current HRSN practices. Mixed-methods analyses will integrate survey data, chatbot engagement metrics, and qualitative interviews to refine both the intervention and the study protocol. The results of this study will inform the design of a future multi-site trial to evaluate the efficacy and implementation of DAPHNE for addressing HRSNs in pediatric primary care. Trial registration: NCT07168382.
Citation: Sezgin E, Clarkson E, Logan F, Jackson DI, Hussain S-A, Stokes J, et al. (2026) An AI-based chatbot to support health-related social needs among pediatric primary care population: Protocol for a pilot randomized controlled trial. PLoS One 21(4): e0337868. https://doi.org/10.1371/journal.pone.0337868
Editor: Farshid Danesh, Islamic World Science & Technology Monitoring and Citation Institute (ISC), IRAN, ISLAMIC REPUBLIC OF
Received: November 20, 2025; Accepted: April 2, 2026; Published: April 20, 2026
Copyright: © 2026 Sezgin et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: No datasets were generated or analyzed during the current study. All relevant data from this study will be made available upon study completion.
Funding: This publication was supported, in part, by The Ohio State University Clinical and Translational Science Institute (CTSI) and the National Center for Advancing Translational Sciences of the National Institutes of Health under Grant Number UM1TR004548. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Health-related social needs (HRSNs), including food insecurity, unstable housing, transportation challenges, and financial instability, account for more than 50% of modifiable health outcomes [1,2]. Children with unmet HRSNs have substantially elevated risks for cognitive, behavioral, and developmental difficulties, having greater risk than peers from resource secure households [3–6]. They are also 18% more likely to have emergency department (ED) visits and 36% more likely to experience hospitalizations compared with children without social risks (IRR = 1.18, 95% CI [1.12–1.23]; 1.36, 95% CI [1.26–1.47]) [7]. This burden is especially pronounced among infants younger than 2 years, for whom early-life HRSNs, such as food insecurity or benefit instability, are linked to measurable behavioral and regulatory difficulties by six months of age (aOR= 1.64–1.86 across domains; aOR= 2.16 for 2 or more identified HRSNs) [8]. This early-life concentration represents a high-yield prevention target and reinforces primary care as the locus for proactive screening and linkage to services [7–9].
Pediatric primary care clinics (PCCs) are therefore critical access points for early identification and intervention but current screening practices are constrained by staffing shortages, workflow inefficiencies, and high operational costs [10]. Caregivers are also often reluctant to disclose needs due to stigma, language barriers, or confusion about service navigation, which further exacerbates under-identification [11]. Referral completion is constrained by a combination of structural, provider-level, and caregiver-level barriers. Structural barriers include complex eligibility requirements, poor coordination, and limited availability of community resources. Provider-level barriers, such as limited familiarity with available services and uncertainty about referral processes, further hinder follow-through. Caregivers also face barriers related to time, transportation, and competing demands that make service access difficult [12–14].
Despite the availability of standardized screening instruments, fewer than 25% of hospitals and 16% of physician practices routinely screen for HRSNs [15]. When implemented, screening is often limited to episodic encounters such as annual well-child visits, missing acute and urgent social needs at other timepoints [16,17]. In this cohort, particularly the caregivers of infants (ages 0–2) are disproportionately affected by the lack of screening [3,8]. For instance,food insecurity in this age group is strongly associated with adverse developmental outcomes [18], higher hospitalization rates [19], and increased risk of chronic conditions including obesity and diabetes [20]. The barriers are particularly pronounced for Medicaid-insured children, who experience greater challenges in securing basic necessities such as infant formula, diapers, and transportation [21,22].
Rationale for AI-driven solutions
AI-based chatbots (also known as conversational agents and virtual assistants) offer an opportunity to overcome some of these structural and provider-related barriers in HRSN identification and referral. Unlike traditional clinician-initiated screening, chatbots can provide continuous, user-initiated engagement across multiple contexts and time points, enabling real-time identification of urgent needs [23–25]. By facilitating private conversations, chatbots play a positive role to reduce stigma and support honest disclosure for some groups [11,26–28]. They can also automate referrals using up-to-date community resource databases, and reduce provider burden while offering tailored, context-specific recommendations [25,29,30]. These advantages are further supported by our preliminary engagement sessions with social work teams, caregivers, and patient advocates, who indicated high acceptance of chatbot-based social needs assessment, provided that privacy and usability are prioritized [23,31].
Conceptual framework
The methods for this study were informed by the Obesity-Related Behavioral Intervention Trials (ORBIT) model [32,33], which provides a systematic framework for developing behavioral interventions by progressing through iterative phases (from early conceptualization and definition of intervention components, to feasibility and pilot testing, and ultimately to efficacy and effectiveness trials). Prior applications of conversational AI in healthcare, including tools for mental health support, symptom tracking, and caregiver-facing applications, demonstrate early evidence and implications for digital engagement strategies [34–39]. Building on this foundation, the present pilot randomized controlled trial (pRCT) corresponds to Phase IIb of ORBIT, which emphasizes feasibility testing, establishing acceptability, and refinement of study protocols before progression to full-scale efficacy trials.
Study objectives
The long-term goal of this research is to develop an empirically supported, scalable, AI-driven solution to facilitate HRSN identification and referral in pediatric primary care. This pilot study has three specific objectives: (1) To assess the feasibility, acceptability, and usability of the DAPHNE chatbot with caregivers and providers of pediatric primary care patients compared with standard care; (2) To evaluate the feasibility and acceptability of the study protocol, including recruitment, retention, and measurement strategies, for informing design of a future large-scale trial; and (3) To characterize the current standard of care for addressing unmet HRSNs, thereby informing selection of appropriate comparator conditions and outcomes in future trials.
Preliminary work
We have conducted several studies to inform DAPHNE’s development. A scoping review of patient-facing chatbots and voice agents revealed high interest but a lack of rigorous trials on chatbot-based interventions and indicating the need for systematic evaluation [40]. Building upon this knowledge, we developed a voice-interactive diary for caregivers of children with special health care needs; over 80% reported ease of use and approximately half increased health tracking frequency, demonstrating caregiver receptivity to conversational tools [39]. To enable integration of unstructured caregiver data, we created a natural language processing pipeline that accurately structured caregiver-generated notes and transcripts (F1 > 0.7), validating real-time data capture for clinical use [41,42]. Finally, in a feasibility study of a semi-functional DAPHNE prototype (n = 13), community health and social workers rated usability at 72/100 on the System Usability Scale and emphasized its value for triaging referrals [23]. Together, these studies establish feasibility, provider and caregiver acceptance, and the need for a pilot randomized controlled trial.
Methods
Study design and setting
This study is designed as a pilot randomized controlled trial (pRCT) guided by the ORBIT model (Phase IIb) [32,33]. One hundred caregivers of children (prioritizing ≤2 years of age) will be recruited from Nationwide Children’s Hospital Primary Care Network (NCH PCN). Following baseline assessment, participants will be randomized in a 1:1 ratio to either the standard care group or DAPHNE + standard of care group (n = 50 per arm). In addition, 10 providers will be recruited to complete surveys on workflow integration and HRSN practices.
The trial will be conducted in NCH PCCs, which provided nearly 250,000 visits in 2024 for mostly Medicaid-enrolled patients. The PCN includes 14 clinics across the Columbus metropolitan area. For the purpose of this study, we identified clinics with the capacity to support the research project and a relatively higher number of patients with unmet HRSNs.
Ethics
The study has Institutional Review Board (IRB) approval at Nationwide Children’s Hospital (#00004369). Informed consent will be obtained from all participants. For Electronic Health Records (EHR) data extraction, consent for the child’s participation will be obtained by one parent. If recruited, children 9 and above will provide informed assent before research procedures begin.
Study status and timeline
This study is currently in the Initiation and Planning phase, with the protocol finalized and Institutional Review Board (IRB) approval obtained (NCH IRB #00004369). We initiated the process of participant recruitment in Q1, Year 1, and are actively enrolling participants. The time schedule is provided in Fig 1. We project that participant recruitment will be completed in Q2 of Year 2, primary data collection (6-month follow-up) will be completed in Q1 of Year 3, and final results are expected for manuscript submission in Q4 of Year 3.
Participants
Caregivers. Recruitment will occur through the NCH PCN, with research staff approaching caregivers during routine visits. Our target recruitment is n = 50 participants per arm, facilitated using flyers and social worker referrals. Furthermore, we will communicate with potential participants who are interested via virtual methods (phone call, teleconference). Inclusion criteria are: (1) primary caregiver of a child (prioritizing ≤2 years old) receiving care at the selected clinics in the PCN, (2) self-identify with at least one unmet HRSN (3) ability to provide informed consent and complete study surveys in English, (4) ownership of a mobile device for chatbot use, and (5) willingness to participate in follow-up assessments. Exclusion criteria: none beyond failure to meet the above criteria. Caregivers will be compensated for their participation.
Providers.
To evaluate the feasibility of integrating the DAPHNE chatbot into clinical workflows at primary care clinics, we will recruit approximately 10 providers across the NCH PCN, ensuring representation from at least 50% of the 14 PCC sites. Eligible providers include physicians, nurses, social workers, and care coordinators who are directly involved in patient care. Providers on extended leave or those not engaged in patient care will be excluded. Recruitment will be facilitated through the NCH Primary Care Research Network.
DAPHNE Chatbot development and structure
DAPHNE was developed using secure cloud infrastructure (via AWS). At the backend, we integrated state-of-the-art large language models (e.g., Claude, GPT, Mistral, Gemini, LLaMA) adapted for healthcare settings and enabling intent recognition, contextual follow-up, and dynamic retrieval of resources from the resources API. Unlike rule-based decision trees, the AI allows free-text input from caregivers, interpreting varied language patterns (e.g., “I can’t get to the clinic” means transportation needs) and providing tailored responses. It demonstrated >99% intent accuracy over 1,500 turns of conversations [31].
DAPHNE is provided as a web-based conversational agent accessible on iOS, Android, or web browsers (Fig 2). It is deployed using HIPAA-compliant servers and validated for accuracy and security by the internal Information Services governance committee. The chatbot will not require computational power on the user’s side, therefore a simple smartphone could be used to engage with the app. Our conversational model is also optimized for scaling at the cloud side with minimal cost and computational resource needs.
Created in BioRender. Sezgin, E. (2026) https://BioRender.com/n6anxpk.
To ensure safety, DAPHNE automatically flags out-of-scope or potentially concerning interactions, and all conversations are subject to monitoring and auditing protocols. The research team reviews flagged transcripts to prevent harmful content and safeguard patient well-being. In addition, the research team periodically reviews the chatbot logs. For urgent or high-risk needs, an escalation protocol is used directing caregivers to professional resources, while the research team coordinates with the social work team as appropriate.
Scope of HRSN and DAPHNE chatbot
The chatbot’s scope is intentionally bounded and aligned with nationally recognized HRSN screening frameworks used in primary care, including guidance from SIREN, the PhenX Toolkit social determinants measures, and CMS Accountable Health Communities domains [44–46]. DAPHNE focuses on four predefined domains: (1) food insecurity and essential nutritional needs (e.g., infant formula), (2) housing instability, (3) transportation barriers, and (4) financial strain and material hardship (including utilities).
Conversational inputs are mapped to these predefined categories using structured intent classification, and referrals are limited to domain-specific community resources retrieved through a community resource database [47]. Domains outside this scope (urgent needs, high risk cases; e.g., interpersonal violence, mental health crisis, evictions and urgent food access) trigger an escalation protocol directing caregivers to appropriate professional or emergency services. This bounded framework ensures conceptual clarity, safety, and alignment with established HRSN screening practices and services in pediatric primary care.
Intervention: DAPHNE Chatbot
Caregivers randomized to the intervention arm will receive onboarding consisting of instructions on how to download the app and a brief tutorial on the app’s features. The chatbot engages caregivers in natural conversations to understand HRSNs, provide tailored resource recommendations (e.g., food pantries, housing, transportation), and follow-up with automated reminders. Features include periodic personalized notifications (e.g., informational quick tips, updates), information about local services, and guidelines to sustain engagement.
Sequence generation and concealment
The randomization sequence will be generated by the study member who is not involved in recruitment using the NCI Clinical Trial Randomization Tool (maximal procedure; maximum tolerated imbalance = 2) [48]. The sequence will be uploaded into the REDCap randomization module with role-based access controls [49]. Research staff conducting eligibility screening and baseline assessments will not have access to the allocation list. Assignment will be released automatically by REDCap only after completion of baseline measures and confirmation of eligibility, preventing prediction of future assignments. Only the intervention coordinator responsible for onboarding will be notified of allocation after assignment release.
Control: Standard care
Caregivers in the control arm will receive the existing clinic-based HRSN screening and referral procedures as currently implemented, without access to the chatbot. These caregivers will complete the same study assessments as the intervention group except those related to chatbot usability and feasibility of the intervention.
Contamination and standard care
To minimize contamination, intervention access will be restricted to unique participant credentials, and control participants will not be provided DAPHNE links or onboarding materials. At each follow-up assessment (1, 3, and 6 months), participants will be asked whether they have used DAPHNE outside the study or accessed any other digital HRSN screening tools. Any reported cross-exposure will be documented.
Usual care will be characterized and documented systematically. At study initiation, we will document clinics describing current HRSN screening instruments, workflow roles (e.g., medical assistant, nurse, social worker), timing of screening, referral pathways, and EHR documentation practices (where applicable). In addition, EHR abstraction will include primary care visits, number of missed appointments, number of ER visits, HRSN screening and referrals (social work encounters) for all participants. Follow-up calls with caregivers in the standard care arm will capture services received outside the health system (e.g., community-based referrals not documented in the EHR). These procedures will allow comparison of intervention and standard care exposure.
Procedures
All caregivers will complete caregiver-reported outcome measures (cPROMs; see “secondary outcomes” section below) at baseline, 3 months, and 6 months post-intervention. Demographic information will be collected from all caregivers at baseline. Intervention group participants will have on-demand access to the chatbot throughout the 6-month study period, and will receive weekly notifications from the chatbot app. 10 caregivers from both groups will also complete a qualitative interview at 3 months and 6 months to evaluate their experience using the chatbot app versus receiving the standard of care.
Provider component
After receiving a brief demonstration of the DAPHNE chatbot, providers will share their feedback and complete a structured survey on current HRSN screening and referral practices. Providers will complete the Workflow Integration Expectancy (WIE) tool and the Feasibility of Intervention Measure (FIM) at baseline and 6 months; providers will also complete the SUS at baseline. Additionally, providers will complete a survey on the feasibility of implementing the chatbot into their clinical workflows at baseline and 6 months. Semi-structured interviews will be conducted to elicit feedback on integration challenges, potential facilitators, and perceived impact on team workflow. Data from provider surveys will be analyzed descriptively, and interview transcripts will undergo thematic analysis to identify patterns in feasibility, barriers, and opportunities for scaling.
Outcomes and measures
Primary outcome.
The primary outcomes will evaluate study feasibility (recruitment, retention, and survey completion across both arms) and intervention usability/acceptability. Feasibility will be assessed through enrollment rates (target ≥ 70%), retention rates (target ≥ 70%), and implementation feasibility using the Feasibility of Intervention Measure (FIM; target ≥ 80%) [50,51]. Acceptability will be measured with the Website Evaluation Questionnaire (WEQ), with success defined as ≥ 80% endorsement of top rating categories [52]. Usability will be assessed using the System Usability Scale (SUS), with a mean score ≥ 68 considered acceptable [53]. Technical performance will also be evaluated, including chatbot response accuracy (target F1 ≥ 0.7 [54]), response latency (<3 seconds), and differences across platforms (iOS vs. Android).In addition, caregiver comprehension will be assessed using the Patient Comprehension Questionnaire (PCQ), with success defined as a mean score ≥4.The PCQ will be administered to caregivers in the intervention arm at baseline and 3 months.
To further assess the DAPHNE chatbot, qualitative analysis will be completed for caregiver and provider interviews. This qualitative data will overlap with quantitative measures to gain a stronger understanding of provider and caregiver perspectives on feasibility, acceptability, and usability of the DAPHNE chatbot. For caregivers in the intervention group, qualitative interviews will be coded to evaluate factors that drive caregiver engagement with the DAPHNE chatbot and encourage use of the chatbot as well as barriers that may impact caregiver ability or want to use the chatbot. Interviews for caregivers in the control group will provide information about the current standard of care for caregivers with social needs and allow a comparison between resources or suggested resources caregivers receive in the current standard of care versus resources or suggested resources caregivers received while using the chatbot. Caregiver interviews across both groups will also be evaluated for caregiver perspectives toward study burden to determine the acceptability and feasibility of completing a larger scale trial of the DAPHNE chatbot. Clinician interviews will give first-hand insight into how well the DAPHNE chatbot could be integrated into clinic workflow by focusing on how useful the chatbot is perceived to be in a clinic setting, the challenges that may arise from using the chatbot in a clinic setting, and the ease with which the chatbot would be able to be integrated and possibly enhance the current standard of care.
Secondary outcomes.
Secondary outcomes address caregiver- and provider-level effects. Caregiver-reported outcomes (cPROMs) include stress [55], self-efficacy [56], satisfaction with access to community resources [57], and quality of life for caregiver and child as proxy [58]. These cPROMs will be completed by caregivers in both arms at baseline, 3, and 6 months. In addition to cPROMs, caregivers in both arms will complete the standardized HRSN screener. This survey will be administered at 3 and 6 months to capture changes in reported needs over time. Caregiver engagement with the chatbot is measured through system logs (e.g., minutes of use, number of logins, session durations, screenings completed, and resources accessed). Caregiver burden associated with study procedures will be measured in both arms using the Participant Burden Assessment (PeRBA; target ≥4) at 1, 3, and 6 months [59]. Provider perspectives on workflow integration will be assessed using the Workflow Integration Expectancy tool, with a target rating ≥ 4 [60]. These secondary outcomes will be analyzed to see how they change over the course of the trial and to eventually address whether there are any associations with outcome changes and chatbot use.
Exploratory outcomes.
Exploratory analyses will focus on healthcare utilization and economic outcomes to inform future large-scale trials. Electronic health record (EHR) data will be extracted to capture documentation of HRSN screening and referrals, adherence to well-child visits, missed appointments, emergency department utilization, and estimated healthcare costs. EHR data will be abstracted for all participants at 3- and 6-month follow-up period to capture pediatric primary care visits, social work referrals, and emergency department utilization. Billing and visit data will be used to estimate healthcare costs, including cost avoidance from reduced no-shows or ED visits. While the study is not powered to detect cost effects, descriptive comparisons between the DAPHNE and standard care arms will be conducted to identify patterns in service utilization and associated costs. Findings will inform the selection of economic outcomes and power calculations for a future NIH R01 trial. Additional exploratory outcomes will include continuous tracking of recruitment rates and dropout rates throughout the study period. Outcome measures, descriptions and planned timeline are also available at our registered protocol [61].
The details of study endpoints, including instruments, scoring rules, directionality, and primary timepoints are provided in Supporting Information (S1 Table).
Sample size justification
This pilot trial is designed to estimate feasibility parameters rather than to test clinical efficacy. A prespecified feasibility benchmark is 6-month retention ≥70% (proportion of randomized caregivers completing the 6-month assessment). This target was selected because retention is a conservative, study-wide indicator of feasibility for a 6-month caregiver-facing digital intervention and provides guidance for assessing the viability and planning parameters of a future trial. Consistent with the ORBIT framework, this benchmark will inform, rather than solely determine, decisions regarding refinement of study procedures and progression to a larger efficacy study.
With n = 100 randomized caregivers, if retention is 70%, the corresponding 95% confidence interval (Wilson method) [62] will have an expected half-width of approximately ±9 percentage points, providing sufficient precision to inform decisions for a future trial.
For secondary outcomes, the intervention arm will allow estimation of mean SUS scores with acceptable precision. Assuming a standard deviation of approximately 12–15 points, the 95% confidence interval for the mean SUS will have an expected half-width of approximately ±3–6 points. Confidence intervals for proportions will be calculated using Wilson intervals, and confidence intervals for continuous outcomes will use t-based methods (or bootstrap methods if distributional assumptions are not met).
Data analysis
Quantitative data (survey scores, engagement metrics, retention rates) will be summarized using descriptive statistics (e.g., mean, median, SD and IQR). Qualitative data will be transcribed and analyzed using thematic content analysis (hybrid inductive-deductive approach) in NVivo by two independent coders with ≥85% inter-rater reliability. Integration of quantitative and qualitative findings will follow a convergent mixed-methods design [63,64], in which both data streams are analyzed separately and then merged during interpretation. Integration will occur through joint display and narrative weaving, identifying points of convergence, divergence, and complementarity to refine the intervention and inform future trials.
Longitudinal analysis of secondary outcomes
Continuous repeated outcomes (e.g., caregiver stress, self-efficacy, satisfaction with access to resources, and quality of life) will be analyzed using linear mixed-effects models with fixed effects for intervention arm, time (categorical), arm x time interaction, and baseline value of the outcome. A participant-level random intercept will account for within-subject correlation. Clinic site and baseline HRSN burden will be included as prespecified covariates.
Consistent with the ORBIT Phase IIb framework, these analyses are intended to examine directionality, consistency, and potential clinical relevance of change across study arms rather than to provide definitive estimates of treatment effect. Estimates with 95% confidence intervals will be reported descriptively. Interpretation will focus on patterns of improvement, variability, and the proportion of participants demonstrating meaningful change to inform whether progression to a fully powered efficacy trial is warranted.
Binary repeated outcomes will be analyzed using generalized linear mixed models with a logit link and participant-level random intercepts to account for within-subject correlation. The same prespecified covariates used in the continuous outcome models will be included. Results will be interpreted as exploratory indicators.
Missing data.
For longitudinal secondary and exploratory outcomes (e.g., stress, self-efficacy, satisfaction, quality of life), the mixed model will account for missing data that are missing at random. Covariates associated with missing data occurrence will be included in the models. Multiple imputation by chained equations [65] under a missing-at-random assumption will address missing baseline data and will also be applied as a sensitivity analysis to assess the robustness of results to alternative missing-data assumptions. The imputation model will include intervention arm, clinic site, baseline HRSN burden, baseline outcome values, time indicators, and prespecified predictors of missingness (e.g., demographic characteristics and prior engagement measures). Twenty or more imputed datasets will be generated, and parameter estimates will be pooled using Rubin’s rules. For feasibility-related process measures (e.g., survey completion rates), worst-case assumptions will be examined where appropriate to assess robustness of conclusions.
Randomization and analysis population
The primary analytic approach will follow the intention-to-treat (ITT) principle, whereby all randomized participants are analyzed in their originally assigned group regardless of adherence or withdrawal. This approach preserves comparability between arms and reflects real-world implementation.
Because this pilot does not use stratified randomization, baseline characteristics (including clinic site and baseline HRSN burden) will be summarized descriptively by arm. If meaningful imbalances in baseline characteristics are observed by chance, we will maintain the original allocation and refrain from re-randomization or post-hoc modification. Instead, to enhance precision and reduce sensitivity to baseline differences, we will address any imbalance analytically using prespecified covariate adjustment in secondary and exploratory models. Adjusted models will include clinic (fixed effect), baseline HRSN burden (continuous or categorized as 1 vs ≥ 2), baseline value of the outcome when available, child age, and insurance type. We will report both unadjusted and adjusted estimates and interpret results as exploratory, focusing on directionality and consistency of findings (and, where applicable, the proportion demonstrating meaningful change) rather than statistical significance.
Data monitoring and risk management
All study data will be collected through HIPAA-compliant electronic systems and stored on encrypted Nationwide Children’s Hospital servers. Data integrity will be ensured through automated validation checks, routine audits, and adherence to standardized abstraction protocols. Technical support will be available via a dedicated helpdesk during working hours, with all issues logged for iterative improvement.
Potential risks include participant frustration due to usability issues, privacy concerns, or inaccurate chatbot responses. To address these risks, the study incorporates caregiver training during onboarding, visual and video guides, and reassurance that DAPHNE complements rather than replaces clinical care. The chatbot’s performance will be continuously monitored for response accuracy and timeliness, with misinformation or errors flagged and corrected during weekly team reviews. Provider oversight will be integrated through periodic reporting of chatbot-identified needs and referrals to ensure appropriate follow-up. These strategies are designed to maintain caregiver trust, ensure accurate outputs, and safeguard participants throughout the study.
An independent Data Safety and Monitoring Board, consisting of 4 experts in primary care practices, AI/ML research, and biostatistics will oversee the trial to ensure participant safety, data integrity, and overall trial conduct. The board, which is independent of the study sponsor and funders, will meet semi-annually to review cumulative safety data, enrollment metrics, and protocol adherence. They will provide recommendations directly to the Principal Investigator and the managing institution regarding the continuation, modification, or termination of the study.
Discussion
This protocol describes a pilot randomized controlled trial of DAPHNE, an AI-driven conversational agent designed to improve identification of health-related social needs and facilitate caregiver access to essential community resources. The study has potential to address the access barriers in pediatric primary care, where unmet HRSNs remain a major driver of adverse child health outcomes, yet routine screening and referral processes are underutilized [15–17].
By focusing on feasibility, acceptability, and usability, this study will generate critical data to guide refinement of both the intervention and the study protocol. Specifically, it will assess caregiver and provider engagement, evaluate technical performance across devices, and characterize barriers to study participation. Complementary analyses of provider surveys and EHR data will contextualize the current standard of care, providing essential information for selecting comparator conditions and outcomes in future clinical trials.
The results will directly inform a larger NIH R01 trial evaluating the efficacy and implementation of AI-driven HRSN screening. Consistent with the ORBIT model [32,33]. this study represents an intermediate phase (IIb) in behavioral intervention development, building the foundation for robust testing of clinical and operational outcomes. Longer-term directions include expansion to non-English-speaking populations, integration of voice interactivity, and cost-effectiveness analyses to assess healthcare savings from reduced no-shows and emergency utilization.
If successful, DAPHNE could advance personalized, equitable pediatric care by enabling real-time HRSN screening, reducing stigma in caregiver disclosures, and streamlining referral pathways. Beyond pediatrics, this approach has the potential to provide a generalizable framework for integrating AI-driven conversational tools into routine healthcare delivery and public health systems.
Strengths and limitations
This study has several important strengths. First, DAPHNE introduces an innovative, AI-driven approach to health-related social need (HRSN) screening, using natural language, private, and context-aware interactions that move beyond static questionnaires [23–25,29,30]. Second, the intervention was co-designed with caregivers, clinicians, and advocates, ensuring that usability, engagement, and contextual relevance are embedded in its design [23]. Third, the study emphasizes integration into pediatric primary care workflows, which is essential to reduce burden on care teams while maintaining alignment with provider responsibilities [66,67]. Fourth, the trial is guided by the ORBIT model [32,33], which provides a systematic framework for developing and iteratively evaluating the DAPHNE behavioral intervention. Finally, the modular design of backend resource integration allows integration with diverse community resource databases, making the intervention scalable and adaptable to local or health-domain specific populations and clinical contexts [68–73].
Nonetheless, there are key limitations. This is a single-site pilot study with a relatively small sample size (n = 100) from specific region/ state, which restricts the generalizability of findings and precludes formal efficacy testing. Results will instead focus on feasibility, acceptability, and usability outcomes to inform future large-scale trials. The study is also limited to English-speaking, smartphone and data plan owner caregivers, which constrains equity in early testing. The 6-month follow-up period may not capture longer-term outcomes such as sustained engagement, resource utilization, or downstream effects on clinical care and health outcomes. Finally, technology-specific challenges, including variable device access, inconsistent internet connectivity, and risks inherent to conversational AI (such as inaccurate responses or hallucinations), may influence feasibility and participant trust [74–77].
Even though we hypothesize the benefits of chatbot, such as the automation of referrals, increased utilization of resources and the potential reduction of provider burden by efficiently identifying HRSNs, we invite readers and practitioners to be cautious. There are implicit factors including potential AI biases, AI related errors and inaccuracies and human overreliance on AI for resource support, which may potentially limit implementation of chatbot at a scale. Technology in our project focuses on supplementing human support rather than replacing, therefore not diminishing the value of personalized assistance and human interaction in practice [78].
Future directions
Consistent with the ORBIT model, this Phase IIb pilot is designed to inform decisions about whether and how to proceed to a Phase III efficacy trial. If predefined feasibility and acceptability benchmarks are met, the next step would be a multi-site Phase III trial evaluating the efficacy of DAPHNE in reducing caregiver-reported unmet HRSNs over 6 months, measured using a standardized screening instrument aligned with the chatbot’s predefined domains. Phase IIb findings will also inform whether a subsequent trial targets overall HRSN burden or focuses on a high-priority, high-prevalence domain (e.g., food insecurity and early childhood material needs) that demonstrates the strongest engagement and referral responsiveness in this cohort.
If feasibility benchmarks are not fully met, findings from this pilot will guide refinement of the intervention, study procedures, or target population prior to additional testing in a subsequent optimization phase, consistent with ORBIT’s iterative development framework. In either case, pilot-derived estimates of prevalence, engagement patterns, and variability will inform endpoint selection and trial design for future studies. Secondary outcomes in a future efficacy trial would likely include referral uptake, caregiver-reported psychosocial measures, and selected healthcare utilization metrics.
Conclusions
This pilot randomized controlled trial will provide essential feasibility, acceptability, and usability data on the DAPHNE chatbot for addressing health-related social needs in pediatric primary care. Findings will inform refinement of both the intervention and the study protocol, directly supporting the design of a future NIH R01 trial. Guided by the ORBIT framework, the study emphasizes early-phase testing as a foundation for scalable, rigorous evaluation. In future research, the trial will expand to larger and more diverse populations, incorporate non-English language options and voice interactivity, and integrate cost-effectiveness analyses to assess the impact on healthcare utilization. Ultimately, this work will determine whether AI-driven conversational agents can serve as a sustainable and equitable model for integrating social needs identification and referral into routine pediatric care.
Acknowledgments
We gratefully acknowledge the support from the AWS Health Equity Initiative (via AWS Social Responsibility and Impact credits), as well as the FindHelp.org Fellowship to maintain community resource navigation. Our appreciation is also extended to the NCH IT-RI team for their expertise in platform development and to the NCH Office of Technology Commercialization for their guidance in scaling and dissemination.
References
- 1. Hood CM, Gennuso KP, Swain GR, Catlin BB. County Health Rankings: Relationships Between Determinant Factors and Health Outcomes. Am J Prev Med. 2016;50(2):129–35. pmid:26526164
- 2. Magnan S. Social Determinants of Health 201 for Health Care: Plan, Do, Study, Act. NAM Perspect. 2021;2021. pmid:34532697
- 3. Council on Community Pediatrics. Poverty and child health in the United States. Pediatrics. 2016;137:e20160339.
- 4. Cree RA, Bitsko RH, Robinson LR, Holbrook JR, Danielson ML, Smith C, et al. Health Care, Family, and Community Factors Associated with Mental, Behavioral, and Developmental Disorders and Poverty Among Children Aged 2-8 Years - United States, 2016. MMWR Morb Mortal Wkly Rep. 2018;67(50):1377–83. pmid:30571671
- 5.
National Academies of Sciences, Engineering, and Medicine. Integrating social care into the delivery of health care: Moving upstream to improve the nation’s health. Health and Medicine Division, Board on Health Care Services, Committee on Integrating Social Needs Care into the Delivery of Health Care to Improve the Nation’s Health, editors. Washington, D.C.: National Academies Press. 2020. https://doi.org/10.17226/25467
- 6. Boch S, Keedy H, Chavez L, Dolce M, Chisolm D. An Integrative Review of Social Determinants of Health Screenings used in Primary Care Settings. J Health Care Poor Underserved. 2020;31(2):603–22. pmid:33410796
- 7. Rigdon J, Montez K, Palakshappa D, Brown C, Downs SM, Albertini LW, et al. Social Risk Factors Influence Pediatric Emergency Department Utilization and Hospitalizations. J Pediatr. 2022;249:35-42.e4. pmid:35697140
- 8. Anyigbo C, Liu C, Ehrlich S, Reyner A, Ammerman RT, Kahn RS. Household Health-Related Social Needs in Newborns and Infant Behavioral Functioning at 6 Months. JAMA Pediatr. 2024;178(2):160–7. pmid:38147349
- 9. Chang L, Stewart AM, Monuteaux MC, Fleegler EW. Neighborhood Conditions and Recurrent Emergency Department Utilization by Children in the United States. J Pediatr. 2021;234:115-122.e1. pmid:33395566
- 10. Nerlinger AL, Kopsombut G. Social determinants of health screening in pediatric healthcare settings. Curr Opin Pediatr. 2023;35(1):14–21. pmid:36301073
- 11. Hsu C, Cruz S, Placzek H, Chapdelaine M, Levin S, Gutierrez F, et al. Patient Perspectives on Addressing Social Needs in Primary Care Using a Screening and Resource Referral Intervention. J Gen Intern Med. 2020;35(2):481–9. pmid:31792864
- 12. Savin K, Morales A, Levi R, Alvarez D, Seligman H. “Now I Feel a Little Bit More Secure”: The Impact of SNAP Enrollment on Older Adult SSI Recipients. Nutrients. 2021;13(12):4362. pmid:34959914
- 13. Algert SJ, Reibel M, Renvall MJ. Barriers to participation in the food stamp program among food pantry clients in Los Angeles. Am J Public Health. 2006;96(5):807–9. pmid:16571694
- 14. Kreuter M, Garg R, Thompson T, McQueen A, Javed I, Golla B, et al. Assessing The Capacity Of Local Social Services Agencies To Respond To Referrals From Health Care Providers. Health Aff (Millwood). 2020;39(4):679–88. pmid:32250682
- 15. Fraze TK, Brewster AL, Lewis VA, Beidler LB, Murray GF, Colla CH. Prevalence of Screening for Food Insecurity, Housing Instability, Utility Needs, Transportation Needs, and Interpersonal Violence by US Physician Practices and Hospitals. JAMA Netw Open. 2019;2(9):e1911514. pmid:31532515
- 16. Dolce M, Keedy H, Chavez L, Boch S, Zaim H, Dias B, et al. Implementing an EMR-based Health-related Social Needs Screen in a Pediatric Hospital System. Pediatr Qual Saf. 2022;7(1):e512. pmid:35071955
- 17. Victorino CC, Gauthier AH. The social determinants of child health: variations across health outcomes - a population-based cross-sectional analysis. BMC Pediatr. 2009;9:53. pmid:19686599
- 18. Bitsko RH, Visser SN, Schieve LA, Ross DS, Thurman DJ, Perou R. Unmet health care needs among CSHCN with neurologic conditions. Pediatrics. 2009;124 Suppl 4:S343-51. pmid:19948598
- 19. Pantell MS, Kaiser SV, Torres JM, Gottlieb LM, Adler NE. Associations Between Social Factor Documentation and Hospital Length of Stay and Readmission Among Children. Hosp Pediatr. 2020;10(1):12–9. pmid:31888952
- 20. Ryan JL, Franklin SM, Canterberry M, Long CL, Bowe A, Roy BD, et al. Association of Health-Related Social Needs With Quality and Utilization Outcomes in a Medicare Advantage Population With Diabetes. JAMA Netw Open. 2023;6(4):e239316. pmid:37083665
- 21. Cook JT, Frank DA, Berkowitz C, Black MM, Casey PH, Cutts DB, et al. Food insecurity is associated with adverse health outcomes among human infants and toddlers. J Nutr. 2004;134(6):1432–8. pmid:15173408
- 22. Gray TW, Podewils LJ, Rasulo RM, Weiss RP, Tomcho MM. Examining the Implementation of Health-Related Social Need (HRSN) Screenings at a Pediatric Community Health Center. J Prim Care Community Health. 2023;14:21501319231171519. pmid:37148221
- 23. Sezgin E, Kocaballi AB, Dolce M, Skeens M, Militello L, Huang Y, et al. Chatbot for Social Need Screening and Resource Sharing With Vulnerable Families: Iterative Design and Evaluation Study. JMIR Hum Factors. 2024;11:e57114. pmid:39028995
- 24. Abbasian M, Khatibi E, Azimi I, Oniani D, Shakeri Hossein Abad Z, Thieme A, et al. Foundation metrics for evaluating effectiveness of healthcare conversations powered by generative AI. NPJ Digit Med. 2024;7(1):82. pmid:38553625
- 25. Xue J, Zhang B, Zhao Y, Zhang Q, Zheng C, Jiang J, et al. Evaluation of the Current State of Chatbots for Digital Health: Scoping Review. J Med Internet Res. 2023;25:e47217. pmid:38113097
- 26. Johnson CB, Luther B, Wallace AS, Kulesa MG. Social Determinants of Health: What Are They and How Do We Screen. Orthop Nurs. 2022;41(2):88–100. pmid:35358126
- 27. Wallace AS, Luther BL, Sisler SM, Wong B, Guo J-W. Integrating social determinants of health screening and referral during routine emergency department care: evaluation of reach and implementation challenges. Implement Sci Commun. 2021;2(1):114. pmid:34620248
- 28.
Tian W, Lu Y, Yu J, Fan J, Tang P, Zhang L. A Privacy-Preserving Framework for Mental Health Chatbots Based on Confidential Computing. In: 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta), 2022. 1119–24. https://doi.org/10.1109/smartworld-uic-atc-scalcom-digitaltwin-pricomp-metaverse56740.2022.00160
- 29. Grassini E, Buzzi M, Leporini B, Vozna A. A systematic review of chatbots in inclusive healthcare: insights from the last 5 years. Univ Access Inf Soc. 2024;24(1):195–203.
- 30. Milne-Ives M, de Cock C, Lim E, Shehadeh MH, de Pennington N, Mole G, et al. The Effectiveness of Artificial Intelligence Conversational Agents in Health Care: Systematic Review. J Med Internet Res. 2020;22(10):e20346. pmid:33090118
- 31. Sezgin E, Jackson DI, Hussain S-A, Kocaballi AB, Richardson C, Skeens M, et al. Evaluating User Experiences with an AI Chatbot for Health-Related Social Needs: A Cross-Sectional Mixed Methods Study. Health Informatics. medRxiv; 2025. Available: https://www.medrxiv.org/content/10.1101/2025.06.16.25329054v1.full.pdf
- 32. Czajkowski SM, Powell LH, Adler N, Naar-King S, Reynolds KD, Hunter CM, et al. From ideas to efficacy: The ORBIT model for developing behavioral treatments for chronic diseases. Health Psychol. 2015;34(10):971–82. pmid:25642841
- 33.
Powell LH, Freedland KE, Kaufmann PG. Behavioral clinical trials for chronic diseases. 1st ed. Cham, Switzerland: Springer Nature. 2021. https://doi.org/10.1007/978-3-030-39330-4
- 34. Boucher EM, Harake NR, Ward HE, Stoeckl SE, Vargas J, Minkel J, et al. Artificially intelligent chatbots in digital mental health interventions: a review. Expert Rev Med Devices. 2021;18(sup1):37–49. pmid:34872429
- 35. Fitzpatrick KK, Darcy A, Vierhile M. Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial. JMIR Ment Health. 2017;4(2):e19. pmid:28588005
- 36. Winn AN, Somai M, Fergestrom N, Crotty BH. Association of Use of Online Symptom Checkers With Patients’ Plans for Seeking Care. JAMA Netw Open. 2019;2(12):e1918561. pmid:31880791
- 37. Abd-Alrazaq AA, Rababeh A, Alajlani M, Bewick BM, Househ M. Effectiveness and Safety of Using Chatbots to Improve Mental Health: Systematic Review and Meta-Analysis. J Med Internet Res. 2020;22(7):e16021. pmid:32673216
- 38. You Y, Gui X. Self-diagnosis through AI-enabled chatbot-based symptom checkers: User experiences and design considerations. arXiv [cs.HC]. 2021. Available: http://arxiv.org/abs/2101.04796
- 39. Sezgin E, Oiler B, Abbott B, Noritz G, Huang Y. “Hey Siri, Help Me Take Care of My Child”: A Feasibility Study With Caregivers of Children With Special Healthcare Needs Using Voice Interaction and Automatic Speech Recognition in Remote Care Management. Front Public Health. 2022;10:849322. pmid:35309210
- 40. Sezgin E, Militello LK, Huang Y, Lin S. A scoping review of patient-facing, behavioral health interventions with voice assistant technology targeting self-management and healthy lifestyle behaviors. Transl Behav Med. 2020;10(3):606–28. pmid:32766865
- 41. Sezgin E, Hussain S-A, Rust S, Huang Y. Extracting Medical Information From Free-Text and Unstructured Patient-Generated Health Data Using Natural Language Processing Methods: Feasibility Study With Real-world Data. JMIR Form Res. 2023;7:e43014. pmid:36881467
- 42. Hussain S-A, Sezgin E, Krivchenia K, Luna J, Rust S, Huang Y. A natural language processing pipeline to synthesize patient-generated notes toward improving remote care and chronic disease management: a cystic fibrosis case study. JAMIA Open. 2021;4(3):ooab084. pmid:34604710
- 43. Chan A-W, Boutron I, Hopewell S, Moher D, Schulz KF, Collins GS, et al. SPIRIT 2025 statement: Updated guideline for protocols of randomised trials. PLoS Med. 2025;22(4):e1004589. pmid:40294521
- 44. Research on Integrating Social & Medical Care. [cited 23 Sep 2025]. Available: https://sirenetwork.ucsf.edu/
- 45. Krzyzanowski MC, Ives CL, Jones NL, Entwisle B, Fernandez A, Cullen TA, et al. The PhenX Toolkit: Measurement Protocols for Assessment of Social Determinants of Health. Am J Prev Med. 2023;65(3):534–42. pmid:36935055
- 46. Accountable Health Communities Model. [cited 20 Feb 2026]. Available: https://www.cms.gov/priorities/innovation/innovation-models/ahcm
- 47. findhelp.org by findhelp - Search and Connect to Social Care. In: findhelp.org [Internet]. [cited 14 May 2024]. Available: http://findhelp.org
- 48. Clinical Trial Randomization Tool - Clinical Trial Randomization Tool. [cited 20 Feb 2026]. Available: https://prevention.cancer.gov/ctrandomization/tool/
- 49. Stevens L, Kennedy N, Taylor RJ, Lewis A, Harrell FE Jr, Shotwell MS, et al. A REDCap advanced randomization module to meet the needs of modern trials. J Biomed Inform. 2025;171:104925. pmid:41052664
- 50. Anastasi JK, Capili B, Norton M, McMahon DJ, Marder K. Recruitment and retention of clinical trial participants: understanding motivations of patients with chronic pain and other populations. Front Pain Res (Lausanne). 2024;4:1330937. pmid:38606348
- 51. Weiner BJ, Lewis CC, Stanick C, Powell BJ, Dorsey CN, Clary AS, et al. Psychometric assessment of three newly developed implementation outcome measures. Implement Sci. 2017;12(1):108. pmid:28851459
- 52.
Elling S, Lentz L, de Jong M. Website Evaluation Questionnaire: Development of a Research-Based Tool for Evaluating Informational Websites. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2007. p. 293–304. https://doi.org/10.1007/978-3-540-74444-3_25
- 53.
Lewis JR, Sauro J. The Factor Structure of the System Usability Scale. Lecture Notes in Computer Science. Springer Berlin Heidelberg. 2009. p. 94–103. https://doi.org/10.1007/978-3-642-02806-9_12
- 54.
Yuwono SK, Wu B, D’Haro LF. Automated scoring of chatbot responses in conversational dialogue. Lecture Notes in Electrical Engineering. Singapore: Springer Singapore. 2019. p. 357–69. https://doi.org/10.1007/978-981-13-9443-0_31
- 55. Barroso NE, Hungerford GM, Garcia D, Graziano PA, Bagner DM. Psychometric properties of the Parenting Stress Index-Short Form (PSI-SF) in a high-risk sample of mothers and their infants. Psychol Assess. 2016;28(10):1331–5. pmid:26595220
- 56.
Bandura A. Self efficacy. New York, NY: W.H. Freeman. 1997.
- 57.
Owen R. Net Promoter Score and Its Successful Application. Management for Professionals. Springer Singapore. 2018. p. 17–29. https://doi.org/10.1007/978-981-10-7724-1_2
- 58. Flanagan JC. Measurement of quality of life: current state of the art. Arch Phys Med Rehabil. 1982;63: 56–59. Available: https://www.ncbi.nlm.nih.gov/pubmed/6460487
- 59. Lingler JH, Schmidt KL, Gentry AL, Hu L, Terhorst LA. A New Measure of Research Participant Burden: Brief Report. J Empir Res Hum Res Ethics. 2014;9(4):46–9. pmid:26125079
- 60.
Flanagan M, Arbuckle N, Saleem JJ, Militello LG, Haggstrom DA, Doebbeling BN. Development of a workflow integration survey (WIS) for implementing computerized clinical decision support. AMIA Annu Symp Proc. 2011;2011: 427–434. Available: https://www.ncbi.nlm.nih.gov/pubmed/22195096
- 61. AI-Based Personalized Health and Self-Care. In: clinicaltrials.gov [Internet]. [cited 14 Nov 2025]. Available: https://clinicaltrials.gov/study/NCT07168382
- 62. Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Statist Sci. 2001;16.
- 63. Fetters MD, Curry LA, Creswell JW. Achieving integration in mixed methods designs-principles and practices. Health Serv Res. 2013;48(6 Pt 2):2134–56. pmid:24279835
- 64. Creswell JW, David Creswell J. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. SAGE Publications; 2017. Available: https://play.google.com/store/books/details?id=KGNADwAAQBAJ
- 65. White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30(4):377–99. pmid:21225900
- 66. Xu L, Sanders L, Li K, Chow JCL. Chatbot for Health Care and Oncology Applications Using Artificial Intelligence and Machine Learning: Systematic Review. JMIR Cancer. 2021;7(4):e27850. pmid:34847056
- 67. Sezgin E, Jackson DI, Kaufman K, Skeens MA, Gerhardt CA, Moscato E. Perceptions about the use of virtual assistants for seeking health information among caregivers of young childhood cancer survivors. Digit Health. 2025;11:20552076251326160. pmid:40093694
- 68.
Pilato G, Augello A, Gaglio S. A Modular Architecture for Adaptive ChatBots. In: 2011 IEEE Fifth International Conference on Semantic Computing, 2011. 177–80. https://doi.org/10.1109/icsc.2011.68
- 69. Nadarzynski T, Knights N, Husbands D, Graham CA, Llewellyn CD, Buchanan T, et al. Achieving health equity through conversational AI: A roadmap for design and implementation of inclusive chatbots in healthcare. PLOS Digit Health. 2024;3(5):e0000492. pmid:38696359
- 70. Thirunavukarasu AJ. Large language models will not replace healthcare professionals: curbing popular fears and hype. J R Soc Med. 2023;116(5):181–2. pmid:37199678
- 71.
Solanki RK, Rajawat AS, Gadekar AR, Patil ME. Building a Conversational Chatbot Using Machine Learning. Advances in Medical Education, Research, and Ethics. IGI Global. 2023. p. 285–309. https://doi.org/10.4018/978-1-6684-7164-7.ch013
- 72. Singh B, Olds T, Brinsley J, Dumuid D, Virgara R, Matricciani L, et al. Systematic review and meta-analysis of the effectiveness of chatbots on lifestyle behaviours. NPJ Digit Med. 2023;6(1):118. pmid:37353578
- 73. Denecke K, Abd-Alrazaq A, Househ M, Warren J. Evaluation Metrics for Health Chatbots: A Delphi Study. Methods Inf Med. 2021;60(5–06):171–9. pmid:34719011
- 74. Colasacco CJ, Born HL. A Case of Artificial Intelligence Chatbot Hallucination. JAMA Otolaryngol Head Neck Surg. 2024;150(6):457–8. pmid:38635259
- 75. Woo B, Huynh T, Tang A, Bui N, Nguyen G, Tam W. Transforming nursing with large language models: from concept to practice. Eur J Cardiovasc Nurs. 2024;23(5):549–52. pmid:38178303
- 76. Oca MC, Meller L, Wilson K, Parikh AO, McCoy A, Chang J, et al. Bias and Inaccuracy in AI Chatbot Ophthalmologist Recommendations. Cureus. 2023;15(9):e45911. pmid:37885556
- 77. Heo S, Son S, Park H. HaluCheck: Explainable and verifiable automation for detecting hallucinations in LLM responses. Expert Systems with Applications. 2025;272:126712.
- 78. Sezgin E. Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers. Digit Health. 2023;9. pmid:37426593