Figures
Abstract
Introduction
Mental health problems constitute a significant global health challenge due to their rising prevalence and substantial treatment gap. Digital Mental Health Interventions (DMHIs) including mental health chatbots have emerged as promising solutions due to their effectiveness and scalability. Recent advances in Generative Artificial Intelligence (GenAI) have improved the conversational abilities of these chatbots, further amplifying their potential. However, despite instances of inadvertent harm stemming from the unpredictable nature of GenAI, little attention has been paid to user experience and safety of these chatbots.
Objective
This proposed review will explore existing research on GenAI-based mental health chatbots. Specifically, it aims to identify and describe current chatbots, focusing on user experience, safety and risk mitigation strategies.
Methods
The review will follow the Joanna Briggs Institute (JBI) guidelines for conducting scoping reviews. It will also adhere to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Review (PRISMA-ScR). A systematic database search of Medline (PubMed), Scopus, PsycINFO, ACM Digital Library, and IEEE Xplore will be conducted. The database search will be complimented by research-based search engines (Google Scholar and Consensus). Studies focusing on the development, evaluation or implementation of GenAI-based mental health chatbots will be included without limitations to specific disorders or population groups. Two independent reviewers will perform screening and data extraction. The analysis will include descriptive summary and thematic analysis, with results presented in tabular, graphical, and narrative formats.
Conclusion
This review will provide a comprehensive overview of GenAI-based mental health chatbots while identifying innovative practices and knowledge gaps relating to user experience and safety. Findings will inform the ethical development, evaluation and implementation of GenAI-based mental health interventions.
Citation: Olisaeloka L, Richardson C, Vigo D (2026) User experience and safety of generative AI-based mental health chatbots: Scoping review protocol. PLoS One 21(1): e0341631. https://doi.org/10.1371/journal.pone.0341631
Editor: Muthmainnah Muthmainnah Muthmainnah, Universitas Al Asyariah Mandar, INDONESIA
Received: January 17, 2025; Accepted: January 9, 2026; Published: January 23, 2026
Copyright: © 2026 Olisaeloka et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: No datasets were generated or analysed during the current study. All relevant data from this study will be made available upon study completion.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Background
Mental health and substance use disorders affect over one billion people globally, contributing substantially to disability, premature mortality, and economic burden worldwide [1–3]. Despite effective therapeutic approaches, a significant treatment gap persists with a majority of affected individuals remaining untreated [3,4]. This gap is driven by persistent barriers to care including treatment costs, shortages of trained personnel, geographical inaccessibility, and stigma [3].
Digital Mental Health Interventions (DMHI) have emerged as a promising strategy to address some of these challenges and expand access to care. DMHI encompass a range of technology-based tools such as online platforms, mobile apps, chatbots and virtual reality (VR), designed to deliver mental health services and support [5,6]. Asynchronous and self-guided DMHI, like chatbots are especially promising due to their accessibility and scalability [5,7].
Conversational Agents (CA), commonly called chatbots, are software applications that mimic human conversation through text or voice interactions. These agents have been used to deliver mental health interventions for a variety of conditions including depression, anxiety, eating disorders, and substance use [8–13]. However, user engagement with these tools is often limited, attributable to the lack of personalization and dynamic interaction [14–16].
Personalization—the tailoring of an intervention to a user’s unique context—has been shown to enhance engagement, user experience, and effectiveness [12]. However, traditional mental health chatbots are primarily rule-based, relying on pre-programmed conversational flow which limit their flexibility and ability to personalize responses. While retrieval-based chatbots offer more adaptability, they still depend on pre-scripted responses, which restrict their ability to meet complex user needs [17,18]. In contrast, GenAI mental health chatbots powered by large language models (LLMs) can produce more interactive and contextually relevant responses. This allows for natural and tailored empathetic conversations, which emerging evidence suggest may improve engagement and therapeutic outcomes [19–21].
The emergence of LLMs marked a turning point in the development of sophisticated mental health chatbots capable of human-like support [22,23]. However, the same flexibility and sophistication that enhance personalization and user engagement also introduce novel risks and safety challenges. GenAI chatbots may generate misinformation, produce inappropriate or harmful responses, and exhibit algorithmic bias. Further, their “black box” nature make them unpredictable and less reliable especially in crisis situations [24–26]. These concerns have triggered broader debates about the ethical and safe deployment of GenAI in mental healthcare [27,28].
Despite increasing discourse on the ethical application of GenAI for mental health, there remains limited research on how to design and deploy GenAI-based mental health chatbots in effective and safe ways. Existing reviews in this area largely focus on traditional rule- and retrieval-based models [8,10–12]. Notwithstanding, a recent meta-analysis highlighted the superior efficacy of AI-based mental health chatbots compared to traditional ones, due to their ability to simulate empathetic conversations and personalize interactions [21]. Still, there remains a lack of systematic synthesis examining their characteristics, user experience and safety profiles.
In this review, User Experience (UX) is conceptualized according to the International Organization for Standardization (ISO 9241−210), which defines UX as an individual’s “perceptions and responses resulting from the use and/or anticipated use of a product, system or service.” The ISO notes that UX “includes all the users’ emotions, beliefs, preferences, perceptions, physical and psychological responses, behaviours and accomplishments that occur before, during and after use.” [29]. In the context of DMHIs, this encompasses measures of acceptability, usability, perceived impact, and engagement [9,30]. Emerging research highlights both the appeal and pitfalls of GenAI mental health chatbots: users appreciate the engaging on-demand and non-judgmental support, but also express concerns about unreliable or potentially harmful content, as well as the risk of overdependence [31,32].
User safety is another critical consideration for DMHI and refers to how digital tools minimize harm, uphold data protection and privacy, and promote psychological well-being throughout intervention design and delivery [33]. The World Health Organization in its guidelines for digital interventions calls for safety considerations including assessing benefits and harms, ensuring data privacy, and using evidence to guide implementation [34]. The safety of traditional mental health chatbots previously received limited attention, mainly because the rule- and retrieval-based systems were perceived as low-risk [35]. In contrast, GenAI’s unpredictability poses challenges in ensuring reliable and safe responses which could result in serious consequences [36]. For instance, there have been reports of GenAI chatbots offering harmful advice [37], promoting substance use, and soliciting explicit content from minors [31]. In some cases, persistent interactions with GenAI chatbots have been linked to tragic outcomes, including suicide [38,39]. These incidents underscore the urgent need for robust safety protocols. The American College of Physicians has called for transparency, rigorous testing, and focused research to understand and mitigate AI-related risks in healthcare [40].
This scoping review fills a critical research gap relating to user experience and safety of GenAI-based mental health interventions. While, existing reviews have examined broader applications of AI and large language models (LLMs), none have systematically mapped Generative AI-based chatbots or explored how user experience and safety are conceptualized and operationalized within this emerging domain [12,24,41]. The proposed review therefore fills this gap by focusing specifically on LLM-powered chatbots and by integrating both user-centered and safety-oriented perspectives. A preliminary search of MEDLINE, the Cochrane Database of Systematic Reviews, and JBI Evidence Synthesis revealed no published or registered scoping or systematic reviews on this specific topic as of August 2024, when this review was registered.
Review questions
The proposed review seeks to:
- Identify and describe Generative AI-based chatbots developed specifically to deliver mental health interventions.
- Assess how user experience (e.g., acceptability, usability, engagement) are reported in studies of these chatbot interventions.
- Examine the safety mechanisms and risk mitigation strategies integrated during the development and deployment of these chatbot interventions.
Methods
The proposed scoping review will follow guidelines outlined in the Joanna Briggs Institute (JBI) manual for scoping reviews [42] and adhere to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) [43]. The review has been registered in Open Science Framework Registries (doi.org/10.17605/OSF.IO/HSNXA).
Eligibility criteria
The PCC (population, concept, context) framework recommended by JBI was used to develop the scope and eligibility criteria of the review to ensure a clear and effective search strategy [42]. Table 1 presents the inclusion and exclusion criteria.
Search strategy
A preliminary search of MEDLINE was conducted to identify articles on the topic. Keywords and index (MeSH) terms identified from relevant articles were used to develop the full search strategy for MEDLINE (OVID) (S1 Appendix). This search strategy will be adapted to other selected databases: Scopus, PsycINFO, ACM Digital Library and IEEE Xplore. The databases were chosen to capture a broad range of sources from different disciplines related to the review objectives. PubMed was selected for its extensive coverage of publications in medicine and health sciences, while Scopus was chosen to include studies in relevant multidisciplinary areas such as science and technology, medicine, and social sciences. PsycINFO specializes in psychiatry and psychology related articles, making it essential for this review. The ACM Digital Library and IEEE Xplore were included because they index publications relating to AI and NLP application in mental health. The database search will be complemented by research-based search engines (Google Scholar and Consensus) to capture other relevant grey literature. The reference lists of all included sources of evidence will also be screened for additional studies.
Selection of evidence sources
Following the search, all identified citations will be imported into Covidence with duplicates automatically removed [44]. Following a pilot test, titles and abstracts will be screened by two independent reviewers against the eligibility criteria. Potentially relevant sources will be retrieved in full, and the full text of selected citations assessed in detail against the eligibility criteria by the same independent reviewers. At this stage, reasons for exclusion of sources of evidence will be recorded and reported in the scoping review. Any disagreements that arise between the reviewers at each stage of the selection process will be resolved through discussion, or with an additional reviewer. The results of the search and the study inclusion process will be reported in full in the final scoping review and presented in a PRISMA-ScR flow diagram [42].
Data extraction
Data will be extracted by two independent reviewers using a data extraction tool developed by the reviewers. The data extracted will include specific details about the participants, concept, context, study methods and key findings relevant to the review questions. Table 2 lists important data that will be extracted from included studies. These will be used to develop a draft extraction form in Covidence which will be modified and revised as necessary during the data extraction process. As recommended by JBI, any disagreements that arise between the reviewers will be resolved through discussion, or with an additional reviewer, to achieve consensus [42]. Where appropriate, authors of papers will be contacted to request missing or additional data, where required.
Data analysis and presentation
Data will be analysed and presented using descriptive statistics and narrative synthesis, focusing on the review objectives. Data visualization including summary tables, graphs and figures will be used to concisely present findings. Data extraction will be conducted within Covidence, which will also facilitate version control and audit trails. Extracted datasets will be exported to R (v4.2.3) for descriptive analysis and NVivo (v14) for qualitative synthesis, ensuring transparent data management and reproducibility [44,45]. Quantitative data will be summarized using descriptive statistics, including means, standard deviations, and frequency distributions, where applicable. No inferential or meta-analytic procedures will be performed, consistent with the scoping-review design. Qualitative data (e.g., user feedback and narrative findings) will undergo inductive thematic analysis following Braun and Clarke’s six-phase framework [46]. Coding will be performed independently by at least two reviewers, who will iteratively compare and refine themes through reflexive discussion until consensus is reached. An audit trail of coding decisions will be maintained to enhance transparency and trustworthiness [42].
Study characteristics (e.g., author, country, design, population, and context) will be summarized in structured tables and figures accompanied by a narrative overview. To map the GenAI-based chatbot interventions, a dedicated table will outline identified chatbots, their key features (e.g., deployment platform, interaction mode), and mental health problems they target.
User experience measures will be categorized and presented according to common themes such as acceptability, usability, engagement, and personalization, while safety mechanisms will be grouped into pre-deployment and delivery-focused strategies. To enhance conceptual clarity, pre-deployment (development-phase) safety mechanisms will be analyzed separately from deployment-phase risk mitigation strategies. The former includes model-training safeguards, bias and accuracy testing, and data-protection measures applied before user interaction. The latter captures real-world implementation safeguards such as human-in-the-loop oversight, user-support features, crisis-response protocols, and reporting of adverse events. This distinction will guide both data extraction and thematic synthesis. Overall, narrative synthesis will integrate findings across themes, highlighting innovative practices, safety considerations, and implications for future research, policy, and practice. The findings will inform the ethical design, evaluation, and regulation of GenAI tools for mental health care, offering timely guidance for developers, researchers, and policymakers seeking to ensure human-centered and safe deployment.
Ethical considerations
This review involves analysis of publicly available literature and does not require ethical approval. Nonetheless, the broader ethical dimensions of GenAI mental health tools are recognized. Particular attention will be paid to how included studies address informed consent, data privacy, transparency, and the mitigation of potential psychological harm arising from chatbot use. These aspects will be highlighted in the synthesis to inform ethical best practices for future AI-enabled mental health interventions.
Limitations
We acknowledge few anticipated limitations in this scoping review. First, the exclusion of non-English language publications may introduce language bias and limit the global comprehensiveness of the findings. Second, the expected variability across included studies in terms of study designs, reporting quality, and the varied operationalizations of key constructs such as user experience and safety may constrain the ability to directly compare results or synthesize findings quantitatively. Finally, given the rapidly evolving field of generative AI, relevant interventions may exist outside the academic literature, such as unpublished, proprietary, or inadequately described tools. This may affect the completeness of our review and highlights the need for ongoing updates as new evidence emerges. Despite these limitations, the review’s findings are expected to support the development of evidence-informed frameworks for responsible and equitable integration of generative AI in mental health interventions.
Supporting information
S1 Appendix. Initial Search for OVID MEDLINE (3/07/2024).
https://doi.org/10.1371/journal.pone.0341631.s001
(DOCX)
S2 Appendix. PRISMA-P (Preferred Reporting Items for Systematic review and Meta-Analysis Protocols) 2015 checklist*.
https://doi.org/10.1371/journal.pone.0341631.s002
(PDF)
References
- 1.
Dattani S, Rodés-Guirao L, Ritchie H, Roser M. Mental Health. https://ourworldindata.org/mental-health. 2023.
- 2. Walker ER, McGee RE, Druss BG. Mortality in mental disorders and global disease burden implications: a systematic review and meta-analysis. JAMA Psychiatry. 2015;72(4):334–41. pmid:25671328
- 3.
WHO. World mental health report: Transforming mental health for all. World Health Organization. https://www.who.int/publications-detail-redirect/9789240049338
- 4.
Mental health atlas 2017. Geneva: World Health Organization. https://www.who.int/publications-detail-redirect/9789241514019
- 5. Kuhn E, Saleem M, Klein T, Köhler C, Fuhr DC, Lahutina S, et al. Interdisciplinary perspectives on digital technologies for global mental health. PLOS Glob Public Health. 2024;4(2):e0002867. pmid:38315676
- 6. Schueller SM, Torous J. Scaling evidence-based treatments through digital mental health. Am Psychol. 2020;75(8):1093–104. pmid:33252947
- 7. Naslund JA, Aschbrenner KA, Araya R, Marsch LA, Unützer J, Patel V, et al. Digital technology for treating and preventing mental disorders in low-income and middle-income countries: a narrative review of the literature. Lancet Psychiatry. 2017;4(6):486–500. pmid:28433615
- 8. Vaidyam AN, Wisniewski H, Halamka JD, Kashavan MS, Torous JB. Chatbots and Conversational Agents in Mental Health: A Review of the Psychiatric Landscape. Can J Psychiatry. 2019;64(7):456–64. pmid:30897957
- 9. Jabir AI, Martinengo L, Lin X, Torous J, Subramaniam M, Tudor Car L. Evaluating conversational agents for mental health: scoping review of outcomes and outcome measurement instruments. J Med Internet Res. 2023;25:e44548. pmid:37074762
- 10. Lim SM, Shiau CWC, Cheng LJ, Lau Y. Chatbot-delivered psychotherapy for adults with depressive and anxiety symptoms: a systematic review and meta-regression. Behav Ther. 2022;53(2):334–47. pmid:35227408
- 11. Abd-Alrazaq AA, Rababeh A, Alajlani M, Bewick BM, Househ M. Effectiveness and safety of using chatbots to improve mental health: systematic review and meta-analysis. J Med Internet Res. 2020;22(7):e16021. pmid:32673216
- 12. He Y, Yang L, Qian C, Li T, Su Z, Zhang Q, et al. Conversational Agent Interventions for Mental Health Problems: Systematic Review and Meta-analysis of Randomized Controlled Trials. J Med Internet Res. 2023;25:e43862. pmid:37115595
- 13. Bendotti H, Lawler S, Chan GCK, Gartner C, Ireland D, Marshall HM. Conversational artificial intelligence interventions to support smoking cessation: A systematic review and meta-analysis. Digit Health. 2023;9:20552076231211634. pmid:37928336
- 14. Borghouts J, Eikey E, Mark G, De Leon C, Schueller SM, Schneider M, et al. Barriers to and Facilitators of User Engagement With Digital Mental Health Interventions: Systematic Review. J Med Internet Res. 2021;23(3):e24387. pmid:33759801
- 15. Opie JE, Vuong A, Welsh ET, Esler TB, Khan UR, Khalil H. Outcomes of best-practice guided digital mental health interventions for youth and young adults with emerging symptoms: part ii. a systematic review of user experience outcomes. Clin Child Fam Psychol Rev. 2024;27(2):476–508. pmid:38634939
- 16. Liverpool S, Mota CP, Sales CMD, Čuš A, Carletto S, Hancheva C, et al. Engaging children and young people in digital mental health interventions: systematic review of modes of delivery, facilitators, and barriers. J Med Internet Res. 2020;22(6):e16317. pmid:32442160
- 17. Hornstein S, Zantvoort K, Lueken U, Funk B, Hilbert K. Personalization strategies in digital mental health interventions: a systematic review and conceptual framework for depressive symptoms. Front Digit Health. 2023;5:1170002. pmid:37283721
- 18. Abd-Alrazaq AA, Alajlani M, Ali N, Denecke K, Bewick BM, Househ M. Perceptions and opinions of patients about mental health chatbots: scoping review. J Med Internet Res. 2021;23(1):e17828. pmid:33439133
- 19. Darcy A, Daniels J, Salinger D, Wicks P, Robinson A. Evidence of human-level bonds established with a digital conversational agent: cross-sectional, retrospective observational study. JMIR Form Res. 2021;5(5):e27868. pmid:33973854
- 20. Beatty C, Malik T, Meheli S, Sinha C. Evaluating the therapeutic alliance with a free-text CBT conversational agent (Wysa): a mixed-methods study. Front Digit Health. 2022;4:847991. pmid:35480848
- 21. Li H, Zhang R, Lee Y-C, Kraut RE, Mohr DC. Systematic review and meta-analysis of AI-based conversational agents for promoting mental health and well-being. npj Digit Med. 2023;6(1).
- 22. Miner AS, Shah N, Bullock KD, Arnow BA, Bailenson J, Hancock J. Key Considerations for Incorporating Conversational AI in Psychotherapy. Front Psychiatry. 2019;10:746. pmid:31681047
- 23.
GPT-3 powers the next generation of apps. https://openai.com/blog/gpt-3-apps. Accessed 2023 October 30.
- 24. Balcombe L. AI chatbots in digital mental health. Informatics. 2023;10(4):82.
- 25.
De Choudhury M, Pendse SR, Kumar N. Benefits and harms of large language models in digital mental health. 2023. https://doi.org/10.48550/arXiv.2311.14693
- 26. Akinrinmade AO, Adebile TM, Ezuma-Ebong C, Bolaji K, Ajufo A, Adigun AO, et al. Artificial Intelligence in Healthcare: Perception and Reality. Cureus. 2023;15(9):e45594. pmid:37868407
- 27. Denecke K, Gabarron E. The ethical aspects of integrating sentiment and emotion analysis in chatbots for depression intervention. Front Psychiatry. 2024;15:1462083. pmid:39611131
- 28. Mörch C-M, Gupta A, Mishara BL. Canada protocol: An ethical checklist for the use of artificial Intelligence in suicide prevention and mental health. Artif Intell Med. 2020;108:101934. pmid:32972663
- 29.
International Organization for Standardization. ISO 9241-210:2010(en), Ergonomics of human-system interaction — Part 210: Human-centred design for interactive systems. https://www.iso.org/obp/ui/#iso:std:iso:9241:-210:ed-1:v1:en
- 30. Obikane E, Sasaki N, Imamura K, Nozawa K, Vedanthan R, Cuijpers P, et al. Usefulness of implementation outcome scales for digital mental health (iOSDMH): experiences from six randomized controlled trials. Int J Environ Res Public Health. 2022;19(23):15792. pmid:36497867
- 31. Ma Z, Mei Y, Su Z. Understanding the benefits and challenges of using large language model-based conversational agents for mental well-being support. AMIA Annu Symp Proc. 2024;2023:1105–14. pmid:38222348
- 32. Siddals S, Torous J, Coxon A. “It happened to be the perfect thing”: experiences of generative AI chatbots for mental health. Npj Ment Health Res. 2024;3(1):48. pmid:39465310
- 33. Taher R, Hsu C-W, Hampshire C, Fialho C, Heaysman C, Stahl D, et al. The Safety of Digital Mental Health Interventions: Systematic Review and Recommendations. JMIR Ment Health. 2023;10:e47433. pmid:37812471
- 34.
WHO. Recommendations on digital interventions for health system strengthening. https://www.who.int/publications/i/item/9789241550505
- 35. Laranjo L, Dunn AG, Tong HL, Kocaballi AB, Chen J, Bashir R, et al. Conversational agents in healthcare: a systematic review. J Am Med Inform Assoc. 2018;25(9):1248–58. pmid:30010941
- 36. De Freitas J, Uğuralp AK, Oğuz-Uğuralp Z, Puntoni S. Chatbots and mental health: Insights into the safety of generative AI. J Consumer Psychol. 2023; jcpy.1393.
- 37. Jargon J. WSJ News Exclusive | How a Chatbot Went Rogue. Wall Street Journal. 2023. https://www.wsj.com/articles/how-a-chatbot-went-rogue-431ff9f9
- 38. Xiang C. ‘He would still be here’: Man dies by suicide after talking with AI chatbot, widow says. Vice. 2023. https://www.vice.com/en/article/pkadgm/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says
- 39. Montgomery B. Mother says AI chatbot led her son to kill himself in lawsuit against its maker. The Guardian. 2024. https://www.theguardian.com/technology/2024/oct/23/character-ai-chatbot-sewell-setzer-death
- 40. Daneshvar N, Pandita D, Erickson S, Snyder Sulmasy L, DeCamp M, ACP Medical Informatics Committee and the Ethics, Professionalism and Human Rights Committee. Artificial intelligence in the provision of health care: an american college of physicians policy position paper. Ann Intern Med. 2024;177(7):964–7. pmid:38830215
- 41. Guo Z, Lai A, Thygesen JH, Farrington J, Keen T, Li K. Large Language Models for Mental Health Applications: Systematic Review. JMIR Ment Health. 2024;11:e57400. pmid:39423368
- 42.
Aromataris E, Lockwood C, Porritt K, Pilla B, Jordan Z. JBI Manual for Evidence Synthesis - JBI Global Wiki. https://jbi-global-wiki.refined.site/space/MANUAL. Accessed 2024 June 30.
- 43. Tricco AC, Lillie E, Zarin W, O’Brien KK, Colquhoun H, Levac D, et al. PRISMA extension for scoping reviews (PRISMA-ScR): checklist and explanation. Ann Intern Med. 2018;169(7):467–73. pmid:30178033
- 44.
Covidence. Covidence - Better systematic review management. https://www.covidence.org/. Accessed 2024 July 1.
- 45.
NVivo: Leading Qualitative Data Analysis Software. https://lumivero.com/products/nvivo/. Accessed 2025 April 19.
- 46. Braun V, Clarke V. Using thematic analysis in psychology. Qualitative Res Psychol. 2006;3(2):77–101.