Figures
Abstract
Radiology reports are an integral part of patient medical records; however, these reports often contain complex medical terminology that are difficult for patients to comprehend, potentially leading to anxiety, misunderstanding, and misinterpretation. The development of user-friendly instruments to improve understanding is thus critically important to enhance health literacy and empower patients. In this study, we introduce a novel artificial intelligence (AI) interface, the Rads-Lit Tool, which can simplify radiology reports for patients using natural language processing (NLP) techniques. This manuscript presents the development process, methodology, and results of the Rads-Lit Tool, demonstrating its potential to simplify radiology reports across various examination types and complexity levels. Our findings highlight that patient-facing AI-driven tools can enhance patient health literacy and foster improved patient-provider communication in radiology.
Citation: Doshi RH, Amin K, Chan SM, Kaur M, Bajaj SS, Khosla P, et al. (2025) Development, optimization, and preliminary evaluation of a novel artificial intelligence tool to promote patient health literacy in radiology reports: The Rads-Lit tool. PLoS One 20(9): e0331368. https://doi.org/10.1371/journal.pone.0331368
Editor: Aloysius Gonzaga Mubuuke, Makere University College of Health Sciences, UGANDA
Received: January 15, 2025; Accepted: August 13, 2025; Published: September 3, 2025
Copyright: © 2025 Doshi et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and the figures (in the main text and supplementary files).
Funding: The author(s) received no specific funding for this work.
Competing interests: NO authors have competing interests.
1. Introduction
A cornerstone of modern medicine, imaging has revolutionized diagnosis, treatment planning, and disease monitoring with radiology reports an indispensable component of patient medical records. Traditionally, these reports have been accessible only to radiologists and the referring providers, who then interpret findings for patients [1]. However, the rise of electronic portals has led to an increasing number of patients directly accessing their medical information, with the 21st Century Cures Act mandating access to all parts of the electronic health record (EHR) [2]. While this shift empowers patients to play a more active role in their care, the complexity of medical jargon and acronyms in radiology reports often leads to significant confusion, anxiety, and potential misinterpretation [3,4]. Simply looking up individual terms often fails to provide a cohesive understanding of the report’s narrative and its implications. This underscores a critical need for tools that can translate the entire report findings into plain language, thereby enhancing patient comprehension and facilitating more meaningful patient-provider communication Moreover, advanced imaging modalities continue to evolve rapidly, reinforcing the need for clear, transparent patient-facing reports.
Correspondingly, there has been growing attention toward health literacy and its associations with patient engagement, treatment adherence, and care disparities [5]. Advanced natural language processing (NLP), in turn, has emerged as one promising tool to help bridge the gap between complex medical information and ease of understanding [6–8]. In radiology in particular, NLP techniques have been used in everything from observation detection for diagnostic surveillance to quality assessment to clinical support services [9,10]. A few studies have sought to use NLP to simplify radiology reports, namely through linking terms to the consumer health vocabulary system [11] and the French lexical network [12].
With the rise of large language models (LLMs), OpenAI’s ChatGPT, Google Bard, and Microsoft Bing have also been explored for simplifying radiology reports [13,14]. The main advantage of these tools is their accessibility and comprehensiveness — freely available to anyone with an internet connection and able to simplify an entire radiology report, rather than simply appending a summary, linking to a glossary, or altering the structure. However, there are limitations. While general-purpose LLMs like ChatGPT are accessible, their effectiveness for specialized tasks like simplifying radiology reports heavily depends on user-crafted prompts, which can lead to variable quality and reliability [15,16]. Patients may not possess the expertise to formulate optimal prompts. Rads-Lit addresses this by embedding a systematically optimized prompt within a user-friendly interface, specifically designed for radiology reports. This aims to provide more consistent, reliable, and appropriately simplified outputs compared to ad-hoc use of general LLMs, thereby offering a more dependable solution for enhancing patient health literacy in this domain.
We have used a variety of prompts to assess these LLMs for accuracy and fidelity in simplifying radiology reports [15,16]; however, we have this data only for specific prompts and remain unsure if the LLMs would be inaccurate or inadequate if asked different prompts. Given the infinite possibility of prompts and the corresponding variable quality of responses, patients may not be able to take full advantage of these chatbots to improve their own health literacy. While general LLMs show promise, their direct application by patients for simplifying complex medical texts like radiology reports is fraught with challenges, including prompt variability and inconsistent output quality [13–16]. This highlights a critical gap: the need for specialized, optimized tools that can reliably simplify radiology reports to an appropriate health literacy level while maintaining clinical accuracy. Therefore, this study aimed to address the following research questions:
1.) Can a systematic prompt engineering process identify an optimal LLM prompt to simplify radiology reports to a target 5th-7th grade reading level?
2.) How does a specialized tool (Rads-Lit), utilizing an optimized prompt, perform in terms of readability improvement across various imaging modalities compared to original reports and a basic simplification prompt?
3.) What is the accuracy, completeness, and perceived safety (by radiologists) of the simplified reports generated by such a tool?
We present the development, optimization, and preliminary evaluation of the Rads-Lit Tool, an AI interface designed to address these questions. This work includes a novel methodology for LLM prompt assessment for health literacy, aiming to empower patients with understandable medical information, thereby fostering improved patient-provider communication and informed decision-making.“
2. Methods
2.1. Development of interface
We developed a proof-of-concept user interface for patients to input their radiology findings and receive a simplified version of their findings (http://radiologyliteracy.org/), utilizing OpenAI’s Davinci application programming interface (API). We have undergone an optimization process, as detailed below, to simplify patients’ imaging report at a reading level recommended by the American Medical Association and National Institutes of Health while maintaining accuracy [17,18].
2.2. Dataset selection and modification
We sourced a random selection of 750 radiology reports across diverse examination types (150 MRI, CT, US [ultrasound], X-ray, and Mammogram each) from the MIMIC-III database, a comprehensive dataset available from Beth Israel Deaconess Medical Center [19,20]. (13, 19, 20, 25) A random sub-selection of 25 reports was initially chosen to test our prompts. Redacted physician names in the reports were changed to “Dr. Smith” and redacted dates were changed to “prior.” As this study used only de-identified, publicly available data, our specific analysis was deemed exempt from further institutional IRB review given our institution’s IRB involves analysis of publicly available de-identified data. Given the de-identified nature of the data, patient consent for this specific retrospective study was waived.
2.3. Readability scores
To assess readability, we used the validated Gunning Fog (GF), Flesch-Kincaid Grade Level (FK), Automated Readability Index (ARI), and Coleman-Liau (CL) indices, as previous studies have done [16,21–23]. Each of these reading indices output a value related to a reading grade level (RGL) (i.e., output of 7 represents the 7th grade reading level.) Further, in line with previous studies, we averaged the scores across all 4 indices to get an average RGL, aRGL [24].
2.4. Prompt engineering and optimization
Our prompt engineering involved a multi-stage iterative process (Fig 1). Stage 1 began with identifying five core simplifying stems (‘simplify’, ‘explain’, etc.) via a Delphi technique, tested with two modifiers for 15 initial prompts. Stage 2 focused on prompts that achieved below the median Stage 1 aRGL, adding grade level specifications and contextual phrases (e.g., ‘so I can understand’), resulting in 56 further prompts. Stage 3 took the top four stem/grade combinations and added two persona-based contexts (‘I am a patient,’ ‘you are a health literacy tool’) for 8 additional prompts. The full list of 79 prompts is detailed in S1 Fig. The five best-performing prompts (lowest median aRGLs) from these stages were then extensively tested on 750 reports, alongside a basic ‘simplify’ prompt, to select the final prompt for the Rads-Lit tool, again using a Delphi method for the final choice based on readability, fidelity and accuracy.
2.5. Accuracy, completeness, and comprehension of Rads-lit tool
After selecting the best-performing prompt, three radiologists—2 attendings and 1 resident—assessed 62 of these reports and the corresponding simplified. Specifically, these radiologists evaluated the output for accuracy, completeness, and extraneous information using single-item Likert scales, ranging from 1 (Strongly Disagree, 0–20% agreement) to 5 (Strongly Agree, 80–100% agreement).
3. Results
The prompt optimization process is summarized in Fig 1 and Table 1.
After determining the 5 best-performing prompts in the pilot analysis across Stages 1–3 (Table 1), we re-calculated the average readability scores of these prompts across 750 radiology reports (Table 2). These prompts, namely Prompts A-E, demonstrated significantly improved readability scores compared to the original radiology report and the common prompt, “Simplify:” across all four readability indexes tested (p < 0.0001) (Table 1, S2 Fig). The final five prompts had comparable, reading score differences, with aRGLs of 6.1 to 6.3, and were not statistically significant.
Given that were no statistical differences between the five prompts, we utilized the Delphi method to decide our prompt. 2 reviewers blindly assessed 5 outputs from each of the 5 prompt on fidelity and accuracy and chose prompt D, “You are a health literacy tool. Interpret this radiology finding at the 5th grade level:”. Despite not having the lowest readability score, we chose this prompt because it best retained fidelity while clearly defined the role of the LLM and provided a goal grade level (Table 2).
In simplifying 750 radiology reports, our selected prompt for our tool (utilizing Prompt D) shows that all four readability indices tested showed statistically significant improvement in readability scores of findings compared to raw radiologist reports (N = 750, p < 0.0001) (Table 1, Fig 2, S3 Fig). The chosen prompt for our interface produced output with a median of 55 [38–72.5] words.
*, **, ***, **** correspond to p < 0.05, p < 0.01, p < 0.001, and p < 0.0001, respectively. aRGLs of the the final prompt and report were used.
Across all imaging types, our tool (utilizing prompt D) simplified radiology reports from an aRGL of 13.7 to an aRGL of 6.2 (Fig 2). CT radiology reports had an aRGL of 13.9, and our tool provided simplified outputs with an aRGL of 6.3. This simplification held across different examination types, including US (13.6 to 6.2), Mammogram (13.2 to 5.9), MRI (13.7 to 6.4), and X-ray (14.1 to 6.3; Table 1).
With the optimized prompt, we assessed our tool for accuracy, completeness of information, inclusion of beneficial supplementary information not found, appropriate urgency, and comfort providing output without supervision using a 5-level likert-type evaluation (Fig 3). For accuracy of the information, both radiologists predominantly found most simplified reports to be free of inaccuracies or misleading details, with combined agreement or strong agreement recorded 255 times out of 300 outputs (85%). However, the radiologists indicated disagreement or strong disagreement with this sentiment for 24 outputs (8%). Regarding the inclusion of all pertinent or actionable details from the original reports in the simplified versions, 249 outputs (83%) conveyed agreement or strong agreement. They expressed disagreement or strong disagreement with 18 (6%) outputs. The provision of beneficial supplementary information in the simplified reports not found in the original impression was found in 81 outputs (27%), while 189 outputs (63%) disagreed or strongly disagreed. On the matter of comfortably sharing the simplified reports directly with patients without additional oversight, the radiologists felt expressed agreement or strong agreement to comfortably sharing reports without additional supervision in 230 outputs (76.7%), while they expressed reservations for 39 outputs (13%). The radiologists felt, in 261 of the outputs (87%), that the simplified reports adeptly communicated the required urgency. Disagreement or strong disagreement was recorded in only 8 instances (2.7%).
4. Discussion
Our proof-of-concept study presents the development and evaluation of the Rads-Lit patient interface. Through an iterative comparative assessment, we evaluated 79 distinct prompts to determine which five best simplified radiologist reports, achieving the lowest RGLs, relative to the common prompt (“Simplify”). Ultimately, we selected the prompt: “You are a health literacy tool. Interpret this radiology finding at the 5th grade level” for further testing based on our team’s previous study on the importance of context for OpenAI’s LLM [25]. Evaluations by 2 radiologists confirmed that the output from this prompt was accurate 85% of the time with little to no extraneous content relative to the original radiology report. While our findings suggest there is room for model refinement, they also underscore the importance of balancing the clarity of simplified reports with preserving nuanced clinical details integral to patient care. Importantly, simplified language does not equate to clinical guidance. Without proper framing, patients may misinterpret these outputs as standalone diagnostic conclusions.
Given the critical need to promote radiological literacy, approaches such as summary statements, [26] language glossaries, [27] structured templates with standardized lexicon [28,29], video radiology reports [30,31], and listing the radiologist’s phone number have been proposed [32]. This study highlights the utility of using NLP and LLMs to simplify radiology reports, as well as several areas of further research. For example, radiologists’ opinions diverged regarding the inclusion of supplementary information, with only 27% of the outputs found beneficial and a notable 63% disagreement, suggesting that a one-size-fits-all approach may not be ideal. This variability reinforces the need for clear communication that simplified reports are meant to enhance, rather than replace clinical consultation. Some patients might benefit from additional context, while others might find this supplementary information extraneous or confusing. Additionally, the collective comfort of the radiologists in sharing 76.7% of the simplified reports directly with patients indicates the tool’s potential in promoting patient autonomy and comprehension. Still, given the novelty of this technology and risk of hallucinations [33,34], it is unlikely that radiologists would trust the Rads-Lit patient interface to operate autonomously. Indeed, in a previous single-center survey, 76.9% of radiologists reported that they would not support AI-generated simplifications of reports without a manual check [35]. A safety net, possibly in the form of a reviewing radiologist, will likely still be necessary. Hospitals and health systems must consider embedding controls within such tools, such as requiring clinician review before release, clear disclaimers on patient-facing outputs, and integration with the electronic health record to ensure traceability. These institutional guardrails are critical to prevent misuse, including inappropriate self-management or misinformed clinical decision-making by patients.
Patients are already using LLMs like OpenAI’s ChatGPT, Google Bard, and Microsoft Bing to better understand their medical care; however, given the infinite possibility of prompts, most people are not providing additional context that they are a patient, that the chatbot should act as a health literacy tool, or that simplification should happen at a specific grade level [36–38]. Indeed, our previous work has shown that the prompt “Simplify” on ChatGPT-3.5 consistently produces excessively complicated outputs for the average American’s 7th grade reading level [13,17,18]. Other research from our group suggests that the level of simplification differs based on racial context: Open AI’s ChatGPT-3.5 and ChatGPT-4 simplified radiology reports at a higher reading level for those who self-identified as White or Asian, when compared to those who self-identified as Black or American Indian/Alaska Native [39]. These findings raise important concerns about equity and implicit bias in LLM-generated outputs. While the reasons for these disparities remain unclear, they may reflect systemic inequities embedded in the training data or variation in how different demographic identifiers interact with the model [40].
Some small studies have explored the accuracy of LLM-simplified radiology reports with variable results [41]. For example, using 3 fictitious radiology reports, Jeblick et al. created 15 simplified reports and found that one-third of these reports had incorrect statements, missing key medical information, or potentially misleading passages [42]. Tepe et al. analyzed 30 simplified radiology reports and found that their readability and understandability was significantly improved, although their accuracy in assessing the urgency of medical conditions was inadequate [43]. For contrast, analysis of simplified reports from 20 cardiovascular MRIs and 60 shoulder, knee, and lumbar spine MRIs found that GPT-4 produced highly accurate reports with minimal confusing or inaccurate output [44,45]. Our group has similarly shown that, in analysis of 150 mammography, X-ray, CT, MRI, and ultrasound scans, radiologists found that 83–86% of radiology reports simplified by GPT 3.5 and 4 had no errors and all essential information [16]. However, to our knowledge, there has been no previous work that has comprehensively assessed the accuracy, completeness of information, appropriate urgency, and comfort providing output without supervision, especially in such a large sample size with 750 radiology reports across diverse examination types. Our results suggest the import of implementing standard prompts and guidelines for LLM-based patient education in order to maximize the utility of these tools and to improve equity in health communication.
Our proof-of-concept tool explores the capabilities of LLMs and suggests that these tools can be safely incorporated into radiology practice. With a median count of 55 words in this study, a radiologist should be able to review the simplified output quickly, but the benefits to patients must be weighed against these disruptions to providers, as RVUs may not accommodate for this additional review time and a separate billing code is unlikely. An example of an implementation of such a technology is in a speech recognition and reporting platform, where LLMs and NLP could readily generate a simplified summary, which the radiologist can proof, and if needed, edit. Or, this technology could be implemented in EHRs allowing a radiologist to review simplified output before signing a report. There is evidence a simplified report or summary may benefit patients and improve patient satisfaction scores, but adoption by providers and impact on providers must also be assessed. Importantly, we must address concerns about the unintended consequences. What if, for instance, these simplified reports lead to heightened anxiety? And does it truly offer time-saving benefits to referring physicians when they discuss findings with their patients? Although our work is foundational, it’s essential to be proactive in our discussion and think about the potential implications [46].
This study is not without its limitations. For one, we relied primarily on readability metrics to guide our prompt engineering. Although we utilized the Delphi method to look at the outputs of the 5 prompts with the lowest readability scores to identify the best prompt and ensure that the prompts retained fidelity and accuracy, we did not use accuracy to identify our prompt until that point. Second, the readability metrics used in this study are language and structure-focused, so these measures may not necessarily capture comprehensibility from a medical perspective. Additionally, readability metrics may not adequately capture patient literacy needs. This highlights the need for further research to refine the interface and ensure its applicability across diverse patient populations and radiology subspecialties, as well as explore the potential for personalized approaches to simplifying radiology reports, considering individual patient characteristics, such as age, education, and prior medical knowledge. Moreover, future iterations must prioritize clear guardrails and user education to ensure that patients understand simplified reports are adjunctive, not directive, and that clinical follow-up remains essential. Finally, we have utilized 2 attendings and a resident with the likert scale to assess these prompts when in reality the incorporation of patient feedback and physician input on the simplified reports is crucial to evaluate the tool’s real-world usability and its potential for enhancing patient-provider communication. Future multi-site studies would validate these readability metrics in real-world settings, ensuring their reliability and relevance to patient outcomes.
It is crucial to reiterate that the Rads-Lit Tool is designed as a health literacy aid to simplify existing radiology report text, not as a diagnostic or clinical decision support system. As rightly noted by challenges in the broader AI field, these models, including the one underpinning Rads-Lit, currently lack the ‘Gestalt’ understanding required to interpret findings within the full, complex clinical context of an individual patient, especially concerning comorbidities or the relative intensity of multiple pathologies. While the tool improves access to comprehension, it does not obviate the need for follow-up with a referring clinician. Misinterpretation, such as assuming a non-urgent finding requires no action or self-treating based on the simplified language, poses real risks if safeguards are not in place. Our tool does not attempt this; its purpose is solely to make the concluded report more comprehensible to patients after it has been finalized by a radiologist.
We believe that this proof-of-concept tool can help begin a discussion of how to utilize such LLMs for patient-centered care in radiology. In our tool, we also showcase preliminary features (that remain untested) to allow patients to press “explain more” if they don’t understand or want to learn more about a particular sentence of the generated output and allow physicians to edit responses if they are using the tool to provide patients a better understanding of their report findings (Fig 4). As LLM solutions become more common, discussion regarding the implementation of such tools is of utmost importance.
In conclusion, this study outlines the development and preliminary evaluation of Rads-Lit, demonstrating that an AI tool with optimized prompting can significantly improve the readability of radiology reports. Our findings suggest its potential as a specialty-specific health literacy aid, shifting the onus of prompt engineering from the patient to a systematically designed interface (Fig 4). While improved patient-provider communication and patient-centered care are key goals, this study represents an initial step. Extensive further research, including direct patient feedback, validation across diverse populations and clinical settings, and assessment of real-world impact on patient understanding and outcomes, is crucial before such tools can be broadly incorporated into clinical practice [47]. Most importantly, any implementation of such tools must prioritize patient safety by incorporating robust safeguards to prevent misinterpretation and inappropriate self-management, ensuring that simplified reports enhance rather than replace essential clinical relationships. Continued collaboration among patients, clinicians, developers, and policymakers will be essential to responsibly harness this technology to support, rather than replace, human-centered care.
Supporting information
S1 Fig. All prompts tested are depicted.
Best prompts after each stage are denoted.
https://doi.org/10.1371/journal.pone.0331368.s001
(PNG)
S2 Fig. 5 best prompts compared to a basic prompt “simplify.
” *, **, ***, **** correspond to p < 0.05, p < 0.01, p < 0.001, and p < 0.0001, respectively. Dashed line depicts 8th grade level.
https://doi.org/10.1371/journal.pone.0331368.s002
(PNG)
S3 Fig. Rads lit vs radiologist report.
*, **, ***, **** correspond to p < 0.05, p < 0.01, p < 0.001, and p < 0.0001, respectively.
https://doi.org/10.1371/journal.pone.0331368.s003
(PNG)
Acknowledgments
We thank Yale Department of Radiology for their support. We thank Arav Doshi for his help with data visualization and analysis.
References
- 1. Olthof AW, de Groot JC, Zorgdrager AN, Callenbach PMC, van Ooijen PMA. Perception of radiology reporting efficacy by neurologists in general and university hospitals. Clin Radiol. 2018;73(7):675.e1–675.e7. pmid:29622361
- 2. Provider Obligations For Patient Portals Under The 21st Century Cures Act. Forefront Group. Health Affairs (Project Hope). 2022.
- 3. Bruno B, Steele S, Carbone J, Schneider K, Posk L, Rose SL. Informed or anxious: patient preferences for release of test results of increasing sensitivity on electronic patient portals. Health Technol (Berl). 2022;12(1):59–67. pmid:35036280
- 4. Mehan WA Jr, Gee MS, Egan N, Jones PE, Brink JA, Hirsch JA. Immediate Radiology Report Access: A Burden to the Ordering Provider. Curr Probl Diagn Radiol. 2022;51(5):712–6. pmid:35193795
- 5. Berkman ND, Sheridan SL, Donahue KE. Health literacy interventions and outcomes: an updated systematic review. Evid Rep Technol Assess (Full Rep). 2011;199:1–941.
- 6. Doshi RH, Bajaj SS, Krumholz HM. ChatGPT: Temptations of Progress. Am J Bioeth. 2023;23(4):6–8. pmid:36853242
- 7. Sabet CJ, Bajaj SS, Stanford FC, Celi LA. Equity in Scientific Publishing: Can Artificial Intelligence Transform the Peer Review Process? Mayo Clin Proc Digit Health. 2023;1(4):596–600. pmid:40206303
- 8. Patel AV, Jasani S, AlAshqar A, Doshi RH, Amin K, Panakam A, et al. Comparative Evaluation of Artificial Intelligence Models for Contraceptive Counseling. Digital. 2025;5(2):10.
- 9. Amin K, Khosla P, Doshi R, Chheang S, Forman HP. Artificial Intelligence to Improve Patient Understanding of Radiology Reports. Yale J Biol Med. 2023;96(3):407–17. pmid:37780992
- 10. Lakhani P, Langlotz CP. Automated detection of radiology reports that document non-routine communication of critical or significant results. J Digit Imaging. 2010;23(6):647–57. pmid:19826871
- 11. Qenam B, Kim TY, Carroll MJ, Hogarth M. Text Simplification Using Consumer Health Vocabulary to Generate Patient-Centered Radiology Reporting: Translation and Evaluation. J Med Internet Res. 2017;19(12):e417. pmid:29254915
- 12. Ramadier L, Lafourcade M. Radiological Text Simplification Using a General Knowledge Base. In: Gelbukh A, ed. Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science. Springer International Publishing; 2018:617–27.
- 13. Doshi R, Amin K, Khosla P, Bajaj S, Chheang S, Forman HP. Utilizing Large Language Models to Simplify Radiology Reports: a comparative analysis of ChatGPT3.5, ChatGPT4.0, Google Bard, and Microsoft Bing. Cold Spring Harbor Laboratory. 2023.
- 14. Sun Z, Ong H, Kennedy P, Tang L, Chen S, Elias J, et al. Evaluating GPT4 on Impressions Generation in Radiology Reports. Radiology. 2023;307(5):e231259. pmid:37367439
- 15. Amin KS, Mayes L, Khosla P, Doshi R. ChatGPT-3.5, ChatGPT-4, Google Bard, and Microsoft Bing to improve health literacy and communication in pediatric populations and beyond. 2023.
- 16. Amin KS, Davis MA, Doshi R, Haims AH, Khosla P, Forman HP. Accuracy of ChatGPT, Google Bard, and Microsoft Bing for Simplifying Radiology Reports. Radiology. 2023;309(2):e232561. pmid:37987662
- 17.
Weiss BD. Health Literacy and Patient Safety: Help Patients Understand: Manual for Clinicians. AMA Foundation. 2007.
- 18.
How to Write Easy-to-Read Health Materials. U.S. National Library of Medicine. 2016. https://www.nlm.nih.gov/medlineplus/etr.html
- 19.
Johnson A, Pollard T, Mark R. MIMIC-III Clinical Database. 2016. https://doi.org/10.13026/C2XW26
- 20. Johnson AEW, Pollard TJ, Shen L, Lehman L-WH, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. 2016;3:160035. pmid:27219127
- 21. Chen W, Durkin C, Huang Y, Adler B, Rust S, Lin S. Simplified Readability Metric Drives Improvement of Radiology Reports: an Experiment on Ultrasound Reports at a Pediatric Hospital. J Digit Imaging. 2017;30(6):710–7. pmid:28484918
- 22. Amin KS, Mayes LC, Khosla P, Doshi RH. Assessing the Efficacy of Large Language Models in Health Literacy: A Comprehensive Cross-Sectional Study. Yale J Biol Med. 2024;97(1):17–27. pmid:38559461
- 23. Doshi RH, Amin KS, Kapadia S, McKenzie P, Preda-Naumescu A, Tkachenko E, et al. Characteristics of information on inflammatory skin diseases produced by four large language models. Int J Dermatol. 2025;64(4):773–5. pmid:39523532
- 24. Pearson K, Ngo S, Ekpo E, Sarraju A, Baird G, Knowles J, et al. Online Patient Education Materials Related to Lipoprotein(a): Readability Assessment. J Med Internet Res. 2022;24(1):e31284. pmid:35014955
- 25. Doshi R, Amin KS, Khosla P, Bajaj SS, Chheang S, Forman HP. Quantitative Evaluation of Large Language Models to Streamline Radiology Report Impressions: A Multimodal Retrospective Analysis. Radiology. 2024;310(3):e231593. pmid:38530171
- 26. Kadom N, Tamasi S, Vey BL, Safdar N, Applegate KE, Sadigh G, et al. Info-RADS: Adding a Message for Patients in Radiology Reports. J Am Coll Radiol. 2021;18(1 Pt A):128–32. pmid:33068534
- 27. Martin-Carreras T, Kahn CE Jr. Coverage and Readability of Information Resources to Help Patients Understand Radiology Reports. J Am Coll Radiol. 2018;15(12):1681–6. pmid:29310924
- 28. Panicek DM, Hricak H. How Sure Are You, Doctor? A Standardized Lexicon to Describe the Radiologist’s Level of Certainty. AJR Am J Roentgenol. 2016;207(1):2–3. pmid:27065212
- 29. Vincoff NS, Barish MA, Grimaldi G. The patient-friendly radiology report: history, evolution, challenges and opportunities. Clin Imaging. 2022;89:128–35. pmid:35803159
- 30. Cook TS, Oh SC, Kahn CE Jr. Patients’ Use and Evaluation of an Online System to Annotate Radiology Reports with Lay Language Definitions. Acad Radiol. 2017;24(9):1169–74. pmid:28433519
- 31. Recht MP, Westerhoff M, Doshi AM, Young M, Ostrow D, Swahn D-M, et al. Video Radiology Reports: A Valuable Tool to Improve Patient-Centered Radiology. AJR Am J Roentgenol. 2022;219(3):509–19. pmid:35441532
- 32. Cross NM, Wildenberg J, Liao G, Novak S, Bevilacqua T, Chen J, et al. The voice of the radiologist: Enabling patients to speak directly to radiologists. Clin Imaging. 2020;61:84–9. pmid:31986355
- 33. Sallam M, Al-Mahzoum K, Alaraji H, Albayati N, Alenzei S, AlFarhan F, et al. Apprehension Toward Generative Artificial Intelligence in Healthcare: A Multinational Study among Health Sciences Students. MDPI AG. 2024.
- 34. Kim Y, Jeong H, Chen S, Li SS, Lu M, Alhamoud K, et al. Medical Hallucination in Foundation Models and Their Impact on Healthcare. Cold Spring Harbor Laboratory. 2025.
- 35. Amin KS, Davis MA, Naderi A, Forman HP. Release of complex imaging reports to patients, do radiologists trust AI to help? Curr Probl Diagn Radiol. 2025;54(2):147–50. pmid:39676024
- 36. Amin K, Doshi R, Forman HP. Large language models as a source of health information: Are they patient-centered? A longitudinal analysis. Healthc (Amst). 2024;12(1):100731. pmid:38141269
- 37. Tandar CE, Bajaj SS, Stanford FC. Social Media and Artificial Intelligence-Understanding Medical Misinformation Through Snapchat’s New Artificial Intelligence Chatbot. Mayo Clin Proc Digit Health. 2024;2(2):252–4. pmid:38962215
- 38. Stokel-Walker C. How patients are using AI. BMJ. 2024;387:q2393. pmid:39562011
- 39. Amin KS, Forman HP, Davis MA. Even with ChatGPT, race matters. Clin Imaging. 2024;109:110113. pmid:38552383
- 40. Jain B, Doshi R, Nundy S. Leveraging Artificial Intelligence to Advance Health Equity in America’s Safety Net. J Gen Intern Med. 2025. pmid:40375041
- 41. Parillo M, Vaccarino F, Beomonte Zobel B, Mallio CA. ChatGPT and radiology report: potential applications and limitations. Radiol Med. 2024;129(12):1849–63. pmid:39508933
- 42.
Jeblick K, Schachtner B, Dexl J. ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports. 2022.
- 43. Tepe M, Emekli E. Decoding medical jargon: The use of AI language models (ChatGPT-4, BARD, microsoft copilot) in radiology reports. Patient Educ Couns. 2024;126:108307. pmid:38743965
- 44. Salam B, Kravchenko D, Nowak S, Sprinkart AM, Weinhold L, Odenthal A, et al. Generative Pre-trained Transformer 4 makes cardiovascular magnetic resonance reports easy to understand. J Cardiovasc Magn Reson. 2024;26(1):101035. pmid:38460841
- 45. Kuckelman IJ, Wetley K, Yi PH, Ross AB. Translating musculoskeletal radiology reports into patient-friendly summaries using ChatGPT-4. Skeletal Radiol. 2024;53(8):1621–4. pmid:38270616
- 46. Khosla P, Amin K, Doshi R. Combating Chronic Disease with Barbershop Health Interventions: A Review of Current Knowledge and Potential for Big Data. Yale J Biol Med. 2024;97(2):239–45. pmid:38947107
- 47. Jain P, Jain B, Doshi R, Jain U, Claypoof H, Aboulatta A, et al. Digital Health: An Opportunity to Advance Health Equity for People with Disabilities. Milbank Quarterly. 2025.