Assessment of ChatGPT-generated medical Arabic responses for patients with metabolic dysfunction–associated steatotic liver disease

Saleh A. Alqahtani; Reem S. AlAhmed; Waleed S. AlOmaim; Saad Alghamdi; Waleed Al-Hamoudi; Khalid Ibrahim Bzeizi; Ali Albenmousa; Alessio Aghemo; Nicola Pugliese; Cesare Hassan; Faisal A. Abaalkhail

doi:10.1371/journal.pone.0317929

Abstract

Background and aim

Artificial intelligence (AI)-powered chatbots, such as Chat Generative Pretrained Transformer (ChatGPT), have shown promising results in healthcare settings. These tools can help patients obtain real-time responses to queries, ensuring immediate access to relevant information. The study aimed to explore the potential use of ChatGPT-generated medical Arabic responses for patients with metabolic dysfunction–associated steatotic liver disease (MASLD).

Methods

An English patient questionnaire on MASLD was translated to Arabic. The Arabic questions were then entered into ChatGPT 3.5 on November 12, 2023. The responses were evaluated for accuracy, completeness, and comprehensibility by 10 Saudi MASLD experts who were native Arabic speakers. Likert scales were used to evaluate: 1) Accuracy, 2) Completeness, and 3) Comprehensibility. The questions were grouped into 3 domains: (1) Specialist referral, (2) Lifestyle, and (3) Physical activity.

Results

Accuracy mean score was 4.9 ± 0.94 on a 6-point Likert scale corresponding to “Nearly all correct.” Kendall’s coefficient of concordance (KCC) ranged from 0.025 to 0.649, with a mean of 0.28, indicating moderate agreement between all 10 experts. Mean completeness score was 2.4 ± 0.53 on a 3-point Likert scale corresponding to “Comprehensive” (KCC: 0.03–0.553; mean: 0.22). Comprehensibility mean score was 2.74 ± 0.52 on a 3-point Likert scale, which indicates the responses were “Easy to understand” (KCC: 0.00–0.447; mean: 0.25).

Conclusion

MASLD experts found that ChatGPT responses were accurate, complete, and comprehensible. The results support the increasing trend of leveraging the power of AI chatbots to revolutionize the dissemination of information for patients with MASLD. However, many AI-powered chatbots require further enhancement of scientific content to avoid the risks of circulating medical misinformation.

Citation: Alqahtani SA, AlAhmed RS, AlOmaim WS, Alghamdi S, Al-Hamoudi W, Bzeizi KI, et al. (2025) Assessment of ChatGPT-generated medical Arabic responses for patients with metabolic dysfunction–associated steatotic liver disease. PLoS ONE 20(2): e0317929. https://doi.org/10.1371/journal.pone.0317929

Editor: Anna Di Sessa, Universita degli Studi della Campania Luigi Vanvitelli Scuola di Medicina e Chirurgia, ITALY

Received: October 25, 2024; Accepted: January 7, 2025; Published: February 3, 2025

Copyright: © 2025 Alqahtani et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the article and its Supporting information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Metabolic dysfunction–associated steatotic liver disease (MASLD), formerly known as non-alcoholic fatty liver disease (NAFLD), is a global health concern, closely linked to the obesity epidemic and sedentary lifestyles [1, 2]. MASLD involves a full spectrum of conditions resulting from metabolic imbalances, such as metabolic dysfunction-associated steatohepatitis (MASH), previously called non-alcoholic steatohepatitis (NASH) [1, 3]. MASLD and MASH pose enormous financial and health burdens across countries, including those in the Arabic-speaking world [4–6]. Early detection and treatment of MASLD are crucial to prevent the progression of more severe stages like cirrhosis and hepatocellular carcinoma [1, 7]. However, barriers to healthcare access and patient literacy may create challenges in managing this condition effectively [8, 9].

In the digital age, artificial intelligence (AI) applications in healthcare offer innovative solutions to such challenges. Chatbots powered by advanced AI models, like the Chat Generative Pretrained Transformer (ChatGPT), can supplement patient education and engagement outside the clinical setting [10]. With their ability to process and produce human-like text, these AI tools can deliver instant, reliable medical information and support, potentially transforming patient self-management practices [11, 12].

From a previous study aimed to determine ChatGPT’s effectiveness in answering patient inquiries concerning MASLD and associated lifestyle factors, findings indicated that ChatGPT delivered accurate (mean score of 4.84 on a 6-point Likert scale), comprehensive (mean score of 2.08 on a 3-point scale), and easy to understand (mean score of 2.87 on a 3-point scale) responses. Nonetheless, it is noteworthy that the variability in ChatGPT’s responses may be attributed to factors such as the training dataset, context, and language [13].

Despite the promise of AI-powered interventions, their effectiveness for Arabic-speaking patients with MASLD remains underexplored. We aimed to explore the potential use of ChatGPT in generating medical responses in Arabic for patients with MASLD, assessing its accuracy, reliability, and comprehensiveness as an informative resource.

Materials and methods

A cross-sectional study assessed the effectiveness of ChatGPT in providing medical responses to Arabic-speaking patients with MASLD. The process followed three main steps: 1) A validated English-language patient questionnaire on MASLD [13], was translated into Arabic by the MASLD experts and an independent researcher, ensuring linguistic and contextual accuracy from a patient perspective; 2) The translated questions were then entered separately into ChatGPT 3.5 on November 12, 2023, simulating a realistic scenario where a patient seeks information regarding MASLD; and 3) Ten MASLD experts from Saudi Arabia, who were native Arabic speakers and fluent in English, independently evaluated the AI-generated responses. The data was collected from 01/31/2024 through 02/10/2024. For the survey and questionnaire, we primarily used Classical Arabic, which is the standard for formal and business writing, ensuring a common linguistic framework across diverse Arabic-speaking populations.

Three domains were assessed using respective Likert scales: 1) Accuracy: Responses were rated on a 6-point Likert scale ranging from ’Completely incorrect’ to ’Correct’; 2) Completeness: A 3-point Likert scale was utilized, categorizing responses as ’Incomplete’, ’Adequate’, or ’Comprehensive’; and 3) Comprehensibility: The intelligibility of responses was determined using a 3-point Likert scale marked by ’Difficult’, ’Partly difficult’, and ’Easy to understand’.

An additional open-ended question was integrated into the Arabic questionnaire to gather detailed feedback and expert commentary on the AI-generated response quality. This structured evaluation method aimed to capture the nuanced perspectives of clinical experts regarding the application of ChatGPT in patient education and its potential role in improving MASLD patient care within Arabic-speaking populations.

Statistical analysis

The data was analyzed using the Statistical Package for Social Sciences (SPSS), version 28 (IBM Corp., N.Y., USA). To assess the potential usability of ChatGPT’s Arabic responses for patients with MASLD, the non-parametric Kendall Tau’s correlation test was employed. It examined the association between experts’ ratings, using ordinal data from Likert scale assessments for the three domains, to determine the direction and strength of relationships between the variables under study. The mean scores, measured on 6- and 3-point Likert scales, Kendall’s coefficient of concordance, and range values were expressed.

Ethical statement

Ethical approval for this study was obtained from the Research Ethics Committee (REC) of King Faisal Specialist Hospital & Research Center, Riyadh, Saudi Arabia (RAC #2241013) on 01/29/2024. The REC recommended the approval of the study with a waiver of signing and documentation of consent. The decision of participant MASLD experts to submit the survey was considered consent.

Results

Accuracy

The mean score for accuracy was 4.92 ± 0.94 on a 6-point Likert scale corresponding to “Nearly all correct”. Kendall’s coefficient of concordance ranged from 0.025 to 0.649, with a mean of 0.28, indicating a moderate level of agreement among all 10 experts. The highest mean square was for question 5, with a mean of 5.3 corresponding to “Correct”. The lowest mean was question 13, with a mean score of 4.3, corresponding to “More correct than incorrect”. Among the three domains, Physical Activity had the highest accuracy mean of 5.07 ± 0.83, while specialist referral had the lowest mean score of 4.70 ± 1.02 (Fig 1).

Download:

Fig 1. Accuracy score.

Box plot showing the distribution of accuracy scores for each question. Graph shows the interquartile range (box), median (horizontal line), mean (dot), and outliers (whiskers).

https://doi.org/10.1371/journal.pone.0317929.g001

Completeness

The mean score for completeness was 2.37 ± 0.53 on a 3-point Likert scale, corresponding to “Comprehensive”. Kendall’s coefficient ranged from 0.03 to 0.553, with a mean of 0.22, indicating a moderate level of agreement among all 10 experts. The highest question mean score was Q8 of 2.6, corresponding to “Comprehensive”. The lowest mean was question 1, with a mean score of 2.1, corresponding to “Adequate”. Among the three domains, Physical Activity had the highest mean score of 2.43 ± 0.57, while specialist referral had the lowest mean score of 2.20 ± 0.48 (Fig 2).

Download:

Fig 2. Completeness score.

Box plot showing the distribution of completeness scores for each question. Graph shows the interquartile range (box), median (horizontal line), mean (dot), and outliers (whiskers).

https://doi.org/10.1371/journal.pone.0317929.g002

Comprehensibility

The average comprehensibility score was 2.74 ± 0.52, which indicates that the ChatGPT-generated responses were “Easy to understand”. Kendall’s coefficient ranged from 0.00 to 0.447, with a mean of 0.25, indicating a moderate level of agreement among all 10 experts. The highest question mean score of 2.9 was questions 2, 3, 6, 8, and 10. The lowest question mean score of 2.4 was questions 7 and 14. Among the three domains, Physical Activity had the highest mean score of 2.83 ± 0.38, while specialist referral had the lowest mean score of 2.50 ± 0.63 (Fig 3).

Download:

Fig 3. Comprehensibility score.

Box plot showing the distribution of comprehensibility scores for each question. Graph shows the interquartile range (box), median (horizontal line), mean (dot), and outliers (whiskers).

https://doi.org/10.1371/journal.pone.0317929.g003

Expert comments

When comparing responses with the highest/lowest frequency of the expert comments, the following questions generated responses with no comments by the expert (Questions 8–10 and 12). The highest questions that had more than one expert comment were questions 1, 5, and 14. Grouping comments by theme, the following had been identified to be the most repeated comments among the experts: 1) The generated responses used the term “NAFLD/NASH” instead of “MASLD/MASH”; 2) The Arabic-generated response translation of “Biopsy”; 3) The Arabic-generated response translation of “MRI”; and 4) The Arabic-generated response sentences on alcohol consumption.

Discussion

AI is significantly impacting the medical field, including gastroenterology and hepatology [14, 15]. In recent years, AI has been successfully applied in liver pathology and radiology to improve diagnostic accuracy and reduce inter- and intra-observer variability [14–16]. Recently, significant attention has been paid to the clinical applications of AI-based chatbots, specifically ChatGPT in various contexts, including its potential use as an immediate, free, and on-demand information dissemination tool for patients with MASLD [14]. Identifying effective information dissemination tools for patients with MASLD is a clinical priority for disease management, as MASLD management needs a multidisciplinary approach [17]. Patient education and information dissemination are an essential component for helping patients in achieving and maintaining lifestyle changes [18, 19]. AI-based chatbots could be a valuable tool for patients by providing simplified explanations and guidance on first-line treatment options and disease management such as weight loss and physical activity recommendations.

Pugliese et al. [13] recently conducted the first study on ChatGPT 3.5 as an information dissemination tool for patients with MASLD, demonstrating that ChatGPT 3.5 can provide understandable and complete answers from the patient’s perspective to 15 pre-defined MASLD-related questions in English. The AI-generated answers were evaluated by 10 experts and found to be relatively accurate [13]. In addition, preliminary data from another study by the same authors showed that using a different language from English did not seem to affect the effectiveness of ChatGPT as a resource tool for patients with MASLD [20]. To date, no study assessed the effectiveness of AI-powered interventions for Arabic-speaking patients with MASLD.

In our study, we involved 10 MASLD experts from Saudi Arabia who were native Arabic speakers and evaluated the same set of questions that were previously analyzed in English. We found that ChatGPT’s ability to advise patients with MASLD was not affected by language, as the Arabic answers were deemed to be complete (with a mean score of 2.4 on a 3-point scale) and comprehensible (with a mean score of 2.74 on a 3-point scale). However, consistent with other studies, the accuracy of ChatGPT still requires improvement, with a mean score of 4.9 on a 6-point Likert scale (Table 1). So, while the Arabic language does not influence the completeness and accuracy of ChatGPT generated answers, it also does not improve the inaccuracies observed in clinically meaningful answers. Similar to a previous study conducted in the English language [13], the Physical Activity domain had the highest score as well for the Arabic questionnaire (Table 2).

Download:

Table 1. Comparing the mean score result between the Arabic and English responses [13].

https://doi.org/10.1371/journal.pone.0317929.t001

Download:

Table 2. Comparing domains mean score result between the Arabic and English responses [13].

https://doi.org/10.1371/journal.pone.0317929.t002

Limitations

Ten experts in the field of MASLD conducted the ratings using Likert scales. However, it is important to note that such scales have limitations as they allow for partial accuracy ratings. This is unacceptable in the medical field as it can lead to misunderstandings and dangerous consequences for patients. Another limitation is the availability of new and potentially better versions of ChatGPT (ChatGPT 4), as the study used version 3.5. However, it should be noted that ChatGPT 4 is not freely accessible to patients and thus it is unlikely to be used any time soon. While a variety of large language models are accessible, including free options, our decision to employ ChatGPT was primarily driven by methodological consistency. To ensure a reliable comparison between English and Arabic responses, it was crucial to maintain a standardized approach. By utilizing the same AI tool, we could isolate the impact of language differences on the generated content. We acknowledge the rapid advancements in AI technology and the potential benefits of exploring diverse models which may improve in accuracy and cultural relevance. Future research endeavors will undoubtedly involve a comparative analysis of various AI tools to assess their relative strengths and weaknesses in different language contexts.

In addition, it is crucial to consider the impact of socio-cultural factors on ChatGPT responses. The sociocultural background of the patient determines the tool’s capacity to offer culturally sensitive guidance, and patient preferences, health literacy levels, and cultural quirks may all affect how successful the responses are. Therefore, even if ChatGPT is a useful tool, its use needs to be done with consideration for the patients’ cultural variety [20] Chatbots also have other known limitations, including the risk of generating content that may not be grounded in evidence-based knowledge, a phenomenon known as ’hallucinations’ [21]. Retrieval augmented generation (RAG) is a potential method to address this issue. RAG combines the response-generating ability of AI-based chatbots with the ability to pull in verified information from external sources, resulting in more accurate and complete answers. There is a growing trend not only in acquiring information from AI-based apps and services but also in decision-making based on such information. Hence, the professional community should use AI responsibly by following the principles and ethics associated with it.

Conclusions

This study addresses the critical requirement for AI tools in the Arabic-speaking world, where the prevalence of MASLD is estimated to be higher than in Western countries [22]. Although our study confirms the promising results obtained by previous studies, the universal adoption of ChatGPT as a resource tool for MASLD patients is challenging [13, 20]. The identified limitations highlight the need for continued improvement of AI models in healthcare settings. Such improvement requires collaboration between AI experts and healthcare professionals, which is necessary and crucial. While the study results showcase that the AI-generated responses are accurate and consistent, patients should be informed not to replace conventional doctor visits with these technologies, as they facilitate educational patient material specifically, and are not a way to have a medical diagnosis or consultation.

Supporting information

S1 Table. Accuracy Likert scale reference.

https://doi.org/10.1371/journal.pone.0317929.s001

(DOCX)

S2 Table. Completeness Likert scale reference.

https://doi.org/10.1371/journal.pone.0317929.s002

(DOCX)

S3 Table. Comprehensiveness Likert scale reference.

https://doi.org/10.1371/journal.pone.0317929.s003

(DOCX)

S4 Table. Accuracy coded responses.

https://doi.org/10.1371/journal.pone.0317929.s004

(DOCX)

S5 Table. Completeness coded responses.

https://doi.org/10.1371/journal.pone.0317929.s005

(DOCX)

S6 Table. Completeness coded responses.

https://doi.org/10.1371/journal.pone.0317929.s006

(DOCX)

S7 Table. Accuracy—Kendall’s tau analysis.

https://doi.org/10.1371/journal.pone.0317929.s007

(DOCX)

S8 Table. Completeness Kendall’s tau.

https://doi.org/10.1371/journal.pone.0317929.s008

(DOCX)

S9 Table. Comprehensiveness Kendall’s tau.

https://doi.org/10.1371/journal.pone.0317929.s009

(DOCX)

S10 Table. Arabic questionnaire.

https://doi.org/10.1371/journal.pone.0317929.s010

(DOCX)

S11 Table. ChatGPT generated responses.

https://doi.org/10.1371/journal.pone.0317929.s011

(DOCX)

References

1. Chan WK, Chuah KH, Rajaram RB, Lim LL, Ratnasingam J, Vethakkan SR. Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD): A State-of-the-Art Review. J Obes Metab Syndr. 2023;32(3):197–213. pmid:37700494
- View Article
- PubMed/NCBI
- Google Scholar
2. Zelber-Sagi S, Ratziu V, Oren R. Nutrition and physical activity in NAFLD: An overview of the epidemiological evidence. World J Gastroenterol WJG. 2011 Aug 7;17(29):3377–89. pmid:21876630
- View Article
- PubMed/NCBI
- Google Scholar
3. Staufer K, Stauber RE. Steatotic Liver Disease: Metabolic Dysfunction, Alcohol, or Both? Biomedicines. 2023 Jul 26;11(8):2108. pmid:37626604
- View Article
- PubMed/NCBI
- Google Scholar
4. Alqahtani S. A., Broering D. C., Alghamdi S. A., Bzeizi K. I., Alhusseini N., Alabbad S. I., et al. (2021). Changing trends in liver transplantation indications in Saudi Arabia: from hepatitis C virus infection to nonalcoholic fatty liver disease. BMC gastroenterology, 21(1), 245. pmid:34074270
- View Article
- PubMed/NCBI
- Google Scholar
5. Coker T., Saxton J., Retat L., Alswat K., Alghnam S., Al-Raddadi R. M., et al. (2022). The future health and economic burden of obesity-attributable type 2 diabetes and liver disease among the working-age population in Saudi Arabia. PloS one, 17(7), e0271108. pmid:35834577
- View Article
- PubMed/NCBI
- Google Scholar
6. Golabi P., Paik J. M., AlQahtani S., Younossi Y., Tuncer G., & Younossi Z. M. (2021). Burden of non-alcoholic fatty liver disease in Asia, the Middle East and North Africa: Data from Global Burden of Disease 2009–2019. Journal of hepatology, 75(4), 795–809. pmid:34081959
- View Article
- PubMed/NCBI
- Google Scholar
7. Yin X, Guo X, Liu Z, Wang J. Advances in the Diagnosis and Treatment of Non-Alcoholic Fatty Liver Disease. Int J Mol Sci. 2023 Feb 2;24(3):2844. pmid:36769165
- View Article
- PubMed/NCBI
- Google Scholar
8. Lazarus JV, Colombo M, Cortez-Pinto H, Huang TTK, Miller V, Ninburg M, et al. NAFLD—sounding the alarm on a silent epidemic. Nat Rev Gastroenterol Hepatol. 2020 Jul;17(7):377–9. pmid:32514153
- View Article
- PubMed/NCBI
- Google Scholar
9. Allen-Meares P, Lowry B, Estrella ML, Mansuri S. Health Literacy Barriers in the Health Care System: Barriers and Opportunities for the Profession. Health Soc Work. 2020 Jan 28;45(1):62–4. pmid:31993624
- View Article
- PubMed/NCBI
- Google Scholar
10. Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell. 2023 Oct 31;6:1237704. pmid:38028668
- View Article
- PubMed/NCBI
- Google Scholar
11. Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Trans Benchmarks Stand Eval. 2023 Feb 1;3(1):100105.
- View Article
- Google Scholar
12. Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023 Sep 22;23(1):689. pmid:37740191
- View Article
- PubMed/NCBI
- Google Scholar
13. Pugliese N, Wai-Sun Wong V, Schattenberg JM, Romero-Gomez M, Sebastiani G, NAFLD Expert Chatbot Working Group, et al. Accuracy, Reliability, and Comprehensibility of ChatGPT-Generated Medical Responses for Patients With Nonalcoholic Fatty Liver Disease. Clin Gastroenterol Hepatol Off Clin Pract J Am Gastroenterol Assoc. 2023 Sep 15;S1542-3565(23)00704–8. pmid:37716618
- View Article
- PubMed/NCBI
- Google Scholar
14. Le Berre C, Sandborn WJ, Aridhi S, et al. Application of Artificial Intelligence to Gastroenterology and Hepatology. Gastroenterology. 2020;158(1):76–94.e2. pmid:31593701
- View Article
- PubMed/NCBI
- Google Scholar
15. Schattenberg JM, Chalasani N, Alkhouri N. Artificial Intelligence Applications in Hepatology. Clin Gastroenterol Hepatol. 2023;21(8):2015–2025. pmid:37088460
- View Article
- PubMed/NCBI
- Google Scholar
16. Nam D, Chapiro J, Paradis V, Seraphin TP, Kather JN. Artificial intelligence in liver diseases: Improving diagnostics, prognostics and response prediction. JHEP Rep. 2022;4(4):100443. Published 2022 Feb 2. pmid:35243281
- View Article
- PubMed/NCBI
- Google Scholar
17. Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, et al. AASLD Practice Guidance on the clinical assessment and management of nonalcoholic fatty liver disease. Hepatology. 2023;77(5):1797–1835. pmid:36727674
- View Article
- PubMed/NCBI
- Google Scholar
18. Pugliese N, Plaz Torres MC, Petta S, Valenti L, Giannini EG, Aghemo A. Is there an ’ideal’ diet for patients with NAFLD?. Eur J Clin Invest. 2022;52(3):e13659. pmid:34309833
- View Article
- PubMed/NCBI
- Google Scholar
19. Balakrishnan M, Liu K, Schmitt S, et al. Behavioral weight-loss interventions for patients with NAFLD: A systematic scoping review. Hepatol Commun. 2023;7(8):e0224. Published 2023 Aug 3. pmid:37534947
- View Article
- PubMed/NCBI
- Google Scholar
20. Pugliese N., Polverini D., Lombardi R., Pennisi G., Ravaioli F., Armandi A., et al., & NAFLD Expert Chatbot Working Group. (2024). Evaluation of ChatGPT as a Counselling Tool for Italian-Speaking MASLD Patients: Assessment of Accuracy, Completeness and Comprehensibility. Journal of Personalized Medicine, 14(6), 568. pmid:38929789
- View Article
- PubMed/NCBI
- Google Scholar
21. Goddard J. Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers. Am J Med. 2023;136(11):1059–1060. pmid:37369274
- View Article
- PubMed/NCBI
- Google Scholar
22. Younossi ZM, Golabi P, Paik JM, Henry A, Van Dongen C, Henry L. The global epidemiology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): a systematic review. Hepatology. 2023;77(4):1335–1347. pmid:36626630
- View Article
- PubMed/NCBI
- Google Scholar

[ref1] 1. Chan WK, Chuah KH, Rajaram RB, Lim LL, Ratnasingam J, Vethakkan SR. Metabolic Dysfunction-Associated Steatotic Liver Disease (MASLD): A State-of-the-Art Review. J Obes Metab Syndr. 2023;32(3):197–213. pmid:37700494
View Article
PubMed/NCBI
Google Scholar

[2] View Article

[3] PubMed/NCBI

[4] Google Scholar

[ref2] 2. Zelber-Sagi S, Ratziu V, Oren R. Nutrition and physical activity in NAFLD: An overview of the epidemiological evidence. World J Gastroenterol WJG. 2011 Aug 7;17(29):3377–89. pmid:21876630
View Article
PubMed/NCBI
Google Scholar

[6] View Article

[7] PubMed/NCBI

[8] Google Scholar

[ref3] 3. Staufer K, Stauber RE. Steatotic Liver Disease: Metabolic Dysfunction, Alcohol, or Both? Biomedicines. 2023 Jul 26;11(8):2108. pmid:37626604
View Article
PubMed/NCBI
Google Scholar

[10] View Article

[11] PubMed/NCBI

[12] Google Scholar

[ref4] 4. Alqahtani S. A., Broering D. C., Alghamdi S. A., Bzeizi K. I., Alhusseini N., Alabbad S. I., et al. (2021). Changing trends in liver transplantation indications in Saudi Arabia: from hepatitis C virus infection to nonalcoholic fatty liver disease. BMC gastroenterology, 21(1), 245. pmid:34074270
View Article
PubMed/NCBI
Google Scholar

[14] View Article

[15] PubMed/NCBI

[16] Google Scholar

[ref5] 5. Coker T., Saxton J., Retat L., Alswat K., Alghnam S., Al-Raddadi R. M., et al. (2022). The future health and economic burden of obesity-attributable type 2 diabetes and liver disease among the working-age population in Saudi Arabia. PloS one, 17(7), e0271108. pmid:35834577
View Article
PubMed/NCBI
Google Scholar

[18] View Article

[19] PubMed/NCBI

[20] Google Scholar

[ref6] 6. Golabi P., Paik J. M., AlQahtani S., Younossi Y., Tuncer G., & Younossi Z. M. (2021). Burden of non-alcoholic fatty liver disease in Asia, the Middle East and North Africa: Data from Global Burden of Disease 2009–2019. Journal of hepatology, 75(4), 795–809. pmid:34081959
View Article
PubMed/NCBI
Google Scholar

[22] View Article

[23] PubMed/NCBI

[24] Google Scholar

[ref7] 7. Yin X, Guo X, Liu Z, Wang J. Advances in the Diagnosis and Treatment of Non-Alcoholic Fatty Liver Disease. Int J Mol Sci. 2023 Feb 2;24(3):2844. pmid:36769165
View Article
PubMed/NCBI
Google Scholar

[26] View Article

[27] PubMed/NCBI

[28] Google Scholar

[ref8] 8. Lazarus JV, Colombo M, Cortez-Pinto H, Huang TTK, Miller V, Ninburg M, et al. NAFLD—sounding the alarm on a silent epidemic. Nat Rev Gastroenterol Hepatol. 2020 Jul;17(7):377–9. pmid:32514153
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref9] 9. Allen-Meares P, Lowry B, Estrella ML, Mansuri S. Health Literacy Barriers in the Health Care System: Barriers and Opportunities for the Profession. Health Soc Work. 2020 Jan 28;45(1):62–4. pmid:31993624
View Article
PubMed/NCBI
Google Scholar

[34] View Article

[35] PubMed/NCBI

[36] Google Scholar

[ref10] 10. Chakraborty C, Pal S, Bhattacharya M, Dash S, Lee SS. Overview of Chatbots with special emphasis on artificial intelligence-enabled ChatGPT in medical science. Front Artif Intell. 2023 Oct 31;6:1237704. pmid:38028668
View Article
PubMed/NCBI
Google Scholar

[38] View Article

[39] PubMed/NCBI

[40] Google Scholar

[ref11] 11. Javaid M, Haleem A, Singh RP. ChatGPT for healthcare services: An emerging stage for an innovative perspective. BenchCouncil Trans Benchmarks Stand Eval. 2023 Feb 1;3(1):100105.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref12] 12. Alowais SA, Alghamdi SS, Alsuhebany N, Alqahtani T, Alshaya AI, Almohareb SN, et al. Revolutionizing healthcare: the role of artificial intelligence in clinical practice. BMC Med Educ. 2023 Sep 22;23(1):689. pmid:37740191
View Article
PubMed/NCBI
Google Scholar

[45] View Article

[46] PubMed/NCBI

[47] Google Scholar

[ref13] 13. Pugliese N, Wai-Sun Wong V, Schattenberg JM, Romero-Gomez M, Sebastiani G, NAFLD Expert Chatbot Working Group, et al. Accuracy, Reliability, and Comprehensibility of ChatGPT-Generated Medical Responses for Patients With Nonalcoholic Fatty Liver Disease. Clin Gastroenterol Hepatol Off Clin Pract J Am Gastroenterol Assoc. 2023 Sep 15;S1542-3565(23)00704–8. pmid:37716618
View Article
PubMed/NCBI
Google Scholar

[49] View Article

[50] PubMed/NCBI

[51] Google Scholar

[ref14] 14. Le Berre C, Sandborn WJ, Aridhi S, et al. Application of Artificial Intelligence to Gastroenterology and Hepatology. Gastroenterology. 2020;158(1):76–94.e2. pmid:31593701
View Article
PubMed/NCBI
Google Scholar

[53] View Article

[54] PubMed/NCBI

[55] Google Scholar

[ref15] 15. Schattenberg JM, Chalasani N, Alkhouri N. Artificial Intelligence Applications in Hepatology. Clin Gastroenterol Hepatol. 2023;21(8):2015–2025. pmid:37088460
View Article
PubMed/NCBI
Google Scholar

[57] View Article

[58] PubMed/NCBI

[59] Google Scholar

[ref16] 16. Nam D, Chapiro J, Paradis V, Seraphin TP, Kather JN. Artificial intelligence in liver diseases: Improving diagnostics, prognostics and response prediction. JHEP Rep. 2022;4(4):100443. Published 2022 Feb 2. pmid:35243281
View Article
PubMed/NCBI
Google Scholar

[61] View Article

[62] PubMed/NCBI

[63] Google Scholar

[ref17] 17. Rinella ME, Neuschwander-Tetri BA, Siddiqui MS, et al. AASLD Practice Guidance on the clinical assessment and management of nonalcoholic fatty liver disease. Hepatology. 2023;77(5):1797–1835. pmid:36727674
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref18] 18. Pugliese N, Plaz Torres MC, Petta S, Valenti L, Giannini EG, Aghemo A. Is there an ’ideal’ diet for patients with NAFLD?. Eur J Clin Invest. 2022;52(3):e13659. pmid:34309833
View Article
PubMed/NCBI
Google Scholar

[69] View Article

[70] PubMed/NCBI

[71] Google Scholar

[ref19] 19. Balakrishnan M, Liu K, Schmitt S, et al. Behavioral weight-loss interventions for patients with NAFLD: A systematic scoping review. Hepatol Commun. 2023;7(8):e0224. Published 2023 Aug 3. pmid:37534947
View Article
PubMed/NCBI
Google Scholar

[73] View Article

[74] PubMed/NCBI

[75] Google Scholar

[ref20] 20. Pugliese N., Polverini D., Lombardi R., Pennisi G., Ravaioli F., Armandi A., et al., & NAFLD Expert Chatbot Working Group. (2024). Evaluation of ChatGPT as a Counselling Tool for Italian-Speaking MASLD Patients: Assessment of Accuracy, Completeness and Comprehensibility. Journal of Personalized Medicine, 14(6), 568. pmid:38929789
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref21] 21. Goddard J. Hallucinations in ChatGPT: A Cautionary Tale for Biomedical Researchers. Am J Med. 2023;136(11):1059–1060. pmid:37369274
View Article
PubMed/NCBI
Google Scholar

[81] View Article

[82] PubMed/NCBI

[83] Google Scholar

[ref22] 22. Younossi ZM, Golabi P, Paik JM, Henry A, Van Dongen C, Henry L. The global epidemiology of nonalcoholic fatty liver disease (NAFLD) and nonalcoholic steatohepatitis (NASH): a systematic review. Hepatology. 2023;77(4):1335–1347. pmid:36626630
View Article
PubMed/NCBI
Google Scholar

[85] View Article

[86] PubMed/NCBI

[87] Google Scholar

Figures

Abstract

Background and aim

Methods

Results

Conclusion

Introduction

Materials and methods

Statistical analysis

Ethical statement

Results

Accuracy

Completeness

Comprehensibility

Expert comments

Discussion

Limitations

Conclusions

Supporting information

S1 Table. Accuracy Likert scale reference.

S2 Table. Completeness Likert scale reference.

S3 Table. Comprehensiveness Likert scale reference.

S4 Table. Accuracy coded responses.

S5 Table. Completeness coded responses.

S6 Table. Completeness coded responses.

S7 Table. Accuracy—Kendall’s tau analysis.

S8 Table. Completeness Kendall’s tau.

S9 Table. Comprehensiveness Kendall’s tau.

S10 Table. Arabic questionnaire.

S11 Table. ChatGPT generated responses.

References