Comparing large language models and search engine responses to common orthodontic questions
Fig 2
Overall Quality (A), Overall Empathy (B), Overall Readability (C), and Overall Satisfaction(D) of LLMs and search engine responses to questions. A indicates GPT-4o; B, GPT-4o mini; C, Claude 3.5 Sonnet; D, Kimi AI; E, ERNIE Bot; F, Google; G, Microsoft Bing; and H, Baidu. The midline indicates the median (50% percentile); the box, the 25% and 75% percentiles; and the density distribution plot represents the probability density of the response score distribution. Kruskal-Wallis tests were used for comparisons among LLMs and among search engines, and Mann-Whitney U tests were used for comparisons between LLMs and search engines.