Table 1.
Distribution into chapters of total 150 MCQs.
Fig 1.
Number of questions with the corresponding detected accuracy.
Gemini 1.5 Flash revealed a mean accuracy of 68.3% (SD = 39.7), Gemini 2.0 Flash 76.3% (SD = 36.4), ChatGPT o1-mini 76.7% (SD = 35.3) and ChatGPT 4o 78.9% (SD = 37.0).
Fig 2.
Accuracy of the tested LLMs divided into different chapters.
Each grey dot illustrates the accuracy for one asked question. An accuracy of 0 represents 0/10 correct answers for one question. The headings of the chapters corresponding to their number can be found in Table 1. (A) represents the accuracy by chapter for Gemini 1.5 Flash, (B) for Gemini 2.0 Flash, (C) for ChatGPT 4o, (D) for ChatGPT o1-mini.