Performance of large language models ChatGPT and Gemini in child and adolescent psychiatry knowledge assessment

doi:10.1371/journal.pone.0332917

Table 1.

Distribution into chapters of total 150 MCQs.

More »

Expand

Fig 1.

Number of questions with the corresponding detected accuracy.

Gemini 1.5 Flash revealed a mean accuracy of 68.3% (SD = 39.7), Gemini 2.0 Flash 76.3% (SD = 36.4), ChatGPT o1-mini 76.7% (SD = 35.3) and ChatGPT 4o 78.9% (SD = 37.0).

More »

Expand

Fig 2.

Accuracy of the tested LLMs divided into different chapters.

Each grey dot illustrates the accuracy for one asked question. An accuracy of 0 represents 0/10 correct answers for one question. The headings of the chapters corresponding to their number can be found in Table 1. (A) represents the accuracy by chapter for Gemini 1.5 Flash, (B) for Gemini 2.0 Flash, (C) for ChatGPT 4o, (D) for ChatGPT o1-mini.

More »

Expand