Benchmarking large-language-model vision capabilities in oral and maxillofacial anatomy: A cross-sectional study

doi:10.1371/journal.pone.0335775

Table 1.

Chat-based large-language models evaluated in this study.

More »

Expand

Fig 1.

Study flowchart.

The anatomy icon was drawn de novo by the corresponding author in Adobe Illustrator 2024 (Adobe Inc., San Jose, CA, USA); no external images, stock assets, or anatomical atlases were used, retrieved, traced, or adapted.

More »

Expand

Table 2.

Accuracy (%) of six large-language models in identifying anatomical labels.

More »

Expand

Fig 2.

Correct-response rates and 95% confidence intervals of six large-language models when answering multiple-choice items.

Superscript letters above each bar denote pairwise comparisons, with values sharing at least one common letter (a-d) do not differ significantly according to χ² tests with Benjamini-Hochberg adjustment.

More »

Expand

Table 3.

Consistency (%) of six large-language models across two repeated interactions.

More »

Expand

Table 4.

Per-label response latency (median seconds, interquartile range) of six large-language models.

More »

Expand

Fig 3.

Log-scale violin-and-box plots showing the distribution of per-label response latency (seconds) for six large-language models.

More »

Expand

Fig 4.

Accuracy-versus-latency trade-off for six large-language models.

Each cross marker locates a model by its median response latency (seconds, log-scale) and overall accuracy. Marker size is proportional to consistency across repeated interactions.

More »

Expand

Fig 5.

Example of landmark-name identification on an anterior facial muscle plate.

(A) Facial musculature redrawn for illustrative purposes. (B) Ground-truth terms with the corresponding outputs.

More »

Expand

Table 5.

Pragmatic model selection for dental anatomy teaching.

More »

Expand