Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Examples of the four tasks of the GATmath benchmark.

The correct choice of each question is indicated in bold. The prompt used is in italics.

More »

Fig 1 Expand

Fig 2.

Examples of the five tasks from the GATLc benchmark.

The correct choice of each question is indicated in bold. The prompt used is in italics.

More »

Fig 2 Expand

Fig 3.

Transcription guide for GAT mathematical questions.

More »

Fig 3 Expand

Fig 4.

Two objects of the created JSON file.

More »

Fig 4 Expand

Table 1.

GATmath distribution across four tasks.

More »

Table 1 Expand

Table 2.

GATLc distribution across five tasks.

More »

Table 2 Expand

Fig 5.

An example of the five-shot evaluation settings used.

More »

Fig 5 Expand

Table 3.

Performance of Arabic LLMs on various tasks of GATmath dataset.

More »

Table 3 Expand

Table 4.

Performance of Arabic LLMs on various tasks of the GATLc dataset.

More »

Table 4 Expand

Fig 6.

Accuracy of the Models on the Nine Tasks of GATLc and GATmath Datasets.

More »

Fig 6 Expand

Table 5.

Performance of models on other Arabic benchmarks. CS: Commonsense.

More »

Table 5 Expand

Table 6.

Performance of models on mathematical reasoning benchmarks.

More »

Table 6 Expand

Table 7.

Performance of models on reasoning and language comprehension benchmarks.

More »

Table 7 Expand

Fig 7.

Performance of LLMs on GATmath and GATLc compared with English benchmarks.

More »

Fig 7 Expand