Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

The summary of the LLM-based data analysis system and its features.

Given a task description our system can generate and execute the code to carry out such analysis. It interacts and uses the user data that is mentioned in the task. It can correct code execution errors using the implemented self-correction mechanism.

More »

Fig 1 Expand

Table 1.

Example prompts, their task-related features, and their assigned complexity values.

More »

Table 1 Expand

Fig 2.

Error rate and fraction of executable tasks is dependent on task complexity and response length for a simple prompt strategy.

(A) Executability plotted against response length for tasks of varying complexity. Yes indicates that response code was executable, whereas no indicates response code was not executable. The prompt strategy was set to ‘simple’ (N = 20 individual prompts over n = 10 cycles). (B) Fraction of executable tasks plotted for tasks of increasing complexity. Prompt strategy was set to ‘simple’ (N = 20 individual prompts over n = 10 cycles).

More »

Fig 2 Expand

Fig 3.

ActAs and CoT prompt strategies do not result in decreased error rate for tasks with increasing complexity.

A) Executability plotted against response length for tasks of varying complexity. Yes indicates that response code was executable, whereas no indicates response code was not executable. Prompt strategy was set to “simple”, “CoT” or “ActAs” (N = 20 individual prompts over n = 10 cycles). (B) Fraction of executable tasks plotted for tasks of increasing complexity. Prompt strategy was set to “simple”, “CoT” or “ActAs” (N = 20 individual prompts over n = 10 cycles).

More »

Fig 3 Expand

Fig 4.

fileCont prompt strategy results in decreased error rate for some tasks with increasing complexity.

(A) Executability plotted against response length for tasks of varying complexity. Yes indicates that response code was executable, whereas no indicates response code was not executable. Prompt strategy was set to “simple” or “fileCont” (N = 20 individual prompts over n = 10 cycles). (B) Fraction of executable tasks plotted for tasks of increasing complexity. Prompt strategy was set to “simple” or “fileCont” (N = 20 individual prompts over n = 10 cycles).

More »

Fig 4 Expand

Fig 5.

Self-correction prompt strategy results in decreased error rate for all tasks with increasing complexity.

(A) Executability plotted against response length for tasks of varying complexity. Yes indicates that response code was executable, whereas no indicates response code was not executable. Prompt strategy was set to “simple”, “actAs”, “CoT”, “fileCont” or “selfCorrect” (N = 20 individual prompts over n = 10 cycles). (B) Fraction of executable tasks plotted for tasks of increasing complexity. Prompt strategy was set to “simple”, “actAs”, “CoT”, “fileCont” or “selfCorrect” (N = 20 individual prompts over n = 10 cycles).

More »

Fig 5 Expand

Table 2.

Results of chi-square test testing all different prompting strategies over the various complexities.

More »

Table 2 Expand

Fig 6.

GPT-4 results in decreased error rate for all tasks with increasing complexity when compared to GPT-3.5-turbo.

(A) Executability plotted against response length for tasks of varying complexity. Yes indicates that response code was executable, whereas no indicates response code was not executable. Prompt strategy was set to “selfCorrect” (N = 20 individual prompts over n = 10 cycles). OpenAI GPT-3.5-Turbo and GPT-4 LLM were used. (B) Fraction of executable tasks plotted for tasks of increasing complexity. Prompt strategy was set to “selfCorrect” (N = 20 individual prompts over n = 10 cycles). OpenAI GPT-3.5-turbo and GPT-4 LLMs were used.

More »

Fig 6 Expand

Fig 7.

Code correctness for GPT-3.5-Turbo using the self-correct mechanism.

Fraction of executable versus correct tasks plotted for tasks of increasing complexity. Prompt strategy was set to “selfCorrect” (N = 20 individual prompts over n = 10 cycles). OpenAI GPT-3.5-turbo LLM was used.

More »

Fig 7 Expand

Fig 8.

An illustration of the mergen RStudio addin.

The Shiny-based chatbot (also accessible via RStudio Add-in) allows users to change parameters of mergen functions, change API service, import API key and even turn on the “Self Correct” mode. A) Example input task as well as settings pane. B) The result of self-corrected and executed code generated for the task in A.

More »

Fig 8 Expand