Fig 1.
Overall methodology of the study.
Fig 2.
Zero-shot prompt for generating labels.
Fig 3.
Few-shot prompt for generating labels.
Fig 4.
Zero-shot Chain-of-Thoughts prompt for label generation.
Table 1.
Hyperparameter settings utilized for traditional machine learning models during hyperparameter tuning.
Fig 5.
The distribution of class labels in the four different label sets.
(a) human labels, (b) Zero-shot labels, (c) Few-shot labels, and (d) Zero-shot CoT labels.
Fig 6.
The percentages of changes in the three types of new label sets; Zero-shot, Few-shot, and Zero-shot CoT compared to human labels.
Table 2.
Testing results of models fine-tuned on four training sets with different labels.
Table 3.
Results of Wilcoxon signed-rank test performed to compare the evaluation metrics of each of two sets of labels generated by different approaches.
The ‘W’ refers to the test-statistic and p-val refers to the P-value.
Fig 7.
The percentage increase in performance compared to human-labeled data, observed across the top-performing classifiers of human labeling.
(a) Zero-shot, (b) Few-shot, (c) Zero-shot CoT.
Table 4.
Top classifiers trained on different GPT-based labeling sets based on f1-score.
Fig 8.
Performance analysis of classifiers trained on GPT-4’s labeled datasets, which outperformed ground truth labels.
Fig 9.
Two examples explaining the advantage of Zero-shot CoT over the basic Zero-shot prompting mechanism.