Fig 1.
We include a statement assigning a persona to each prompt. The figure shows how different personas impact generations from the gemma-7b-inst model in objective tasks (w/ ground truth) and subjective tasks (no ground truth). The robot indicates the no persona baseline, where no persona-assignment statement is included.
Table 1.
Example prompts (with an example persona) for all datasets.
Table 2.
Persona list by category.
Fig 2.
Distribution of personas’ performances.
We show results for each dataset and overall performance (averaged across datasets).
Table 3.
Persona group average ranks (out of 193—162 personas + 30 control personas + no persona baseline—lower is better) for each knowledge domain. The rank of the best persona in each group is shown in parenthesis. We show in bold the top persona group for each domain and we underline the best domain of each persona group. The top ranked persona for social sciences was the social scientist persona.
Table 4.
Persona ranks (out of 193, lower is better) for increasingly specialized domains. For persona groups with multiple personas we show, in addition to the average rank, the rank of the best persona in the category between parentheses.
Fig 3.
Distribution of personas’ bias scores and frequency of unknown answers.
Ground truth answers yield a bias of 0 and a unknown frequency of 0.5.
Table 5.
Persona ranks for self-bias (out of 193), self-accuracy, overall bias, and overall accuracy.
Table 6.
Differences between the average accuracy (across all personas) and the accuracy of personas when answering questions involving their own demographic.
Table 7.
Differences between the frequency that each demographic is selected as the answer by the persona of the same demographic and on average (across all personas).
Table 8.
Prompts fed to GPT-4 to generate instruction paraphrases for the attitude questionnaires.
Fig 4.
Distribution of attitude scores for each model.
The yellow line shows the no persona scores.
Table 9.
P-values obtained through Friedman’s test for significance of the variability of persona’s attitudes for each model. We show in bold the non-significant results (significance level of .05).
Fig 5.
Attitude scores (averaged across models) for personas with consistent cross-model rankings.
The blue line shows the no persona scores.
Fig 6.
Pearson correlations between attitudes for human annotators (top left plot) and each model’s personas.
We show in bold weight significant correlations (p<.05).
Fig 7.
Cosine similarity between human and model correlations (between attitudes on the left and between attitudes and annotations on the right).
The black horizontal line denotes the cosine similarity between human and random baseline correlations.
Fig 8.
Distribution of toxicity scores for each model.
Top row: average offensiveness and racism ratings. Bottom row: agreement with human annotations for offensiveness and racism. The ratings are in a Likert scale from 1 (not at all offensive/racist) to 5 (extremely offensive/racist).
Fig 9.
Pearson correlations between attitudes and annotation statistics for human annotators (top left plot) and each model’s personas.
We show in bold weight significant correlations (p<.05).
Fig 10.
Distribution of personas’ refusal rates (averaged across datasets).
Fig 11.
Distribution of personas’ refusal rates for each dataset.
Fig 12.
Ratios between the standard deviation of the refusal rates of each persona category and the control category.