The False positive problem of automatic bot detection in social science research

doi:10.1371/journal.pone.0241045

Table 1.

Data sets and how we combined them for our analysis.

More »

Expand

Fig 1.

Example accounts for each data set.

More »

Expand

Fig 2.

Receiver operating characteristics curve for Botometer and the universal score (average over 3 months for each account).

The x-axis represents the false positive rate and the y-axis represents the true positive rate (sensitivity).

More »

Expand

Table 2.

ROC-AUC as well as the PR-AUC universal score.

More »

Expand

Fig 3.

Receiver operating characteristics curve for Botometer and the universal complete automation probability (CAP) (average over 3 months for each account).

The x-axis represents the false positive rate and the y-axis represents the true positive rate (sensitivity).

More »

Expand

Table 3.

ROC-AUC as well as the PR-AUC scores universal CAP.

More »

Expand

Fig 4.

Universal score precision-recall curves for the resampled data sets.

We consider the population baseline on Twitter (15% bots) for the universal Botometer score, black points indicate the precision and the recall for the Botometer score 0.76. With the German politicians and bots data set, for almost every threshold level the identified sample of bots has more humans than real bots (precision). The x-axis represents the recall (sensitivity) and the y-axis represents the precision.

More »

Expand

Fig 5.

CAP precision-recall curves for the resampled data sets.

We consider the population baseline on Twitter (15% bots) for the universal CAP Botometer score, Black points indicate the precision and the recall for the universal CAP 0.25. With the German politicians and bots for almost every threshold level the identified sample of bots has more humans than real bots (precision). The x-axis represents the recall (sensitivity) and the y-axis represents the precision.

More »

Expand

Table 4.

ROC-AUC as well as the PR-AUC English score.

More »

Expand

Table 5.

ROC-AUC as well as the PR-AUC for the English CAP.

More »

Expand

Fig 6.

Density plot of the SDs for single accounts plotted for each group.

Left for the Botometer universal score, right for the Botometer CAP. Bandwith of 0.015 was used for the CAP and the universal scores. The x-axis represents standard deviation and the y-axis represents the density.

More »

Expand

Fig 7.

Changing score over time.

Percentage of accounts (y-axis) that have at least once a score in the three months below as well as above the threshold for all thresholds between 0 and 1 in steps of 0.05. The x-axis represents the chosen threshold for the Botometer score. Left for the universal score, right for the universal CAP.

More »

Expand

Fig 8.

Density plots for the different data sets.

Left: Density plots for the different combined data sets in our analysis showing the distribution of Botometer’s universal score. We used the resampled data sets with 15% bots and 85% humans with a total n = 100,000 for each data set. Right: Density plots for the human accounts data sets. Lines indicate the median, a bandwidth of 0.04 was used for all data sets. The x-axis represents the Botometer score and the y-axis represents the density.

More »

Expand