Table 1.
GDELT features included in analysis, accompanied by short description.
Fig 1.
Overview of data aggregation process.
Overview of process to aggregate GDELT data from article- to outlet-level instances, containing themes and their respective average GDELT features.
Fig 2.
Frequency of political lean classes per dataset, on a logarithmic scale.
Fig 3.
Overview of experiments used to test models.
To test the impact of different bias-related data, models were trained on subsets of the data: traditional bias data (features related to tone, polarity, activity and self/group reference density); alternative bias data (features of word-, article-counts, image- or video presence); and the combination of all these features: full bias data. An additional experiment tested model performance on the full dataset when supplemented with categorical features from the MBFC data.
Fig 4.
Confusion matrices of the predictions by the best performing models per task.
Table 2.
Model results per experiment.
Table 3.
Output examples. Examples of domains with corresponding predictions and ground truths. Predictions were made using the best performing NN model.
Fig 5.
Decision plot of Breitbart, a right-wing political news source.
The twenty most influential features are plotted in descending order. The range at the top of the graph represents the political bias labels as predicted by the model.
Fig 6.
Decision plot of Forbes, a right-leaning political news source.
Fig 7.
Decision plot of the Economist, a centre-leaning political news source.
Fig 8.
Decision plot of the Guardian, a left-leaning political news source.
Fig 9.
Decision plot of CNN, a left-wing political news source.
Fig 10.
SHAP decision plots of misclassified web-domain.
Example of a misclassified web-domain, theconservativetreehouse.com, which is a right-wing domain that was falsely classified as left-leaning by the model.
Fig 11.
PABS and MBFC label agreement.
A confusion matrix comparison of MBFC labels with those of PABS.