We present an automated method for measuring media bias. Inferring which newspaper published a given article, based only on the frequencies with which it uses different phrases, leads to a conditional probability distribution whose analysis lets us automatically map newspapers and phrases into a bias space. By analyzing roughly a million articles from roughly a hundred newspapers for bias in dozens of news topics, our method maps newspapers into a two-dimensional bias landscape that agrees well with previous bias classifications based on human judgement. One dimension can be interpreted as traditional left-right bias, the other as establishment bias. This means that although news bias is inherently political, its measurement need not be.
Citation: D’Alonzo S, Tegmark M (2022) Machine-learning media bias. PLoS ONE 17(8): e0271947. https://doi.org/10.1371/journal.pone.0271947
Editor: Ciro Castiello, University of Bari Aldo Moro, ITALY
Received: January 21, 2022; Accepted: July 10, 2022; Published: August 10, 2022
Copyright: © 2022 D’Alonzo, Tegmark. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: We have made our data freely available here: https://www.kaggle.com/datasets/tegmark/mediabias.
Funding: The author received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Political polarization has increased in recent years, both in the United States and internationally , with pernicious consequences for democracy and its ability to solve pressing problems . It is often argued that such polarization is stoked by the media ecosystem, with machine-learning-fueled filter bubbles  increasing the demand for and supply of more biased media. Media bias is defined by  as favoring, disfavoring, emphasizing or ignoring certain political actors, policies, events, or topics in a way that is deceptive toward the reader, and can be accomplished through many different techniques.
In response, there has been significant efforts to protect democracy by studying and flagging media bias. However, there is a widespread perception that fact-checkers and bias-checkers can themselves be biased and lack transparency . It is therefore of great interest to develop objective and transparent measures of bias that are based on data rather than subjective human judgement calls. Early work in this area is reviewed in , and is mainly qualitative, manual, or both. While this has produced interesting findings on biased coverage of, e.g., protests  and terrorism , the manual nature of these methods limits their scalability and feasibility for real-time bias monitoring in the digital age.
Advances in machine learning (ML) raise the possibility of bias detection that is transparent and scalable by virtue of being automated, with little or no human intervention. Early efforts in this direction have shown great promise, as reviewed in . For example, various ML natural language processing (NLP) techniques have been employed to discover bias-inducing words from articles in four German newspapers  and six 20th Century Dutch newspapers . ML NLP techniques have also been used to detect gender bias in sports interviews , to detect political bias in coverage of climate change , to identify trolling in social media posts , and to analyze bigram/trigram frequencies in the U.S. congressional record . Although these studies have been successful, they have typically involved relatively small datasets or hand-crafted features, making it timely and interesting to further pursue automated media bias detection with larger datasets and broader scope. This is the goal of the present paper.
Specifically, we will use a dataset containing roughly a million articles from about 100 different newspapers to study phrase bias , i.e., the bias allowing a machine-learning algorithm to predict which newspaper published an article merely from how often it uses certain phrases. As illustrated in Fig 1, for instance, articles about the Black Lives Matter (BLM) topic that refer to “demonstrators” and “rioters” are likely to be from media on the political left and right, respectively. Our goal is to make the bias-detection algorithm as automated, transparent and scalable as possible, so that biases of phrases and newspapers are machine-learned rather than input by human experts. For example, the horizontal positions of phrases and newspapers in Fig 1, which can be interpreted in terms of left-right bias, were computed directly from our data, without using any human input as to how various phrases or media sources may be biased.
The colors and sizes of the dots were predetermined by external assessments and thus in no way influenced by our data. The positions of the dots thus suggest that the horizontal axis can be interpreted as the traditional left-right bias axis, here automatically rediscovered by our algorithm directly from the data.
The rest of this paper is organized as follows. The Methods section describes our algorithm for automatically learning media bias from an article database, including a generalization of principal component analysis tailored for phrase frequency modeling. The Results section shows our findings for the most biased topics, and identifies a two-dimensional bias landscape that emerges from how bias correlates across topics, with left-right stance and establishment stance as its two bias axes. The Conclusions section summarizes and discusses our findings.
In this section, we present our method for automated bias detection. We first describe how we automatically map both phrases, meaning monograms, bigrams, or trigrams, and newspapers into a d-dimensional bias space using phrase statistics alone, then present our method for phrase selection.
Generalized SVD-modeling of phrase statistics
Given a set of articles from n different media sources, we begin by counting occurrences of m phrases (say “fetus”, “unborn baby”, etc.). We arrange these counts into an m × n matrix N of natural numbers Nij ≥ 0 encoding how many times the ith phrase occurs in the jth media source. We model Nij as a random variable drawn from a Poisson distribution whose mean (the average number of times the phrase occurs) is non-negative and depends both on the phrase i and the media source j: (1)
Our goal is to accurately model this matrix in terms of biases that link phrases and newspapers. Specifically, we wish to approximate either (or, alternatively, its logarithm) as a low-rank matrix , as in Singular-Value Decomposition (SVD) : where the rank r < min(m, n). Without loss of generality, we can choose U and V to be orthogonal matrices (UUt = I, VVt = I) and wk > 0.
Singular-value decomposition (SVD) corresponds to minimizing the mean-squared-error loss function . Although SVD is easy to compute and interpret mathematically, it is poorly matched to our media bias modeling problem for two reasons. First of all, it will in some cases predict negative phrase counts , which of course makes no sense as a language model. Second, it implicitly gives equal weight to fitting every single number , even though some are measured much more accurately than others from the data (the Poission error bar is and phrase counts can differ from one another by orders of magnitude). To avoid these shortcomings, we choose to not minimize the SVD loss, but to instead maximize the Poisson likelihood (2) i.e., the likelihood that our model produces the observed phrase counts N. Numerically, it is more convenient to maximize its logarithm (3)
The approximation in the last step uses Stirling’s approximation ln(k!) ≈ kln(k/e), and we use it for numerical speedup only when . To avoid the aforementioned problems with forbidden negative -values, we try two separate fits and select the one that fits the data better (gives a higher Poisson likelihood): (4) (5) where ReLU(x) = x if x ≥ 0, vaninishing otherwise. In our numerical calculations in the Results section, we find that the second fit performs better most of the time, but not always.
We determine the best fit by selecting the desired rank r (typically r = 3) and numerically minimizing the loss function over the fitting parameters wk, Uik and Vjk. We do this using the gradient-descent method method implemented in scipy.optimize , which is greatly accelerated by the following exact formulas for that follow from Eqs (3) and (4): (6) (7) (8) where (9) W is the diagonal matrix with Wkk = wk, and θ is the Heaviside step function defined by θ(x) = 1 if x > 0, vanishing otherwise. For the exponential parametrization of Eq (5), these formulas are identical except that . Once the numerical optimization has converged and determined , we use the aforementioned freedom to ensure that U and V are orthogonal matrices and wk ≥ 0.
Using the open-source Newspaper3k software , we scraped and downloaded a total of 3,078,624 articles published between January 2019 and December 2020 from 100 media sources chosen to include the largest US newspapers as well as a broad diversity of political stances. The 83 newspapers appearing in our generalized SVD bias figures below are listed in in Fig 2 and the correlation analysis at the end also includes articles from Defense One and Science.
The downloaded article text was minimally pre-processed before analysis. All text in “direct quotes” was removed from the articles, since we are interested in biased phrases use by journalists, not by their quoted sources. We replaced British spelling of common words (e.g., favourite, flavour) with American spelling (favorite, flavor) to erase spelling-based clues as to which newspaper an article is from. Non-ASCII characters were replaced by their closest ASCII equivalent. Text was stripped of all punctuation marks except periods, which were removed only when they did not indicate end-of-sentence—for example, “M.I.T.” would become “MIT”. End-of-sentence periods were replaced by “PERIOD” to avoid creating false bigrams and trigrams containing words not in the same sentence. Numerals were removed unless they were ordinals (1st, 17th), in which case they were replaced with equivalent text (first, seventeenth). The first letter of each sentence was lower-cased, but all other capitalization was retained. We discarded any articles containing fewer than ten words after the aforementioned preprocessing.
Extraction of discriminative phrases
We auto-classified the articles by topic using the open-source MITNewsClassify package from . For each of the topics mentioned below (covered in 779,174 articles), we extracted discriminative phrases by first extracting the Nanalyzed = 100,000 most common phrases, then ranking, purging and merging this phrase list as described below. The cutoff parameter Nanalyzed was introduced only for numerical efficiency, and had negligible effect on our results, which are dominated by much lower ranks; for example, the phrases included in our final analysis of abortion bias had a median rank of 1,116, far below our 100,000 cutoff.
To avoid duplication, we deleted subsumed monograms and bigrams from our phrase list: we deleted all monograms that appeared in a particular bigram more than fsubsumed = 70% of the time and all bigrams that appeared in a particular trigram more than a fraction fsubsumed of the time. For the BLM topic, for example, “tear” was deleted because if appeared in “tear gas” 87% of the time. We tested the choices fsubsumed = 60%, 70%, 80%, 90% and found 70% to strike the best balance between keeping too many phrase fragments (fsubsumed = 90% would retain “tear”) and discarding key subphrases (fsubsumed = 60% would discarded “fetal heartbeat” in favor of “a fetal heartbeat”, even though about 30% of 1,000 occurrences of “fetal heartbeat” lacked a preceding “a” and would thus have been ignored). These and all other hyperparameters of our method are listed in Table 1.
Next, all phrases were sorted in order of decreasing information score (10) where Pij ≡ Nij/N⋅⋅ is the aforementioned N-matrix rescaled as a joint probability distribution over phrases i and newspapers j, and replacing an index by a dot denotes that the index is summed over; for example, N⋅⋅ is the total number of phrases in all the articles considered. The mutual information between phrases and articles is ∑iIi, which can be interpreted as how many bits of information we learn about which newspaper an article is from by looking at one of its phrases. The information scores Ii can thus be interpreted as how much of this information the ith phrase contributes. Phrases are more informative both if they are more common and if their use frequency varies more between newspapers.
We removed all spoiler phrases where more than fspoiler = 90% of all occurrences of the phrase are from a single newspaper. These “too useful” phrases commonly reference journalist names or other things unique to newspapers but not indicative of political bias. For example, CNBC typically labels its morning news and talk program Squawk Box, making the phrase Squawk Box useful for predicting that an article is from CNBC but not useful for learning about media bias. We chose 90% to be conservative, lying far above fractions for legitimate biased phrases, the most extreme being “medical abortion”, appearing in a specific newspaper 52% of the time.
To further mitigate this problem, we created a black list of newspaper names, journalist names, other phrases uniquely attributable to a single newspaper, and generic phrases that had little stand-alone meaning in our context (such as “article republished”). Phrases from this list were discarded for all topics. Phrases that contained PERIOD were also removed from consideration. Just as we discarded direct quotes above, we also removed all phrases that contained “said” or “told” because they generally involved an indirect quote.
Once this automatic purge was complete, the Nscreened = 1,000 remaining candidate phrases with the highest information scores were selected for manual screening as described in the next section. We set this threshold simply to avoid exorbitant manual labor. Most surviving phrases were not ranked near the cutoff; for example, the median rank of used abortion bias phrases was 391.
Manual purge and merge.
To be included in our bias analysis, phrases must meet the following criteria:
- In order to be relevant to a topic, a phrase must not be a very common one that has ambiguous stand-alone meaning. For example, the phrase “social media” could be promoting social media pages, as in “Follow us on social media”, or referencing a social media site. For simplicity, such common phrases with multiple meanings were excluded. Note that longer phrases (bigrams or trigrams) that contained such shorter phrases (monograms or bigrams) could still be included, such as “social media giants” in the tech censorship topic.
- A phrase is allowed to occur in multiple topics (for example, “socialism” is relevant to both the Venezuela and Cuba topics), but a sub-topic is not. For example, phrases related to the sub-topic tech censorship in China were excluded from both the tech censorship and China topics because they were relevant to both.
- Uniqueness: Since there was minimal pre-processing, many phrases appear with different capitalizations or conjugations. In some cases, only one of the phrase variations was included and the others were discarded. In other cases, all variations were included because they represented a meaningful difference. These choices were made on a case by case basis, with a few general rules.
If both a singular and plural version of a word were present, only the more frequent variant was kept. If phrases were differentially capitalized (for example “big tech” and “Big Tech”), we kept both if they landed more than two standard deviations apart in the generalized principal component plot, otherwise we kept only the most frequent variant. If phrases were a continuation of one another, such as “Mayor Bill de” and “Bill de Blasio”, the more general phrase was included. In this case, “Bill de Blasio” would be included because it does not contain an identifier. If there was no identifier, the more informative phrase was kept: for example, discarding “the Green New” while keeping “Green New Deal”.
- Specificity: Phrases must be specific enough to stand alone. A phrase was deemed specific if the phrase could be interpreted without context or be overwhelmingly likely to pertain to the relevant topic. This rules out phrases with only filler words (e.g., “would like”, “must have”) and phrases that are too general (e.g. “politics”).
- Organize Subtopics (if needed): Some topics were far larger and broader than others. For example, finance contained many natural subtopics, including private finance and public finance. If natural subtopics appeared during the above process, the parent topic was split into subtopics. If topics were small and specific, such as guns, no such additional manual processing was performed.
- Edge cases: There were about a dozen cases on the edge of exclusion based on the above criteria, for which the include/exclude decision was based on a closer look at both the underlying data and the phrase error bar emerging from the principal component analysis. Most of these phrases were excluded for occurring only in a single newspaper for stylistic reasons. When necessary, we examined the use of the phrase in context by reading a random sample of 10 articles in our database containing the phrase.
In this section, we present the results of applying our method to the aforementioned 779,174-article dataset. We will first explore how the well-known left-right media bias axis can be auto-discovered. We then identify a second bias axis related to establishment stance, and conclude this section by investigating how bias correlates across topics.
Left-right media bias.
We begin by investigating the Black Lives Matter (BLM) topic, because it is so timely. The BLM Movement swept across the USA in the summer of 2020, prompting media coverage from newspapers of varied size and political stance. We first compute the aforementioned N-matrix describing phrase statistics; Nij is how many times the ith phrase was mentioned in the jth newspaper. We have made this and all the other N-matrices computed in this paper are available online (Our N-matrices, phrase lists etc. are available at https://space.mit.edu/home/tegmark/phrasebias.html). Table 2 shows a sample, rescaled to show the number of occurrences per article, revealing that the frequency of certain phrases varies dramatically between media sources. For example, we see that “riots” is used about 60 times more frequently in PJ Media than in the NY Times, which prefers using “protests”.
As described in the previous section, our generalized principal component analysis attempts to model this N-matrix in terms of biases that link phrases and newspapers. The first component (which we refer to as component 0) tends to model the obvious fact that some phrases are more popular in general and some newspapers publish more articles than others, so we plot only the next two components (which we refer to as 1 and 2) below. BLM components 1 and 2 are shown in Fig 1, corresponding to the horizontal and vertical axes: the phrase panel (left) plots Ui1 against Ui2 for each phrase i and the media panel (right) plots Vj1 against Vj2 for each media source j. The bars represent 1 standard deviation error bars computed using the Fisher information matrix method. To avoid clutter, we only show phrases and newspapers occurring in at least Nmin articles; for topics with fewer than 15,000 articles, we drop this phrase threshold to Nmin = 100. This removes only a small fraction of the dots from our bias plots (in the abortion example, it removed 6% of the phrases and 2% of the newspapers); their information content is low because they are based on rather few articles, so they have a rather negligible effect on our results but would visually dominate our plots with their large error bars.
In the media panel, the dots representing newspapers are colored based on external left-right ratings and scaled based on external pro-critical establishment ratings (which crudely correlates with newspaper size). The colors of the media dots reflect the left-right classification of media from  into the five classes “left”, “lean left”, “center”, “lean right” and “right”. The sizes of the media dots reflects the establishment stance classification from  which is based on the Swiss Policy Research Media Navigator classification  and Wikipedia’s lists of left, libertarian and right alternative media, attempting to quantify the extent to which a news source normally accepts or challenges claims by powerful entities such as the government and large corporations. It is important to note that the colors and sizes of the dots were predetermined by external assessments and thus in no way influenced by the N-matrices that form the basis of our analysis in this paper. It is therefore remarkable that Fig 1 reveals a clear horizontal color separation, suggesting that the first BLM component (corresponding to the horizontal axis) can be interpreted as the well-known left-right political spectrum.
Phrase bias and valent synonyms.
As described in the Methods section, the phrases appearing in Fig 1 (left panel) were selected by our algorithm as the ones that best discriminated between different newspapers. We see that they typically carry implicit positive or negative valence. Looking at how these phrases are used in context reveals that some of them form groups of phrases that can be used rather interchangeably, e.g., “protests” and “riots”. For example, a June 8 2020 New York Times article reads “Floyd’s death triggered major protests in Minneapolis and sparked rage across the country”  while a June 10 2020 Fox News article mentions “The death of George Floyd in police custody last month and a series of riots that followed in cities across the nation” . The x-axis in Fig 1 is seen to automatically separate this pair, with “protests” on the left and “riots” on the right, with newspapers (say NY Times and PJ Media) similarly being left-right separated in the right panel according to their relative preference for these two phrases. Fig 3 shows many such groups of emotionally loaded near-synonyms for both BLM and other topics. In many cases, we see that such a phrase group can be viewed as falling on a linguistic valence spectrum from positive (euphemism) to neutral (orthophemism) to negative (dysphemism).
The nutpicking challenge.
Fig 1 is seen to reveal a clean, statistically significant split between almost all left-leaning and right-leaning newspapers. The one noticeable exception is Counterpunch, whose horizontal placement shows it breaking from its left-leaning peers on BLM coverage. A closer look at the phrase observations reveals that this interpretation is misleading, and an artifact of some newspapers placing the same phrase in contexts where it has opposite valence. For example, a Counterpunch article treats the phrase “defund the police” as having positive valence by writing “the advocates of defund the police aren’t fools. They understand that the police will be with us but that their role and their functions need to be dramatically rethought” . In contrast, right-leaning PJ Media treats “defund the police” as having negative valence in this example: “If you’re a liberal, whats not to like about the slogan defund the police ? It’s meaningless, it’s stupid, it’s dangerous, and it makes you feel good if you mindlessly repeat it” . This tactic is known as nutpicking: picking out and showcasing what your readership perceives as the nuttiest statements of an opposition group as representative of that group.
In other words, whereas most discriminative phrases discovered by our algorithm have a context-independent valence (“infanticide” always being negative, say), some phrases are bi-valent in the sense that their valence depends on how they are used and by whom. We will encounter this challenge in many of the news topics that we analyze; for example, most U.S. newspapers treat “socialism” as having negative valence, and as a result, the arguably most socialist-leaning newspaper in our study, Socialist Alternative, gets mis-classified as right-leaning because of its frequent use of “socialism” with positive connotations. For example, for the Venezuela topic, Socialist Project uses the term “socialist” as follows: “Notably, Chavismo is a consciously socialist -feminist practice throughout all of Venezuela. Many communities that before were denied their dignity, have collectively altered their country based on principles of social equity and egalitarianism.” . In contrast, Red State uses “socialist” in a nutpicking way in this example: “conservative pundits and politicians have painted a devastatingly accurate picture of what happens when a country embraces socialism. Pointing out the dire situation facing the people of Venezuela provided the public with a concrete example of how socialist policies destroy nations.” .
Correlated left-right controversies.
Our algorithm auto-discovers bias axes for all the topics we study and, unsurprisingly, many of them reflect a traditional left-right split similar to that revealed by our BLM analysis. For example, Fig 4 shows that the first principal component (the x-axis) for articles on the abortion topic effectively separates newspapers along the left-right axis exploiting relative preferences for terms such as “fetus”/“unborn babies”, “termination/infanticide” and “anti choice”/“pro life”. In addition to valent synonyms, we see that our algorithm detects additional bias by differential use of certain phrases lacking obvious counterparts, e.g., “reproductive rights” versus “religious liberty”.
Fig 5 shows that the correlation between BLM bias and abortion bias is very high (the correlation coefficient r ≈ 0.90). Since these two topics are arguably rather unrelated from a purely intellectual standpoint, their high correlation reflects the well-known bundling of issues in the political system.
Each dot corresponds to a newspaper (see legend in Fig 2).
A simple way to auto-identify topics with common bias is to rank topic pairs by their correlation coefficients. In this spirit, Table 3 shows the ten topics whose bias is most strongly correlated with BLM bias, together with the corresponding Pearson correlation coefficient r and its standard error , where n is the number of newspapers included in its calculation. The results for three of the most timely top-ranked issues (tech censorship, guns, and US immigration) are shown in Fig 6, again revealing a left-right spectrum of media bias for these topics.
The figures above show that although the left-right media axis explains some of the variation among newspapers, it does not explain everything. Fig 7 shows a striking example of this for the military spending topic. As opposed to the previous bias plots, the dots are no longer clearly separated by color (corresponding to left-right stance). Indeed, left-leaning CNN (18) is seen right next to right-leaning National Review (53) and Fox News (36). Instead, the dots are seen to be vertically separated by size, corresponding to establishment stance. In other words, we have auto-identified a second bias dimension, here ranging vertically from establishment-critical (bottom) to pro-establismnent (top) bias.
Generalized principal components for military spending.
Just like left-right bias, establishment bias manifests as differential phrase use. For example, as seen Table 4. the phrase “military industrial complex” is used more frequently in newspapers classified as establishment-critical, such as Canary and American Conservative, but is rarely, if ever, used by mainstream, pro establishment outlets such as Fox or CNN, which instead prefer phrases such as “defense industry”.
We find that the military spending topic, much like the BLM topic, is highly correlated with other topics included in the study. This is clearly seen in Fig 8, which plots the pro-critical generalized principal components of the military spending topic and the Venezuela topic. A closer look at the Venezuela topic in Fig 9 reveals a establishment bias similar to that seen in Fig 7. We see that, while establishment-critical papers frequently use phrases such as “imperialism” and “regime change”, pro-establishment newspapers prefer phrases such as “socialism” and “interim president”. This figure reveals that the Venezuela topic engenders both establishment bias (the vertical axis) and also a smaller but non-negligible left-right bias (the horizontal axis).
To identify additional topics with establishment bias, we again compute correlation coefficients between generalized principal components—this time with the vertical component for military spending. Table 5 shows the ten most correlated topics, revealing a list quite different from the left-right-biased topics from Table 3. Nuclear weapons, Yemen, and police, three timely examples from this list, are shown in Fig 10. Here the left panels illustrate how usage of certain phrases reflects establishment bias separation. In articles about nuclear weapons, the terms “nuclear arms race” and “nuclear war” are seen to appear preferentially in establishment-critical newspapers, while “nuclear test” and “nuclear deterrent” are preferred by pro-establishment papers. In articles about Yemen, the phrase “war on Yemen”, suggesting a clear cause, is seen to signal an establishment-critical stance, while “humanitarian crisis”, not implying a cause, signals pro-establismnent stance. For articles about police, grammatical choices in the coverage of police shootings is seen to be highly predictive of establishment stance: establishment-critical papers use passive voice (e.g., “was shot dead”) less than pro-establishment papers, and when they do, they prefer the verb “killed” over “shot”. Such news bias through use of passive voice was explored in detail in . Fig 11 illustrates such use of the passive voice and valent synonyms across establishment topics.
Machine learning the media bias landscape
Throughout this paper, we have aspired to measure media bias in a purely data-driven way, so that the data can speak for itself without human interpretation. In this spirit, we will now eliminate the manual elements from our above bias landscape exploration (our selection of the two rather uncorrelated topics BLM and military spending and the topics most correlated with them). Our starting point is the 56 × 56 correlation matrix R for the generalized principal components of all our analyzed topics, shown in Fig 12. Notation such as “BLM 1” and “BLM 2” reflects the fact that we have two generalized principal components corresponding to each topic (the two axes of the right panel of Fig 1, say). Our core idea is to use the standard technique of spectral clustering  to identify which topics exhibit similar bias, using their bias correlation from Fig 12 as measure of similarity. We start by performing an eigendecomposition (11) of the correlation matrix R, where λi are the eigenvalues and the columns of the matrix E are the eigenvectors. Fig 13 illustrates the first two eigencomponents, with the point corresponding to the kth topic plotted at coordinates (E1k, E2k). To reduce clutter, we show the ten components with the largest |E1k| and the ten with the largest |E2k|, retaining only the largest component for each topic. For better intuition, the figure has been rotated by 45°, since if two internally correlated clusters are also correlated with each other, this will tend to line up the clusters with the coordinate axes. If needed, we also flip the sign of any axis whose data is mainly on the negative side and flip the 1/2 numbering to reflect cluster membership as described in S1 Table in S1 Appendix.
The bars represent 1 standard deviation Jackknife error bars.
We can think of Fig 13 as mapping all topics into a 2-dimensional media bias landscape. The figure reveals a clear separation of the topics into two clusters based on their media bias characteristics. A closer look at the membership of these two clusters suggests interpreting the x-axis as left-right bias and the y-axis as establishment bias. We therefore auto-assign each topic to one of the two clusters based on whether it falls closer to the x-axis or the y-axis (based whether |E1k|>|E2k| or not, in our case corresponding to which side of the dashed diagonal line the topic falls). We then sort the topics on a spectrum from most left-right-biased to most establishment-biased: the left-right topics are sorted by decreasing x-coordinate and followed by the establishment topics sorted by increasing y-coordinate. When ordered like this, the two topic clusters become visually evident even in the correlation matrix R upon which our clustering analysis was based: Fig 12 shows two clearly visible blocks of highly correlated topics–both the left-right block in the upper left corner and the establishment block in the lower right.
Above, the newspapers were mapped onto a separate bias plane for each of many different topics. We normalize each such media plot, e.g., the left panel of Fig 1, such that the dots have zero mean and unit variance both horizontally and vertically. We then unify all these plots into a single media bias landscape plot in Fig 14 by taking weighted averages of these many topic plots, weighting both by topic relevance and inverse variance. Specifically, for each topic bias, we assign two relevance weights corresponding to the absolute value of its x- and y-coordinates in Fig 13, reflecting its relevance to left-right and establishment bias, respectively. These weights can be found in Table S1 in S1 Appendix. For example, to compute the x-coordinate of a newspaper in Fig 14, we simply take a weighted average of its generalized principal components for all topics, weighted both by the left-right relevance of that topic and by the inverse square of the error bar.
Our method locates newspapers into this two-dimensional media bias landscape based only on how frequently they use certain discriminative phrases, with no human input regarding what constitutes bias. The colors and sizes of the dots were predetermined by external assessments and thus in no way influenced by our data. The positions of the dots thus suggest that the two dimensions can be interpreted as the traditional left-right bias axis and establishment bias, respectively.
Fig 14 can be viewed as the capstone plot for this paper, unifying information from all our topic-specific bias analyses. It reveals fairly good agreement with our the external human-judgement-based bias classifications reflected by the colors and sizes of the dots: it shows a separation between blueish does on on the left and reddish ones on the right, as well as a separation between larger (pro-establishment) dots toward the top and smaller (establishment-critical) ones toward the bottom.
For a more quantitative comparison of our classification scores (which are arbitrary real numbers) with the external ones (which are quantized on a scale 1,2,3,4,5), we consider simple binary classification. As can be seen in Fig 14, the AllSides classification  (corresponding to the dot colors) classifies 12/44 ≈ 27% of the newspapers as “right” or “lean right” (some shade of red in the figure). If we correspondingly define the rightmost 27% in our classification as “right or lean right”, then the agreement between our classification and AllSides is 91% (for 40 out of 44 newspapers). Fig 14 also shows that our external establishment classification  (corresponding to the dot sizes) classifies 17/44 ≈ 39% of the newspapers as establishment-critical (the two smallest dot sizes). If we correspondingly define the lowermost 39% in our classification as establishment-critical, then the agreement between our classification and the external one is 95% (for 42 out of 44 newspapers).
Closer inspection of Fig 14 also reveal some notable exceptions that deserve further scrutiny. As mentioned, the “nutpicking” poses a challenge for our method. An obvious example is Jacobin Magazine, a self-proclaimed socialist newspaper  that Fig 14 classifies as right-leaning because of its heavy use of the phrase “socialism” approvingly while it is mainly used pejoratively by right-leaning media. Nutpicking may also help explain why Fig 14 shows some more extreme newspapers closer to the center than more moderate ones (according to the human-judgement-based classification from AllSides ). For example, AllSides rates Breitbart as further right than Fox, and uses the phrase “defund the police” more often than Fox—presumably to criticize or mock it, thus getting pulled to the left in Fig 14 towards left-leaning newspapers who use the phrase approvingly. One might expect nutpicking to be more common on the extremes of the political spectrum, in which case our method would push these newspapers toward the center. Fig 14 also shows examples where our method might be outperforming the human-judgement-based classification from AllSides ). For example,  labels Anti War as “right” while our method classifies it as left, in better agreement with its online mission statement.
Our analysis also offers more nuance than a single left-right bias-score: for example, our preceding plots show that American Conservative is clearly right-leaning on social issues such as abortion and immigration, while clearly left on issues involving foreign intervention, averaging out to a rather neutral placement in Fig 14.
We have presented an automated method for measuring media bias. It first auto-discovers the phrases whose frequencies contain the most information about what newspapers published them, and then uses observed frequencies of these phrases to map newspapers into a two-dimensional media bias landscape. We have roughly a million articles from about a hundred newspapers for bias in dozens of news topics, producing a a data-driven bias classification in good agreement with prior classifications based on human judgement. One dimension can be interpreted as traditional left-right bias, the other as establishment bias. Comparing to external human-generated bias ratings, our automated method shows 91% and 95% agreement on binary classification along the left-right and establishment bias axes, respectively. We also provide separate bias classifications for each news topic, revealing more nuanced understanding of newspaper leanings.
Our method leaves much room for improvement, and we will now mention three examples. First, we saw how the popular practice of nut-picking can cause problems for our analysis by the same phrase being used with positive or negative connotations depending on context. This could be mitigated by excluding such bi-valent phrases from the analysis, either manually or with better machine learning.
Second, topic bias can cause challenges for our method, by separating newspapers by their topic focus (say business versus sports) in a way that obscures political bias. As described above, we attempted to minimize this problem by splitting overly broad topics into narrower ones, but this process should be improved and ideally automated.
Third, although our method is almost fully automated, a manual screening step remains whereby auto-selected phrases are discarded if they lack sufficient relevance, uniqueness or specificity. Although this involves only the selection of phrases (machine-learning features), not their interpretation, it is worthwhile exploring whether this screening can be further (or completely) automated, ideally making our method completely free of manual steps and associated potential for human errors.
Fourth, although we have demonstrated that phrase counts contain lots of interesting information about media bias, this is of course just a tiny fraction of the total information content. It will be interesting to explore greatly improved automated media bias detection with sophisticated natural language processing architectures such as that of GPT3 .
We hope that our method will prove useful not only for researchers in media studies, but also in the journalism industry. For example, it can be used to automatically discover, tag and highlight valent synonyms and other loaded phrases. This can be used both internally, to encourage more neutral phrase usage, and externally, by fact-checking and media-monitoring sites to alert readers when they are being manipulated through phrasing. As another example, our automated method can be used by news aggregators and review-writers to help identify articles on both sides of a wide variety of issues, since the bias classification can be performed article-by-article for each separate topic rather than simply once-and-for-all for each newspaper. As automatic speech-to-text audio transcription gradually gets better, our method should become useful for radio and television news as well.
As datasets and analysis methods continue to improve, the quality of automated news bias classification should get ever better, enabling more level-headed scientific discussion of this important phenomenon, with conclusions based more on data than on human punditry. We therefore hope that automated new bias detection can help make media bias discussions less politicized than the media being discussed.
The authors wish to thank Rahul Bhargava, Meia Chita-Tegmark, Haimoshri Das, Emily Fan, Jamie Fu, Finnian Jacobson-Schulte, Dianbo Liu, Jianna Liu, Mindy Long, Hal Roberts, Khaled Shehada, Arun Wongprommoon, and Ethan Zuckerman, for helpful comments and support during the launch phase of this project.
- 1. Wilson AE, Parker VA, Feinberg M. Polarization in the contemporary political and media landscape. Current Opinion in Behavioral Sciences. 2020;34:223–228.
- 2. McCoy J, Rahman T, Somer M. Polarization and the global crisis of democracy: Common patterns, dynamics, and pernicious consequences for democratic polities. American Behavioral Scientist. 2018;62(1):16–42.
- 3. Pariser E. The filter bubble: How the new personalized web is changing what we read and how we think. Penguin; 2011.
- 4. Shultziner D, Stukalin Y. Distorting the News? The Mechanisms of Partisan Media Bias and Its Effects on News Production. Political Behavior. 2021;43(1):201–222.
- 5. Brandtzaeg PB, Følstad A. Trust and distrust in online fact-checking services. Communications of the ACM. 2017;60(9):65–71.
- 6. Groseclose T, Milyo J. A social-science perspective on media bias. Critical Review. 2005;17(3-4):305–314.
- 7. McCarthy J, Titarenko L, McPhail C, Rafail P, Augustyn B. Assessing Stability in the Patterns of Selection Bias in Newspaper Coverage of Protest During the Transition from Communism in Belarus*. Mobilization: An International Quarterly. 2008;13(2):127–146.
- 8. Papacharissi Z, de Fatima Oliveira M. News Frames Terrorism: A Comparative Analysis of Frames Employed in Terrorism Coverage in U.S. and U.K. Newspapers. The International Journal of Press/Politics. 2008;13(1):52–74.
- 9. Hamborg F. Media Bias, the Social Sciences, and NLP: Automating Frame Analyses to Identify Bias by Word Choice and Labeling. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Online: Association for Computational Linguistics; 2020. p. 79–87. Available from: https://www.aclweb.org/anthology/2020.acl-srw.12.
- 10. Spinde T, Hamborg F, Gipp B. Media Bias in German News Articles: A Combined Approach; 2020. p. 581–590.
- 11. Wevers M. Using Word Embeddings to Examine Gender Bias in Dutch Newspapers, 1950-1990. arXiv:190708922 [cs, stat]. 2019;.
- 12. Fu L, Danescu-Niculescu-Mizil C, Lee L. Tie-breaker: Using language models to quantify gender bias in sports journalism. arXiv:160703895 [physics]. 2016.
- 13. Schuldt JP, Konrath SH, Schwarz N. “Global warming” or “climate change”?: Whether the planet is warming depends on question wording. Public Opinion Quarterly. 2011;75(1):115–124.
- 14. Liu A, Srikanth M, Adams-Cohen N, Alvarez RM, Anandkumar A. Finding social media trolls: Dynamic keyword selection methods for rapidly-evolving online debates. arXiv preprint arXiv:191105332. 2019.
- 15. Gentzkow M, Shapiro J. What Drives Media Slant? Evidence From U.S. Daily Newspapers. Econometrica. 2010;78(1):35–71.
- 16. Hamborg F, Zhukova A, Gipp B. Automated Identification of Media Bias by Word Choice and Labeling in News Articles. In: 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL); 2019. p. 196–205.
- 17. Eckart C, Young G. The approximation of one matrix by another of lower rank. Psychometrika. 1936;1(3):211–218.
- 18. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0–Fundamental Algorithms for Scientific Computing in Python. Nature Methods. 2020.
- 19. Ou-Yang L. Newspaper3k: Article scraping & curation — newspaper 0.0.2 documentation;. Available from: https://newspaper.readthedocs.io/en/latest/.
- 20. Wongprommoon A. mit-news-classify: A news classification tool developed for Improve the News, a project by Max Tegmark. Available from: https://www.improvethenews.org/.
- 21. AllSides Media Bias Ratings. Available from: https://www.allsides.com/media-bias/media-bias-ratings.
- 22. Improve the news—Even if media don’t affect what you think, they affect what you think about. What do you want to think about? Available from: https://www.improvethenews.org/.
- 23. The Media Navigator—Swiss Policy Research. Available from: https://swprs.org/media-navigator/.
- 24. How George Floyd Was Killed in Police Custody—The New York Times. Available from: https://www.nytimes.com/2020/05/31/us/george-floyd-investigation.html.
- 25. George Floyd’s brother set to testify as Capitol Hill lawmakers consider police reform proposals | Fox News. Available from: https://www.foxnews.com/politics/george-floyds-brother-set-to-testify-as-capitol-hill-lawmakers-consider-police-reform-proposals.
- 26. Jackson J. Police Reform Was Never Going to be Easy, But Now’s the Time; 2020. Available from: https://www.counterpunch.org/2020/06/11/police-reform-was-never-going-to-be-easy-but-nows-the-time/.
- 27. Moran R. ‘Bloody Sunday’:18 Murders in Chicago in 24 Hours as Calls to ‘Defund the Police’ Escalate;. Available from: https://pjmedia.com/news-and-politics/rick-moran/2020/06/09/bloody-sunday18-murders-in-chicago-in-24-hours-as-calls-to-defund-the-police-escalate-n509665.
- 28. Gaster J. Defiant Resistance: The Venezuelan Crises and the Possibility of Another World—The Bullet 2019. Available from: https://socialistproject.ca/2019/04/defiant-resistance-the-venezuelan-crises-and-the-possibility-of-another-world/.
- 29. Charles J. Selling Capitalism More Effective Than Demonizing Socialism. Available from: https://redstate.com/jeffc/2019/09/28/selling-capitalism-effective-demonizing-socialism-n116351.
- 30. Herman ES, Chomsky N. Manufacturing consent: The political economy of the mass media. Random House; 2010.
- 31. Ng AY, Jordan MI, Weiss Y. On spectral clustering: Analysis and an algorithm. In: Advances in neural information processing systems; 2002. p. 849–856.
- 32. About Us. Available from: https://jacobinmag.com/about.
- 33. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language models are few-shot learners. arXiv preprint arXiv:200514165. 2020.