The reach of commercially motivated junk news on Facebook

Commercially motivated junk news–i.e. money-driven, highly shareable clickbait with low journalistic production standards–constitutes a vast and largely unexplored news media ecosystem. Using publicly available Facebook data, we compared the reach of junk news on Facebook pages in the Netherlands to the reach of Dutch mainstream news on Facebook. During the period 2013–2017 the total number of user interactions with junk news significantly exceeded that with mainstream news. Over 5 Million of the 10 Million Dutch Facebook users have interacted with a junk news post at least once. Junk news Facebook pages also had a significantly stronger increase in the number of user interactions over time than mainstream news. Since the beginning of 2016 the average number of user interactions per junk news post has consistently exceeded the average number of user interactions per mainstream news post.


Introduction
Social media and Facebook in particular have become a major gateway to news. Large numbers of people access news through social media, as shown by survey data from the Reuters Digital News Report [1]. In the US, 45% of respondents used social media for news consumption on a weekly basis, with Facebook being the leading source. Unfortunately, not all news spread by social media consists of high-quality, well-edited content. An important factor in the quality of Facebook's news feed is the widely discussed presence of 'fake news' and clickbait on the platform. Over the last years there has been an alleged rise in low-quality, and even completely fabricated news on social media [2].
The rise of so-called 'fake news' has gained much attention in academia, government and the media in the last few years. In this paper we refrain from using the term 'fake news' as an analytical concept, since it is imprecise and heavily politicized, encompassing connotations such as deceitful, false, and slanted [3][4][5][6]. Instead, we use the term junk news, focusing on a combination of content characteristics, production values, and types of producers. We study the money-driven, low-quality, highly shareable kind of content that is typically distributed on social media as clickbait. This genre frequently includes-but is not limited to-disinformation, i.e. completely fabricated or severely distorted information presented in news formats. Since the 2016 US elections, interest in the 'information disorder' [5] has boomed. Academic research has focused on the nature of the problem, its impact on the audience, and ways to counter it (e.g. [7][8][9]). Journalists have identified individuals and organisations spreading disinformation for ideological or commercial reasons [10]. Government bodies, think tanks, and social network platforms (Facebook, Twitter) have produced reports about covert foreign influence operations [11,12]. Within this 'junk news universe', research focuses predominantly on political content.
However, quantitative studies about the reach of junk news and disinformation are scarce. An extensive review of the literature on disinformation and social media, published in 2018 [13], highlights the prevalence of various kinds of disinformation as a research gap [13]. In addition, it notes an over-emphasis on Twitter and a lack of studies using Facebook data [13] and mentions restrictions imposed by the social media platforms [13].
The present study investigates the reach of junk news on Facebook. More specifically, we take the Netherlands as a case. The Dutch Facebook network is extensive: there are approximately 10.5 Million Facebook users in the Netherlands [14], on a total population of 17 Million. We compared the reach and development of commercially motivated Dutch junk news on Facebook to the reach and development of Dutch mainstream news on Facebook. For the purpose of this study we define mainstream news as well-edited content, published by established news media. We collected 117 thousand Facebook posts published by 63 junk news pages and 20 mainstream news pages over a five-year period. With these data, we study the reach of junk news and mainstream news by measuring publication activity and user engagement. Publication activity is defined by the number of posts published by a Facebook page. User engagement is defined by the number of user interactions with the published posts.
Given the alleged rise of junk news and in light of Facebook measures to improve news feed quality, the objective of this study was (1) to assess the total reach of junk news on Facebook, compared to mainstream news, in terms of user engagement; and (2) to investigate how junk news develops over time, in terms of publication activity of the junk news producers' Facebook pages, and of user engagement with the published posts.

Junk news defined
Scholars and journalists use various terms when they discuss news that is in some respects deceitful and/or unreliable. Our study concerns junk news. The bulk of the production of the junk news pages that we include consists of low-quality, sensational content. They frequently publish fabricated-i.e. completely fake-news, but as Venturini [15] argues, diffusion is the purpose, not falsity: ' [. . .] spread, rather than fakeness, is the birthmark of these contents that should be called "viral news" or possibly "junk news" for, just as junk food, they are consumed because they are addictive, not because they are appreciated.' (p.3 in [15]) 'Junk news' is also employed as a term by Oxford University's project on Computational Propaganda [16] covering a wide range of news sources. These are rated as 'junk news' if they tick at least three of the following five boxes: lack of professionalism (low to non-existent journalistic standards); sensationalist style (in-your-face visuals and headlines, strong emotional appeal); low credibility (low-quality sources, no fact-checking, false information, conspiracy theories); bias (hyper-partisan reporting); and forgery (outlets imitate both news formats and specific news brands, to pass off their fakes as genuine) [17]. We adopt the term 'junk news' from Narayanan et al. [17], but adapt the definition in order to make it applicable to Dutch commercial junk news on Facebook. Studying social media use during elections, the Computational Propaganda (ComProp) papers focus on politically themed and motivated social media messages. The characteristics of these messages are partly similar to the Dutch source material we analysed. They differ in their emphasis on ideology and falsehood. First, the outlets we tracked are not ideology-driven, but purely commercial. In addition, though falsehoods are frequent, they only occur in a minority of items; moreover, they mostly appear to be the consequence of low production standards rather than instances of intentional deception. Finally, none of the web sites in our sample deceptively imitates a respectable news brand, so the category 'forgery' does not apply.
This leads us to the following characteristics that constitute our working definition of commercial junk news (henceforward: junk news): • low journalistic quality (pre-packaged content, no added research and fact-checking); • produced by non-mainstream producers; • business model based on websites with advertising and Facebook pages pushing the sites' posts; • goal is viral success; • frequently contains fabricated or heavily distorted messages; • frequent use of clickbait headlines.
The commercial incentive of the pages implies that pages that are ideologically motivated are not covered by our definition.

Related work
Most studies in this field have been conducted from a political disinformation perspective (e.g. [7,18,19]). Few address the reach of money-driven junk news. Moreover, most of the available data on commercial junk news have been published by investigative journalists. In a seminal exposé, Buzzfeed editor Silverman [20] showed that during the 2016 US elections, the most popular fake stories about politics outperformed the most popular mainstream news stories.
Le Monde's fact-check and data journalism team Les Décodeurs analysed 101 false claims, spread by 1,001 web pages and videos. These claims were not all about politics: notably, health-related stories were among the most popular. Links to these pages and videos generated 4.3 Million Facebook shares and some 16 Million interactions (i.e. shares, comments, and likes). Three quarters of the false stories elicited more than 10,000 interactions each [21].
Using a sample of fact-checked news items, Vosoughi et al. [22] found that news items labelled by fact-checkers as untrue travelled faster than those labelled as true. Their sample covers a minute part of the junk news universe: only those items that have been evaluated by professional fact-checkers. In contrast we study the entire output of one nation's commercial junk news producers.
Studies addressing the reach of political fake news find that this reach seems to be overestimated. Combining survey responses with web tracking data, Guess et al. [23] estimate that in the weeks before and after the 2016 US presidential election 1 in 4 Americans visited a fake news site, but that most fake news was consumed by a small group of conservatives. Studying the fake news audience in the US, Nelson and Taneja [24] similarly conclude that this is a small subset of the heaviest Internet users. Political fake news is, essentially, niche content. In contrast, the category of money-driven junk news we study aims for the largest possible audience.
The most similar to our work is the study by Fletcher et al. [25]. Assessing the reach of both ideologically and commercially motivated fake news in France and Italy, they downplay the problem's size, pointing out that most sites in their sample reached less than 1% of the online population in each country. 'By comparison, the most popular news websites in France (Le Figaro) and Italy (La Repubblica) had an average monthly reach of 22.3% and 50.9%, respectively.' [25]. Some false news outlets, though, proved exceptionally successful: 'In France, one false news outlet generated an average of over 11 million interactions per month-five times greater than more established news brands.' [25].
This generally low level of measured engagement may be due to the study design. Fletcher et al. focussed on 'outlets that consistently and deliberately publish "false news", which we have defined elsewhere as "for-profit fabrication, politically-motivated fabrication [and] malicious hoaxes" designed to masquerade as news' [3]. This means that they excluded sites that publish general low-quality news, including the occasional fabricated item. Moreover, one of the Italian blacklists they used was, according to its editor, incomplete and outdated [26]. Finally, employing time spent on site as a metric skews results, since users often read no more than the headline, or the abstract offered by Facebook, before hitting the share button. This latter issue was also pointed out by Coltelli [26].
Studying data that cover one year (2017), Fletcher et al. [25] did not study the longitudinal development of fake news. Covering a larger time span (Jan. 2015-July 2018), Allcott et al. [27] measure the volume of Facebook users' engagements with sites known to spread false stories and compare this to developments in the reach of mainstream news sites and business and culture sites. After an initial rise in fake news engagement, this declined sharply from the beginning of 2017 onwards. During the same period, engagement numbers for the other categories they sampled remained more or less stable. The declining reach of fake news could be the result of Facebook actions against bad actors after the 2016 US elections.
Most studies on disinformation focus on the US. The Netherlands are different in a number of respects. Actors specializing in commercially inspired political disinformation (of the kind peddled by the notorious Macedonian fake news producers targeting Trump supporters in 2016) do not exist in the Netherlands. Neither do outlets that serve nothing but fabricated news. In their capacity of third-party fact-checkers for Facebook, two of the authors reviewed hundreds of web links submitted by Dutch Facebook users as potentially 'fake news'. Although absence of evidence does not constitute evidence of absence, we can safely assume that 'Macedonian' sites or sites that only publish manufactured news stories, had they existed, would have come to our attention. Moreover, neither do completely fake news sites feature in the (limited) academic literature dealing with disinformation in the Netherlands [28,29] or in think tank reports [30], nor have they been detected by Dutch investigative journalists dealing with this topic [31].
In summary, the present study distinguishes itself from prior work in three respects: (a) it addresses commercial junk news as opposed to political junk news; (b) it addresses the phenomenon's reach on Facebook as opposed to Twitter; (c) it follows a data-driven approach, including over 117 thousand published Facebook posts and the user interactions associated with these posts.

Data and methods
We compiled two seed lists of sites that we included in our sample: one of junk news sites and one of mainstream news sites.

Criteria for data sampling
The criteria for including a website in the list of junk news sites were directly deduced from the definition of junk news provided above. The sites were initially brought to the attention of two of the authors in their capacity as third-party fact-checkers for Facebook. In 2017, we factchecked more than 70 claims submitted by Facebook users; the reports were published on Nieuwscheckers.nl. Excluding the claims published on conspiracy sites (which are at least in part ideologically motivated) and on alternative health sites (adopting a different business model: some of these also make money by selling health products), left us with some 50 claims originating from commercially driven junk news sites that do not focus on one single topic. Most news items were published on multiple sites. We found that all items we checked were lacking veracity and originality: they consisted of material that was lifted from other websites and reproduced without additional research. In many cases, the sites published manufactured stories copied from foreign sources (e.g., 'Oprah Winfrey (63) zwanger van eerste kind', i.e. 'Oprah Winfrey (63) pregnant with first child'). In a few exceptional cases, the stories were invented by the site's editors (e.g., a story about a muslim girl from the Dutch town of Deventer who received death threats from fundamentalist muslims because she performed as a singer).
By searching for other sites that had published the same news items, and by using domain information in order to identify other sites registered by the same producers, we were able to collect more junk news sites. Since many of these sites do not publish the names of their owners, we used open source information (e.g., matching Google Adsense ID numbers and public Chamber of Commerce records) to expand our list of junk news sites. We deduce the fact that the producers are not ideologically motivated from the relative absence of political content on their sites and from their personal social media use, which is also lacking political messages. We contacted seven owners and editors involved in this business, but without exception they declined to be interviewed.
In the list of mainstream news sites we included national, well-known, general news media that have their own Facebook page. In the Dutch media landscape the set of established news media is relatively small and well-defined, consisting of national newspapers, news magazines, and news broadcasts. We only included websites that predominantly publish original, welledited content.

Data download and processing
For each domain in our junk news seed list we identified their corresponding Facebook page by crawling their homepage using Selenium [32] and Python, extracting the link to Facebook. For the mainstream news sites we manually identified the corresponding Facebook page. The resulting lists consist of 20 mainstream news pages and 63 junk pages, both shown in S1 Appendix.
We used the Facebook API [33] (version 2.8 accessed in the fall of 2017 for obtaining the junk news data, and version 3.0 accessed in the spring of 2018 for obtaining the mainstream news data) to download all posts published by these Facebook pages up until December 2017. The API did not return any junk news data before January 2013, most likely because the pages contained in the junk seed list were not yet active at that time. We sampled the same period for mainstream news to make both sets comparable. Thus, our sample contains all posts published by the 63 junk news pages between January 2013 and December 2017, and all posts published by the 20 mainstream news pages in that same period.
In December 2017, the Facebook API allowed us to get the unique identifiers of the users who posted a reaction or comment. Using these unique identifiers we were able to distil the number of people who interacted with a junk news post at least once. From February 6, 2017 it was no longer possible to retrieve information about user ids [34]. As a result, this information is missing for the mainstream news data. Table 1 summarizes the total size of the collected data sample. Fig 1 shows the number of posts published by the individual pages, for junk news and mainstream news. The table shows that the 20 mainstream news pages have altogether published almost the same number of posts as the 63 junk news pages in the same time period. This is further illustrated by Fig 1: each of the mainstream news pages has published more than 2,000 posts in the five-year time period. Seven junk news pages have published more than 2,000 posts as well, but the large majority of the junk news pages were much less active than the mainstream news pages. Table 1  Reactions, comments and shares are three types of user interactions with posts on Facebook. A 'reaction' is what is commonly referred to as a 'like', which can have the form of a thumbs-up, a heart, a crying emoticon, a shocked emoticon, or an angry emoticon. Together they constitute the user engagement. We were unable to assess reach in terms of page views and clicks, as these are not publicly available. The publication date is needed for the longitudinal analysis of the publication activity and engagement with Facebook pages.
We used R for the quantitative analysis of the collected data. We generated two types of statistics: statistics of the publication activity of the Facebook pages in our sample (number of posts published per month), and statistics of the user engagement with the published posts: numbers of reactions, comments and shares.

The reach of junk news and mainstream news on Facebook
In this section we address our first objective: to assess the total reach of junk news on Facebook, compared to mainstream news, in terms of user engagement. Table 2 lists the numbers of interactions per post over the complete five-year period, for junk news and mainstream news. An independent-samples t-test was conducted to compare the number of reactions, comments and shares on junk news and mainstream news. There was a significant difference between junk news and mainstream news for reactions, comments, and shares (P<0.0001 for all three comparisons). Thus, junk news on Facebook has

The development of junk news and mainstream news over time
In this section, we address our second objective: to investigate how junk news develops over time, in terms of publication activity of the junk news producers' Facebook pages, and of user engagement with the published posts. Publication activity over time. Fig 3 shows the publication activity over time. The average number of published posts per page per month is 50 for mainstream news (stdev = 6) and 53 for junk news (stdev = 21). A Mann-Whitney-Wilcoxon test indicates that the distributions in the two groups do not differ significantly (n 1 = n 2 = 60, p = 0.76); thus the average publication activity per page per month is comparable between junk news pages and mainstream news pages. However, the post activity for junk news on Facebook is more irregular with a much larger standard deviation, than the post activity for mainstream news. In addition, Fig  3B shows   Looking at the number of user interactions over time, we see that the lines for junk news and mainstream news have different peaks. We quantitatively analysed the development of the user engagement by computing a linear least squares regression line (line of best fit) for each graph. We found that the user engagement with both types of news is growing over time, but the engagement with junk news grows faster: the slope of the trend line for reactions on junk news posts is 7.55 compared to 3.99 for mainstream news. For comments the slopes are 4.05 for junk news and 1.94 for mainstream news. For shares, the slopes are 2.40 and 0.18 respectively. An independent-samples t-test was conducted to compare the slopes of the regression lines for the change in numbers of interactions over time. There was a significant difference between junk news and mainstream news for reactions, comments, and shares. The difference between the increase of reactions on junk news (b = 7.55, s.e. = 167.4) and the increase of reactions of mainstream news (b = 3.99, s.e. = 155.2) was significant with t(116) = 2.1, p = 0.039. The difference between the increase of comments on junk news (b = 4.05, s.e. = 39.6) and the increase of comments on mainstream news (b = 1.94, s.e. = 21.5) was highly significant with t (116) = 5.5, p < 0.0001. The difference between the increase of shares of junk news (b = 2.40, s. e. = 77.9) and the increase of shares of mainstream news (b = 0.18, s.e = 50.5) was highly significant with t(116) = 3.2, p = 0.0017 Thus, the posts published by junk news pages increasingly receive more user interactions than mainstream news. However, there is one caveat to this analysis, and that is the observation that the numbers of reactions, comments and shares for junk news pages have only decreased since the summer of 2017. This is striking because Fig 3 has indicated that the junk news pages have become increasingly active in publishing posts in the same period, with a steep growth since September 2017.

Discussion
Quantitative research has mostly overlooked the phenomenon of money-driven junk news, focusing on junk news and fake news characterized by political content and ideological motivation. Whereas the audience for political fake news is relatively small, consisting of politically polarized, heavy media users [23,24], commercial junk news appears to reach the broad audience it aims for. We have shown that commercial junk news receives significantly more user interactions (reactions, comments and shares) than mainstream news on Facebook. Hiding in plain sight, this category does not strive for brand recognition or loyalty. We have demonstrated that the reach of this kind of news warrants academic attention.
In fact, the figures we present likely underestimate the reach of junk news distributed by Facebook in the Netherlands, because we estimated reach by the number of interactions with a post. The reach of content however can be larger than the number of interactions: the number of Facebook users who consumed at least part of the story is probably higher than the number of people who interacted with the post [35]. The data for shares, reactions, and comments are the most robust indication for the reach of junk news among Dutch Facebook users, but the number of users who must have at least scanned the headline is most likely even larger. Similarly, the number of individual users reached by the pages we sampled must be higher than the 5,3 Million users who added a reaction or comment to at least one of the posts.
During the period covered by our data (January 2013 until December 2017) Facebook's popularity in the Netherlands has slightly grown from 9.6 million users in 2014 to 10.4 million users in 2017 [14]. A similar development could be expected for its popularity as a medium for spreading content and in user engagement with the news pages. However, the user engagement with those pages show a sharper increase than the overall Facebook popularity in the same time period. Moreover, the increase of interactions with junk news is event significantly stronger than the increase of interactions with mainstream news.

Comparison to related studies
Two recent studies that attempt to compare the reach of mainstream news versus fake news or junk news on Facebook present findings that are less dramatic than ours. Assessing the reach of fake news in France and Italy (including money-driven fake news), Fletcher et al. [25] state that most sites in their sample reached less than 1% of the online population. A French outlier however generated an average of over 11 million interactions per month, outperforming more established news brands.
Our results deviate from Fletcher et al.'s finding that ' [. . .] in most cases, in both France and Italy, false news outlets do not generate as many interactions as established news brands.' [25]. Our findings and our interpretation are less optimistic than those by Fletcher et al. on 'fake' news in Italy and France: their findings are restricted to sites that predominantly publish completely fabricated items and their use of time spent on site as a metric neglects the fact that many users will not read beyond the headline.
Our findings also differ from those of Allcott et al. [27], who compared the Facebook reach of sites known for spreading false stories with that of other news, business or culture sites. The decrease of false stories they note since early 2017 is only partly reflected in the Dutch junk news data: Our data show that junk news pages have become increasingly active in publishing posts in the second half of 2017, with a steep growth since September 2017. However, we have also observed that the numbers of reactions, comments and shares for junk news pages have decreased since the summer of 2017. We speculate that there might be a relation with Facebook's efforts to reduce the visibility of junk news on the platform (listed in the Appendix, Table 1 of Allcott et al., 2018 [27]). In May 2017, Facebook announced that 'misinformation, sensationalism, clickbait and posts that fall outside of [their] Community Standards' will be demoted [36].
Facebook data provided by the platform itself could possibly clarify this matter, but the lack of transparency about its algorithms and about the effectiveness of its actions against bad actors are a recurring obstacle for researchers in this field. As government pressure on the platforms increases [37], this may change in the future. In fact, in April 2018 Facebook and Social Science One announced a partnership in which the tech company shares data with social scientists studying fact-checking and misinformation on the platform [38,39].
Dutch junk news Facebook pages frequently promote fake, i.e. fabricated, stories [40]. Although these stories are not representative for the output as a whole, they can reach a sizeable audience. This is worrying, since some of these stories contain misleading health advice or false information about social groups. A completely bogus story about animal abuse by asylum seekers has been published on 13 different websites [41]. Using Netvizz [42], we found that between its first publication on 17 March 2017 and 13 May 2017, the story was shared 55,292 times.
However, focusing on fabricated stories with possible social and political consequences obscures the bigger point about junk news: thriving on the core components of social media use, this highly spreadable, low-quality category of news threatens to drown out better-quality news [15].

Conclusions
We studied the reach of commercial junk news on Facebook, by analysing 117 thousand posts published by 63 junk news pages and 20 mainstream news pages in the Netherlands.
In our five-year sample, there is significantly larger user engagement with junk news items than with mainstream news items, for each of the three interaction metrics (reactions, comments, and shares). In terms of different people reached junk news is widespread on Facebook: 5,3 Million individual Facebook users commented or reacted on a junk news post at least once. On a total number of 10 Million Facebook users in the Netherlands this is an impressive volume of engagement.
Junk news pages have been increasingly successful in attracting user engagement over the five-year time period 2013-2017, and the increase is significantly stronger than for mainstream news. From the beginning of 2016 junk news has consistently attracted more user interactions per post than mainstream news.
In conclusion, junk news pages are more successful than mainstream news in generating user engagement with posts. This user engagement feeds the business success for commercial junk news outlets on social media.