An investigation of social media labeling decisions preceding the 2020 U.S. election

Since it is difficult to determine whether social media content moderators have assessed particular content, it is hard to evaluate the consistency of their decisions within platforms. We study a dataset of 1,035 posts on Facebook and Twitter to investigate this question. The posts in our sample made 78 misleading claims related to the U.S. 2020 presidential election. These posts were identified by the Election Integrity Partnership, a coalition of civil society groups, and sent to the relevant platforms, where employees confirmed receipt. The platforms labeled some (but not all) of these posts as misleading. For 69% of the misleading claims, Facebook consistently labeled each post that included one of those claims—either always or never adding a label. It inconsistently labeled the remaining 31% of misleading claims. The findings for Twitter are nearly identical: 70% of the claims were labeled consistently, and 30% inconsistently. We investigated these inconsistencies and found that based on publicly available information, most of the platforms’ decisions were arbitrary. However, in about a third of the cases we found plausible reasons that could explain the inconsistent labeling, although these reasons may not be aligned with the platforms’ stated policies. Our strongest finding is that Twitter was more likely to label posts from verified users, and less likely to label identical content from non-verified users. This study demonstrates how academic–industry collaborations can provide insights into typically opaque content moderation practices.


B. Categories: In-scope narratives
The Election Integrity Partnership flagged the following types of narratives: • Procedural interference: Misleading content related to actual election procedures, e.g. the mechanics of voting (time, place, and manner of voting).
• Participation interference: Content designed to deter participation in the election process, e.g.threats to personal safety.
• Fraud: Content that encouraged people to misrepresent themselves to affect the electoral process or illegally cast or destroy ballots.
• Delegitimization: Content that aimed to delegitimize election results on the basis of false or misleading claims.
• Premature claims of victory: Content that called the results of an election before they were officially called by an authoritative source.
• Calls to violent action: Specific calls that could lead to violence, specific claims of violence from the "other side" that are not true or use misleading evidence, or hints of violence if the poster does not get their way.

C. Facebook and Twitter's labels
Figures 1 and 2 show some of the labels the platforms applied ahead of the 2020 election.

D. Script for assessing platform action
While we manually evaluated the status of all posts in the dataset for this paper, we also wrote a script to automate the process of loading and assessing posts in our dataset.We report on this process here as it was highly accurate and may be useful for future researchers.We ran this script first on December 7, 2020 to provide real-time findings for the Partnership final report.However, at the time we were not able to manually double check the script.For this paper we rely on the dataset created as a result of our manual inspection on June 4, 2021.The disadvantage of this decision is that our dataset may not reflect enforcement measures that were taken in the run-up to the election or soon afterwards; it may instead reflect enforcement that took place weeks later.However, we prefer to use the more accurate dataset, and believe it is unlikely that platforms labeled or removed much content weeks after the election.
The script scraped every single post in the dataset and checked whether: 1. the content was on a relevant platform (Facebook, TikTok, Instagram, Twitter, or YouTube), 2. the content was labeled, or 3. the content was removed.
The script worked by launching a "headless" instance of Chrome (version 91) that visited each post and relayed information about the page back to the script, which was then used to determine how the content was actioned.The script processed what a U.S.-based visitor would encounter when visiting the posts.To bypass "login walls" on Facebook and Instagram, our script authenticated with those platforms using a real user account.
While we experimented with extracting engagement data (number of likes, shares, views, etc.) from each post to supplement our analysis, we found that the anti-scraping measures employed by the platforms made this difficult to do reliably.Additionally, since the script examines each post in isolation, it has no way of detecting less overt moderation actions (such as downranking).
The script used textual signals (e.g., the presence of particular text on a page, such as 'This Content Isn't Available Right Now'), visual signals (e.g., the presence of a particular warning icon used in content labels), and semantic signals (e.g., the underlying structure of the web page) to determine the correct coding.The script also took a screenshot of every post in the dataset from a relevant platform so that its codings could be easily verified.Our manual review process largely relied on these screenshots.
The script performed reasonably well; when we tested it on the 4,681 posts in June 2021 and compared its results to our manual results, it made mistakes on only 163 URLsmostly due to multiple tweets appearing on the same page (e.g., in a thread).Note that these numbers reference the full post dataset, not the analysis dataset used in this paper.

E. Platform policies
Platform policies are spread across community standards, terms of service, and policy change blog posts that may not be incorporated into its Community Standards (Miller, 2021), which makes it difficult to pinpoint what they prohibit.In this section we describe relevant platform policies in more detail.See Appendix F of Election Integrity Partnership (2021) for a detailed description of what policies changed across platforms.
Facebook's Community Standards discuss election-related content under instructions "do not post" and when discussing groups that are not allowed on Facebook (Facebook, N.d.).The platform tells users not to post misleading content about voting, calls for actions that would interfere in an individual's ability to participate in any aspect of an election, calls for violence related to any aspect of an election, calls to bring weapons to any place related to election (and, sometimes, to "certain locations where there are temporarily signals of a heightened risk of violence or offline harm.[…] for example, when there is a known protest and counter-protest planned or violence broke out at a protest in the same city within the last 7 days"), threats against election officials, attacks on institutions or practices that may lead to harm particularly during elections, and expressions of support for intimidating voters.Facebook says it does not allow groups that repeatedly threaten to "violently disrupt an election process."In September 2020 a Facebook blog post (much of which has not been incorporated into its Community Standards) announced that it would remove posts that use COVID-19 to discourage voting, and label posts that seek to delegitimize aspects of the election (Meta, 2020).The blog post was unique in describing enforcement actions for violative content; the Community Standards do not do this explicitly.
Twitter's election policies fall under its Civic Integrity Policy (Twitter, 2021).It summarizes its policy as prohibiting posts that "may suppress participation or mislead people about when, where, or how to participate in a civic process.In addition, we may label and reduce the visibility of Tweets containing false or misleading information about civic processes in order to provide additional context."The policy covers four categories of content: Misleading information about how to participate, suppression and intimidation, misleading information about outcomes, and false or misleading affiliations.Twitter also prohibits accounts with a false or misleading affiliation.Twitter's violent threats policy forbids "[threatening] violence against an individual or a group of people" and "the glorification of violence" (Twitter, 2019) and a blog post specific to the election says that it will remove tweets that call for violence (Gadde, 2020).

F. Multilingual analysis
While the vast majority of the user-generated content that we analyzed was in English, our analysis does include some content not in English (as shown below in Figure 1.We used Google Translate to translate non-English content for our analysis.No language 1 0 0 0

Figure 1 :
Figure 1: Labels Facebook applied ahead of the 2020 election.Note: figures have been recreated.

Figure 2 :
Figure 2: Labels Twitter applied ahead of the 2020 election.Note: figures have been recreated.