Figures
Abstract
Hyperpartisan news consists of articles with strong biases that support specific political parties. The spread of such news increases polarization among readers, which threatens social unity and democratic stability. Automated tools can help identify hyperpartisan news in the daily flood of articles, offering a way to tackle these problems. With recent advances in machine learning and deep learning, there are now more methods available to address this issue. This literature review collects and organizes the different methods used in previous studies on hyperpartisan news detection. Using the PRISMA methodology, we reviewed and systematized approaches and datasets from 81 articles published from January 2015 to 2024. Our analysis includes several steps: differentiating hyperpartisan news detection from similar tasks, identifying text sources, labeling methods, and evaluating models. We found some key gaps: there is no clear definition of hyperpartisanship in Computer Science, and most datasets are in English, highlighting the need for more datasets in minority languages. Moreover, the tendency is that deep learning models perform better than traditional machine learning, but Large Language Models’ (LLMs) capacities in this domain have been limitedly studied. This paper is the first to systematically review hyperpartisan news detection, laying a solid groundwork for future research.
Citation: Maggini MJ, Bassi D, Piot P, Dias G, Otero PG (2025) A systematic review of automated hyperpartisan news detection. PLoS ONE 20(2): e0316989. https://doi.org/10.1371/journal.pone.0316989
Editor: Pablo Henríquez, Universidad Diego Portales, CHILE
Received: August 12, 2024; Accepted: December 19, 2024; Published: February 21, 2025
Copyright: © 2025 Maggini et al. This is an open access article distributed under the terms of the CreativeCommonsAttributionLicense, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All the files regarding the collection and screening process are publicly available at the following GitHub repository: https://github.com/MichJoM/Hyperpartisan_News_Detection_Systematic_Review/tree/main. We do not need to give access to data, since they are already open.
Funding: This work is supported by the EUHORIZON2021 European Union’s Horizon Europe research and innovation programme (https://cordis.europa.eu/project/id/101073351/es)the Marie Skłodowska-Curie Grant No.: 101073351. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency (REA). Neither the European Union nor the granting authority can be held responsible for them. The authors have no relevant financial or non-financial interests to disclose. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
The foundation of democratic governments rests on the voting process conducted by citizens [1]. Political parties, in their quest for votes, heavily rely on news media to disseminate their messages during campaigns. While transparent information and active political participation are crucial for a healthy democracy, political entities increasingly employ hyperpartisan communication strategies. These tactics aim to discredit opposing factions and distort reality, potentially impacting how governments represent their constituents. Although hyperpartisan campaign methods may increase voter participation [2] and strengthen the connection between voting decisions and specific ideologies, they can have significant negative consequences. As [3]emonstrates, this communication style can highlight divisive tensions within society, complicating governance and potentially alienating citizens when opposing sides gain power. Consequently, hyperpartisanism poses a threat to the proper functioning of democracy [4] by polarizing and dividing the social fabric, reducing trust in governmental entities and mainstream news [5], and exacerbating tensions between governments and their oppositions [6].
The rise of alternative media outlets further amplifies these threats to democracy [7], as they often share polarizing content [8]. In the online sphere, hyperpartisanship proliferates through various channels social networks publishers’ websites. The dissemination of hyperpartisan news, characterized by highly polarized political and ideological content, capitalizes on the virality facilitated by platform algorithms [9]. While the term gained prominence during the 2016 U.S. election [10], there is no evidence suggesting that this specific event triggered a systemic hyper-polarization [11].
The digital realm has become a significant arena for political influence [12], affecting the entire infosphere [13]. The close relationship between hyperpartisanship and online interactions has led to increased attention on these manipulative forms of communication [14]. On the policy front, the EU Commission’s 2018 expert report [15] addressed related topics such as disinformation, defamation, hate speech, and incitement to violence. More recently, the European Parliament adopted the Digital Services Act (DSA) [16] in 2022, aiming to provide "a secure, predictable and trustworthy online environment" (Article 1. 1). In line with [9] and [17], we categorize hyperpartisan news under the broader umbrella of misinformation, closely related to fake news detection. Hyperpartisan news detection as a classification task is specifically related to the news domain and can focus on linguistic, semantic, and meta-data features. The objective is for an algorithm to predict a text’s political affiliation or determine if the content is hyperpartisan. The rising academic interest in hyperpartisan detection is testified by the high participation of 42 teams at task 4 of SemEval-2019 [18].
For this systematic review, we only considered automated text-based strategies applied to news articles. Manual detection of hyperpartisan news has been proposed. It mainly focuses on discourse analysis [19–21]. Despite its effectiveness, this approach does not scale with the daily news spreading. Hence, automated methods such as deep learning, social network analysis, or cross-methodologies like [22] are more effective. These approaches rely on different features, so that hyperpartisan news detection may be tackled adopting content, sources, and user-based data [23].
The article is organized as follows: the Related Works section covers the relevant surveys on similar topics, highlighting the main features and comparing their limitations with regards to our study; the Methodology section discusses the methodology adopted for this systematic review, including research questions, search strategy, criteria selection, and selection procedure; the section Hyperpartisan news detection: description of the phenomenon focuses on the definition of hyperpartisanship, highlighting its multi-task and cross-disciplinary nature. Afterward, we present the textual frames where hyperpartisan traits are traceable and the spectrum of methodologies used in different computational sub-fields. Then, we covered the diverse strategies and scales used to label hyperpartisanship. Section Approaches for automatic hyperpartisan news detection contains a global categorization and discussion of the most performant model in the papers screened and selected. We distinguished between the typology of the model, the results, the features and the approaches employed. Section Datasets is a descriptive overview of the datasets used in this domain: we collected the cited datasets and their features. Finally, section Conclusions and future works concludes the article by presenting the main findings of our literature review.
The main contributions of this study are:
- Comparing the different definitions of hyperpartisan news detection;
- Collecting and discussing the diverse approaches and algorithms used in the selected literature, specifically for the news domain;
- Reporting evaluation metrics, features and embeddings considered in the studies;
- Presenting the main findings, the engineering innovations and research designs;
- Collecting and analyzing 38 datasets used in the literature, focusing on English and less representated languages;
- Delineating prevailing research gaps and challenges in hyperpartisan news detection task.
Related works
The current state of the literature lacks a systematic review specialized in automatic hyperpartisan news detection. While there are various relevant survey papers, they predominantly focus on fake news and bias detection tasks. For instance, [24] examined fake news detection while considering the relation between factuality and political bias of news sources without showing any dataset or discussing the methodologies. [25] started from a theorical introduction of the fake news phenomenon to then cover the technical methodologies considering different perspectives from content to style analysis. [26] compared manual and automated approaches to identify media bias, distinguishing several forms of bias occurring in the distinct steps of news production. Similarly, [27] investigated the application of deep learning algorithms in fake news detection, building upon a taxonomy proposed by [17], where hyperpartisan news detection overlapped with fake news detection. [28] covers the broad field of disinformation by designing a taxonomy without considering either automated approaches or the datasets used in the literature. Similarly, [29] analyzes the general phenomenon of media bias detection by describing its diverse manifestations (e.g., spin bias, ideology bias, coverage bias), distinguished the techniques to detect them and reported 17 datasets. Except for this last author, no particular attention was given to hyperpartisan news detection from the others.
Methodology
In this section we will present and describe the methodology adopted to conduct this systematic review following [30]’s guidelines. The planning and execution phases of this study are detailed in the following subsections, while the results phases are discussed in section Hyperpartisan news detection: description of the phenomenon, section Approaches for automatic hyperpartisan news detection and section Datasets.
Research Questions
The Research Questions (RQ) that motivated the need for this systematic review are the following:
- RQ1 Does a categorization for hyperpartisan news detection methods exist?
- RQ2 Is hyperpartisan news detection a stand-alone or over-lapping task?
- RQ3 What are the proposed solutions using textual data?
- RQ4 Does the task keep up with the new Natural Language Processing technologies like autoregressive models?
- RQ5 What are the results of the models developed?
- RQ6 What are the datasets used for this task? How are they structured? Have they been updated to cover the latest political global and regional trends?
- RQ7 How can the current state of research on hyperpartisan detection be characterized in diverse languages and countries?
Search strategy
We adopted the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines [31], consisting of a checklist (http://www.prisma-statement.org/documents/PRISMA_2020_checklist.pdf) and a flow diagram Fig 1 to illustrate in a simplified and clear way the steps made. To retrieve papers, primary different academic online databases were used to overcome their respective limitations [32] in terms of topic coverage and papers available: ACM Digital Library, Google Scholar, Scopus, ProQuest, and IEEExplore. Our query archetype was: ((hyperpartisan OR “political bias” OR “hyper-partisan” OR partisanship OR hyperpartisanship OR “political polarization”) AND (news OR bias OR articles) AND (detection OR classification)). The first set of words contains the different homographs. We also searched in all subject fields, to capture as many semantically similar papers as possible, including potentially miscategorized papers. We selected the 2015-2024 timeframe to analyze trends before the term was coined, considering a period in which studies on this topic grew, and increasingly powerful models were employed.
The Flow Diagram illustrates the steps during document collection and evaluation. We skimmed more than 1553 papers and finally we selected a subset of 81.
For the purpose of obtaining pertinent papers related to our questions, the queries reported in Table 1 were a refined result of a structured process based on different steps introduced by [29]. Queries within each database were structured to match titles, abstracts, and keywords. We extended this pipeline, introducing the following step: “Network visualization and exploration”. It concerned the usage of the research software ResearchRabbit (https://researchrabbitapp.com) to exhaustively capture possibly omitted papers through the citation links structure.
- Keywords domain extrapolation: Initial reviews on similar topics helped us identify the keywords used in this domain. We noticed a lack of scientific agreement on writing “hyperpartisan”. To cover all these morphologically diverse forms (hyper-partisan, hyperpartisan, hyper-partisan), we included them in our queries, treating them as synonyms;
- Iterative searches: This process allowed us to select the most appropriate terms by comparing the results retrieved using different keywords combinations. We examined how much titles and abstracts related with the queries;
- Verifying against established literature: To ensure the efficiency of our search terms, we compared the results to a list of papers in the domain of hyperpartisan detection;
- Network visualization and exploration: To further validate our verification, we used ResearchRabbit, a tool to visualize the citation links between papers in the same collection. It suggested similar papers written by the same or different authors, highlighting stored publications in the user’s folder. This tool helped us in gauging the coherence of our results.
Selection criteria
Before describing the screening process, we illustrate the criteria employed for the paper selection.
- Inclusion criteria
- Papers primarily focused on automated hyperpartisan news detection;
- Papers that used the related task (e.g. fake news detection) as a synonym of hyperpartisan news detection;
- Publications from 2015 to 2024;
- Exclusion criteria
- Exclusion of sources that either address the hyperpartisan news detection problem from a theoretical perspective, namely theory papers, or manual detection;
- Studies discussing only related topics, such as fake news detection, stance detection, or political bias;
- Findings that do not use news domain datasets as the main source for hyperpartisan news detection, i.e. social network analysis, comments analysis, and tweets detection-based approaches;
- Literature reviews, books, thesis and posters.
Screening and selection process
The following search strategy and procedures for study selection and analysis were used. The study selection, quality assessment of the included studies, and thematic analysis were performed by one author (PP). However, the procedures and findings were discussed by all authors, and potential disagreements were resolved by consensus.
To manage the screening and selection processes, we utilized Rayyan (https://www.rayyan.ai/) for its AI-powered capabilities, which allowed the two reviewers to conduct a blinded selection process, preventing any mutual influence. Specific eligibility criteria were established to ensure the reliability of the study. These criteria were applied independently by each reviewer to maintain objectivity and consistency. The criteria included: relevance to the predefined inclusion criteria, evaluation of models using both accuracy and F1 score, and comprehensive reporting of the dataset used. Only papers that met the criteria and were accepted by both reviewers were selected. In cases where there was disagreement, a third reviewer was consulted to assess the paper’s eligibility. The initial dataset consisted of 723 papers from ACM Digital Library (https://dl.acm.org/), 571 from Google Scholar (https://scholar.google.com/), 1 from ScienceDirect (https://www.sciencedirect.com/), 97 from Scopus (https://www.scopus.com/home.uri), 159 from ProQuest (https://www.proquest.com/index), and 1 from IEEE Xplorer (https://ieeexplore.ieee.org/Xplore/home.jsp). Notably, Google Scholar initially retrieved 1800 results, but we noted that, after the threshold of 500 results, it did not produce relevant documents. That led us to manually collect only the first 571 papers.
We conducted the entire selection process using Rayyan, as described in Fig 1. It automatically detected 118 duplicates. After manual checks, we removed them. Left with 1441 studies, screening titles and abstracts was the initial step. Following thorough evaluations, 67 papers were retained from a curated pool of 110, eliminating 43 papers that did not meet specific focus or dataset criteria.
Additionally, to examine the cohesion and coherence of our references, we used ResearchRabbit to visualize the citation network, identifying two prominent clusters with centers in [33] and [18]. [18] is a key work from the SemEval initiative, set a foundational benchmark for detecting hyperpartisan in news articles, which informed our criteria for selecting relevant studies. This shared task saw the participation of 42 teams. They explored several approaches that future research will expand upon it. Moreover, the two datasets described in are important benchmarks for hyperpartisan news detection. Similarly, [33] compared linguistics and topical methodologies too discern between hyperpartisan and neutral news. That was one of the first work in literature and defined the importance of linguistics features in this task. 14 additional papers were included after exploring similar works and citations thanks to this procedure. Lastly, we compared the several definitions of hyperpartisan news to stress the importance of having a specific and clear task not overlapping with related ones. Our work offers an extensive and comprehensive investigation of state-of-the-art techniques considering both mixed approaches, machine and deep learning application. To ensure our systematic review is both homogeneous and robust in terms of comparability, we focused on the most commonly used performance metrics in NLP: accuracy and F1 score. By collecting and analyzing these standard metrics, we aim to maintain consistency across the studies and enhance the reliability of our comparative analysis. Lastly, we retrieved and analyzed 38 datasets, reporting the evaluation metrics, embeddings and features used by researchers. Finally, we present some descriptive results regarding the trend of the publications over time (Fig 2) and the selected sample that highlight the main publishers (Fig 3).
Transparency and replicability
Emphasis was placed on transparency and replicability to adhere to rigorous academic standards and required by PLOS ONE’s policy on Data Availability. Thus, a GitHub repository stores the queries employed and described in the paper as well as the results of the screening process described above. This enables fellow researchers to replicate the methodology and verify the findings. The repository is accessible at https://github.com/MichJoM/Hyperpartisan_News_Detection_Systematic_Review/tree/main. In addition to the previous information, it contains the explanation of how missing data were handled.
Hyperpartisan news detection: Description of the phenomenon
In this section, we begin by examining the definitions of hyperpartisan news detection found in the reviewed literature. We then delve into the various biases that are related to our investigated phenomenon and constitute it. Additionally, we examined the diverse hyperpartisan sources and we provide an overview of the application domains. Finally, we discuss the different strategies for labeling an entity as hyperpartisan.
The problematics of the definition
Definition of hyperpartisanship.
The term Hyperpartisanship (https://claremontreviewofbooks.com/hyperpartisanship/) is not certified in any dictionary. A widely accepted definition considers hyperpartisan news as having an extreme bias toward a particular political ideology or party [18]. This type of news reporting often presents information in a highly sensationalized and one-sided manner, prioritizing ideological loyalty over objective reporting and critical analysis. This behavior denotes an extreme political allegiance to a party, leading to intense disagreement with the opposing faction [18].
Vagueness of the definition and overlap with similar tasks.
The minimalist definition of hyperpartisanship is widely adopted by computer scientists, who tend to simplify social phenomena models when applying automated detection [26]. Hyperpartisanship coexists within the broader category of junk news and shares characteristics with tasks such as political, ideological, and fake news detection [34]. Due to the vagueness of the definition, hyperpartisan headlines are often difficult to cluster within the misinformation set, and there is a lack of consensus on what precisely constitutes hyperpartisanship [35]. The perception of news as hyperpartisan can depend on the reader’s epistemic bubble [36]. Additionally, both left and right extremisms do not show significant stylistic differences, making hyperpartisanship a subject-shifting concept [33]. While humans can assess the degree of hyperpartisanship in a given text due to their cultural and linguistic awareness, machines lack this capability.
Hyperpartisan news detection often overlaps or is confused with other disinformation tasks, such as fake news detection [19,37–40,94], and stance detection [41]. Specifically, hyperpartisanship might be conveyed through elements of fake news, aimed at propagating a specific agenda and manipulating readers to adopt a particular position on a given topic [40].
Traits of hyperpartisan news.
From a linguistic perspective, hyperpartisan articles exhibit a high count of adjectives and adverbs [42,43], extensive use of pronouns, and words of disgust [44]. These articles tend to feature longer paragraphs written in a sensationalist style, full of emotional language and rare terms [45]. Right-wing media, in particular, often employ hyperpartisan headlines, corroborating earlier findings [46,47]. Hyperpartisan news articles display hyper-polarized linguistic traits in their titles as well. However, hyperpartisanship opposes to balanced news, which are intended to report facts with balanced tone and informative intention.
Analogue biases.
Hyperpartisan news detection is a task in which certain textual features indicated above suggest that the writer is expressing an extremist, one-sided opinion. Moreover, various degrees with which typologies of bias occur contribute to make the text hyperpartisan. There are several taxonomies proposals for junk news like [28,34]. We will use the bias categories collected by Oxford (https://catalogofbias.org/biases/spin-bias/) and [29] to discuss the founding biases of hyperpartisan articles.
Spin bias, or rhetoric bias [29], strictly concerns the linguistic structure of the article, its persuasion. The deliberate or inadvertent misrepresentation of research outcomes, leading to unjustified indications of positive or negative results, potentially could result in misleading conclusions. Written language is the product of the conscious application of strategic discursive and persuasive patterns to interest the readers. The words contribute to giving a particular meaning to the entire text, especially if they leverage an emotional lexicon with superlatives.
Ad hominem bias is a rhetorical strategy in which one moves away from the topic of the controversy by contesting not the statement of the interlocutor, but the interlocutor themselves and his personal characteristics or traits [48]. This rhetorical strategy was frequently used in sophistry and is still widely used today in political discussions and journalistic controversies.
Presence bias or opinion statement involves the inclusion of subjective opinions within news articles, influencing readers’ perceptions. It occurs when factual reporting is mingled with subjective viewpoints or opinions [49]. In other words, it reflects the degree of agreement and statement sharing of an entity, i.e. users or publishers [50].
Ideological bias occurs when news reporting or content is influenced by a particular ideological stance or viewpoint, impacting the presentation and selection of news topics. Ideological detection is different from political bias because some ideologies can be shared even by opposite parties. Ideologies often contrast each other, but to be classified they need this comparison [51].
Framing bias involves presenting information to shape or influence people’s perceptions of an issue or event by emphasizing certain aspects while downplaying others [52,120]. In this case, using linguistics and rhetorical figures helps the author partially present the selected information. Therefore, framing expresses a publishers leaning towards an ideology. Frames are tools that emphasize specific information while potentially favoring one aspect over another, with or without being slanted [53]. It is performed in moral content and style used [21].
Coverage bias, is not present in Table 2 since it is not a textual bias. It refers to the disproportionate attention or neglect of topics or events in news reporting, leading to an imbalance in coverage across different subjects [54].
Political bias could be easily confused with ideological bias. Since a party is a combination of both an ideology and a political leaning, this bias is related to the inclination of news media or information sources or people to favor one political party’s agenda [55].
In this context, it is essential to avoid conflating the reification of the social phenomenon involving linguistic indicators with the entirety of the specified biases. Namely, not all categories of biases mentioned can be classified as hyperpartisan when they manifest. The linguistic element of exaggeration per se does not automatically denote hyperpartisanship; rather, it necessitates contextual positioning, such as aligning with a particular party or ideology. Simply adopting a stance is insufficient for categorization as hyperpartisan; it is the degree of exaggeration in that stance that holds significance. We propose some examples to illustrate this in Table 2.
Proposal for a definition.
By reviewing the various definitions collected in Table 3, several key observations emerge:
- the concept of hyperpartisanship is an intersected field and shares features with the typologies of media biases discussed in section Analogue bias. In the following list the indexes define the intrinsic characteristics in the hyperpartisan definitions in 3, “Characteristic" column:
- spin bias;
- ad hominem bias;
- opinion statement bias;
- ideology bias;
- framing bias;
- coverage bias;
- political bias;
- it is commonly acknowledged that hyperpartisan news exhibits one-sided political bias, incorporating specific statements aligning with the ideology of a particular political party and/or agenda;
- the lack of a commonly shared definition across various studies results in the characteristics of detection being variable and mutually exclusive undermining the integrity and scientific rigor of research in this field;
- while approaching this classification task, some researches like [56] lack a methodological approach because do not introduce a definition of the phenomenon.
In light of these considerations, hyperpartisan detection must necessarily consider different variables simultaneously: positioning, presence of a bias and its degree of exaggeration. Does the current state of the art in detection methodologies do this? As mentioned earlier, a detection method that simultaneously considers the different types of biases and these three variables has not been conducted. Various research works tend to focus individually on specific subsets of linguistic and content-based features, as outlined in the following sections.
Considering these elements, we propose the following definition to aid future research in addressing hyperpartisan news in Computer Science: Hyperpartisan news detection is the process of identifying news articles that exhibit extreme one-sidedness, characterized by a pronounced use of bias. The prefix "hyper-" highlights the exaggerated application of at least one specific type of bias—such as spin, ad hominem attacks, opinionated statements, ideological slants, framing, selective coverage, political leaning, or slant bias—to promote a particular ideological perspective. This strong ideological alignment is conveyed through amplified linguistic elements that reinforce one of these bias types within the text.
Where can hyperpartisanship be detected? Perspectives on the sources
In this section, we will give a general overview of the main sources typologies considered to detect hyperpartisan news articles.
In light of the prevalence of hyperpartisan news dissemination online, the methodologies discovered are implemented specifically on online news outlets. Initially, when considering the domain of publishers, a linguistic approach can be applied to news analysis to detect hyperpartisanship. This approach involves studying textual information within articles using style-based or topic-based models [33,46,62,128]. Detection methods consider specific sections, such as the title [46,47], sentences [63], quotes in the body [42], or encompass both [46,58,64,65,94]. Otherwise, researchers investigated hyperpartisanship spread starting from entities involved in the writing and publishing process, like journalist’s [66] or media [18] leaning . Considering publishers as entities often interconnected through economic and political bonds [67], they form a polarized network, which can be analyzed using metadata like external links [68–70,138]. While determining bias based on the source is feasible [66,71], an article from a biased media outlet may not always be hyperpartisan [49,72]. This issue was underscored by [72], which highlighted the inadequacy of the information source in determining an article’s hyperpartisanship. This method generates a system capable of indicating bias scores in news and suggesting similar topics from different sources to encourage readership of diverse perspectives or to avoid extremely biased news.
Working with textual data enables the extraction of sentiment features [73]. For instance, [74] observed that sentiment analysis, applied to titles and sentences using TextBlob (https://textblob.readthedocs.io/en/dev/), improved evaluation metrics. Additionally, [75] noted that hyperpartisan articles tend to convey more aggressive and negative sentiments compared to other articles. Using VADER (https://github.com/cjhutto/vaderSentiment), [76] conducted experiments to analyze the contribution of sentiment features in indicating the author’s bias. Meanwhile, [77] approached hyperpartisan news detection by considering sentiment as a means to capture the polarity of articles.
Moreover, [78] employed both textual and image features to detect hyperpartisanship. Their study revealed that automated methods outperformed humans and that incorporating additional information such as images and titles enhanced the accuracy of the model.
How hyperpartisanship is labeled?
Understanding the measurement of hyperpartisanship involves considering the diverse scales utilized. In Social Sciences, a range of indexes and scales is employed for this purpose, leveraging distinct features from those used in automatic detection methodologies. For instance, polarization is calculated with the CSES Polarization Index. The Common CSES Polarization Index (PI) is a tool used to assess the distribution of political parties across the Left/Right ideological spectrum. These metrics gauge ideological positioning and account for party sizes or vote shares, offering a comprehensive view of ideological stance and political influence [3]. Differently, automatic hyperpartisan detection relies on linguistic features. Some studies employ binary classification methods, utilizing labels such as hyperpartisan/mainstream (i.e. non-hyperpartisan) [18], Left/Right [66,118]. However, such distinctions often overlook nuanced differences within diverse political leanings [33]. Few studies have extended their scope to include a more fine-grained polarization range [79]. For example, [80] approached hyperpartisan detection as a multi-class classification problem, employing both 7- and 5-point scales to define affiliations: 1-2.5 – far-left, 2.5-3.5 – center-left, 3.5-4.5 – center, 4.5-5.5 – center-right, 5.5-7 – far-right. Similarly, [73] used a scale and [81] sought to manage granularity by distinguishing between right, center, and left affiliations.
Approaches for automatic hyperpartisan news detection
The detection of hyperpartisan content encompasses a range of methodologies, varying from traditional non-deep learning approaches to cutting-edge deep learning techniques, as well as mixed learning algorithms. Non-deep learning methodologies often rely on traditional machine learning algorithms, leveraging handcrafted features and rule-based systems to identify linguistic patterns, stylistic markers, and network structures within textual and metadata sources. These approaches commonly include stylometric analysis and topic modeling methods to discern biased content. In contrast, deep learning methodologies harness the power of neural networks to automatically extract intricate features from raw data, enabling the identification of complex patterns and relationships in unstructured text or network data. These techniques, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and Transformers, excel in learning representations directly from the data.
Models discussion
In the following subsections, to encompass the risk of bias, we grouped and discussed the mentioned studies by model architecture. We will differentiate between Non-deep learning, Deep learning, and other methodologies adopted in the papers selected for the systematic review. The Deep Learning section includes Non-transformer’s Deep Learning models and the Transformers family. In the following tables: 4, 5, 6 and Table 7, we categorize the best model for each paper, reporting its performance. When researchers compared more models on the same task using different datasets, like the case of [118], we report only the best model’s performance.
Non-deep learning methods.
In Table 4, we categorized papers using the traditional machine learning approaches. The methodologies involved algorithms like Support Vector Machines, Random Forest, and Logistic Regression, followed by Linear regression, Naive Bayes, Linear SVC, KNN, XGBoost, and Maxent.
There are effective strategies adopted with the SVM model. For instance, [70] used this algorithm combined with the sentiment analysis via National Council Canada’s Emotion Lexicon (NRC Emotion Lexicon) to analyze the emotional content in the article. Moreover, they extended the linguistic approach applyinh Linguistic Inquiry and Word Count (LIWC). They also considered the articles’ structure and meta-data as features. [94], adopted n-grams, i.e. bi- and tri-grams, and dependency sub-trees that impacted the performance. On the other hand, [83] experimented several embeddings: Doc2Vec [95], Glove [96], ELMo [97]. They found out that “adding simple lexical and sentiment features hurts the performance". [43] studied the linguistics divergencies between fake and hyperpartisan news employing an SVM. In this case, it emerged that hyperpartisan articles exhibit more sentences and a higher adjective count compared to unbiased news. When comparing the characteristics of extreme polarized articles against fake news, they noted that the former contains high usage of question/exclamation marks and adjectives. These sentence-related features delineate distinct linguistic patterns. [77] confirmed the robust potentialities of the Logistic Regression, ranking in the second place at the SemEval-2019. [77] built representations with Universal Sentence Encoder (USE) [98] and combined both semantic and handcrafted features, paying attention to the grade of the adjectives and subjectivity and distinguishing between two levels of polarity: sentence and article level. [37] used the Reuter Dataset for the training and the test, combining the ELMo embedding with a logistic regression classifier as already done by [72] and [77], confirming the effectiveness of this method. [77] discovered that the most relevant features concern bias lexicon and polarity. [93] placed third at the SemEval-2019 and found that article length was a distinctive trait of biased articles. By working at the phrase level, they created a set of phrases to discern the different types of articles, paying attention to removing n-grams containing publishers’ style biases. [46] focused on news titles with a topic-based approach. They also built a dataset considering two distinct typologies of news titles, augmenting the granularity of the detection. The first category pertains to descriptions of confrontations or conflicts between opposing parties, suggesting a deeply polarized political climate. The second set involves opinions that express a biased, inflammatory, and aggressive stance against a policy, a political party, or a politician. [73] thought there was an interdependence between factuality and political ideology bias, so that introduced a multi-task learning setup with the Copula Ordinal Regression (COR) [99]. They used the entire news outlet and considered diverse scales for measuring factuality (3-point scale) and political bias (7-point scale). [40] with Maximum Entropy Modeling (MaxEnt) by-passed linguistics features to build a model capable of generalizing as much as possible, [40] devised a document classification system that combines clustering features with simple local features. They showcased the effectiveness of employing distributional features from large in-domain unlabeled data. [85] approached the task using n-gram embeddings with article and title polarity, implementing the XGBoost model with all of these scalar features, but it performed poorly. They derived their methodology of applying stylometric analysis from [33]. This approach utilized n-grams, readability scores and Part-of-Speech (PoS) followed by binary classification. Thanks to unmasking information, they simultaneously compared documents with opposite political leaning. In doing so, [33] investigated the style variations depending on the political orientation and confronted it with a topic-based bag-of-words models. This methodology highlighted the limited usefulness of integrating corpus characteristics when performing a granular distinction amongst left, right and mainstream styles. Indeed, both the political extremes show similarities and can produce confounding effects in the model. Hence, concerning the style analysis for the hyperpartisan detection, the categories should be limited to mainstream and hyperpartisan without considering the specific leaning.
Furthermore, for a complete understanding of the approaches used in the literature, we summarized them in the Table 5. In this case, although ELMo, BERT, and Word2Vec embeddings were used as features of Non-Deep Learning algorithms. Table 5 describes only the features used gwith the best models proposed in Non-Deep Learning approaches in Table 4. We distinguish between features (Morpho-syntactic, Lexicon, Semantic, Sentiment and Metadata) and approaches (style-based and topic-based).
Deep learning methods.
In the following paragraphs, we analyzed the Deep Learning methods adopted by diverse authors to solve the hyperpartisan detection task. In Table 6, we categorized papers using the traditional machine learning approaches. Lastly, at the end of the section, a comprehensive Table 7 collects and illustrates the results rounded to two decimals reported by all the authors studied in our systematic review.
Deep learning: Non transformer-based architectures.
[42] employed a fusion of CNN and LSTM, utilizing quantitative linguistic features extracted through GloVe. In this way, they highlighted the crucial role of incorporating linguistic features alongside representations based on word vectors. Additionally, they built a meta-classifier to filter noisy data to apply to the by-publisher dataset. [72] won the SemEval-2019 Task 4 by combining rich morphological and contextual representations by averaging the three vectors per word into ELMo embeddings. Their model was used for further studies by [105] for pseudo-labeling frameworks: Overlap-checking and Meta-learning. Overlap-checking consists of adding data, helping the model train, while Meta-learning allows the model to be continually trained on a clean dataset and a pseudo dataset. This last work inspired [107]. In their article, they used a HAN combined with ELMo embeddings. The HAN is a model capable of balancing the information in a current state, deciding whether to update it and how much the past information contributes to its new state. In this case, the information stems from sentence level, confirming that richer article representations yield better performances. By encapsulating the articles’ structure, connectors and paying attention to stylistic markers, handcrafted stylistic features and emotion lexicons, they reached the state-of-the-art in 2020 on the SemEval-2019 Task 4 dataset. [109] improved the HAN standard model by introducing Knowledge Encoding (KE) components. The HAN segment functions to grasp word and sentence relationships within a news article, employing a structured hierarchy across three levels—word, sentence, and title. Meanwhile, the KE component integrates common and political knowledge associated with real-world entities into the prediction process for determining the political stance of the news article. Since the model is not language-based, it could work with diverse languages beyond English. [112] developed a pre-training framework encoding knowledge about entity mentions, namely masked tokens as frame indicators, and modeling the propagation between users with a social information graph. They noted that models pre-trained on general sources and tasks have limited ability to focus on biased text segments. [113] introduced a voting system of LSTMs to build a controlled dataset to train another LSTM. It was an example to demonstrate the importance of having a balanced and clean dataset to run experiments. Lastly, [120] built a Hierarchical-LSTM applied to subframes (n-grams) to tackle the framing bias. In this paper, they introduced a pioneering framework aimed at pretraining text models utilizing signals derived from the abundant social and linguistic context available, encompassing elements such as entity mentions, news dissemination, and frame indicators.
Deep learning: Transformer-based architectures.
Regarding the Transformers architectures, we observed a massive utilization of BERTbase and BERTlarge. BERTbase is a pre-trained BERT model trained on a smaller dataset than BERTlarge. BERTbase differentiates itself in cased and uncased, depending on whether to discern between cases and uncased words. [121] wanted to remove the bias when modeling the medium. They observed that combining bias mitigation with triplet loss, Twitter bios and media-level representations increased the model efficacy. [118] proposed a multi-task BERT-based model with contrastive learning to tackle framing bias in news articles. [122] with BERT and combinations of syntactic bigram counts and psycholinguistic features investigated the inference of political information and hyperpartisanship on author and text level starting from linguistic data. [123] showed that fine-tuning the model entails better results. [124] introduced a semi-supervised framework trained using federating learning, namely algorithms are trained independently across diverse datasets. Furthermore, textual data are tagged to extrapolate wh-questions replies and temporal lexicon information. The same author replicated this approach in [125]. In the quest for precise detection and data denoising, the same author replicated this approach with variations in [125]. [125] employed an attention-based strategy to learn text representation, aiming to identify target expressions accurately while extracting pertinent contextual information. They generated a BERT attention embedding query utilizing lexicon expansion, content segmentation and temporal event analysis. Ultimately, this approach enhances the understanding of consecutive news articles within a temporal framework. [126] experimented using BERTbase and BERTlarge feeding them with embeddings of different lengths. They were interested in analyzing the parts of the articles, looking for a consistent level of hyperpartisanship that demonstrated to exist. [60] from the confrontation between BERT and ELMo models, confirmed that the inputs and embeddings dimensions contributed to affecting positively the performance. [127] performed domain adaptation, showing its efficacy. [116] operated in a low-resource scenario with prompt-based learning and employed masked political phrase prediction and a frozen pre-trained language model that relies on transformer architecture, utilizing the robustly optimized BERT approach known as RoBERTa as a backbone for their own model, MP-tuning. [117] focuses on political ideology and stance detection, comparing triplets of documents on the same history to detect dissimilarities amongst them. They trained RoBERTa through continual learning. Whereas, [128] improved their model’s performance with cross-domain contrastive learning and this work is noticeable that they used GPT-2 for augmenting hyperpartisan textual data. Lastly, [129] faced the task for Persian hyperpartisan tweets by prompting GPT-3.5, a multi-language conversational generative LLM released in 2022, and open-weights model like Llama2 [130]. [129] compared the capabilities of Large Language Models (LLMs) and BERT-based models like RoBERTa and ParseBERT to detect English and Persian tweets, providing instructions with different levels of specifity to the models. Despite the huge dimension and the extensive training of LLMs, fine-tuning ParseBERT and RoBERTa has proven to be more efficient and practical for certain tasks.
Other methods.
Within the vast landscape of computational frameworks, certain algorithms defy classification within the traditional realms of deep learning or non-deep learning. This chapter delves into the exploration of these unique frameworks—sophisticated combinations of diverse models, labeling techniques and graph approaches—that operate beyond the conventional boundaries of established categorizations.
[49] applies a framework for presentation bias, studying hyperpartisanship with a graph-based method. This three-step framework is so structured: collecting related-articles clusters on the same topic; applying Aspect-based Sentiment Analysis (ABSA) with BERTbase to rate and classify fine-grained opinions in the pairs of sentences; the variation in bias between news sources within similar categories is figured out by contrasting the scores of matching pairs of articles. This comparison is done for every combination of news sources within these categories, and the differences in bias are averaged across all article groups. This averaging process leads to the development of a bias matrix. [69] proposed a Multi-View Document Attention Model (MVDAM) capable of modeling at the same time title, structure and metadata like links in order to estimate the political ideology of a news article. This framework based on the Bayesian approach utilizes different models for creating the 3-D representation: a convolutional neural network for learning the title, Node2Vec for the network and HAN for the content. [131] worked mostly on manual features like metatopic, namely polarizing topics, using an end-to-end tool: The Gavagai Explorer, which performed poorly.
[33] performed a political orientation prediction and hyperpartisan classification task using an unmasking technique with binary classifiers. For the first task, they found that left-wing news tends to be easily misclassified. This study noticed that individual political orientation is struggling to predict and that a style-based approach overcomes the content-based one. Moreover, they discovered subtle differences in style between hyperpartisan news belonging to different political leanings. [62] using masking and transformer-based models proved that topic-based approaches lead to better results than style-based. Instead, [132] made a comparative examination of BERT-based models and masking-based models, enriching comprehension regarding the strengths and constraints of varied approaches in bias detection, offering crucial insights for upcoming research and advancements in this domain. In essence, these models’ contribution lies in their capacity to augment the precision, clarity, and comprehensibility of bias detection within political and social discussions. Consequently, they propel advancements in this pivotal research domain. Furthermore, [133] investigates using large language models for automated stance detection in a lower-resource language, focusing on immigration. It annotates pro- and anti-immigration examples to compare performance across models. The study finds that GPT-3.5 matches supervised models’ accuracy, offering a simpler alternative for hyperpartisan detection in media monitoring. Lastly, for the sake of exhaustiveness, we will briefly cover other methods not focusing on news textual features. For this reason, the following discussed papers are not included in our final selection. However, in this way, the reader can understand the complexities of approaches to tackle hyperpartisanship. [134] maps linguistic divergence across the U.S. political spectrum using 1.5M social media posts (20M words) from 10k Twitter users. By analyzing followers of 72 news accounts, it identifies variations in topics, sentiment, and lexical semantics. Methods combine data mining, lexicostatistics, machine learning, large language models, and human annotation. [135] analyzes language differences on Twitter among 5,373 Democratic and 5,386 Republican followers to explore psychological traits tied to political leanings. Using naturalistic data, it confirms hypotheses: liberals’ language shows uniqueness, swearing, anxiety, and emotions, while conservatives’ language reflects group identity, achievement, and religion, supporting prior research. To conclude, [136] introduced FAULTANA (FAULT-line Alignment Network Analysis), a computational method to identify societal fault lines and polarization drivers in online interactions. Using data from Birdwatch (Twitter) and DerStandard forums, it reveals two polarized groups aligned with political identities. FAULTANA tracks polarization over time, highlighting divisive issues and their impact. We present the best performances retrieved in the selected papers in Table 7.
Datasets
In the previous section, we provided an overview of methodologies employed in addressing hyperpartisan detection. Effective models depend on top-notch data quality to function optimally. However, constructing a high-quality, well-balanced dataset can be both time-consuming and resource-intensive. This challenge is compounded by shifts in data policies across social networks since the Cambridge Analytica scandal, leading to potential difficulties or cost changes in obtaining data. Additionally, a trend has emerged within news sources where access to data is restricted due to its previous utilization in training models like GPT (https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/). Consequently, news sources have implemented paywalls and crawler restrictions (https://ilmanifesto.it/termini-e-condizioni), making it exceedingly challenging to gather suitable information for this and similar tasks.
Datasets presentation
To support upcoming studies on identifying hyperpartisan news and related tasks, we have created an extensive table: Table 8, which outlines key attributes of datasets relevant to hyperpartisan news detection. This table includes datasets referenced in different papers. Some are not primarily used for hyperpartisan detection but could be. It is important to note that when subsets or extended versions of earlier datasets exist, we consider them separate entities denoted by *. Additionally, datasets marked with ** signify merged collections. The column labeled Data indicates the number of articles gathered by the researchers.
To provide comprehensive insights into the table, we will give a philological explanation of the datasets marked with the symbols * and **. Framing Triplet Dataset is a combination of the following datasets: SemEval-2019 task 4 along with [120]’s data. Furthermore, [120] expands the SemEval-2019 task 4 dataset by incorporating articles collected from polarized sources and then labeled through mediabiasfactcheck.com. Regarding BIGNEWS, collected by [117], it has two subsets, respectively: BIGNEWSBLN is a downsampled corpus maintaining an equal distribution of ideologies, and BIGNEWSALIGN, which clusters news stories from opposing sources but on the same topic. In their research, [49] utilized a subset of All-the-news (https://www.kaggle.com/datasets/snapcrack/all-the-news). Furthermore, [33] worked with a subset of articles crawled from the URLs contained in The BuzzFeed-Webis Fake News Corpus collected by [140]. By cleaning [33]’s dataset, [62] obtained a new dataset. The same researchers created StereoImmigrants, a collection of Spanish news about immigrants, for [132].
Labeling and retrieving processes relied upon platforms like Allside (https://allsides.com/), Factcheck (https://mediabiasfactcheck.com/), Politifact (https://www.politifact.com/), as ground truth for establishing the bias of an article and as source where to collect data. Indeed, in these contexts, experts assign news to the political orientation.AllSides.com is a media company that specializes in providing balanced news coverage by collecting and comparing news stories from various sources with different political leanings. The platform categorizes news articles based on their political bias—whether left, center, or right—and scores them according to the level of partisanship they contain.
Since [121] noted that training models with big datasets reduce the performance due to their noise, researchers started to prefer the quality rather than the dimension. Indeed, [46], after a deeper analysis of the SemEval 2019 dataset, revealed several issues with this ground truth dataset widely used: class imbalance, task-label unalignment, and distribution shift.
As we can see from Fig 4, there is an imbalanced distribution towards English data, leaving the context of minority languages understudied. Datasets are available at the respective links: [81] https://gitlab.com/checkthat_lab/clef2023-checkthat-lab/-/tree/main/task3?ref_type=heads, [109] https://github.com/yy-ko/khan-www23., [118] https://github.com/MSU-NLP-CSS/CLoSE_framing, [80] https://github.com/axenov/politik-news, [132] https://github.com/jjsjunquera/StereoImmigrants, [120] https://github.com/ShamikRoy/Subframe-Prediction, [141] https://urlis.net/zon9n8wr, [58] https://drive.google.com/drive/folders/1IyaKYeDkl7ubuabTI65G0nSBfxQNdeTr [142] https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ULHLCB, [18] https://zenodo.org/records/1489920 [143] www.ccs.neu.edu/home/luwang/data.html, [33] https://github.com/BuzzFeedNews/2016-10-facebook-fact-check, Reuter http://about.reuters.com/ researchandstandards/corpus/, [144] https://github.com/RWalecki/copula_ordinal_regression, [145] https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ZCXSKG, BuzzFeed https://github.com/BuzzFeedNews/2016-10-facebook-fact-check/blob/master/data/facebook-fact-check.csv.
Potential limitations
The studies collected hold significant value, yet several inherent limitations in the datasets could influence the comprehensiveness and applicability of future findings. Firstly, the absence of a distinct dataset designed to differentiate between hyperpartisan and partisan news poses a fundamental challenge, potentially impacting the classification accuracy. Secondly, 2 shows that the predominant focus is on English news articles within the dataset. This fact raises concerns about minority languages and their respective democratic contexts, possibly skewing the representation and applicability of Anglo-American papers’ conclusions to their different socio-cultural environment. This discrepancy might lead to situations where certain democracies lack the necessary tools and datasets in their native language, hindering their ability to develop similarly effective analytical tools as over-represented democracies. Additionally, the phenomenon of hyperpartisanship varies significantly between countries due to the variety of party systems and different cultural backgrounds [6]. Consequently, the development of models trained on linguistically non-representative data may compromise their ability to efficiently detect hyperpartisanship in under-represented democracies, thereby impacting their success rates. Furthermore, issues pertaining to dataset maintenance, such as broken URLs, may impede replicability and accessibility for future research endeavors [33]. Furthermore, temporal lexicon constraints might hinder capturing shifts in textual patterns, tones, and context, affecting the accuracy of temporal analysis [46]. We highlight that cross-lingual comparison of hyperpartisan traits has never been studied from a computational approach. Thus, it is not possible to define if the online environment flattens cultural-linguistic traits pertinent to hyperpartisanship independently from the country and its political system. Another consideration regards the limited availability of data over time due to paywalls and copyright restrictions poses a significant barrier, potentially restricting the depth and breadth of future analysis within certain timeframes. Lastly, despite the popularity and the good results that researchers achieved, as far as we know, autoregressive models were not used.
Conclusions and future works
In synthesizing insights from 81 studies, our systematic review illuminates the value of existing research in understanding hyperpartisan news. We summarized all the papers included in the systematic review in Table 9. With the support of this table, we are going to reply to the initial research questions.
RQ1: Does a categorization for hyperpartisan news detection methods exist? Currently, there is no widely adopted comprehensive categorization system in the literature. The field still lacks standardized mathematical models for quantifying textual exaggerations that define hyperpartisan content. One key contribution of this systematic review is that it represents the first attempt to systematize news-based approaches while also enhancing the traditional PRISMA methodology by integrating ResearchRabbit during the "Identification of studies via other methods" phase. ResearchRabbit facilitated a systematic, data-driven expansion of our literature pool by visualizing clusters based on citation linkages. This clustering approach provided a structured method for identifying and selecting relevant studies by uncovering both direct citation relationships and keyword-based topic similarities. As a result, the tool contributed to a more comprehensive and cohesive expansion of the selected literature base. Furthermore, we proposed a specific definition of the studied phenomenon that can be applied in Computer Social Science and Computer Science.
RQ2: Is hyperpartisan news detection a stand-alone or overlapping task? The complexity of hyperpartisan news detection hints at an overlapping task encompassing various forms of media bias, suggesting a shift towards multi-label detection for nuanced representations. Research shows that models with fine-grained label sets outperform binary classifications, yet the majority of studies use simplified, binary categories.
RQ3: What are the proposed solutions using textual data? Research commonly applies text-based methods, such as Natural Language Processing (NLP) techniques, to detect hyperpartisan content by identifying linguistic patterns of exaggeration and emotional tone. In terms of labels, fine-grained labels show improved model accuracy in detecting diverse biases, but in this case the annotation required is costly.
RQ4: Does the task keep up with new NLP technologies like autoregressive models? To date, the adoption of advanced autoregressive models in hyperpartisan news detection is limited, revealing a critical gap. This gap underscores a need to explore these models, which could improve detection accuracy with state-of-the-art language understanding.
RQ5: What are the results of the developed models developed? Since the release of BERT, this model architecture—and particularly its variants, such as RoBERTa—has achieved state-of-the-art performance in a wide range of classification tasks.
RQ6: What datasets are used for this task? How are they structured? Have they been updated to cover the latest political global and regional trends? Datasets predominantly comprise English-language news articles, which risks skewing results when applying models to non-English contexts. Limited representation of minority languages restricts model generalization and hampers analysis of unique democratic and socio-political dynamics. In addition, dataset maintenance issues (e.g., broken URLs) hinder replicability, and paywalls or copyright constraints restrict access to time-sensitive data, impacting longitudinal research.
RQ7: How can the current state of research on hyperpartisan detection be characterized in diverse languages and countries? The absence of linguistically diverse datasets is a significant limitation, especially in minority and underrepresented cultures. This restricts the field’s capacity to develop effective hyperpartisan detection models for varied linguistic environments. Current datasets’ Anglo-American focus may limit models’ efficacy when applied to global democracies with different political and cultural contexts, exacerbating bias and misinformation issues in these areas. Moreover, the lack of cross-lingual studies leaves the impact of online environments on cultural-linguistic variations in hyperpartisan traits unexplored.
In conclusion, while existing research provides insights into hyperpartisan news, limitations in dataset diversity, language inclusion, and methodology highlight the need for more robust, globally representative resources. Future research could benefit from exploring autoregressive models and expanding cross-lingual analysis for a broader understanding of hyperpartisanship in diverse political systems and cultural contexts.
References
- 1. Falkenbach M, Bekker M, Greer SL. Do parties make a difference? A review of partisan effects on health and the welfare state. Eur J Public Health 2020;30(4):673–82. pmid:31334750
- 2. Ellger F. The mobilizing effect of party system polarization. Evidence from Europe. Comparat Politic Stud 2023;57(8):1310–38.
- 3. Dalton RJ. Modeling ideological polarization in democratic party systems. Elector Stud. 2021;72102346.
- 4. Lorenz-Spreen P, Oswald L, Lewandowsky S, Hertwig R. A systematic review of worldwide causal and correlational evidence on digital media and democracy. Nat Hum Behav 2023;7(1):74–101. pmid:36344657
- 5. Guess AM, Barberó P, Munzert S, Yang J. The consequences of online partisan media. Proc Natl Acad Sci U S A 2021;118(14):e2013464118. pmid:33782116
- 6. Dalton RJ. Party identification and nonpartisanship. Int Encyclop Soc Behav Sci. 2015;6–6.
- 7. McCoy J, Somer M. Toward a theory of pernicious polarization and how it harms democracies: comparative evidence and possible remedies. Ann Am Acad Politic Soc Sci 2018;681(1):234–71.
- 8. Holt K, Ustad Figenschou T, Frischlich L. Key dimensions of alternative news media. Digit Journalism 2019;7(7):860–9.
- 9. Tucker J, Guess A, Barbera P, Vaccari C, Siegel A, Sanovich S, et al. Social media, political polarization, and political disinformation: a review of the scientific literature. SSRN Electron J. 2018.
- 10. Anthonio T. Robust document representations for hyperpartisan and fake news detection. 2019.
- 11. Bartels LM. Partisanship in the trump era. J Politics. 2018;80:1483–94.
- 12.
Hawdon J, Ranganathan S, Leman S, Bookhultz S, Mitra T. Social media use, political polarization, and social capital: is social media tearing the U.S. apart? In: Meiselwitz G editor. Social computing and social media. Design, ethics, user behavior, and social network analysis. Springer; 2020. p. 243–260. https://doi.org/10.1007/978-3-030-49570-1_17
- 13. Bawden D, Robinson L. Curating the infosphere: Luciano floridi’s philosophy of information as the foundation for library and information science. J Documentation 2018;74(1):2–17.
- 14. Nannini L, Bonel E, Bassi D, Maggini MJ. Beyond phase-in: assessing impacts on disinformation of the EU Digital Services Act. AI Ethics. 2024.
- 15. Commission E. A multi-dimensional approach to disinformation – Report of the independent High level Group on fake news and online disinformation. Publications Office. 2018.
- 16. European Parliament Council. Proposal for a regulation of the European parliament and of the council on a single market for digital services (digital services act) and amending directive 2000/31/EC. 2020.
- 17. Bondielli A, Marcelloni F. A survey on fake news and rumour detection techniques. Inf Sci. 2019;55–55.
- 18. Kiesel J, Mestre M, Shukla R, Vincent E, Adineh P, Corney D, et al. SemEval-2019 task 4: hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. 2019.
- 19. Sousa-Silva R. Fighting the fake: a forensic linguistic analysis to fake news detection. Int J Semiot Law 2022;35(6):2409–33. pmid:35505837
- 20. Dykstra A. Critical reading of online news commentary headlines: stylistic and pragmatic aspects. Topics Linguist 2019;20(2):90–105.
- 21. Xu WW, Sang Y, Kim C. What drives hyper-partisan news sharing: exploring the role of source, style, and content. Digital Journalism 2020;8(4):486–505.
- 22. Pescetelli N, Barkoczi D, Cebrian M. Bots influence opinion dynamics without direct human-bot interaction: the mediating role of recommender systems. Appl Netw Sci 2022;7(1):46.
- 23. Pitoura E, Tsaparas P, Flouris G, Fundulaki I, Papadakos P, Abiteboul S, et al. On measuring bias in online information. arXiv preprint 2017
- 24. Nakov P, Sencar HT, An J, Kwak H. A survey on predicting the factuality and the bias of news media. arXiv preprint 2021
- 25. Kondamudi MR, Sahoo SR, Chouhan L, Yadav N. A comprehensive survey of fake news in social networks: attributes, features, and detection approaches. J King Saud Univ – Comput Inf Sci 2023;35(6):101571.
- 26. Hamborg F, Donnay K, Gipp B. Automated identification of media bias in news articles: an interdisciplinary literature review. Int J Digit Libr 2018;20(4):391–415.
- 27.
Medeiros FDC, Braga RB. Fake news detection in social media: a systematic review. In: XVI Brazilian Symposium on Information Systems. 2020; p. 1–8. https://doi.org/10.1145/3411564.3411648
- 28. Kapantai E, Christopoulou A, Berberidis C, Peristeras V. A systematic literature review on disinformation: toward a unified taxonomical framework. New Media Soc 2020;23(5):1301–26.
- 29. Rodrigo-Ginós F-J, Carrillo-de-Albornoz J, Plaza L. A systematic review on media bias detection: what is media bias, how it is expressed, and how to detect it. Exp Syst Appl. 2024;237121641.
- 30. Kitchenham B, Charters S. Guidelines for performing systematic literature reviews in software engineering. Technical Report. 2007.
- 31.
Moher D, Altman DG, Liberati A, Tetzlaff J. PRISMA statement. Epidemiology. 2011;22(1):128; author reply 128. https://doi.org/10.1097/EDE.0b013e3181fe7825 pmid:21150360
- 32. Gusenbauer M, Haddaway NR. Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Res Synth Methods 2020;11(2):181–217. pmid:31614060
- 33. Potthast M, Kiesel J, Reinartz K, Bevendorff J, Stein B. A stylometric inquiry into hyperpartisan and fake news. In: Gurevych I, Miyao Y, editors. In: Gurevych I, Miyao Y, editors; 2018. p. 231–231.
- 34. Zannettou S, Sirivianos M, Blackburn J, Kourtellis N. The web of false information. J Data Inf Quality 2019;11(3):1–37.
- 35. Altay S, Berriche M, Heuer H, Farkas J, Rathje S. A survey of expert views on misinformation: definitions, determinants, solutions, and future of the field. Harvard Kennedy School Misinf Rev 2023;4:1–34.
- 36. Ross Arguedas A, Robertson C, Fletcher R, Nielsen R. Echo chambers, filter bubbles, and polarisation: a literature review. Tech. Reportuan. 2022.
- 37. Garg S, Sharma DK. Role of ELMo embedding in detecting fake news on social media. In: 2022 11th International Conference on System Modeling & Advancement in Research Trends (SMART). In: 2022 11th International Conference on System Modeling & Advancement in Research Trends (SMART); 2022. p. 57–57.
- 38. Ross RM, Rand DG, Pennycook G. Beyond “fake news”: analytic thinking and the detection of false and hyperpartisan news headlines. Judgm decis mak 2021;16(2):484–504.
- 39. Mouróo RR, Robertson CT. Fake news as discursive integration: an analysis of sites that publish false, misleading, hyperpartisan and sensational information. Journalism Stud 2019;20(14):2077–95.
- 40. Agerri R. Doris Martin at SemEval-2019 task 4: hyperpartisan news detection with generic semi-supervised features. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 944–944.
- 41. Bourgonje P, Moreno Schneider J, Rehm G. From clickbait to fake news detection: an approach based on detecting the stance of headlines to articles. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism; 2017. p. 84–84.
- 42. Pórez-Almendros C, Espinosa-Anke L, Schockaert S. Cardiff University at SemEval-2019 task 4: linguistic features for hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 929–929.
- 43. Dumitru VC, Rebedea T. Fake and hyper-partisan news identification. In: Moldoveanu A, Dix AJ, editors. 16th International Conference on Human-Computer Interaction, RoCHI 2019; 2019 Oct 17–18; Bucharest, Romania. Matrix Rom; 2019, p. 60–7.
- 44. Knauth J. Orwellian-times at SemEval-2019 Task 4: a stylistic and content-based classifier. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 976–976.
- 45. Sengupta S, Pedersen T. Duluth at SemEval-2019 Task 4: the pioquinto manterola hyperpartisan news detector. In: Proceedings of the 13th International Workshop On Semantic Evaluation. In: Proceedings of the 13th International Workshop On Semantic Evaluation; 2019. p. 949–949.
- 46. Lyu H, Pan J, Wang Z, Luo J. Computational assessment of hyperpartisanship in news titles. ICWSM. 2024;18:999–1012.
- 47.
Amason E, Palanker J, Shen MC, Medero J. Harvey Mudd College at SemEval-2019 Task 4: the D.X. Beaumont hyperpartisan news detector. Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics; 2019. p. 967–70. https://doi.org/10.18653/v1/s19-2166
- 48.
Walton D. Ad Hominem arguments. University Alabama Press; 1998.
- 49. Tran M. How biased are American media outlets? A framework for presentation bias regression. In: 2020 IEEE International Conference on Big Data (Big Data). In: 2020 IEEE International Conference on Big Data (Big Data); 2020. p. 4359–4359.
- 50. Anand B, Di Tella R, Galetovic A. Information or opinion? Media bias as product differentiation. Econ Manag Strategy 2007;16(3):635–82.
- 51. Sharma A, Kaur N, Sen A, Seth A. Ideology detection in the Indian mass media. In: 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). In: 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM); 2020. p. 627–627.
- 52. Baumer E, Elovic E, Qin Y, Polletta F, Gay G. Testing and comparing computational approaches for identifying the language of framing in political news. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; 2015. p. 1472–1472.
- 53. Kong H-K, Liu Z, Karahalios K. Frames and slants in titles of visualizations on controversial topics. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. In: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems; 2018. p. 1–1.
- 54. Leeson PT, Coyne CJ. Manipulating the media. SSRN. 2011.
- 55.
Honeycutt N, Jussim L. Political bias in the social sciences: a critical, theoretical, and empirical review. In: Ideological and political bias in psychology: nature, scope, and solutions. 2023. p. 97–146. doi: https://doi.org/10.1007/978-3-031-29148-7_5.
- 56.
Patankar A, Bose J, Khanna H. A bias aware news recommendation system. In: 2019 IEEE 13th International Conference on Semantic Computing (ICSC). 2019. p. 232–8. https://doi.org/10.1109/icosc.2019.8665610
- 57. Barnidge M, Peacock C. A third wave of selective exposure research? The challenges posed by hyperpartisan news on social media. MaC 2019;7(3):4–7.
- 58. Gangula RRR, Duggenpudi SR, Mamidi R. Detecting political bias in news articles using headline attention. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. In: Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP; 2019. p. 77–77.
- 59. Pierri F, Artoni A, Ceri S. HoaxItaly: a collection of Italian disinformation and fact-checking stories shared on Twitter in 2019. arXiv preprint 2020
- 60. Huang GKW, Lee JC. Hyperpartisan news and articles detection using BERT and ELMo. In: 2019 International Conference on Computer and Drone Applications (IConDA). In: 2019 International Conference on Computer and Drone Applications (IConDA); 2019. p. 29–29.
- 61. University Politehnica of Bucharest, Dumitru VC, Rebedea T. Topic-based models with fact checking for fake news identification. In: RoCHI - International Conference on Human-Computer Interaction, In: RoCHI - International Conference on Human-Computer Interaction; 2021. p. 182–182.
- 62. Sanchez-Junquera J, Rosso P, Montes M, Ponzetto S, 2021. Masking and transformer-based models for hyperpartisanship detection in news. 2021. p. 1244–1251. 10.26615/978-954-452-072-4_140
- 63. Jeong Lim S, Jatowt A, Yoshikawa M. Understanding characteristics of biased sentences in news articles. In: CIKM Workshops; 2018.
- 64. Naredla NR, Adedoyin FF. Detection of hyperpartisan news articles using natural language processing technique. Int J Inf Manag Data Insights 2022;2(1):100064.
- 65.
Papadopoulou O, Kordopatis-Zilos G, Zampoglou M, Papadopoulos S, Kompatsiaris Y. Brenda Starr at SemEval-2019 Task 4: hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. 2019; 924–8. https://doi.org/10.18653/v1/s19-2157
- 66. M. Alzhrani K. Political ideology detection of news articles using deep neural networks. Intell Automat Soft Comput 2022;33(1):483–500.
- 67. Hermann ES, Chomsky N. Manufacturing consent: the political economy of the mass media. Manuf Consent. 1994.
- 68. Hrckova A, Moro R, Srba I, Bielikova M. Quantitative and qualitative analysis of linking patterns of mainstream and partisan online news media in Central Europe. OIR 2021;46(5):954–73.
- 69. Kulkarni V, Ye J, Skiena S, Wang WY. Multi-view models for political ideology detection of news articles. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.
- 70. Alabdulkarim A, Alhindi T. Spider-Jerusalem at SemEval-2019 Task 4: hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 985–985.
- 71. Alzhrani K. Ideology detection of personalized political news coverage. In: Proceedings of the 2020 4th International Conference on Compute and Data Analysis. In: Proceedings of the 2020 4th International Conference on Compute and Data Analysis; 2020. p. 10–10.
- 72.
Jiang Y, Petrak J, Song X, Bontcheva K, Maynard D. Team Bertha von Suttner at SemEval-2019 Task 4: hyperpartisan news detection using ELMo sentence representation convolutional network. Proceedings of the 13th International Workshop on Semantic Evaluation. 2019. p. 840–4. https://doi.org/10.18653/v1/s19-2146
- 73. Baly R, Karadzhov G Saleh A, Glass J, Nakov P. Multi-task ordinal regression for jointly predicting the trustworthiness and the leading political ideology of news media. In: Burstein J, Doran C, Solorio T, editors. In: Burstein J, Doran C, Solorio T, editors; 2019. p. 2109–2109.
- 74. Chen C, Park C, Dwyer J, Medero J. Harvey Mudd College at SemEval-2019 Task 4: the Carl Kolchak hyperpartisan news detector. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 957–957.
- 75. Palić N, Vladika J, Čubelić D, Lovrenčić I, Buljan M, Šnajder J. TakeLab at SemEval-2019 Task 4: hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 995–995.
- 76. Anthonio T, Kloppenburg L. Team Kermit-the-frog at SemEval-2019 Task 4: bias detection through sentiment analysis and simple linguistic features. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1016–1016.
- 77.
Srivastava V, Gupta A, Prakash D, Sahoo SK, R.R R, Kim YH. Vernon-fenwick at SemEval-2019 Task 4: hyperpartisan news detection using lexical and semantic features. In: Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics; 2019. p. 1078–82. https://doi.org/10.18653/v1/s19-2189
- 78.
Spezzano F, Shrestha A, Fails JA, Stone BW. That’s fake news! reliability of news when provided title, image, source bias & full article. Proc ACM Hum-Comput Interact. 2021;5(CSCW1):1–19. https://doi.org/10.1145/3449183
- 79. Sridharan ASN. An automated news bias classifier using caenorhabditis elegans inspired recursive feedback network architecture. arXiv preprint 2022
- 80. Aksenov D, Bourgonje P, Zaczynska K, Ostendorff M, Moreno-Schneider J, Rehm G. Fine-grained classification of political bias in german news: a data set and initial experiments. In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021). In: Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021); 2021. p. 121–121.
- 81. Azizov D, Nakov P, Liang S, Frank at checkthat! 2023: Detecting the political bias of news articles and news media. In: Conference and Labs of the Evaluation Forum. 2023.
- 82. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Their Appl 1998;13(4):18–28.
- 83. Yeh C-L, Loni B, Schuth A. Tom Jumbo-Grumbo at SemEval-2019 Task 4: hyperpartisan news detection with GloVe vectors and SVM. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1067–1067.
- 84.
Chen T, Guestrin C. XGBoost. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. p. 785–94. https://doi.org/10.1145/2939672.2939785
- 85. Gupta V, Kaur Jolly BL, Kaur R, Chakraborty T. Clark Kent at SemEval-2019 Task 4: stylometric insights into hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 934–934.
- 86. Merow C, Smith MJ, Silander JA Jr. A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography 2013;36(10):1058–69.
- 87. Breiman L. Mach Learn. 2001;45(1):5–32
- 88.
Chakravartula N, Indurthi V, Syed B. Fermi at SemEval-2019 Task 4: the Sarah-Jane-Smith hyperpartisan news detector. In: Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics; 2019.
- 89. Cruz A, Rocha G, Sousa-Silva R, Lopes Cardoso H. Team Fernando-Pessa at SemEval-2019 Task 4: back to basics in hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 999–999.
- 90. Stevanoski B, Gievska S. Team Ned Leeds at SemEval-2019 Task 4: exploring language indicators of hyperpartisan reporting. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1026–1026.
- 91. Saleh A, Baly R, Barrón-Cedeóo A, Da San Martino G, Mohtarami M, Nakov P, et al. Team QCRI-MIT at SemEval-2019 Task 4: propaganda analysis meets hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1041–1041.
- 92.
Bestgen Y. Tintin at SemEval-2019 Task 4: Detecting Hyperpartisan News Article with only Simple Tokens. Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics; 2019. p. 1062–6. https://doi.org/10.18653/v1/s19-2186
- 93. Hanawa K, Sasaki S, Ouchi H, Suzuki J, Inui K. The Sally Smedley hyperpartisan news detector at SemEval-2019 Task 4. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1057–1057.
- 94. Nguyen D-V, Dang T, Nguyen N. NLP@UIT at SemEval-2019 Task 4: the paparazzo hyperpartisan news detector. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 971–971.
- 95. Le QV, Mikolov T. Distributed representations of sentences and documents. arXiv preprint 2014
- 96. Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP); 2014. p. 1532–1532.
- 97. Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep contextualized word representations. arXiv preprint 2018
- 98. Cer D, Yang Y, Kong S, Hua N, Limtiaco N, John RS, et al. Universal Sentence Encoder. arXiv preprint 2018
- 99.
Walecki R, Rudovic O, Pavlovic V, Pantic M. Copula ordinal regression for joint estimation of facial action unit intensity. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 4902–10. https://doi.org/10.1109/cvpr.2016.530
- 100. Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. 1986
- 101. Dorogush AV, Gulin A, Gusev G, Kazeev N, Prokhorenkova, LO, Vorobev A. Fighting biases with dynamic boosting. arXiv preprint 2017
- 102. Gerald Ki Wei H, Jun Choi L. Hyperpartisan news classification with ELMo and bias feature. J Inf Sci Eng. 2021;37.
- 103. Fórber M, Qurdina A, Ahmedi L. Team Peter Brinkmann at SemEval-2019 Task 4: detecting biased news articles using convolutional neural networks. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1032–1032.
- 104. Zehe A, Hettinger L, Ernst S, Hauptmann C, Hotho A. Team Xenophilius Lovegood at SemEval-2019 Task 4: hyperpartisanship classification using convolutional neural networks. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1047–1047.
- 105. Ruan Q, Mac Namee B, Dong R. Bias bubbles: using semi-supervised learning to measure how many biased news articles are around us. In: AICS. 2021. p. 153–64.
- 106.
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. In: Knight K, Nenkova A, Rambow O, editors. Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association for Computational Linguistics; 2016. p. 1480–9 https://doi.org/10.18653/v1/n16-1174
- 107. Cruz AF, Rocha G, Cardoso HL. On document representations for detection of biased news articles. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing. In: Proceedings of the 35th Annual ACM Symposium on Applied Computing; 2020. p. 892–892.
- 108. Moreno JG, Pitarch Y, Pinel-Sauvagnat K, Hubert G. Rouletabille at SemEval-2019 Task 4: neural network baseline for identification of hyperpartisan publishers. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 981–981.
- 109.
Ko Y, Ryu S, Han S, Jeon Y, Kim J, Park S, et al. KHAN: knowledge-aware hierarchical attention networks for accurate political stance prediction. In: Proceedings of the ACM Web Conference 2023 WWW 2023. Austin, TX: ACM; 2023. p. 1572–83. ISBN: 9781450394161. https://doi.org/10.1145/3543507.3583300
- 110. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput 1997;9(8):1735–80. pmid:9377276
- 111. Isbister T, Johansson F. Dick-Preston and Morbo at SemEval-2019 Task 4: transfer learning for hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 939–939.
- 112. Li C, Goldwasser D. Using social and linguistic information to adapt pretrained representations for political perspective identification. In: Zong C, Xia F, Li W, Navigli R, editors. In: Zong C, Xia F, Li W, Navigli R, editors; 2021. p. 4569–4569.
- 113. Cramerus R, Scheffler T. Team Kit Kittredge at SemEval-2019 Task 4: LSTM voting system. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1021–1021.
- 114. Zhang C, Rajendran A, Abdul-Mageed M. UBC-NLP at SemEval-2019 Task 4: hyperpartisan news detection with attention-based Bi-LSTMs. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1072–1072.
- 115. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. Roberta: a robustly optimized BERT pretraining approach. arXiv preprint 2019
- 116. Kim K-M, Lee M, Won H-S, Kim M-J, Kim Y, Lee S. Multi-stage prompt tuning for political perspective detection in low-resource settings. Appl Sci 2023;13(10):6252.
- 117. Liu Y, Zhang XF, Wegsman D, Beauchamp N, Wang L. POLITICS: pretraining with same-story article comparison for ideology prediction and stance detection. In: Carpuat M, de Marneffe MC, Meza Ruiz IV, editors. In: Carpuat M, de Marneffe MC, Meza Ruiz IV, editors; 2022. p. 1354–1354.
- 118. Kim MY, Johnson KM. CLoSE: contrastive learning of subframe embeddings for political bias classification of news media. In: Calzolari N, Huang CR, Kim H, Pustejovsky J, Wanner L, Choi KS, et al., editors. Proceedings of the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics. 2022. p. 2780–2793.
- 119. Devlin J, Chang M, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint 2018
- 120. Roy S, Goldwasser D. Weakly supervised learning of nuanced frames for analyzing polarization in news media. In: Webber B, Cohn T, He Y, Liu Y, editors. In: Webber B, Cohn T, He Y, Liu Y, editors; 2020. p. 7698–7698.
- 121. Baly R, Da San Martino G, Glass J, Nakov P. We can detect your bias: predicting the political ideology of news articles. In: Webber B, Cohn T, He Y, Liu Y, editors. In: Webber B, Cohn T, He Y, Liu Y, editors; 2020. p. 4982–4982.
- 122. Da Silva SC, Paraboni I. Politically-oriented information inference from text. JUCS 2023;29(6):569–94.
- 123. Shaprin D, Da San Martino G, Barrón-Cedeóo A, Nakov P. Team Jack Ryder at SemEval-2019 Task 4: using BERT representations for detecting hyperpartisan news. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1012–1012.
- 124. Ahmed U, Lin JC, Srivastava G. Temporal positional lexicon expansion for federated learning based on hyperpatism detection. Exp Syst 2022;40(5):e13183.
- 125. Ahmed U, Lin JC-W, Srivastava G. Semisupervised federated learning for temporal news hyperpatism detection. IEEE Trans Comput Soc Syst 2023;10(4):1758–69.
- 126. Drissi M, Sandoval Segura P, Ojha V, Medero J. Harvey Mudd College at SemEval-2019 Task 4: the Clint Buchanan hyperpartisan news detector. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 962–962.
- 127.
Mutlu O, Can OA, Dayanik E. Team Howard Beale at SemEval-2019 Task 4: hyperpartisan news detection with BERT. Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics; 2019. p. 1007–11. https://doi.org/10.18653/v1/s19-2175
- 128. Smaşdu RA, Echim SV, Cercel DC, Marin I, Pop F. From fake to hyperpartisan news detection using domain adaptation. arXiv preprint 2023
- 129. Omidi Shayegan S, Nejadgholi I, Pelrine K, Yu H, Levy S, Yang Z, et al. An evaluation of language models for hyperpartisan ideology detection in Persian Twitter. In: Ojha AK, Ahmadi S, Cinkova S, Fransen T, Liu CH, McCrae JP, editors. In: Ojha AK, Ahmadi S, Cinkova S, Fransen T, Liu CH, McCrae JP, editors; 2024. p. 51–51. https://aclanthology.org/2024.eurali-1.8.
- 130. Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: open foundation and fine-tuned chat models. arXiv preprint 2023
- 131. Afsarmanesh N, Karlgren J, Sumbler P, Viereckel N. Team Harry Friberg at SemEval-2019 Task 4: identifying hyperpartisan news through editorially defined metatopics. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1004–1004.
- 132. Sanchez-Junquera J. On the detection of political and social bias. 2021.
- 133. Mets M, Karjus A, Ibrus I, Schich M. Automated stance detection in complex topics and small languages: the challenging case of immigration in polarizing news media. PLoS One 2024;19(4):e0302380. pmid:38669237
- 134. Karjus A, Cuskley C. Evolving linguistic divergence on polarizing social media. Humanit Soc Sci Commun 2024;11(1):422.
- 135. Sylwester K, Purver M. Twitter language use reflects psychological differences between democrats and republicans. PLoS One 2015;10(9):e0137422. pmid:26375581
- 136.
Fraxanet E, Pellert M, Schweighofer S, Gómez V, Garcia D. Unpacking polarization: antagonism and alignment in signed networks of online interaction. PNAS Nexus. 2024;3(12):pgae276. https://doi.org/10.1093/pnasnexus/pgae276 pmid:39703230
- 137. Lee N, Liu Z, Fung P. Team yeon-zi at SemEval-2019 Task 4: hyperpartisan news detection by de-noising weakly-labeled data. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1052–1052.
- 138.
Joo Y, Hwang I. Steve Martin at SemEval-2019 Task 4: ensemble learning model for detecting hyperpartisan news. In: Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics; 2019, p. 990–4. https://doi.org/10.18653/v1/s19-2171
- 139. Ning Z, Lin Y, Zhong R. Team Peter-Parker at SemEval-2019 Task 4: BERT-based method in hyperpartisan news detection. In: Proceedings of the 13th International Workshop on Semantic Evaluation. In: Proceedings of the 13th International Workshop on Semantic Evaluation; 2019. p. 1037–1037.
- 140. Silverman C, Strapagiel L, Shaban H, Hall E, Singer-Vine J. Hyperpartisan facebook pages are publishing false and misleading information at an alarming rate. BuzzFeed News. 2016.
- 141. Gebhard L, Hamborg F. The POLUSA dataset: 0.9m political news articles balanced by time and outlet popularity. 2020.
- 142. Norregaard J, Horne BD, Adali S. NELA-GT-2018: a large multi-labelled news dataset for the study of misinformation in news articles. arXiv preprint 2019
- 143. Fan L, White M, Sharma E, Su R, Choubey PK, Huang R, et al. In plain sight: Media bias through the lens of factual reporting. In: Inui K, Jiang J, Ng V, Wan X, editors. In: Inui K, Jiang J, Ng V, Wan X, editors; 2019. p. 6343–6343.
- 144. Baly R, Karadzhov G, Alexandrov D, Glass J, Nakov P. Predicting factuality of reporting and bias of news media sources. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J, editors. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J, editors; 2018. p. 3528–3528.
- 145. Horne BD, Dron W, Khedr S, Adali S. Sampling the news producers: a large news and feature data set for the study of the complex media landscape. arXiv preprint 2018
- 146. Szwoch J, Staszkow M, Rzepka R, Araki K. Creation of polish online news corpus for political polarization studies. In: Afli H, Alam M, Bouamor H, Casagran CB, Boland C, Ghannay S, editors. Proceedings of the LREC 2022 workshop on Natural Language Processing for Political Sciences. European Language Resources Association; 2022. p. 86–90
- 147. Lim S, Jatowt A, Yoshikawa M. Creating a dataset for fine-grained bias detection in news articles. 2020.
- 148. Li C, Goldwasser D. Encoding social information with graph convolutional networks for political perspective detection in news media. In: Korhonen A, Traum D, Marquez L, editors. In: Korhonen A, Traum D, Marquez L, editors; 2019. p. 2594–2594.