Rumor detection on social networks based on Temporal Tree Transformer

Sirong Wu; Yuhui Deng; Junjie Liu; Xi Luo; Gengchen Sun

doi:10.1371/journal.pone.0320333

Abstract

The rapid propagation of rumors on social media can give rise to various social issues, underscoring the necessity of swift and automated rumor detection. Existing studies typically identify rumors based on their textual or static propagation structural information, without considering the dynamic changes in the structure of rumor propagation over time. In this paper, we propose the Temporal Tree Transformer model, which simultaneously considers text, propagation structure, and temporal changes. By analyzing observing the growth of propagation tree structures in different time windows, we use Gated Recurrent Unit (GRU) to encode these trees to obtain better representations for the classification task. We evaluate our model’s performance using the PHEME dataset. In most existing studies, information leakage occurs when conversation threads from all events are randomly divided into training and test sets. We perform Leave-One-Event-Out (LOEO) cross-validation, which better reflects real-world scenarios. The experimental results show that our model achieves state-of-the-art accuracy 75.84% and Macro F1 score of 71.98%, respectively. These results demonstrate that extracting temporal features from propagation structures leads to improved model generalization.

Citation: Wu S, Deng Y, Liu J, Luo X, Sun G (2025) Rumor detection on social networks based on Temporal Tree Transformer. PLoS ONE 20(4): e0320333. https://doi.org/10.1371/journal.pone.0320333

Editor: Michal Ptaszynski, Kitami Institute of Technology, JAPAN

Received: September 13, 2024; Accepted: February 15, 2025; Published: April 7, 2025

Copyright: © 2025 Wu et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting Information files.

Funding: This work is partially supported by Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College [project code 2022B1212010006], [UIC research grant R0400001-22]; National Natural Science Foundation of China [grant number 12231004], [UIC research grant UICR0600048]; National Natural Science Foundation of China [grant number 1272054], [UIC research grant UICR0600036]; Guangdong University Innovation and Enhancement Programme Funds Featured Innovation Project [grant number 2018KTSCX278], [UIC research grant R5201910], [UIC research grants R201809]; European Research Council (ERC) under the European Union’s Horizon 2020 Research and Innovation Programme [grant agreement number 101002240]. The computations in this paper were performed using the Bayes Cluster (USBC) provided by the Department of Statistics and Data Science, BNU-HKBU United International College.

Competing interests: The authors have declared that no competing interests exist.

1

BERT-Tokenizer is retrieved from https://huggingface.co/bert-base-uncased.

Introduction

Social networks have become an important medium for information dissemination and more than half of the world’s population uses social media. For hot issues that people are interest in, many content producers will try to obtain readings, reposts, and likes by taking things out of context and then exaggerating and distorting them to become popular content, which will finally become well-known to the general public and can easily form online rumors [1,2]. Since online rumors spread suddenly and quickly, they can easily cause unnecessary panic and confusion if they are not promptly screened and controlled [3,4]. Therefore, timely detection of online rumors can reduce the occurrence of unnecessary negative public affairs.

In this paper, we aim to identify the authenticity of a claim made in a conversation formed of related posts on social networks. It can be formulated into a binary classification problem, i.e. we need to identify a claim to be a rumor or non-rumor. Classical deep learning methods are usually employed to extract semantic features from post content to learn classification models. Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) are considered by Ma et al; Wu et al. [5,6] and Yu et al; Lu et al. [7,8], respectively. Generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) are also used to capture the latent representation of a rumor [1,9,10]. To solve the problem of short-term memory, Transformer-based models are used to extract the semantic features among different words by self-attention mechanism [11–13]. However, classification models based solely on semantic features often do not have good generalization ability due to the fact that certain types of rumor events are more likely to show their unique textual features. Thus it is necessary to utilize more features of rumors to learn a supervised classifier.

Rumors on social networks have the characteristic of spreading as a social contagion since online media users like sharing their opinions, conjectures, and evidence of inaccurate information with others [14,15]. Thus, the propagation patterns of rumors generated by interactions among posts can help us identify the authenticity of a claim made in an event. Posts and their interactions are usually modeled as nodes and edges in graphs. The current research study on the propagation characteristics of rumors can be categorized into two groups.

One line of work utilizes the tree structure, a directed acyclic graph, to model propagation structure. Shang et al. [16] pointed out that the information propagation in social networks exhibits a tree-like structure. A tree-structured model can effectively and intuitively represent the changes in the direction of rumor propagation and the evolution of semantics. Kumar and Carley [17] used binarized constituency trees to compare features in source posts and their replies. Ma et al. [18] used recursive neural networks with a tree structure to learn the structure of rumor spread. Ahmed et al. [19] proposed Tree Transformer for sentence representations by performing recursive traversal only with attention. Ma and Gao [11] further improved the representation power of the Tree Transformer [19] by adding a post-level self-attention module and achieved better rumor detection performance.

Another line of work introduces Graph Neural Networks (GNNs) to model the complex topological structures involved in rumor propagation [8,20–22]. In the spread of a rumor, some comments may question and present evidence contradicting the original post, leading to repeated semantic evolution through these contradictions. Ye et al. [23] combines rumor content and propagation structure information based on a graph model to explore their interactions during the rumor propagation process, thereby obtaining a representation of the rumor. Although GNNs are adept at capturing global information and effectively characterizing the features of nodes and edges, they inherently treat all nodes as existing on the same hierarchical level. This uniform approach overlooks the dynamic and nuanced semantic evolution that occurs during rumor propagation. As a result, GNNs have a limitation in their ability to account for and adapt to the continuous changes in semantics introduced by the interplay of supporting and contradicting information. Tao et al. [24] enhanced their approach by integrating the encoding of parent-child node pairs with GNNs to capture the semantic changes between tweets and their responses more effectively.

In the above graph structure models, nodes from the same generation are hierarchically aggregated to generate an entire conversation thread. However, this spatio-only propagation structure does not capture the growth of a real conversation thread along the time. Thus it is necessary to consider temporal information which can help us to observe finer-grained propagation structures, and thus to extract more information in propagation patterns of rumors. Yu et al. [25] divided the cascade into sub-cascade graphs based on temporal development and used a graph-based network to learn their local structural information. Their research showed that extracting dynamic features is crucial for predicting cascade size. Huang et al. [26] also claimed that the propagation tree structure can be further differentiated by their temporal structures since it can reveal differences in the propagation path of information. For instance, Fig 1 shows an example from PHEME dataset collected from Twitter by Zubiaga et al. [27]. The claim is labeled as a rumor about the crash of a Germanwings plane. Fig 1a shows the real conversation thread with spatial and temporal information. For simplicity, the root post and other posts are denoted as { r } and {x₁,x₂x₃,x₄,x₅}, respectively. Fig 1b shows the spatio-only propagation tree structure which is obtained by hierarchical aggregating the root post r’s first generation {x₁,x₂} and second generation {x₃,x₄,x₅}. In this structure, x₃ and x₅ are in the same hierarchy. However, as shown in Fig 1a, x₃ replies x₁ in twenty minutes while x₅ replies it after almost eleven hours during which the user is exposed to more external information. Thus the information provided by these two propagation nodes is different for detection of rumors. Fig 1c shows a spatio-temporal propagation tree structure with five hierarchies generated from the original conversation thread. It allows us to observe finer-grained propagation structure with both spatial and temporal information, and then to better extract the propagation patterns of rumors.

Download:

Fig 1. An example from PHEME dataset collected from Twitter by Zubiaga et al. [27].

A claim labeled as a rumor about Germanwings crash event. (a) The real conversation thread with spatial and temporal information. (b) The spatio-only propagation tree structure which is obtained by hierarchically aggregated by the root post r’s first generation {x₁,x₂} and second generation {x₃,x₄,x₅}. (c) The spatio-temporal propagation tree structure with five hierarchies generated from (a). It allows us to observe finer-grained propagation structure with both spatial and temporal information, and then to better extract the propagation patterns of rumors.

https://doi.org/10.1371/journal.pone.0320333.g001

Although semantic features in posts, spatial and temporal features in propagation structures are significant for us to detect rumors on social networks, few researches consider these three features simultaneously [5,11,21,24,28,29]. In this paper, we propose a composite architecture Temporal Tree Transformer model which can simultaneously extract semantic, spatial and temporal propagation features to detect rumors. Equal-depth time window is first applied to observe the growth of propagation trees from subtrees to an entire tree. Then the Tree Transformer model proposed by Ma and Gao [11] is used to hierarchically encode the subtrees in different time window. Since Gated Recurrent Unit (GRU) demonstrates greater advantages in extracting temporal information, particularly in tasks involving long sequences, limited data, strong local dependencies, or noisy time-series data [30–32], we employ GRU on all subtrees to extract features of rumors. To avoid information leakage caused by randomly splitting conversation threads from all events into training and test sets, and to evaluate the performance of our model in a more realistic scenario, we adopt the Leave-One-Event-Out (LOEO) principle. That is, one event is used as a test set and the remaining events are used as a training set in each iteration. LOEO principle constructs a test environment more close to real-world scenarios and better evaluates the ability of model generalization.

The contributions of this work are as follows:

We characterize post propagation from a more fine-grained perspective, i.e., temporal features as well as spatial features are extracted from propagation structures to achieve high performance of our classification model.
Rather than randomly slicing the conversation threads from all events into training and test sets, a more realistic principle, Leave-One-Event-Out (LOEO), is used for our validation. It is more suitable for practical application scenarios, and thus it better evaluates the ability of model generalization.

The rest of this paper is organized as follows. We define our problem statement in Section Probelm Statement and introduce the proposed model in Section Proposed Model. In Section Experiments and Results, we describe the dataset and baselines we use in our experiments and show our experimental results. We then conclude with future work in Section Conclusions and Future Discussion.

Problem statement

We aim to predict the authenticity of a claim made in a conversation given its source post, response posts and their interactions. Let V(r)={r,x₁,x₂,…,x_n} denote the conversation thread of an event, where r is the source post, x_i is the i-th response post in chronological order, and n is the number of response posts in the thread. To capture the spatial and temporal features in thread propagation, we use tree structure to model the propagation, and observe the growth of the trees from different time windows. Since time interval between a post and its response has large variance in different conversation threads, using time window with equal width to observe the growth is not practical. Thus equal-depth time window is used, that is, there are same number of posts in each time window. Let k denote the number of posts in each time window, then a conversation thread with a root post r and n response posts can be divided into time windows in chronological order, where ⌈ ⋅ ⌉ denotes the integer obtained from rounding up. The posts from the first time window form an initial subtree, and then the posts from the second time window are aggregated to the initial subtree to form the second subtree and so forth. The aggregation stops when an entire propagation tree is formed. For instance, in the conversation thread shown in Fig 1a, we observe the growth of the propagation tree through three subtrees {r,x₁, {r,x₁,x₂,x₃} and {r,x₁,x₂,x₃,x₄,x₅} from three different time windows if k = 2 is applied to divide the thread. We propose Temporal Tree Transformer model to extract semantic, spatial and temporal features from these propagation subtree structures, and thus to detect rumors on social networks.

Proposed model

Temporal Tree Transformer consists of three components: (1)Token-level Transformer Encoder: all posts in a conversation thread are encoded using Token-level Transformer Encoder; (2) Tree Transformer: All posts in a subtree formed in a given time window are hierarchically encoded using the Post-level Transformer Encoder on a continuous basis. It enhances the representation of some posts in the subtree. One of two ways: Bottom-up or Top-down is employed to integrate information and then a subtree representation is obtained; (3) Temporal GRU Encoder: all subtrees in the conversation thread are encoded using a GRU. Finally a Softmax funciton is employed to obtain a probability of being a rumor for the conversation thread. Fig 2 gives an overview of Temporal Tree Transformer, the details of which will be explained in the folowing subsections.

Download:

Fig 2. Temporal Tree Transformer.

(a) Token-level Transformer Encoder. (b) Post-level Transformer Encoder. (c) and (d) are Tree Transformer with Bottom-up and Top-down integration, respectively. (e) Temporal GRU Encoder.

https://doi.org/10.1371/journal.pone.0320333.g002

Token-level Transformer Encoder

All nonalphabetic characters and stop words are removed from each post, and the rest words are converted to lowercase. BERT-Tokenizer^¹ is used to obtain the initial token embedding for each word in a post [33]. Then the i-th post x_i can be initially represented as , where w_t is a d-dimensional embedding vector of the t-th word and |x_i| denotes the number of words in x_i. As shown in Fig 2a, Multi-Head Attention (MHA) is applied to x_i, then the output is fed to two normalization sublayers (LayerNorm) and a fully connected feed-forward sublayer (FFN). Finally max-pooling is applied to the output matrix to obtain which is the vector representation for post x_i. It can be formulated as,

(1)

Tree Transformer

Within a given time window, all the posts in the propagation subtree are encoded by Tree Transformer proposed by Ma and Gao [11]. It hierarchically encodes and integrates posts using Post-level Transformer Encoder on a continuous basis. As shown in Fig 2b, Post-level Transformer Encoder is similar to Token-level Transformer Encoder but without max-pooling layer. One of two ways, Bottom-up and Top-down is applied to hierarchically integrate the encoded representations containing semantic and propagation information. In the Bottom-up way, let x_j denote one of the deepest parent nodes in the subtree, and s_j is its post-level representation. For every s_j, a Post-level Transformer Encoder is used to encode s_j and its child notes. Since the deepest parent nodes are also child notes of other higher level parent nodes, their first representations as well as the token-level representations of their parents are encoded again using Post-level Transformer Encoder. The above encoding process is continuously implemented until the root node is reached, and thus the post-level representation for an entire subtree in a given time window, where r is the source post, is obtained. For instance, Fig 2c shows a Bottom-up way of integration on the tree structure given in Fig 1b. x₁ and x₂ are two deepest parent nodes in the tree, and their token-level representations are s₁ and s₂. Two subtrees {s₁,s₂,s₃} and {s₂,s₄} are first encoded through Post-level Transformer Encoder, respectively. The parent nodes s₁ and s₂ are also child nodes in the higher level subtree {r,s₁,s₂}, thus their first representations together with the token-level representation of their parent node r are encoded again through Post-level Transformer Encoder. Finally the post-level representation for the entire tree, i.e. is obtained. Similarly, Top-down integration integrates information from top to bottom. For instance, as it is shown in Fig 2d, the top subtree {r,s₁,s₂} is first encoded. Then its child nodes s₁ and s₂ are encoded again in lower level subtrees {s₁,s₂,s₃} and {s₂,s₄} due to the fact that they are parent nodes in these two subtrees. Finally the post level representation for the entire tree S′(r) can be also obtained. The way of Top-down integration is more similar to the nature of information propagation in social networks.

After all nodes are encoded by Tree Transformer, some of them carry significant information of claims and stances stated for the entire subtree. Then an attention layer is employed to weight each node and thus the representation for the entire subtree can be written as a weighted sum of post-level representations for all nodes which contain both semantic and structure information.

(2)

where is the post-level representation of the i-th node, is the transformation weight. is the attention weight of obtained from applying softmax function on . It measures the importance of information carried by different nodes.

Temporal GRU Encoder

To extract temporal features in propagation, we observe the growth of propagation trees from different time windows. Since the text and structural feature extraction methods we employed are based on Transformer models, it might seem natural to also use a Transformer model for extracting temporal features. However, as Lim B. et al. [31] highlighted, Gated Recurrent Unit (GRU) outperforms Transformer in scenarios involving small-scale datasets and long time-series tasks due to its simpler structure and more precise modeling of local temporal dependencies. Therefore, we opted to use GRU for temporal feature extraction. Let denote the representation for the subtree from the t-th time window using the above Tree Transformer. Then all these representations can be encoded using GRU, as shown in Fig 2e to extract temporal features. The GRU layer [34] is formulated as,

(3)

where U_z, W_z, U_r, W_r, U_h, and W_h are weight matrices to be learned in GRU. The output is the hidden vector h_N of the last GRU unit, where is the number of time windows. Then we use a fully connected layer and Softmax function to obtain a probability of being a rumor for the conversation thread, and thus to obtain the binary classification output ŷ ∈ { Rumor, Non-rumor } . In our experiments, we also compare the performance of Transformer and GRU for temporal feature extraction. As presented in Table 2, the results demonstrate that GRU achieved superior performance.

Model loss

The model loss is written as summation of categorical cross-entropy loss (CrossEntropy) and L₂ regularization term, which is formulated as

(4)

where denotes the predicted class of i-th training data and y_i denotes the corresponding true label, denotes the L₂ regularization term over all the model parameters Θ and λ denotes the trade-off coefficient.

Experiments and results

Dataset

We evaluate our model on the public dataset PHEME collected from Twitter by Zubiaga et al. [27]. It is the only dataset which contains both the propagation path of rumor and response posts. The PHEME dataset contains five breaking events including shooting at Charlie Hebdo (CH), the hostage situation in Sydney (SS), Ferguson unrest (FG), shooting in Ottawa shooting (OS), and the crash of a Germanwings plane (GC). Each thread in PHEME dataset is labeled with two classes, rumor and non-rumor. Events differ in size drastically and have different class-label proportions which illustrate unbalanced distributions for rumors. Table 1 shows the basic statistics of PHEME dataset.

Download:

Table 1. Statistics of PHEME v5 dataset.

https://doi.org/10.1371/journal.pone.0313772.t001

Experimental setup

In the training process, parameters in the Temporal Tree Transformer are updated by backpropagation with Adaptive Moment Estimation Algorithm (Adam) optimizer. We set the word embeded dimension d to be 768 and the hidden dimension for fully connected layer to be 600. We apply one layer of Transformer to token-level encoder while six layers of Transformer to post-level encoder, and the head number for both MHA modules is 12. During training, the learning rate is set as 0.000042 and the dropout rate is set as 0.2. The time window depth k, which denotes the number of posts in each observing time window, is set to 8 according to the experimental results of k as shown in Fig 3.

Download:

Fig 3. The effects on Macro F1 score and accuracy of varying the value of k in LOEO validation for TTT - G.

The red lines represent Top-down, while the blue lines represent Bottom-up. In both figures, the TRANS results are marked as black lines.

https://doi.org/10.1371/journal.pone.0320333.g003

Models for comparison

To show the effectiveness of our proposed model, we compare the results with the following models on rumor detection classification problems.

RNN: A RNN-based model [5] is used to learn semantic information with sequential structure.
CNN: A CNN-based model [7] is used to extract key semantic features and their interactions from conversation sequences.
MTL3: A multi-task learning model is used for joint tasks for rumor detection, rumor tracking, and stance classification [35].
GAN-RNN: Generative model GAN is used with RNN to enhance semantic representation [29].
GACL: A GNN-based model with contrastive learning [21] that considers the propagation structure information of rumors.
GARD: The latest rumor detection model [24] that integrates the encoding of parent-child node pairs with GNNs to effectively capture both local semantic changes and global structural information.
TRANS: A Tree Transformer model [11] considers both semantic and propagation information based on tree structures.
TTT - T, TTT - G: Our proposed Temporal Tree Transformer models with Transformer and GRU in temporal encoders, respectively.

Rumor classification performance

Five events in PHEME dataset suffer from the class imbalance problem. When faced with data skewness or class imbalance, a classification task is significantly hindered. In such situations, relying solely on accuracy score as an evaluation metric is insufficient due to its limitations in effectively handling class imbalance problems. Thus, to accurately capture the performance of models on PHEME dataset, we refer to the evaluation framework in Zhang et al. [36] which includes accuracy, precision, recall, Macro F1 score, and AUC. Rather than randomly slicing the conversation threads from all five events into training and test sets, a more realistic principle, Leave-One-Event-Out (LOEO), is used for our validation. It is more suitable for practical application scenarios, and thus it better evaluates the ability of model generalization.

The results of our experiments are presented in Table 2. It lists a comparison between our proposed Temporal Tree Transformer model (TTT) with other baseline models. T and G indicate Transformer and GRU in temporal encoders, respectively. BU and TD indicate Bottom-up and Top-down, respectively. The notation ‘-’ indicates that the original paper did not give the relevant results. It shows that our Temporal Tree Transformer models achieve higher accuracy and Macro F1 scores than other models. Among all the baselines, the TRANS model performs best as it fully utilizes semantic and spatial propagation information with strong representation power of the Transformer model. It is important to highlight that GACL and GARD are models grounded in GNNs. These models leverage a robust GNN encoder to capture the global structural features of rumor propagation. However, the tree-structured model TRANS demonstrates superior generalization ability in LOEO validation scenarios compared to GNN-based models. Among our Temporal Tree Transformer models, TTT - G (TD) achieves the best performance due to the fact that GRU Encoder with lower complexity is more suitable for rumor time series data and the way of TD integration is more similar to the nature of information propagation in social networks. TTT - G (TD) model makes a significant improvement over the TRANS model by 2.18%–2.94% (3.74%–4.38%) in terms of accuracy (Macro F1 score).

Download:

Table 2. Experimental results on PHEME dataset.

https://doi.org/10.1371/journal.pone.0313772.t002

Table 3 shows the details of our model performance for each unseen event. The ‘Event’ column shows five different events used as a test set in LOEO validation. The model has the best results on the test event Charlie Hebdo (CH). The reason is that the training set has reasonable proportion of two classes.

Download:

Table 3. Experimental results of Temporal Tree Transformer in each event.

https://doi.org/10.1371/journal.pone.0313772.t003

We further investigate the effects of the hyperparameter k which denotes the number of posts in each time window. The specific experimental results are provided in the tables in S1 Table and S2 Table. To intuitively determine the most suitable value for hyperparameter k, we have plotted a line chart, as shown in Fig 3. It is observed that with k set to 8, our model TTT - G (TD) achieves the highest Mac F1 score and relatively high accuracy in LOEO validation. Even when the value of k increases to 20, our model is robust and shows better performance than the TRANS model. As the assessment criteria is task-oriented, therefore, we set the hyperparameter k of our framework to 8 to compare with other models.

To examine the effectiveness of our conversation representations for classification task, we use t-SNE proposed by Van der Maaten and Hinton [37] to visualize the separability of representations obtained from TTT - G (TD). Fig 4 shows the clusters of points obtained from t-SNE. Orange and blue dots denote the Rumor and Non-rumor classes predicted by our model, respectively. Two clusters are distinct from each other in five events. This further shows that our Temporal Tree Transformer model generates good representations for rumor detection in unseen events.

Download:

Fig 4. Visualization of the latent representation for conversation threads in five events using t-SNE.

Orange dots represent Rumor class, while blue dots represent Non-rumor class. The two clusters are separated in all five subplots, demonstrating the well prediction ability of our classification model in unseen events.

https://doi.org/10.1371/journal.pone.0320333.g004

Conclusions and future discussion

In this paper, we aim to identify the authenticity of a claim made in a conversation thread formed of related posts on social networks. To achieve this, we use equal-depth time window to observe the growth of propagation tree structures. In each time window, the tree is encoded using Token-level Transformer Encoder and Tree Transformer in one of two ways, Bottom-up or Top-down. The tree representations in different time windows are then encoded using Temporal GRU Encoder. Finally, we use the resulting representation of the conversation thread to calculate the probability of it being a rumor. We evaluate our model’s performance using the PHEME dataset, and instead of randomly dividing conversation threads from all events into training and test sets, we perform Leave-One-Event-Out (LOEO) cross-validation, which is closer to a realistic scenario. The results show that our proposed Temporal Tree Transformer model with Top-down integration achieves state-of-the-art classification results across multiple evaluation metrics. The results suggest that extracting temporal features from propagation leads to better generalization of model predictions.

There are numerous avenues to explore in future research. One possibility involves incorporating non-textual data, such as images and videos, to improve the representation of individual post. Additionally, assessing user credibility in the dissemination of misinformation presents a promising opportunity.

Supporting information

S1 Table. Experimental results of varying the value of k in LOEO validation for TTT - G (BU).

https://doi.org/10.1371/journal.pone.0320333.s001

(PDF)

S2 Table. Experimental results of varying the value of k in LOEO validation for TTT - G (TD).

https://doi.org/10.1371/journal.pone.0320333.s002

(PDF)

References

1. Allcott H, Gentzkow M, Yu C. Trends in the diffusion of misinformation on social media. R&P 2019;6(2):1–8.
- View Article
- Google Scholar
2. Kaur K, Gupta S. Towards dissemination, detection and combating misinformation on social media: a literature review. J Bus Ind Mark. 2023.
- View Article
- Google Scholar
3. Jin Z, Cao J, Guo H, Zhang Y, Wang Y, Luo J. Detection and analysis of 2016 us presidential election related rumors on twitter. In: Proceedings of the international conference on social computing, behavioral-cultural modeling and prediction and behavior representation in modeling and simulation. 2017. p. 14–24.
4. Zhou X, Shu K, Phoha V, Liu H, Zafarani R. This is fake! shared it by mistake: Assessing the intent of fake news spreaders. In: Proceedings of the ACM web conference 2022. 2022. p. 3685–3694.
5. Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Wong KF, et al. Detecting rumors from microblogs with recurrent neural networks. 2016.
- View Article
- Google Scholar
6. Wu L, Rao Y, Zhao Y, Liang H, Nazir A. DTCA: Decision tree-based co-attention networks for explainable claim verification. arXiv preprint. 2020.
- View Article
- Google Scholar
7. Yu F, Liu Q, Wu S, Wang L, Tan T. A convolutional approach for misinformation identification. In: Proceedings of the international joint conference on artificial intelligence (IJCAI). 2017. p. 3901–3907.
8. Lu Y, Li C. GCAN: Graph-aware co-attention networks for explainable fake news detection on social media. arXiv preprint. 2020.
- View Article
- Google Scholar
9. Hamidian S, Diab M. Rumor detection and classification for twitter data. arXiv preprint. 2019.
- View Article
- Google Scholar
10. Cheng M, Nazarian S, Bogdan P. VRoC: Variational autoencoder-aided multi-task rumor classifier based on text. In: Proceedings of the web conference 2020. 2020. p. 2892–2898. https://doi.org/10.1145/3366423.3380054
11. Ma J, Gao W. Debunking rumors on twitter with tree transformer. In: Proceedings of the annual meeting of the association for computational linguistics. ACL. 2020.
12. Khoo LMS, Chieu HL, Qian Z, Jiang J. Interpretable rumor detection in microblogs by attending to user interactions. AAAI 2020;34(05):8783–90.
- View Article
- Google Scholar
13. Yu J, Jiang J, Khoo LMS, Chieu HL, Xia R. Coupled hierarchical transformer for stance-aware rumor verification in social media conversations. Association for computational linguistics. 2020.
- View Article
- Google Scholar
14. Li Q, Zhang Q, Si L. Rumor detection by exploiting user credibility information, attention and multi-task learning. In: Proceedings of the 57th annual meeting of the association for computational linguistics. 2019. p. 1173–1179.
15. Mosallanezhad A, Karami M, Shu K, Mancenido MV, Liu H. Domain adaptive fake news detection via reinforcement learning. In: Proceedings of the ACM web conference 2022. 2022. p. 3632–3640. https://doi.org/10.1145/3485447.3512258
16. Shang K-K, Li T-C, Small M, Burton D, Wang Y. Link prediction for tree-like networks. Chaos 2019;29(6):061103. pmid:31266316
- View Article
- PubMed/NCBI
- Google Scholar
17. Kumar S, Carley K. Tree LSTMs with convolution units to predict stance and rumor veracity in social media conversations. In: Proceedings of the 57th annual meeting of the association for computational linguistics. 2019. p. 5047–5058. https://doi.org/10.18653/v1/p19-1498
18. Ma J, Gao W, Joty S, Wong K-F. An Attention-based rumor detection model with tree-structured recursive neural networks. ACM Trans Intell Syst Technol 2020;11(4):1–28.
- View Article
- Google Scholar
19. Ahmed M, Samee M, Mercer R. You only need attention to traverse trees. In: Proceedings of the 57th annual meeting of the association for computational linguistics. 2019. p. 316–322.
20. Bian T, Xiao X, Xu T, Zhao P, Huang W, Rong Y, et al. Rumor detection on social media with bi-directional graph convolutional networks. AAAI 2020;34(01):549–56.
- View Article
- Google Scholar
21. Sun T, Qian Z, Dong S, Li P, Zhu Q. Rumor detection on social media with graph adversarial contrastive learning. In: Proceedings of the ACM web conference 2022. 2022. p. 2789–2797. https://doi.org/10.1145/3485447.3511999
22. Jia H, Wang H, Zhang X. Early detection of rumors based on source tweet-word graph attention networks. PLoS One 2022;17(7):e0271224. pmid:35816493
- View Article
- PubMed/NCBI
- Google Scholar
23. Ye N, Yu D, Zhou Y, Shang K, Zhang S. Graph convolutional-based deep residual modeling for rumor detection on social media. Mathematics 2023;11(15):3393.
- View Article
- Google Scholar
24. Tao X, Wang L, Liu Q, Wu S, Wang L. Semantic evolvement enhanced graph autoencoder for rumor detection. In: Proceedings of the ACM web conference 2024. 2024. p. 4150–4159. https://doi.org/10.1145/3589334.3645478
25. Yu D, Zhou Y, Zhang S, Li W, Small M, Shang K. Information cascade prediction of complex networks based on physics-informed graph convolutional network. New J Phys 2024;26(1):013031.
- View Article
- Google Scholar
26. Huang Q, Zhou C, Wu J, Liu L, Wang B. Deep spatial–temporal structure learning for rumor detection on Twitter. Neural Comput Appl. 2020:1–11.
27. Zubiaga A, Liakata M, Procter R, Wong Sak Hoi G, Tolmie P. Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS One 2016;11(3):e0150989. pmid:26943909
- View Article
- PubMed/NCBI
- Google Scholar
28. Kochkina E, Liakata M. Estimating predictive uncertainty for rumour verification models. arXiv preprint. 2020.
- View Article
- Google Scholar
29. Cheng M, Li Y, Nazarian S, Bogdan P. From rumor to genetic mutation detection with explanations: a GAN approach. Sci Rep 2021;11(1):5861. pmid:33712675
- View Article
- PubMed/NCBI
- Google Scholar
30. Karim F, Majumdar S, Darabi H, Chen S. LSTM fully convolutional networks for time series classification. IEEE Access. 2018;6:1662–9.
- View Article
- Google Scholar
31. Lim B, Arık SÖ, Loeff N, Pfister T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int J Forecast 2021;37(4):1748–64.
- View Article
- Google Scholar
32. Zhang X, Zhong C, Zhang J, Wang T, Ng WWY. Robust recurrent neural networks for time series forecasting. Neurocomputing. 2023;526:143–57.
- View Article
- Google Scholar
33. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint. 2018.
- View Article
- Google Scholar
34. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint. 2014.
- View Article
- Google Scholar
35. Kochkina E, Liakata M, Zubiaga A. All-in-one: Multi-task learning for rumour verification. arXiv preprint. 2018.
- View Article
- Google Scholar
36. Zhang Y, Feng M, Shang K, Ran Y, Wang C-J. Peeking strategy for online news diffusion prediction via machine learning. Physica A Stat Mech Appl. 2022;598:127357.
- View Article
- Google Scholar
37. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9(11):257–65.
- View Article
- Google Scholar

[ref1] 1. Allcott H, Gentzkow M, Yu C. Trends in the diffusion of misinformation on social media. R&P 2019;6(2):1–8.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Kaur K, Gupta S. Towards dissemination, detection and combating misinformation on social media: a literature review. J Bus Ind Mark. 2023.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Jin Z, Cao J, Guo H, Zhang Y, Wang Y, Luo J. Detection and analysis of 2016 us presidential election related rumors on twitter. In: Proceedings of the international conference on social computing, behavioral-cultural modeling and prediction and behavior representation in modeling and simulation. 2017. p. 14–24.

[ref4] 4. Zhou X, Shu K, Phoha V, Liu H, Zafarani R. This is fake! shared it by mistake: Assessing the intent of fake news spreaders. In: Proceedings of the ACM web conference 2022. 2022. p. 3685–3694.

[ref5] 5. Ma J, Gao W, Mitra P, Kwon S, Jansen BJ, Wong KF, et al. Detecting rumors from microblogs with recurrent neural networks. 2016.
View Article
Google Scholar

[10] View Article

[11] Google Scholar

[ref6] 6. Wu L, Rao Y, Zhao Y, Liang H, Nazir A. DTCA: Decision tree-based co-attention networks for explainable claim verification. arXiv preprint. 2020.
View Article
Google Scholar

[13] View Article

[14] Google Scholar

[ref7] 7. Yu F, Liu Q, Wu S, Wang L, Tan T. A convolutional approach for misinformation identification. In: Proceedings of the international joint conference on artificial intelligence (IJCAI). 2017. p. 3901–3907.

[ref8] 8. Lu Y, Li C. GCAN: Graph-aware co-attention networks for explainable fake news detection on social media. arXiv preprint. 2020.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref9] 9. Hamidian S, Diab M. Rumor detection and classification for twitter data. arXiv preprint. 2019.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref10] 10. Cheng M, Nazarian S, Bogdan P. VRoC: Variational autoencoder-aided multi-task rumor classifier based on text. In: Proceedings of the web conference 2020. 2020. p. 2892–2898. https://doi.org/10.1145/3366423.3380054

[ref11] 11. Ma J, Gao W. Debunking rumors on twitter with tree transformer. In: Proceedings of the annual meeting of the association for computational linguistics. ACL. 2020.

[ref12] 12. Khoo LMS, Chieu HL, Qian Z, Jiang J. Interpretable rumor detection in microblogs by attending to user interactions. AAAI 2020;34(05):8783–90.
View Article
Google Scholar

[25] View Article

[26] Google Scholar

[ref13] 13. Yu J, Jiang J, Khoo LMS, Chieu HL, Xia R. Coupled hierarchical transformer for stance-aware rumor verification in social media conversations. Association for computational linguistics. 2020.
View Article
Google Scholar

[28] View Article

[29] Google Scholar

[ref14] 14. Li Q, Zhang Q, Si L. Rumor detection by exploiting user credibility information, attention and multi-task learning. In: Proceedings of the 57th annual meeting of the association for computational linguistics. 2019. p. 1173–1179.

[ref15] 15. Mosallanezhad A, Karami M, Shu K, Mancenido MV, Liu H. Domain adaptive fake news detection via reinforcement learning. In: Proceedings of the ACM web conference 2022. 2022. p. 3632–3640. https://doi.org/10.1145/3485447.3512258

[ref16] 16. Shang K-K, Li T-C, Small M, Burton D, Wang Y. Link prediction for tree-like networks. Chaos 2019;29(6):061103. pmid:31266316
View Article
PubMed/NCBI
Google Scholar

[33] View Article

[34] PubMed/NCBI

[35] Google Scholar

[ref17] 17. Kumar S, Carley K. Tree LSTMs with convolution units to predict stance and rumor veracity in social media conversations. In: Proceedings of the 57th annual meeting of the association for computational linguistics. 2019. p. 5047–5058. https://doi.org/10.18653/v1/p19-1498

[ref18] 18. Ma J, Gao W, Joty S, Wong K-F. An Attention-based rumor detection model with tree-structured recursive neural networks. ACM Trans Intell Syst Technol 2020;11(4):1–28.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref19] 19. Ahmed M, Samee M, Mercer R. You only need attention to traverse trees. In: Proceedings of the 57th annual meeting of the association for computational linguistics. 2019. p. 316–322.

[ref20] 20. Bian T, Xiao X, Xu T, Zhao P, Huang W, Rong Y, et al. Rumor detection on social media with bi-directional graph convolutional networks. AAAI 2020;34(01):549–56.
View Article
Google Scholar

[42] View Article

[43] Google Scholar

[ref21] 21. Sun T, Qian Z, Dong S, Li P, Zhu Q. Rumor detection on social media with graph adversarial contrastive learning. In: Proceedings of the ACM web conference 2022. 2022. p. 2789–2797. https://doi.org/10.1145/3485447.3511999

[ref22] 22. Jia H, Wang H, Zhang X. Early detection of rumors based on source tweet-word graph attention networks. PLoS One 2022;17(7):e0271224. pmid:35816493
View Article
PubMed/NCBI
Google Scholar

[46] View Article

[47] PubMed/NCBI

[48] Google Scholar

[ref23] 23. Ye N, Yu D, Zhou Y, Shang K, Zhang S. Graph convolutional-based deep residual modeling for rumor detection on social media. Mathematics 2023;11(15):3393.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref24] 24. Tao X, Wang L, Liu Q, Wu S, Wang L. Semantic evolvement enhanced graph autoencoder for rumor detection. In: Proceedings of the ACM web conference 2024. 2024. p. 4150–4159. https://doi.org/10.1145/3589334.3645478

[ref25] 25. Yu D, Zhou Y, Zhang S, Li W, Small M, Shang K. Information cascade prediction of complex networks based on physics-informed graph convolutional network. New J Phys 2024;26(1):013031.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref26] 26. Huang Q, Zhou C, Wu J, Liu L, Wang B. Deep spatial–temporal structure learning for rumor detection on Twitter. Neural Comput Appl. 2020:1–11.

[ref27] 27. Zubiaga A, Liakata M, Procter R, Wong Sak Hoi G, Tolmie P. Analysing how people orient to and spread rumours in social media by looking at conversational threads. PLoS One 2016;11(3):e0150989. pmid:26943909
View Article
PubMed/NCBI
Google Scholar

[58] View Article

[59] PubMed/NCBI

[60] Google Scholar

[ref28] 28. Kochkina E, Liakata M. Estimating predictive uncertainty for rumour verification models. arXiv preprint. 2020.
View Article
Google Scholar

[62] View Article

[63] Google Scholar

[ref29] 29. Cheng M, Li Y, Nazarian S, Bogdan P. From rumor to genetic mutation detection with explanations: a GAN approach. Sci Rep 2021;11(1):5861. pmid:33712675
View Article
PubMed/NCBI
Google Scholar

[65] View Article

[66] PubMed/NCBI

[67] Google Scholar

[ref30] 30. Karim F, Majumdar S, Darabi H, Chen S. LSTM fully convolutional networks for time series classification. IEEE Access. 2018;6:1662–9.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref31] 31. Lim B, Arık SÖ, Loeff N, Pfister T. Temporal Fusion Transformers for interpretable multi-horizon time series forecasting. Int J Forecast 2021;37(4):1748–64.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref32] 32. Zhang X, Zhong C, Zhang J, Wang T, Ng WWY. Robust recurrent neural networks for time series forecasting. Neurocomputing. 2023;526:143–57.
View Article
Google Scholar

[75] View Article

[76] Google Scholar

[ref33] 33. Devlin J, Chang MW, Lee K, Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint. 2018.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

[ref34] 34. Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint. 2014.
View Article
Google Scholar

[81] View Article

[82] Google Scholar

[ref35] 35. Kochkina E, Liakata M, Zubiaga A. All-in-one: Multi-task learning for rumour verification. arXiv preprint. 2018.
View Article
Google Scholar

[84] View Article

[85] Google Scholar

[ref36] 36. Zhang Y, Feng M, Shang K, Ran Y, Wang C-J. Peeking strategy for online news diffusion prediction via machine learning. Physica A Stat Mech Appl. 2022;598:127357.
View Article
Google Scholar

[87] View Article

[88] Google Scholar

[ref37] 37. Van der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res 2008;9(11):257–65.
View Article
Google Scholar

[90] View Article

[91] Google Scholar

Figures

Abstract

Introduction

Problem statement

Proposed model

Token-level Transformer Encoder

Tree Transformer

Temporal GRU Encoder

Model loss

Experiments and results

Dataset

Experimental setup

Models for comparison

Rumor classification performance

Conclusions and future discussion

Supporting information

S1 Table. Experimental results of varying the value of k in LOEO validation for TTT - G (BU).

S2 Table. Experimental results of varying the value of k in LOEO validation for TTT - G (TD).

References