Figures
Abstract
Detecting anomalies in the Bitcoin transaction network is critical for ensuring blockchain security and stability. The network’s heterogeneous structure and dynamic nature, coupled with scarce labeled anomalies, pose significant challenges for traditional graph-based methods. To address these, we propose Bidirectional Fusion Heterogeneous Graph Network (BF-HGN), a semi- dynamic supervised model for Bitcoin transaction anomaly detection task. BF-HGN designs multi-type feature embedding and alignment strategies to effectively unify features across heterogeneous transaction–address nodes. A bidirectional temporal fusion mechanism is proposed to capture long-range temporal dependencies that unidirectional models often miss. To alleviate class imbalance and limited annotations, a Class-balanced Classifier (CBC) combined with Adjacency Adaptation (AA) and Adaptive Feature Space Regulation (AFSR) losses is proposed to generate pseudo-anomalous nodes closely resembling real anomalies, improving discrimination boundaries. Experiments on the Elliptic++ dataset demonstrate that BF-HGN outperforms existing methods, achieving F1 scores of 0.6301 and 0.5784 for transaction and address nodes, respectively, establishing a new benchmark for Bitcoin transaction anomaly detection.
Citation: Xiao B, Yin W (2026) Bidirectional fusion heterogeneous graph networks for semi-supervised Bitcoin transaction anomaly detection in dynamic transaction graphs. PLoS One 21(6): e0351051. https://doi.org/10.1371/journal.pone.0351051
Editor: Yang (Jack) Lu, Beijing Technology and Business University, CHINA
Received: February 4, 2026; Accepted: May 21, 2026; Published: June 8, 2026
Copyright: © 2026 Xiao, Yin. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The datasets used in this study are publicly available:https://www.github.com/git-disl/EllipticPlusPlus.
Funding: This work was supported by National Social Science Fund of China Program (24BJY093). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Bitcoin, as a decentralized digital currency [1,2], has gained widespread attention by virtue of its transparency and security. Although Bitcoin transactions are observable on-chain, its anonymity [3] also facilitates illegal activities such as money laundering and dark-net transactions [4–7], threatening financial health and stability. Therefore, focusing on the anomaly detection task in Bitcoin transactions themselves has become a research focus, and its accurate identification can enhance network security and provide support for regulation. In this context, the data provided by exchange-based Know Your Customer (KYC) procedures is particularly important, as it enables the effective identification of target deposit addresses and the perpetrators behind them [8].
At the data processing stage, labeling abnormal samples in bitcoin transaction data relies on expertise and is costly, while labeling normal samples is relatively easy. Therefore, this study proposes a special semi-supervised learning framework that only a small number of normal samples are labeled. In addition, on-chain transaction data comprises both transaction and address nodes, which differ in their semantic meanings and topological roles. However, most previous studies [9–11]modeled transactions as a single type of node, thereby overlooking important cross-type relational information. Based on the aforementioned data characteristics, we integrate the Elliptic++ dataset [12] (Enriching heterogeneous information based on the Elliptic dataset [13]) to construct the first Bitcoin dynamic heterogeneous graph dataset.
In recent years, graph neural network (GNN) [14] has gradually become a research hotspot in Bitcoin transaction network analysis. Compared with traditional methods, GNNs are capable of efficient graph structure modeling, node embedding learning, and multi-level information integration, which can effectively handle network complexity and generate low-dimensional embedding representations. Commonly used techniques include graph convolutional network (GCN) [15] and graph autoencoder (GAE) [16]. Whereas, GCN-based methods have limitations in dealing with long distance dependencies, while GAE-based methods are suitable for unsupervised and semi-supervised tasks but are prone to overfitting problems when the data is extremely unbalanced. Specifically, Pareja et al. [17] proposed EvolveGCN, which extends GCN to learn representations of dynamic graphs. Zhao et al. [18] introduced GraphSMOTE, focusing solely on addressing class imbalance in transaction data. Liu et al. [19] developed EvolveGAN, emphasizing the capture of temporal evolution features in dynamic graphs. However, these methods have not fully addressed the combined challenges of transaction data heterogeneity, class imbalance, and temporal dynamics. To address the issues simultaneously, we propose a Bidirectional Fusion Heterogeneous Graph Network (BF-HGN), which consists of the multi-feature fusion-based feature extraction module and the Class-balanced Classifier (CC).
In the feature extraction stage, the heterogeneous graph contains node classes (Transactions and Addresses) with distinct feature dimensions and semantic spaces. To handle this heterogeneity, we experimented with several heterogeneous feature preprocessing strategies and determined the optimal one. As illustrated in Fig 1a, the proposed scheme first applies independent GCNs to the homogeneous subgraphs ( and
) composed of transaction nodes and address nodes, respectively, to achieve feature-dimension unification and semantic alignment within each class. Based on the processed embeddings, a dynamic heterogeneous graph
is reconstructed, upon which RGCN-based [20] network is designed to extract cross-type, high-order relational features, thereby enhancing the model’s capability to represent heterogeneous information. Meanwhile, in dynamic modeling of the Bitcoin transaction network, temporal evolution is a crucial aspect. However, the basic RGCN cannot explicitly encode temporal dependencies between consecutive time steps. To address this limitation, we incorporate the idea of EvolveGCN [17], which couples Long Short-Term Memory (LSTM) [21] with graph convolutions, and propose an extended framework named EvolveRGCN. Nevertheless, conventional dynamic models typically employ unidirectional LSTM, which constrains their ability to learn long-range dependencies when the temporal span is large. To overcome this limitation, inspired by the design of Bidirectional LSTM (Bi-LSTM) [22] in dynamic signal anomaly detection [23,24], we further propose two bidirectional variants—Bi-EvolveGCN and Bi-EvolveRGCN—for feature extraction in our task. They fuse node features from forward and reverse temporal iterations to enhance the model’s ability to model long-term dynamic transactions. As demonstrated in Fig 1b, taking the subgraph at time point 2 as an example, the fused feature
is generated by aggregating the forward feature at time point 2 and the reverse feature
at time point 2 and the reverse feature
at time point 5. This mechanism effectively enhances the model’s ability to mine associated features between nodes over a long-time span.
(a) Bidirectional temporal feature fusion: captures forward and reverse temporal dependencies via Bi-EvolveGCN and Bi-EvolveRGCN to mine long-range temporal associations; (b) Homogeneous-heterogeneous feature transformation: unifies feature dimensions of transaction and address nodes and constructs dynamic heterogeneous graphs; (c) Class-balanced Classifier: generates pseudo-anomalous nodes constrained by AA and AFSR losses to alleviate class imbalance in semi-supervised scenarios.
In the classification stage, to address the problem of severe class imbalance caused by unlabeled abnormal samples, we propose the Class-balanced Classifier (CC). Its aim is to generate pseudo-abnormal nodes that have similar characteristics to real hard abnormal nodes. The generation process comprehensively considers two core characteristics of real abnormal nodes. One characteristic stems from the differences in the adjacency relationships between real abnormal nodes and normal nodes. We designed the Adjacency Adaptation (AA) loss function to adjust the adjacency relationships of pseudo-abnormal nodes. Specifically, as shown on the left side of Fig 1c, the pseudo-abnormal transaction nodes and address node set
are generated through transformation by referring to the adjacency relationships of the real abnormal node sets
and
and adjusting the adjacency relationships of some normal node sets
and
. The second characteristic, as shown on the right side of Fig 1c, targets the problem that the core of the task lies in the high similarity between some abnormal nodes and normal nodes in terms of features, which makes detection difficult. We propose Adaptive Feature Space Regulation (AFSR) loss function, which optimizes the model’s ability to recognize hard abnormal situations by controlling the distribution of the pseudo-abnormal and the normal node set in the feature space.In summary, the main innovations of this paper are as follows:
- The task of “semi-supervised bitcoin anomaly detection for dynamic heterogeneous graphs” is defined for the first time, and the Bidirectional Fusion Heterogeneous Graph Network (BF-HGN) is designed, which provides a preliminary modeling framework for this field.
- To address the dynamic and heterogeneous nature of Bitcoin transaction data, we propose EvolveRGCN—a relational graph convolutional network designed for feature extraction in dynamic settings. Building upon EvolveGCN, we further develop a progressive feature learning strategy that transitions from homogeneous to heterogeneous representations, enabling deeper semantic mining of transaction structures. To better capture under-explored temporal dependencies, we introduce Bi-EvolveGCN and Bi-EvolveRGCN, which incorporate bidirectional feature fusion. By integrating node representations from both forward and reverse temporal directions, these models effectively capture complex dependencies across temporal subgraphs.
- To tackle the class imbalance problem caused by unlabeled abnormal samples in transactions, we design CC module, aiming to generate pseudo-abnormal nodes whose feature structures are similar to those of real hard abnormal nodes. This process is constrained by the proposed dual loss functions: the AA loss function constrains its relationship with adjacent nodes, and the AFSR loss function constrains its feature space distribution.
2 Related work
2.1 Bitcoin transaction anomaly detection
Bitcoin has gained widespread application in the financial transaction sector due to its anonymity and immutability. Yet, these characteristics also contribute to more illicit transactions thought Bitcoin. Thus, the Bitcoin transaction anomaly detection task emerges as the times require. Its core lies in analyzing blockchain transaction records to identify anomalies, a process that can be conducted using elements such as the graph structure characteristics of the transaction network, node attributes, and dynamic patterns [25]. Current mainstream detection methods can be broadly categorized into two types: the first is feature engineering-based methods, which extract features such as transaction frequency, amount, and node connectivity, and utilize traditional machine learning algorithms like support vector machines (SVM) [26] and random forests (RF) [27] for detection. Such methods do not take into account the relationships between transactions; The second category involves deep learning-based methods [28], which construct dynamic graph neural networks (DGNNs) or self-supervised learning models to learn latent features, thereby enhancing detection accuracy and robustness. It can effectively capture time-series and node interaction information [19,29]. Although related research has made progress, challenges such as scarce labeled data and the strong concealment of malicious behavior remain. In this context, leveraging unsupervised and semi-supervised learning techniques to enhance detection capabilities has become an important future research direction [30].
2.2 Dynamic graph model for Bitcoin transaction anomaly detection
With the development of GNNs, the dynamic Graph Anomaly Detection (DGAD) methods [31,32] have also demonstrated excellent performance in the task of Bitcoin transaction anomaly detection. This section mainly discusses the three DGAD settings: supervised, unsupervised, and semi-supervised.
Supervised learning constructs models using labeled data to detect anomalies in new dynamic graphs. This method achieves high accuracy and effectively learns the differences between normal and anomalous patterns. It can be primarily divided into two categories. The first category is classification-based methods, such as SVM and decision trees. Taking the dynamic social network as an example, such methods perform anomaly detection solely by inputting node features at different time points; the second is deep learning methods, such as using CNN and RNN to process dynamic graph sequence data, extract features, and apply them to classification tasks [33,34].
Unsupervised learning does not require labeled data; it identifies anomalies based on the distribution and structural characteristics of the data, if normal data follows specific patterns while anomalous data deviates from these patterns. This approach is suitable for scenarios where labeled data is difficult to obtain. Commonly used methods include density-based algorithms (e.g., LOF [35] identifies anomalies by calculating local density), clustering methods (where normal data forms tight clusters and anomalous points deviate from the cluster center), and reconstruction-based graph autoencoders (GAE reconstructs graph structures, such as DOMINANT [36] using GCN to implement GAE’s graph structure reconstruction, while AnomalyDAE [37] enhances structural reconstruction performance). Additionally, some researchers have proposed the TAM [38] method to conduct in-depth exploration of the relationships between nodes and subgraphs, but there is still room for improvement in its utilization of labeled information.
Semi-supervised learning lies between supervised and unsupervised learning, utilizing a small amount of labeled data and a large amount of unlabeled data to reduce the demand for labeling while enhancing the model’s generalization ability. Current methods can be divided into traditional classification algorithms (such as semi-supervised SVM, which constructs an initial classifier and iteratively updates it) and deep learning algorithms based on GCN. Although one-class classification [39–41] is widely applied in visual data, it is rarely seen in the node-level anomaly detection task for dynamic graphs. A common requirement for this type of task is that both normal nodes and abnormal nodes are labeled, which has high requirements in terms of cost. By comparison, the semi-supervised method of one-class classification is more practical in the Bitcoin transaction anomaly detection task.
2.3 Dynamic heterogeneous graph model for Bitcoin transaction anomaly detection
The key distinction between GNNs and other neural networks lies in their adherence to a message-passing mechanism [42], whereby each node aggregates information from its neighboring nodes. Heterogeneous GNNs further consider the heterogeneity of the graph, independently distinguishing between various types of nodes and edges. Dynamic heterogeneous graph neural networks (DHGNNs) further explore time-based information in dynamic graphs. In real-world scenarios, such as the Elliptic++ dataset [12], Bitcoin transactions at each time step are formed through interactions between transaction nodes and address nodes. Essentially, this type of data can be regarded as a form of dynamic heterogeneous graph data. Zhang et al. proposed DHGAS, which encodes temporal information [43] before performing heterogeneous message passing; others employed sequence-based models to aggregate information from different time slices [19]. Additionally, in semi-supervised heterogeneous graph anomaly detection tasks, researchers often adopt methods based on Generative Adversarial Networks (GANs) [44,45], which leverage adversarial training with a small amount of label information to enhance the model’s ability to perceive anomalies. Along this line of research, Nair et al. proposed a data-driven risk analytics framework using semi-supervised heterogeneous graph modeling for blockchain transaction fraud [46]. Similarly, Santos et al. studied adaptive graph neural analytics for cryptocurrency anomaly detection under limited labeled data conditions [47]. Finally, recent advances have presented new ideas for the task of heterogeneous and temporal feature fusion. H2CAN [48] uses heterogeneous hypergraph attention to model high-order cross-modal interactions and adopts counterfactual learning to reduce bias for multimodal sentiment analysis. STEAM [49] mines structural-temporal features via motif-augmented hypergraphs and fine-grained temporal autoencoder to detect motif-level anomalies in dynamic graphs.
3 Method
Facing the semi-supervised, dynamic, and heterogeneous characteristics of real-world Bitcoin transaction data, we propose the Bidirectional Fusion Heterogeneous Graph Network (BF-HGN) for anomaly detection tasks. BF-HGN encompasses two core modules: the Multi-type Feature Fusion Extractor (MFFE) and the Class-balanced Classifier (CC). During the feature extraction phase, MFFE enriches the heterogeneous information between transaction and address nodes on the basis of extracted basic features. It employs a bidirectional temporal fusion mechanism to embed temporal information, enabling in-depth mining of cross-time subgraph dependencies. This module includes the Bi-EvolveRGCN and Bi-EvolveGCN core sub-networks, the technical details of which are presented in Section 3.1. In the classification phase, CC is designed to generate pseudo-anomalous nodes similar to real hard anomalous nodes, mitigating the class imbalance issue. The specific implementation is detailed in Section 3.2. Section 3.3 elaborates on the model’s loss function system, where the AA loss function and the AFSR loss function play crucial guiding roles in the generation of the pseudo-anomalous node set.
According to the dynamic characteristics of the data in this task, it is divided into subgraphs at consecutive T time points. The transaction graph and the address graph at a certain time point are denoted as
and
respectively. Taking
as an example,
represents the node features of
, where
is the dimension of the features;
is the adjacency matrix for the edge relationships of
if there is an edge between the source node
and the target node
, and 0 otherwise.
3.1 Multi-class feature fusion extractor
For the dynamic graph scenario, the EvolveGCN can only capture the dependency relationships of dynamic subgraphs in the forward time direction and is insufficient in mining the dependency relationships in the reverse time direction. To address this, we propose a dual-time-direction feature fusion mechanism and constructs the Bi-EvolveGCN network to achieve bidirectional feature aggregation of dynamic subgraphs. Additionally, we design the Bi-EvolveRGCN feature extraction network suitable for dynamic heterogeneous graphs based on LSTM and RGCN. Moreover, the preprocessing step for heterogeneous features is crucial for the model to subsequently learn the complex heterogeneous relationships between nodes, directly influencing the extraction efficiency of key information and the ability to identify abnormal situations.
3.1.1 Bi-EvolveGCN.
Taking time point as an example, (a) and (b) in Fig 2 are respectively implemented based on two EvolveGCN networks, and the core difference between two lies in the opposite time directions of parameter update. By fusing the feature information in the forward and reverse directions, the Bi-EvolveGCN is finally constructed, which integrates the node features extracted by both.
First, the address graph and transaction graph
are processed by (a) and (b) Bi-EvolveGCN respectively to unify the dimensions of all nodes, so as to construct the heterogeneous graph
. Then,
goes through (c) Bi-EvolveRGCN for feature extraction to obtain the sub-graph
. Finally,
is input into (d) CC to achieve node classification.
Next, this section will take the Bi-EvolveGCN at the -th layer as an example for introduction. Before it, we will first explain the meanings of the common parameters in the subsequent formulas:
and
are the learnable parameters of the set of the transaction node set updated in the forward and backward time directions at the time
, respectively. Firstly, we focus on the updating process of
and
, where the update formula for
is:
LSTM, as a special type of RNN, can effectively capture the dependency relationships in time series. The forward learnable weight parameter at time
is input into the LSTM unit. Correspondingly, the update formula for
is:
where at time
is updated by
at time
. Through the forward and backward time-directed LSTM updating mechanism, the model is able to better fit the temporal change characteristics of the dynamic graph.
Subsequently, further node feature updating operations in the forward and backward time directions are performed based on . The update formula for the forward node feature
at layer
is as follows:
where is a ReLU [50] linear transformation, and nonlinear factors can be introduced to enhance the model representation.
is the transaction class node adjacency matrix at time
, describing node connectivity.
is the
th layer forward transaction class node feature at time
, and
is the corresponding intercept. Correspondingly, the update formula for the reverse node feature
based on the output of
is:
Finally, if the total number of layers of Bi-EvolveGCN is , it is necessary to perform the fusion operation of the node features
and
, as given in equation:
The above operation is to splice with
in the 1st dimension to obtain the transaction class node feature
extracted by Bi-EvolveGCN. Similarly, the address node class feature
can be obtained by the same method.
3.1.2 Bi-EvolveRGCN.
Building on the design concept of Bi-EvolveGCN, (c) in Fig 2 is based on RGCN and also adopts the strategy of bidirectional time-dependent learnable parameter update, thus designing Bi-EvolveRGCN suitable for dynamic heterogeneous graphs. The core difference between RGCN and GCN is that the former achieves the heterogeneous graph feature extraction by aggregating the neighboring nodes of multi-class edges separately, while GCN does not design a special mechanism for aggregating the neighboring nodes of multi-class edges.
Next, this section starts with the introduction based on the th layer Bi-EvolveRGCN. Before it, the meanings of common parameters are first interpreted:
represents the set of nodes adjacent to node
with an adjacency relationship
is the set of edge classes);
and
are the features of node
updated in the forward and backward time direction, respectively, by the
-th layer of RGCN at time
. Among them, the update formula for
is as follows:
where represents the ReLU linear transformation;
is the learnable bias used to update nodes under the adjacency relationship
.
is the number of nodes of node
under the adjacency
, which is used to normalize the process of feature update.
is updated from
and is used to update the matrix weight parameter of the node set under the adjacency relationship
for the target node in the forward time direction. Accordingly,
is updated by the following equation:
where is the matrix weight parameter updated from
.
3.1.3 Feature transformation method.
To screen the optimal feature transformation [51] methods, we compare and analyzes various schemes through Fig 3, including the use of fully connected layers (FC), GCN, etc. for node-set feature transformation, as well as the expansion operation for low-dimensional feature node sets. The influence of different fusion methods on the model effect is further explored later in Section 4.4.
(a) shows the scheme of node set feature transformation using FC; (b) depicts the process of transforming graph node set features through GNN-based modeling; and (c) shows the scheme of carrying out feature expansion for node sets with less feature dimensions.
3.2 Fully connected layer feature transformation methods
As shown in Fig 3a, we adopt two FC layers to transform the features of different nodes respectively, so that the feature dimensions of different nodes are consistent. The specific realization of the formula is:
where is defined as the process where the two types of node sets x and y are fused with all adjacent edges
to generate a heterogeneous graph
which is then input into the multi-layer Bi-EvolveRGCN for feature extraction.
3.3 GCN feature transformation methods
As shown in Fig 3b, we employ two Bi-EvolveGCN to pre-adjust the homogeneous graph data respectively. The major difference between this method and the first method lies in that it fully considers the connection relationship between homogeneous nodes. The specific implementation is as follows:
where and
denote the transaction homogeneous graph and address homogeneous graph, respectively, and
is the heterogeneous graph. Specifically,
and
are the fundamental homogeneous components, and
is formed by integrating these two homogeneous subgraphs.
3.4 Random forest feature expansion methods
Fig 3c adopts a feature expansion method to augment the features of the node class with low feature dimensions (the address node class) to achieve the matching of data dimensions among different node classes. Given the advantages of the Random Forest (RF) algorithm in feature importance assessment, we introduce it for pre-training to obtain the importance value of each feature. For this purpose, we obtain the feature importance sequence (the sum of values being 1) as shown in Fig 4. Each feature is expanded based on this ratio to achieve the unification of feature dimensions for different node classes. The specific formula is as follows:
where is the operation of feature expansion on the node set
based on the importance array
.
Based on the comparison experiments in Section 4.4, the feature extraction scheme fusing Bi-EvolveGCN and Bi-EvolveRGCN is selected. This scheme effectively realizes multi-category node feature conversion and extraction with excellent performance.
3.5 Class-balanced classifier
In the classification stage, we design an innovative Class-balanced Classifier to accomplish the effective classification task of multi-class nodes. As shown in Fig 2d, is the sub-graph after the feature extraction stage at time
. Here,
(
) and
are the node set and the adjacency matrix respectively,
is the total number of nodes in the graph, and
is the dimension of the output node features.
First, we randomly select the sets of labeled normal nodes with a proportion of (node ID sets
, total number
as the node set to be converted, denoted as
. aims to control the proportion of generated abnormal nodes, so as to help BF-HGN learn the feature differences between normal and abnormal nodes. Next,
embeds the corresponding neighbor nodes to obtain the initial set of pseudo-abnormal nodes, and the formula is as follows:
where is the adjacency matrix of
and
represents the neighbor embedding features of the initial abnormal nodes. Meanwhile, the noise matrix
that obeys a specific normal distribution (mean
, variance
) and has a dimension consistent with the pseudo anomalous node feature
is pre-generated and added to
:
where denotes the generation of a normally distributed random number matrix with the same dimension as
.
is random and cannot simulate real abnormal nodes, so it cannot be directly defined as a pseudo – abnormal node. The purpose of adding noise to it is to form a symmetry in the feature space with the normal node set, which is consistent with the characteristics of hard abnormal nodes and can guide the generation of pseudo-abnormal nodes to be biased towards hard abnormal nodes.
Subsequently, the FC and ReLU activation layers are sequentially introduced to process with the following equations:
Then, the normal node features are sequentially connected to the features of
with the following formula:
where is the ID set of labeled normal nodes.
denotes splicing the
node set and the
node set in the
th dimension. Meanwhile,
is used to replace the pseudo anomalous node set in
, and the formula is as follows:
where means replacing the node features of
with the node features of
by taking
as the reference number for corresponding nodes. Finally, a multilayer perceptron (MLP) [52] consisting of three FC and two ReLU activation layers is introduced to perform the node classification task with the following formula:
where is a multilayer perceptual machine method with
.
3.6 Loss function
To generate abnormal nodes that are more conducive to model learning, we propose an AA loss function and AFSR loss function to constrain the generation of pseudo-abnormal nodes. Meanwhile, the Binary Cross-entropy (BCE) loss is set as the basic loss function.
3.6.1 Loss function for the degree of neighbor aggregation.
To incorporate the graph structure prior of abnormal nodes into the generation process of outlier nodes, BF-HGN leverages the differential characteristic of the adjacency aggregation degree, forcing the aggregation degree of pseudo-abnormal nodes towards their neighbors to be lower than that of normal nodes. In,
represents the features of all nodes. First, the function needs to calculate the normalized similarity matrix
between the target node and its various types of neighbor nodes. The formula is as follows:
Among them, represents calculating the Euclidean distance between vectors
and
.
is the similarity matrix containing all nodes. Then,
is multiplied by the original adjacency matrix
to obtain the weighted similarity matrix
.
Next, based on , calculate the aggregation degree of the target node for different classes of adjacent nodes in the graph structure. The formula is as follows:
In the above formula, the numerator calculates the total weighted similarity between node of class
and adjacent nodes of class
; the denominator calculates the number of adjacent nodes of class
for node
of class
in the graph structure. The aggregation degree value
is obtained by dividing the two. The higher its value, the closer the relationship with other nodes.
Then, based on , the mean values of affinity for multi-class normal/abnormal nodes,
and
, are calculated. The formula is as follows:
where, when , the function calculates the affinity mean between normal nodes and
is the number of normal nodes of class
,
is the normal node index set of class
; similarly, when
, the function calculates the affinity mean of abnormal nodes. The two means reflect the average level of the respective degree of aggregation of the normal and abnormal node sets, respectively.
Finally, based on and
, the adjacency adaptation loss function
is derived from the following equation:
where the predefined parameter is subtracted by the overall adjacency aggregation degree difference
to obtain an intermediate result, which reflects the deviation between the positive – negative aggregation degree difference of the pseudo-abnormal nodes generated by BF-HGN and the expected difference.
3.6.2 Adaptive feature space regulation loss function.
Since the AA loss function only considers the adjacency aggregation degree during the generation process of pseudo-anomalous nodes and fails to take into account the connection of the feature space distribution among node sets. Therefore, combining the idea that the feature distribution of hard-anomalous node sets is more conducive to model learning, we design the Adaptive Feature Space Regulation (AFSR) Loss Function. It focuses on modeling the anomalous node sets that are highly similar to the normal node set in the feature space, so as to enhance the model’s ability to learn complex decision boundaries and enable it to effectively capture the key feature differences at class boundaries during the training process. The formula of the AFSR loss function is as follows:
To analyze the above equation: the innermost , where
is the eigenvalue of the generated pseudo-abnormal nodes in the
th sample,
th embedding vector dimension, and
is the eigenvalue of the corresponding position of the noisy normal node. This part calculates the squared value of the difference between
and
, which can amplify the feature difference, so that the subtle difference can also be significantly reflected in the subsequent calculation. The intermediate layer
takes the square root of the difference value, and takes into account the differences of the characteristics of each dimension, and obtains the value reflecting the degree of the overall characteristic differences of the sample. Finally, the function sums up the difference values of all the nodes and divides them by the number of nodes
, and calculates the average AFSR loss.
3.6.3 Overall loss function.
On the basis of the AA and AFSR loss function, we construct a complete categorization loss system by introducing the BCE loss function [53]. The formula for the BCE loss is as follows:
where denotes the probability that the node output by the classifier is a normal node;
is the label of node
, with a value of 1 if the node is labeled as a normal node and 0 otherwise; and
and
are the category weights.
Finally, the hyperparameter ?? performs a weighted summation of the three loss functions to adjust the influence of different losses on the BF-HGN:
where and
is the set of time series
4 Experiments
4.1 Dataset
Elliptic++ is currently the largest publicly available heterogeneous dataset of Bitcoin transactions [13,14], containing two types of nodes, transaction (Tr) and address (addr), as well as four types of edge relationships: Tr-Tr, addr-addr, Tr-addr, and addr-Tr. The dataset is divided into 49 time points by time, where 1–30 is the training set, 31–35 is the validation set, and 36–49 is the test set. To meet the research needs, two preprocesses are performed on the dataset: one is to divide the edge relations by time points to clarify the temporal correlation; In addition, the labeled abnormal samples are relabeled as the unknown class.
The node data is shown in Fig 5 The Tr node contains 167-dimensional features (including time information), with a total of 203,769 nodes, of which 42,019 are positive samples of legitimate transactions; the addr node contains 56-dimensional features (including time information), with a total of 920,691 nodes, of which 338,871 are positive samples of legitimate transactions, and the rest of the nodes are labeled as unknown. Edge data as shown in Fig 6 has four types including Tr-Tr (234,355), Addr-Addr (2,868,964), Tr-Addr (837,124) and Addr-Tr (477,117), and the edge relationship is featureless.
4.2 Evaluation protocol
The binary confusion matrix is the core framework for quantifying the error between the true value and the prediction result, and contains three key metrics: precision, recall, and F1 value.
Precision is calculated by the formula:
where TP is the number of positive cases correctly identified and FP is the number of negative cases misjudged as positive cases, this indicator reflects the accuracy of the model in determining positive cases. Recall is calculated as:
where FN is the number of missed positive cases, which measures the model’s ability to capture actual positive cases.
The F1 value is used as a reconciled average of the two and is calculated as:
This metric comprehensively evaluates the overall performance of the model in the classification task by balancing precision and recall.
4.3 Implementation
This experiment is based on the PyTorch framework and uses NVIDIA RTR 3090 GPUs to accelerate model training and inference. The optimizer is selected from Adam [54], and the initial learning rate is set to 0.001 to balance the training stability and convergence speed. For data sampling, subgraphs from five consecutive time points are randomly selected as training samples each time to enhance the model’s generalization ability. The training configuration is 100 epochs to ensure that the data features are fully captured. For the class imbalance problem, a binary cross-entropy loss function is applied, and the class weights and
are set to 0.35 and 0.65, respectively, to optimize the recognition of a few classes. In addition, the variance
and mean
in Eq. (23) are 0.005 and 0.015, respectively. The neighbor aggregation difference C in Eq. (21) is 0.7.
4.4 Comparison with state-of-the-art methods
In this paper, we propose a novel baseline method in the field of Bitcoin transaction anomaly detection and conduct comparison experiments with existing mainstream homogeneous graph static/dynamic semi-supervised and unsupervised methods. Table 1 presents the experimental comparison results of the BF-HGN with multiple baseline methods on the Elliptic++ dataset. To guarantee the reliability of the results, each method repeats the experiment 10 times and takes the average value as the evaluation index. The analysis shows that BF-HGN outperforms the comparison methods.
First, in comparison with static semi-supervised methods GCN [15] and Skip-GCN [55], BF-HGN achieves +19.97%/ + 8.24%@F1 on Tr and Addr node sets, respectively. This result verifies the significant role of temporal information embedding in node features and the introduction of heterogeneous information across the node classes in the optimization of the model. Secondly, static unsupervised learning methods are AEGIS [56] and TAM [38], BF-HGN achieves +11.8%/ + 8.27%@F1 on the Tr and Addr node sets, respectively, when compared with the better-performing TAM. While comparing with the dynamic unsupervised method GADY [57], BF-HGN also has + 10.27%/ + 7.4%@F1, indicating that training with partially labeled normal nodes enhances model targeting and improves performance and generalization. Finally, in the dynamic semi-supervised task, the GPN [58] optimized based on EvolveGCN-O[17] is suitable for the Bitcoin anomaly detection task, while BF-HGN still achieves +2.49%/ + 2.17%@F1 on the Tr and Addr node sets. This confirms that the embedding of heterogeneous information between node classes can enrich the model input and enhance the feature expression capability.
Furthermore, to fully verify the superiority of BF-HGN in heterogeneous graph modeling, we compare BF-HGN with three heterogeneous graph methods: HAN [59], HGT [60], and HetGNN [61]. Experimental results show that BF-HGN outperforms HAN by +14%/ + 4.94%@F1 on the Tr and Addr node sets, HGT by +18.9%/ + 14.69%@F1, and HetGNN by +8.37%/ + 2.64%@F1. This demonstrates that the bidirectional fusion and heterogeneous feature extraction mechanism of BF-HGN is more suitable for dynamic Bitcoin transaction graphs than general heterogeneous graph models.
Additionally, experiments are conducted in this section for comparison of three feature transformation/expansion methods (BF-HGN (& FC), BF-HGN (& FE), and BF-HGN) proposed in Section 2.1. These methods achieve 61.84%/62.2%/63.01%@F1 on the Tr node set, and 57.43%/57.07%/57.84%@F1 on the Addr node set, respectively. These results demonstrate the optimality of the joint feature extraction strategy of Bi-EvolveGCN and Bi-EvolveRGCN, verifying the pivotal role of node class relationship information embedding in the feature extraction stage.
Overall, the experimental results show that BF-HGN performs better on the Tr node set than on the Addr node set. When combined with the dataset feature scores, it is evident that the Tr node set has a smaller data size and a lower proportion of abnormal nodes. This suggests that the method has a significant advantage when dealing with small samples and datasets with a low proportion of abnormalities.
In summary, BF-HGN has two unique advantages. First, it provides richer semantic information for single-category node set classification by effectively mining cross-node class associative information. Second, BF-HGN balances the sample classes during pseudo-abnormal node sample generation. This significantly improves anomaly detection performance.
4.5 Ablation experiment
To clearly verify the effectiveness of BF-HGN and the impact of each innovative module on performance, this section focuses on the core innovative directions proposed by the research. Ablation expriments are conducted on basic innovation modules, such as Bi-EvolveGCN, Bi-EvolveRGCN, CC, AA loss function and AFSR loss function.
The following methods are validated sequentially in this study: 1) Base (EvolveGCN + EvolveRGCN + MLP); 2) Base + Bi-EvolveGCN (Bi-EvolveGCN + EvloveRGCN + MLP); 3) Base + Bi-EvolveRGCN (EvolveGCN-O + Bi-EvolveRGCN + MLP); 4) Base + MFFE (Bi-EvolveGCN-O + Bi-EvloveRGCN + MLP). 5) Base + (Base embeds a class-balanced classifier and removes the loss function
); 6) Base +
(Base embeds a class-balanced classifier and removes the loss function
); and 7) Base + CC (Base embeds a class-balanced classifier and two loss functions). 8) BF-HGN. As shown in Table 2, BF-HGN achieves the following on the Tr and Addr datasets, respectively, as compared to the baseline method: + 10.61%/ + 10.58%@F1, + 10.48%/ + 10.86%@Pre, and +10.35%/ + 9.92%@Re.
Next, the differences among 2), 3), and 4) are primarily reflected in the feature extraction module. Compared to the Base model on the Tr and Addr node sets, 2) achieves + 1.19%/ + 1.09% @F1, 3) achieves + 0.97%/ + 1.65% @F1, and4) achieves +4.06%/ + 3.86% @F1. The experimental results demonstrate the significant advantages of the bidirectional temporal feature embedding method over the unidirectional mechanism. This method uses a fusion embedding of forward and reverse temporal information to capture contextual dependencies more efficiently.
Additionally, to explore the model’s ability to capture long-time-distance dependencies, this section selects the subgraphs of time points 35–49 for experimental comparison, and the results are shown in Fig 7 Analyzing the F1 as the core index reveals that each method performs well in the time interval 35–42. Since time point 43, model performance fluctuates. Further analysis of the experimental data reveals that the bidirectional temporal embedding methods (2), 3), 4) and 8)) significantly outperform Base in capturing long-range temporal dependencies.
Next, 5), 6), and 7) differ mainly in the classification phase and all show significant optimization compared to the base. On the Tr and Addr node sets, 5) achieves + 3.11%/ + 2.8%@F1, 6) achieves 4.96%/ + 4.9% @F1, and 7) achieves +7.22%/ + 8.07%@F1. Further analysis shows the following: first, CC balances samples by class by generating discrete abnormal nodes. This mechanism enables the model to learn feature representations from richer class distributions. It avoids a single class from dominating the training process, thus improving the model’s ability to capture multi-class patterns. Second, AA loss function plays a key role in optimizing the distribution of pseudo-abnormal nodes. It integrates the distance difference between real abnormal samples and neighboring nodes, adjusting the generated pseudo-abnormal node neighbor aggregation features. This enhances the model’s ability to discriminate neighboring anomalous node patterns. Lastly, AFSR loss function aligns the feature boundaries of the pseudo-abnormal and hard abnormal sample classes through spatial alignment, prompting the model to learn the complex anomaly class boundaries.
Finally, to further validate that the bidirectional fusion module can effectively capture the long-range temporal dependencies typically overlooked by unidirectional models, we perform a node-level case study in this section. Specifically, we randomly select one Tr node (ID: 54771037) and one Addr node (ID: 1Ej…isz) at time step 35, and statistically analyze the cross-timestep anomaly correlation patterns of these two nodes under both the Base model and BF-HGN. The detailed statistical results are summarized in Table 3, where the column Truth denotes the ground-truth number of associated anomalous nodes.
Between time steps 36 and 42, the anomaly detection performance of the Base model and BF-HGN shows little difference. However, after time step 42, the number of accurately detected anomalous nodes for both Tr and Addr node categories under BF-HGN is almost consistently higher than that under the Base model. In particular, at time step 46, BF-HGN detects 13 more anomalous Addr nodes associated with 1Ej…isz than the Base model. These results clearly indicate that BF-HGN exhibits superior long-range anomaly detection capability compared with the Base method.
It can be observed that the anomaly detection performance of BF-HGN presents an overall upward trend after time step 35, which quantitatively verifies that the proposed bidirectional fusion model has significantly stronger long-range anomaly mining ability than the unidirectional Base model.
4.6 Computational complexity analysis
Since the bidirectional temporal fusion mechanism requires forward and backward propagation, it may potentially double the computational cost. To clarify the actual computational overhead of our model, we perform a systematic time and space complexity analysis of BF-HGN and compare it with the baseline method in this section.
First, we analyze the complexity of the baseline model EvolveGCN-O. For a dynamic graph with time steps,
transaction nodes,
address nodes and d-dimensional node features, the time complexity of EvolveGCN-O is
, and its space complexity is
, which is used to store node features and model parameters.
For the proposed BF-HGN, its bidirectional temporal fusion module consists of Bi-EvolveGCN and Bi-EvolveRGCN. The time complexity of Bi-EvolveGCN is and the time complexity of Bi-EvolveRGCN is
. Benefiting from the parameter sharing mechanism in the forward and reverse temporal update processes, the overall time complexity of the bidirectional fusion stage is
, which only increases by a constant factor (less than twice) compared with the baseline. Meanwhile, the space complexity of BF-HGN remains
, which is completely consistent with that of the baseline model.
Furthermore, the feature dimensions of the Elliptic++ dataset are 56 and 167, and the number of neural network layers in the feature extraction stage is only 4. This shallow network structure results in an extremely small number of model parameters and low memory usage. In the context of large-scale computing power support, this extra memory overhead is almost negligible.
In summary, compared with the baseline EvolveGCN-O, BF-HGN only introduces a small amount of additional computational overhead, but achieves significant performance improvement in Bitcoin transaction anomaly detection. The proposed model strikes a desirable balance between computational efficiency and detection performance.
4.7 Quantitative experiment
In this section, quantitative experiments are conducted to optimize the key hyperparameters of BF-HGN, aiming to determine the optimal parameter configuration. Figs 8–10 present the experimental results of the three core hyperparameters, with detailed analyses carried out in sequence.
First, we focus on the layer number combination [x, y] of Bi-EvolveGCN and Bi-EvolveRGCN. The experiment evaluates the optimal parameters by limiting the combination within the range of [1, 3]. As shown in Fig 8, BF-HGN achieves the best performance when the number of layers in Bi-EvolveGCN is 1 and that in Bi-EvolveRGCN is 2. Specifically, the layer combination [1, 1] leads to the loss of effective information during classification due to insufficient feature extraction; whereas the configuration [1, 2] elicits the optimal synergistic effect between feature extraction and the classifier, enabling the model to exhibit excellent performance in node detection tasks. When the layer combination exceeds this configuration (e.g., [2, 1] or [3, 2]), BF-HGN tends to suffer from overfitting, resulting in a decline in generalization ability and ultimately affecting the detection performance.
During model training, we systematically tune the scaling parameter of the loss function in Eq. (24) to achieve balanced optimization of multiple loss terms. The experimental results are shown in Fig 9. In experiments on the Tr node set, model performance peaks at
, which can effectively balance the optimization objectives of each loss term. When
deviates from the optimal value, regardless of whether it increases or decreases, the model performance shows a downward trend. This is due to an excessively high weight of a single loss term leads to an imbalance in the optimization process, suppressing the effective adjustment of model parameters by other loss terms. In experiments with the Addr node set, the model achieves optimal performance at
. Based on the above experimental results,
is finally selected as the final parameter.
Finally, we systematically analyzed the influence of the proportion of pseudo – anomalous samples mentioned in Section 3.2 on BF-HGN. As shown in Fig 10, on the Tr node set, when
, the model performance reaches the optimum, and both the F1 value and key evaluation indicators show peak values; as
continues to increase, the detection accuracy shows a downward trend, so
is determined as the optimal value. On the Addr node set, the best detection effect is also achieved when
, and the recognition ability of BF-HGN for normal/anomalous samples reaches the optimal balance. Therefore,
is finally set.
4.8 Visualization experiment
To deeply explore the performance of pseudo-abnormal nodes generated by BF-HGN, the subgraph of time 5 is selected as the object of visualization and analysis in this experiment, and the t-SNE dimensionality reduction algorithm is used to realize the visual presentation of the feature space. By comparing the benchmark method and the BF-HGN method on the Tr node set and Addr node set in Fig 10, the normal nodes are marked by blue circles, green circles mark the pseudo-abnormal nodes generated by BF-HGN, and green dashed lines frame the distribution area of pseudo-abnormal nodes.
As shown in Fig 11a, the normal nodes in the base method form a clear, aggregated boundary in their spatial distribution. The node clusters generated by BF-HGN, however, are concentrated in a specific spatial region at the lower-left boundary position of the feature space, where the pseudo-abnormal nodes are mainly located. Though there is significant overlap between the distributions of pseudo-abnormal and normal nodes, a subtle spatial mismatch between the two can be observed upon closer inspection. The node distributions presented by both the base method and BF-HGN have significant boundary features, as shown in Fig 11b. The difference is that the pseudo-abnormal nodes generated by BF-HGN are clustered toward the left boundary of the space. This cluster does not completely overlap with the normal node set in terms of spatial dimension. This forms a unique distribution difference.
The visualization results show that the pseudo-abnormal nodes generated by BF-HGN are highly concealed and difficult to mine due to their spatial distribution characteristics, which are similar to but different from those of normal nodes. These characteristics provide more challenging training samples for the anomaly detection model and can effectively enhance its ability to recognize and detect anomalies in complex scenarios.
5 Conclusion and outlook
5.1 Conclusion
In this paper, we carry out systematic research on bitcoin transaction anomaly detection task and propose several innovative methods: firstly, to meet the practical requirements, we define the dynamic heterogeneous graph semi-supervised bitcoin anomaly detection task and design the Bi-directional Fusion Heterogeneous Graph Network (BF-HGN) to construct the basic framework. Second, in feature extraction, we improve upon RGCN to construct EvolveRGCN and combines EvolveGCN to design a gradual scheme. It also introduces LSTM to capture temporal features and deeply mines dynamic features through a fusion strategy. Further, we propose the Multi-type Feature Fusion Extractor. This improves the dynamic relationship modeling capability by capturing the upper and lower time-point subgraph associations. Lastly, we address the class imbalance problem caused by unlabeled anomalous samples by designing Class-balanced Classifiers. These classifiers balance the training data class distribution by generating pseudo-abnormal nodes constrained by AA and AFSR loss function.
5.2 Outlook
Future research can be extended to a broader range of financial transaction scenarios, thereby strengthening risk prevention and control capabilities. Further exploration of the optimization space of feature extraction and fusion strategies reveals potential associations in complex data and injects richer semantic information into the model. Meanwhile, continuous efforts should be made to refine the optimization path of loss functions to improve the generation quality of pseudo-anomalous nodes, so as to promote the security and stability of anomaly detection technologies in Bitcoin transactions and related fields. In addition to technical advancements, future studies should incorporate regulatory, ethical, and societal considerations into the design of anomaly detection systems. Inspired by the sociotechnical framework proposed by Rahman et al. [62], responsible and trustworthy FinTech development can be better supported in blockchain transaction surveillance, particularly with respect to regulatory compliance, transparency, and social accountability.
References
- 1.
Nakamoto S. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008. http://bitcoin.org/bitcoin.pdf
- 2. Nerurkar P, Bhirud S, Patel D, Ludinard R, Busnel Y, Kumari S. Supervised learning model for identifying illegal activities in Bitcoin. Appl Intell. 2020;51(6):3824–43.
- 3. Shahen Shah AFM, Karabulut MA, Akhter AFMS, Mustari N, Pathan A-SK, Rabie KM, et al. On the Vital Aspects and Characteristics of Cryptocurrency—A Survey. IEEE Access. 2023;11:9451–68.
- 4. Zheng B, Zhu L, Shen M, Du X, Guizani M. Identifying the vulnerabilities of bitcoin anonymous mechanism based on address clustering. Sci China Inf Sci. 2020;63(3).
- 5. Lee S, Yoon C, Kang H, Kim Y, Kim Y, Han D, et al. Cybercriminal minds: an investigative study of cryptocurrency abuses in the dark web. 26th Annual Network and Distributed System Security Symposium (NDSS) 2019. Inf Sci. 2019.
- 6. Paquet-Clouston M, Haslhofer B, Dupont B. Ransomware payments in the bitcoin ecosystem. J Cybersec. 2019;5(1):tyz003.
- 7. Liu J, Chen J, Wu J, Wu Z, Fang J, Zheng Z. Fishing for Fraudsters: Uncovering Ethereum Phishing Gangs With Blockchain Data. IEEE TransInformForensic Secur. 2024;19:3038–50.
- 8. Wu J, Liu J, Zhao Y, Zheng Z. Analysis of cryptocurrency transactions from a network perspective: An overview. J Netw Comput Appl. 2021;190:103139.
- 9. Cholevas C, Angeli E, Sereti Z, Mavrikos E, Tsekouras GE. Anomaly Detection in Blockchain Networks Using Unsupervised Learning: A Survey. Algorithms. 2024;17(5):201.
- 10.
Sanjay Rai G, Goyal SB, Chatterjee P. Anomaly detection in blockchain using machine learning. Computational Intelligence for Engineering and Management Applications: Select Proceedings of CIEMA 2022. Singapore. Springer Nature Singapore. 2023. p. 487–99. https://doi.org/10.1007/978-981-19-8493-8_37
- 11. Siddique Q. Anomaly Detection in Blockchain Transactions within the Metaverse Using Anomaly Detection Techniques. J Curr Res Blockchain. 2024;1(2):155–65.
- 12.
Weber M, Domeniconi G, Chen J, Weidele DKI, Bellei C, Robinson T, et al. Anti-money laundering in bitcoin: Experimenting with graph convolutional networks for financial forensics. arXiv preprint 2019.
- 13.
Elmougy Y, Liu L. Demystifying fraudulent transactions and illicit nodes in the bitcoin network for financial forensics. Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 2023: 3979–90. https://doi.org/10.1145/3580305.3599803
- 14.
Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv preprint. 2016.
- 15. Zhang S, Tong H, Xu J, Maciejewski R. Graph convolutional networks: a comprehensive review. Comput Soc Netw. 2019;6(1):11. pmid:37915858
- 16.
Bandyopadhyay S, N L, Vivek S V, Murty M N. Outlier resistant unsupervised deep architectures for attributed network embedding. Proceedings of the 13th international conference on web search and data mining (WSDM). 2020: 25–33. https://doi.org/10.1145/3336191.3371788
- 17. Pareja A, Domeniconi G, Chen J, Ma T, Suzumura T, Kanezashi H, et al. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs. AAAI. 2020;34(04):5363–70.
- 18.
Zhao T, Zhang X, Wang S. Graphsmote: Imbalanced node classification on graphs with graph neural networks. Proceedings of the 14th ACM international conference on web search and data mining (WSDM). 2021: 833–41. https://doi.org/10.1145/3437963.3441720
- 19. Liu C, Xu Y, Sun Z. Directed dynamic attribute graph anomaly detection based on evolved graph attention for blockchain. Knowl Inf Syst. 2024;66(2):989–1010.
- 20.
Schlichtkrull M, Kipf TN, Bloem P, Van Den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. European semantic web conference (ESWC). Cham: Springer International Publishing, 2018: 593–607.
- 21. Greff K, Srivastava RK, Koutnik J, Steunebrink BR, Schmidhuber J. LSTM: A Search Space Odyssey. IEEE Trans Neural Netw Learn Syst. 2017;28(10):2222–32. pmid:27411231
- 22. Jin R, Chen Z, Wu K, Wu M, Li X, Yan R. Bi-LSTM-Based Two-Stream Network for Machine Remaining Useful Life Prediction. IEEE Trans Instrum Meas. 2022;71:1–10.
- 23.
Li C, Zhan G, Li Z. News text classification based on improved Bi-LSTM-CNN. 2018 9th International conference on information technology in medicine and education (ITME). IEEE, 2018: 890–3. https://doi.org/10.1109/ITME.2018.00199
- 24. Roy DK, Sarkar TK, Kamar SSA, Goswami T, Muktadir MA, Al-Ghobari HM, et al. Daily Prediction and Multi-Step Forward Forecasting of Reference Evapotranspiration Using LSTM and Bi-LSTM Models. Agronomy. 2022;12(3):594.
- 25. Nayyer N, Javaid N, Akbar M, Aldegheishem A, Alrajeh N, Jamil M. A New Framework for Fraud Detection in Bitcoin Transactions Through Ensemble Stacking Model in Smart Cities. IEEE Access. 2023;11:90916–38.
- 26. Yue S, Li P, Hao P. SVM classification:Its contents and challenges. Appl Math Chin Univ. 2003;18(3):332–42.
- 27. Breiman L. Random Forests. Machine Learning. 2001;45(1):5–32.
- 28. Zhang Y, Tino P, Leonardis A, Tang K. A Survey on Neural Network Interpretability. IEEE Trans Emerg Top Comput Intell. 2021;5(5):726–42.
- 29. Ekle OA, Eberle W. Anomaly Detection in Dynamic Graphs: A Comprehensive Survey. ACM Trans Knowl Discov Data. 2024;18(8):1–44.
- 30. Zhou Y, Luo X, Zhou M. Cryptocurrency Transaction Network Embedding From Static and Dynamic Perspectives: An Overview. IEEE/CAA J Autom Sinica. 2023;10(5):1105–21.
- 31. Duan M, Zheng T, Gao Y, Wang G, Feng Z, Wang X. DGA-GNN: Dynamic Grouping Aggregation GNN for Fraud Detection. AAAI. 2024;38(10):11820–8.
- 32. Lo WW, Kulatilleke GK, Sarhan M, Layeghy S, Portmann M. Inspection-L: self-supervised GNN node embeddings for money laundering detection in bitcoin. Appl Intell. 2023;53(16):19406–17.
- 33.
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN. Convolutional sequence to sequence learning. International conference on machine learning (ICML). 2017: 1243–1252.
- 34.
Graves A. Long short-term memory. Supervised sequence labelling with recurrent neural networks. 2012:37–45.
- 35. Alghushairy O, Alsini R, Soule T, Ma X. A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams. BDCC. 2020;5(1):1.
- 36.
Ding K, Li J, Bhanushali R, Liu H. Deep anomaly detection on attributed networks. Proceedings of the 2019 SIAM international conference on data mining. Society for Industrial and Applied Mathematics (SDM). 2019: 594–602. https://doi.org/10.1137/1.9781611975673.67
- 37.
Fan H, Zhang F, Li Z. Anomalydae: Dual autoencoder for anomaly detection on attributed networks. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2020: 5685–9. https://doi.org/10.1109/ICASSP40776.2020.9053387
- 38. Qiao H, Pang G. Truncated affinity maximization: One-class homophily modeling for graph anomaly detection. Advances in Neural Information Processing Systems (NeurIPS). 2023;36:49490–512.
- 39.
Cai J, Zhang Y, Fan J. Self-discriminative modeling for anomalous graph detection. arXiv preprint 2023.
- 40. Liu Y, Ding K, Lu Q, Li F, Zhang L Y, Pan S. Towards self-interpretable graph-level anomaly detection. Advances in Neural Information Processing Systems (NeurIPS). 2023;36:8975–87.
- 41.
Ma R, Pang G, Chen L, Van Den Hengel A. Deep graph-level anomaly detection by glocal knowledge distillation. Proceedings of the fifteenth ACM international conference on web search and data mining (WSDM). 2022:704–14. https://doi.org/10.1145/3488560.3498473
- 42.
Hamilton W L, Ying R, Leskovec J. Representation learning on graphs: Methods and applications. arXiv preprint 2017.
- 43. Zhang Z, Zhang Z, Wang X, Qin Y, Qin Z, Zhu W. Dynamic Heterogeneous Graph Attention Neural Architecture Search. AAAI. 2023;37(9):11307–15.
- 44. Li Q, Wu G, Ni H, You T. Anomaly detection with dual-channel heterogeneous graph based on hypersphere learning. Inf Sci. 2024;681:121242.
- 45. Khazaei M, Ashrafi-Payaman N. An unsupervised anomaly detection model for weighted heterogeneous graph. J AI Data Min. 2023;11(2):237–45.
- 46. Nair AK, Iyer MS, Raman K. Data-driven risk analytics for blockchain transaction fraud: A semi-supervised heterogeneous graph modeling framework. Journal of Business and Data Analytics. 2025;3(3):43–61.
- 47. Santos MR, Rivera CMD, Villanueva AL. Adaptive Graph Neural Analytics for Cryptocurrency Anomaly Detection under Limited Labeled Data. Journal of AI Analytics and Applications. 2025;3(3):36–55.
- 48. Huang C, Lin Z, Huang Q, Huang X, Jiang F, Chen J. $$\text {H}^2\text {CAN}$$: heterogeneous hypergraph attention network with counterfactual learning for multimodal sentiment analysis. Complex Intell Syst. 2025;11(4).
- 49. Huang C, Yu B, Gao C, Tu Y, Jiang F, Huang X. Structural-temporal mining for motif-level anomaly detection in dynamic graphs. Knowledge-Based Systems. 2025;325:113962.
- 50.
Gao Y, Fang J, Sui Y, Li Y, Wang X, Feng H, et al. Graph anomaly detection with bi-level optimization. Proceedings of the ACM Web Conference. 2024: 4383–94. https://doi.org/10.1145/3589334.3645673
- 51.
Zhu R, Zhao B, Liu J, Sun Z, Chen CW. Improving contrastive learning by visualizing feature transformation. Proceedings of the IEEE/CVF international conference on computer vision (ICCV). 2021: 10306–15. https://doi.org/10.1109/ICCV48922.2021.01014
- 52.
Taud H, Mas JF. Multilayer perceptron (MLP). Geomatic approaches for modeling land change scenarios. Cham: Springer International Publishing; 2017. 451–5.
- 53.
Mao A, Mohri M, Zhong Y. Cross-entropy loss functions: Theoretical analysis and applications. International conference on Machine learning (ICML). 2023: 23803–28.
- 54.
Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint 2014.
- 55.
Cates J, Lewis J, Hoover R, Caudle K. Session11: Skip-GCN: A Framework for Hierarchical Graph Representation Learning. 2023.
- 56.
Ding K, Li J, Agarwal N, Liu H. Inductive anomaly detection on attributed networks. Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence (IJCAL). 2021: 1288–94.
- 57.
Lou S, Zhang Q, Yang S, Tian Y, Tan Z, Luo M. Gady: Unsupervised anomaly detection on dynamic graphs. arXiv preprint 2023.
- 58. Xiao B, Yin W. Generative pseudo-labeling network based dynamic semi-supervised anomaly detection in bitcoin. Neurocomputing. 2026;:133386.
- 59.
Wang X, Ji H, Shi C, Wang B, Ye Y, Cui P, et al. Heterogeneous graph attention network. The world wide web conference. 2019: 2022–32. https://doi.org/10.1145/3308558.3313562
- 60.
Hu Z, Dong Y, Wang K, Sun Y. Heterogeneous graph transformer. Proceedings of the web conference. 2020: 2704–10. https://doi.org/10.1145/3366423.3380027
- 61.
Zhang C, Song D, Huang C, Swami A, Chawla N V. Heterogeneous graph neural network. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (KDD). 2019: 793–803. https://doi.org/10.1145/3292500.3330961
- 62. Rahman NA, Shen W, Liu DL, Yusof FM. Blockchain transaction surveillance and responsible fintech innovation: A sociotechnical framework for graph-based anomaly detection. Journal of Technology Innovation and Society. 2025;3(3):30–46.