Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Dual contrastive learning-based reconstruction for anomaly detection in attributed networks

  • Hossein Rafieizadeh,

    Roles Conceptualization, Methodology, Software, Visualization, Writing – original draft

    Affiliation Department of Data Science and Technology, School of Intelligent Systems Engineering, University of Tehran, Tehran, Iran

  • Hadi Zare ,

    Roles Conceptualization, Formal analysis, Project administration, Resources, Supervision, Validation, Writing – review & editing

    h.zare@ut.ac.ir

    Affiliation Department of Data Science and Technology, School of Intelligent Systems Engineering, University of Tehran, Tehran, Iran

  • Mohsen Ghassemi Parsa,

    Roles Conceptualization, Investigation, Project administration, Supervision, Validation, Writing – review & editing

    Affiliation Department of Data Science and Technology, School of Intelligent Systems Engineering, University of Tehran, Tehran, Iran

  • Hocine Cherifi

    Roles Conceptualization, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Affiliation ICB UMR, CNRS, University of Burgundy (Université de Bourgogne), Dijon, France

Abstract

Anomaly detection in attributed networks is critical for identifying threats such as financial fraud and intrusions across social, e-commerce, and cyber-physical domains. Existing graph-based methods face two limitations: (i) embedding-based approaches obscure fine-grained structural and attribute patterns, and (ii) reconstruction-based methods neglect cross-view discrepancies during training, leaving cross-view discrepancies underutilized. To address these gaps, we propose Dual Contrastive Learning-based Reconstruction (DCOR), a dual autoencoder with a shared Graph neural network (GNN) encoder that contrasts reconstructions (not embeddings) between original and augmented graph views. Instead of contrasting embeddings, DCOR reconstructs both adjacency and attributes for the original graph and for an augmented view, then contrasts the reconstructions across views. This preserves fine-grained, view-specific information and improves the fidelity of both structure and attributes. Across six benchmarks (Enron, Amazon, Facebook, Flickr, ACM, and Reddit), DCOR achieves the best Area Under the Receiver Operating Characteristic curve (AUROC) on six datasets. In comparison with the best-performing non-DCOR baseline across datasets, DCOR improves AUROC by 11.3% on average, with a maximum gain of 21.3% on Enron. On Amazon, ablating the reconstruction-level contrast (RLC) reduces AUROC by 25.5% relative to the model, underscoring the necessity of reconstruction-level contrastive learning. Code and datasets are publicly available at https://github.com/Hossein1998/DCOR-Graph-Anomaly-Detection.git.

Introduction

Anomaly detection in attributed networks is crucial across various domains, including social media, e-commerce, finance, cybersecurity, and the Internet of Things (IoT). In these graphs, nodes carry attributes in addition to links, enabling rich modeling of behaviors and interactions. Detecting anomalies matters because they often correspond to security breaches, fraud, fake accounts, or sensor failures [1]. Prior estimates suggest that online payment fraud alone could amount to hundreds of billions of dollars over a few years [2], and fabricated influencer activity has incurred substantial losses annually [3]. Similar concerns arise in cyber-physical infrastructures, where anomalous nodes may indicate compromised devices or faulty sensors [47]. These examples motivate robust, scalable detectors for attributed graphs.

Anomalies in attributed networks typically fall into three categories (Fig 1) [1,8,9]: (i) Structural: unexpectedly dense communities or cliques, spam links, or isolated nodes; (ii) Attribute: unusual or implausible feature values (e.g., abnormal transaction rates or incomplete profiles); (iii) Interaction: inconsistencies between structure and attributes (e.g., a highly connected merchant with suspicious transactional attributes). Accurately capturing these diverse anomaly types is nontrivial, especially when structural and attribute signals conflict or evolve dynamically.

thumbnail
Fig 1. Anomalies in attributed networks.

Structural anomaly: an unexpected inter-community bridge, where the orange node sits in the cut between two communities and forms a shortcut (dashed) edge to the right cluster. Attribute anomaly: the orange node’s feature vector (colored bars) deviates from those of its neighbors even though its connectivity looks normal. Interaction anomaly: a structure–attribute mismatch, where the orange node’s attributes align with the left community while its links embed it in the right community. Visual encoding: blue = normal nodes and edges; orange = anomalous node or edge; dashed orange edge = anomalous link; colored bars = node attributes.

https://doi.org/10.1371/journal.pone.0335135.g001

To effectively detect these diverse anomaly types, researchers have developed various approaches. Early detectors such as Local Outlier Factor (LOF) [10] compute density-based scores in an embedding space and implicitly assume that proximity captures normal behavior, treating anomalies as sparsely connected or isolated points. Structural Clustering Algorithm for Networks (SCAN) [11] clusters nodes by structural similarity and labels those not belonging to any structural cluster as outliers. While simple and efficient, these heuristics are ill-suited to high-dimensional attributes and cannot capture anomalies arising from subtle feature deviations or complex interactions. They also presuppose homophily (connected neighbors tend to have similar attributes), which fails in many modern applications (user–item graphs, anti-fraud networks, knowledge graphs). In such heterophilous graphs (connected neighbors tend to have dissimilar attributes), purely structural signals can mislead detectors: normal nodes near anomalous ones may be falsely flagged, and anomalous nodes surrounded by normal neighbors may be overlooked, leading to high false positives and negatives when attribute-driven or mixed anomalies are present [1,12].

To address these limitations, recent self-supervised learning (SSL) methods learn node representations without labeled anomalies by creating multiple graph views via augmentations (edge dropping, feature masking, subgraph sampling, or diffusion) and training encoders to maximize agreement for the same node across views, while minimizing agreement between different nodes. Early variants such as GraphCL (graph contrastive learning), GRACE (graph contrastive representation learning), and MVGRL (multi-view graph representation learning) [1315] rely on random augmentations, whereas newer frameworks like GADAM (graph anomaly detection with adaptive message passing), AD-GCL (adversarial graph augmentation to improve graph contrastive learning), CONAD (contrastive attributed network anomaly detection with data augmentation), and UniGAD (unifying multi-level graph anomaly detection) [1619] add adaptive message passing, adversarial view generation, and multi-level stitching. Despite its effectiveness, most graph contrastive pipelines use a narrow augmentation set (edge or node dropping, feature masking, subgraph sampling, or diffusion [2022]), which can overlook key anomaly facets and sometimes produce unrealistic augmented views. We therefore adopt a richer, taxonomy-aligned augmentation suite that instantiates all three anomaly categories in Fig 1. Further details are provided in the proposed framework section.

Nonetheless, in spite of strong downstream performance, these contrastive pipelines still compare compressed node embeddings produced by message passing. This compression can blur fine-grained anomaly cues, particularly in heterophilous graphs where anomalous nodes are largely surrounded by normal ones [2326]. Moreover, message passing inherently smooths features across neighborhoods [27,28], leading to over-smoothing such as distinctive signals may be washed out [16,29] (see Fig 2).

thumbnail
Fig 2. Embedding vs. reconstruction-level contrast (RLC).

Top: the encoder consumes two inputs (the original graph and an augmented view) and contrasts their node embeddings in the embedding space. Bottom: DCOR uses dual autoencoders to reconstruct the adjacency and the attribute matrices for the original graph and the augmented view, then applies contrastive learning directly to the two sets of reconstructions, which preserves cross-view discrepancies that message passing may smooth out.

https://doi.org/10.1371/journal.pone.0335135.g002

Complementary to contrastive learning approaches, autoencoder- and graph neural network (GNN)-based architectures model graph data directly to address some of these limitations. DOMINANT (deep anomaly detection on attributed networks) [30], AnomalyDAE (anomaly detection through a dual autoencoder) [31], GAD-NR (graph anomaly detection via neighborhood reconstruction) [32]; CurvGAD (leveraging curvature for enhanced graph anomaly detection) [33], and MTGAE (mirror temporal graph autoencoder) [34] are representative examples in this category. These models score anomalies via reconstruction error but typically operate on a single view and therefore miss cross-view inconsistencies that often signal subtle anomalies. As a result, valuable discrepancies between reconstructions from different augmentations (e.g., a node well reconstructed in one view but poorly in another) remain underexplored.

These observations reveal two principal gaps:

  1. (A) Embedding-based methods compare low-dimensional embeddings, erasing fine-grained anomaly signals (particularly in heterophilous networks where distinctive features become over-smoothed [26]).
  2. (B) Reconstruction-based methods reconstruct the adjacency matrix A and nodal features X but do not compare reconstructions across augmented views, leaving cross-view discrepancies (and the opportunity to improve reconstruction quality) unexploited.

We introduce Dual Contrastive Learning-based Reconstruction (DCOR), a dual autoencoder framework trained with a reconstruction-level contrastive objective that directly compares reconstructed adjacency and attribute matrices across two augmented views. By contrasting reconstructions rather than embeddings, DCOR preserves view-specific cues, improves reconstruction fidelity, and enhances anomaly separability. Across six public benchmarks, DCOR attains best or competitive performance in terms of Area Under the Receiver Operating Characteristic curve (AUROC). We also adopt a taxonomy-aligned augmentation suite that augments the structure, attributes, and their interaction, providing comprehensive self-supervised signals.

Our contributions are threefold: (i) a reconstruction-level contrastive objective over decoded structure and attributes; (ii) a domain-informed augmentation suite that covers structural, attribute, and interaction anomalies; and (iii) a practical dual-autoencoder design with a shared GNN encoder.

This work extends our earlier conference paper [35] with substantial modifications, including new sections, additional experiments and evaluation metrics, as invited for journal publication.

This paper is organized as follows. The Related work section reviews graph-based anomaly detection, including traditional methods, autoencoder- and GNN-based models, and contrastive learning. The Proposed framework section introduces DCOR, detailing the dual-autoencoders, the reconstruction-level contrastive objective, and taxonomy-aligned augmentations. The Experimental results section describes datasets, evaluation metrics, and implementation, and reports empirical findings, ablations, and robustness analyses. The Discussion section examines limitations, practical considerations, and future directions. The Conclusion section summarizes the paper and highlights the key findings.

Related work

This section reviews four strands: (i) traditional detectors, (ii) autoencoder- and GNN-based models, (iii) contrastive learning (including augmentation-oriented, adversarial, and RL-assisted variants), and (iv) domain applications. Table 2 provides a side-by-side summary of these approaches.

thumbnail
Table 2. Condensed summary of graph anomaly detection methods and their key strengths and limitations.

https://doi.org/10.1371/journal.pone.0335135.t002

Traditional anomaly detection in graphs

Classical methods rely on local density, structural similarity, or low-rank and sparse decompositions. LOF [10] assigns density-based outlier scores and is effective in many tabular settings. However, LOF is sensitive to the neighborhood size k and the choice of distance metric, and repeated runs at scale can be costly. SCAN [11] clusters nodes by structural similarity and flags non-members as outliers. By design, SCAN focuses on topology and ignores node attributes, which causes limited sensitivity to attribute-driven irregularities.

The second category of works relies on modeling attributed graphs via matrix-based formulations. Residual Analysis for Anomaly Detection in Attributed Networks (RADAR) [36] couples structure-attribute effects with a low-rank and sparse decomposition and neighborhood regularization. The smoothness assumptions of RADAR often align with homophily and can degrade under heterophily. ANOMALOUS (joint modeling for anomaly detection on attributed networks) [37] factors the reconstructed structure-attribute matrix via column–row decomposition (CUR) and scores residuals. Like other decomposition methods, its cost grows with graph size and feature dimensionality.

Autoencoder and GNN-based approaches

These models reconstruct topology and attributes and typically score anomalies via reconstruction residuals. While effective on attributed graphs, they may suffer from over-smoothing under heterophily, and when trained in a single view, they overlook cross-view discrepancies. DOMINANT [30] jointly reconstructs A and X and mixes adjacency and feature errors to score anomalies; it often performs well. However, small reconstruction gaps can reduce sensitivity, and under strong heterophily, message passing may over-smooth distinctive cues [26,29]. AnomalyDAE [31] uses dual decoders to reconstruct adjacency and attributes, improving coverage of structural and attribute anomalies, yet training can be heavy on very large graphs, and it does not compare reconstructions across augmented views. GAD-NR [32] regularizes neighborhood reconstruction of structure and attributes to capture complex structural anomalies and often scales better than global matrix methods, but it depends on neighborhood assumptions and can degrade under pronounced heterophily or irregular local mixtures. CurvGAD [33] integrates discrete Ricci curvature to encode higher-order geometry at additional computational cost and still without cross-view comparison. MTGAE [34] is a multi-task graph autoencoder for topology and attributes (with temporal or auxiliary heads) and shares the trade-offs above: potential over-smoothing [26,29] and no explicit cross-view comparison.

Contrastive learning for graph representation

Beyond GraphCL [13], frameworks such as GRACE [14] and MVGRL [15] construct augmented graph views via random edge and node dropping, feature masking, subgraph sampling, or diffusion, and maximize agreement of the same node across views using the InfoNCE loss (Information Noise-Contrastive Estimation) [38]. While effective for unsupervised representation learning, these pipelines compare node embeddings in embedding space, which can attenuate fine-grained structural or attribute cues due to message passing and over-smoothing, particularly on heterophilous graphs. More recent variants add adaptive or adversarial guidance to better align with anomaly patterns: GADAM [16] adapts message passing; AD-GCL crafts adversarial augmentations to produce harder views [17]; and UniGAD [19] unifies multi-level detection (e.g., graph stitching). Nevertheless, most pipelines still rely on a limited augmentation set and compare in embedding space, where over-smoothing and over-squashing [39] can blur fine-grained cues, especially under heterophily. Our approach targets both gaps by enforcing reconstruction-level contrast across views on topology and attributes.

A related line detects anomalies by comparing learned node representations, including CONAD. Representative examples include GCCAD (graph contrastive learning for anomaly detection) [40], GAD-MSCL (graph anomaly detection via multi-scale contrastive learning) [41], EAGLE (efficient contrastive-learning-based anomaly detector on graphs) [42], CoModality [43], and semi- and weakly supervised variants [44]. These methods inherit the strengths of contrastive learning yet remain vulnerable to information loss when embeddings are smoothed or when perturbations are subtle.

Recent surveys systematize a broad toolbox of graph augmentations, including edge perturbation (addition-removal-rewiring), node or edge dropping, feature masking or denoising, subgraph sampling, and diffusion-style transforms [2022]. In practice, however, many graph anomaly detection (GAD) studies instantiate only a narrow subset (for example, random edge dropping or adding and simple feature masking), and overly aggressive or poorly matched perturbations can yield unrealistic structures or erase salient signals, thereby harming anomaly separability. In contrast, we adopt a broader, controlled suite with explicit budgets and topology and attribute constraints, covering structure-level patterns (clique injection, node isolation, random connections, inter- and intra-community rewiring or removal) and feature-level patterns (copying, scaling, masking) derived from domain knowledge.

Hybrid and robustness-oriented variants

Augmentation-based semi-supervised variants.

AugAN (augmentation for anomaly and normal distributions) [45] tackles generalized graph anomaly detection in a semi-supervised setting. It expands the scarce labeled set of normal and anomalous nodes via data augmentation and adopts a tailored episodic training strategy so that the learned representations and classifier remain effective on both unseen subgraphs and entire graphs. NodeAug (node-parallel augmentation) [46] applies feature and edge augmentations to regularize semi-supervised node classifiers. In contrast, we use structural and feature augmentations solely to synthesize training views and train via reconstruction-level contrast without any ground-truth labels. The final anomaly scores are derived from reconstruction discrepancies.

Reinforcement learning-assisted and adversarial variants. Subgraph-centric approaches (e.g., CoLA (contrastive self-supervised learning framework for anomaly detection) [47]) form positive and negative pairs between a node and random-walk subgraphs from its neighborhood versus other nodes to capture local structure-attribute dependencies, but constructing many subgraphs incurs non-trivial cost such as multi-view and multi-scale methods [4850] with scalability limits. RL-assisted methods surface informative structure either via neighborhood selection (RAND [51]) or mutual information (MI)-driven pooling (SUGAR [52]). Adversarial formulations include embedding regularization (ARANE (adversarially regularized attributed network embedding) [53]), data synthesis (GAAN (generative adversarial attributed network) [54]), and domain-specific generators (RegraphGAN (graph generative adversarial network for dynamic network anomaly detection) [55], AdvGraLog (graph-based log anomaly detection via adversarial training) [56], SGAT-AE (self-learning graph attention network autoencoder) [57]), and inductive anomaly-aware layers AEGIS (adversarial graph differentiation networks) [58] with better robustness. These variants broaden robustness but add training complexity, and many still operate at the embedding level, risking information loss.

Anomaly detection in specialized domains

Graph-based anomaly detection is effective across multiple domains. In social networks, GNN-based detectors flag irregular user behavior and structural patterns such as fake profiles and misinformation spread [59]. In e-commerce, graph autoencoders uncover unusual co-purchase patterns and fraudulent transactions [60]. In IoT, modeling device-device interactions enables detection of sensor faults and malicious activities. Recent surveys review GNN- and AI-based IoT anomaly detection [61,62]. Financial networks benefit from large-scale graph benchmarks for fraud and risk detection [63]. In healthcare, graph analysis over patient-provider-claim relations has been used to detect anomalous or fraudulent behavior [64].

Synthesis and positioning. Traditional detectors (e.g., LOF) score local density and typically ignore joint modeling of structure and attributes [10]. Autoencoder- and GNN-based models (e.g., DOMINANT) jointly reconstruct A and X but are trained in a single view and can over-smooth signals [30]. Contrastive GAD (e.g., CONAD) compares augmented views in the embedding space, where fine-grained cues may be compressed [18]. RL and adversarial approaches (e.g., SUGAR) add complexity [52]. DCOR differs by addressing both gaps: it enforces reconstruction-level contrast across two augmented views with dual-autoencoders and a controlled augmentation suite, explicitly preserving cross-view discrepancies in A and X.

Proposed framework

In this section, we formalize the task and present DCOR’s end-to-end pipeline. The overall structure of the proposed approach is given as,

  • Employ domain-informed graph augmentations to induce realistic structural and attribute anomalies, including clique injection, node isolation, shortcuts, community-level changes, and feature copying, scaling, and masking.
  • Use a sampling strategy to create mini-batch subgraphs maintaining alignment between original and augmented views for efficient, robust training.
  • Implement a dual-autoencoder model with shared graph-attention encoder and separate reconstruction heads for adjacency and features, enabling fine-grained anomaly detection.
  • Apply reconstruction-level contrastive loss to pull reconstructions close for normal nodes and enforce a learnable margin separation for augmented (anomalous) nodes in both modalities.
  • Optimize a total loss combining reconstruction fidelity and contrastive separation, with adaptive margin to balance calibration and enhance anomaly differentiation.
  • Define node-level anomaly scores as reconstruction discrepancies in structure and features, ranking nodes by deviation for anomaly detection as the output.

In the following, required definitions and notations are presented. Then, each of the steps is illustrated.

Notations and definitions

Lets consider an attributed, undirected, simple graph denoted by where V is the set of nodes and E is the set of edges between nodes . On the representation level, the graph is presented as , where is the adjacency matrix (, Aii = 0) and stacks the node features (n nodes, d features per node).

A graph augmentation is an operator that maps (A,X) to a new view

(1)

where applies edge-level perturbations (adding or deleting edges) while preserving symmetry and no self-loops, and applies attribute-level perturbations (e.g., copying, scaling, or masking feature entries). For clarity, we use “augmentation” for the operator that generates the view, and “perturbations” for the concrete changes applied to (A,X).

Let denote the node index set. For each augmented view, we record the nodes whose structure or attributes are perturbed, yielding subsets (which may overlap), and define binary indicators

(2)

Equivalently, and , where counts nonzero entries and, for any matrix M, denotes row i. The superscripts (A) and (X) indicate whether the perturbation arises from the structural or attribute side, respectively. Due to training the model in a self-supervised manner without accessing to ground-truth values, these indicators serve only to gate the training loss and are never used during inference.

Graph data augmentation

We employ domain-informed augmentations to synthesize realistic structural and attribute anomalies. Interaction mismatches arise implicitly when structure and attributes are augmented on different (possibly overlapping) node subsets. An overview of the two views is shown in Fig 3 (left: original graph G, right: augmented graph ).

thumbnail
Fig 3. Structural and node-level augmentations for graph anomalies.

Left: original attributed network G. Right: augmented attributed network . Middle: augmentation methods: (1) feature copying (attribute mimicking across distant nodes); (2) feature scaling (multiplying or dividing continuous attributes); (3) node isolation (dropping all incident edges of selected nodes); (4) random shortcut connections and clique injection (adding shortcuts or small dense cliques across and within communities). Color coding: green nodes are normal; red nodes are augmented (selected for structure or feature augmentations; isolated nodes appear red with no incident edges); gray edges are original connections; orange or red edges indicate injected connections (random shortcuts or clique edges); solid feature bars are original attributes; cross-hatched bars mark augmented features; the thin gray curved arrow in the feature panel indicates attribute copying. Collectively, these augmentations induce structural, attribute, and interaction anomalies, creating cross-view discrepancies leveraged by our reconstruction-level contrast.

https://doi.org/10.1371/journal.pone.0335135.g003

On the structure side, several augmentations are employed including clique injection, node isolation, random shortcut edges, and community-level augmentations (inter-community bridging and intra-community edge removal). On the feature side, we apply feature copying (from another node), feature scaling, and feature masking. Optionally, light Gaussian noise is added to mimic measurement noise.

Sampling via GraphSAINT.

We adopt the GraphSAINT random-walk sampler [65] to construct mini-batch subgraphs. At each training step, a fresh subgraph is drawn from short random walks on the original graph, and the same node set is used to slice both the original and augmented views to maintain alignment. This resampling bounds memory usage and improves throughput while approximately preserving local connectivity. It also increases robustness by exposing the model to diverse, overlapping neighborhoods and by reducing brittleness to batch boundaries and occasional noisy or missing edges. Sampler settings are chosen to balance coverage and GPU (Graphics Processing Unit) budget.

Structural graph augmentation

We inject controlled topological augmentations to mimic common anomaly patterns [20,21]. These augmentations are used only to generate augmented views for reconstruction-level contrast. This stage consists of four types of augmentation ways,

  1. (i) Clique Injection,
  2. (ii) Node Isolation,
  3. (iii) Random Shortcut Edges,
  4. (iv) Community-level augmentations containing inter-community bridging and intra-community edge removal.

Clique injection.

In social networks, tightly connected groups of anomalous nodes (cliques) often indicate coordinated malicious activities (e.g., fraud rings, botnets, organized misinformation). To reveal this behavior, a subset is randomly selected and then every pair within C is connected to form a complete subgraph. Formally,

(3)

where with , , and denotes the augmented (binary) adjacency matrix (entries equal 1 when an edge is present and 0 otherwise); self-loops are disallowed and symmetry is preserved. For all other off-diagonal pairs , we keep the original connectivity, i.e., . Typically . By adding these densely connected substructures, the model learns to recognize unusually dense local connectivity (Fig 3).

Node isolation.

Sometimes, anomalies such as compromised accounts or system failures can appear as users who become structurally isolated. To recognize this type of anomaly, we assign the isolation primitive to a random subset of nodes and delete all their incident edges, thereby zeroing the corresponding rows and columns in the augmented adjacency. This encourages the model to identify structural isolation as an anomalous connectivity pattern (Fig 3).

Let be the nodes assigned isolation. We set

(4)

For all , the original adjacency holds, i.e., . We also set and enforce symmetry as defined above.

Random shortcut edges.

We simulate unexpected links by adding a few shortcut edges between previously non-adjacent nodes. Additions are symmetric and exclude self-loops. Also, we avoid connecting to isolated nodes (Fig 3).

(5)

where specifies seed nodes for additions, each is a small non-neighbor set, and denotes logical OR.

Inter-community bridging.

Initially, a community partition of V is obtained by Louvain algorithm [66], where denotes the communities, and K is the number of communities. For distinct communities , we simulate unexpected cross-community ties by adding edges between non-neighbor pairs (measured w.r.t. A). Let

is uniformly sampled without replacement with (a small addition budget). On augmentation side, set:

(6)

To preserve isolation, additions are skipped incident to nodes in . This weakens community separation just enough to expose unusual cross-community interaction patterns.

Intra-community edge removal.

We first obtain a community partition (via the Louvain method [66]) and pick one community C. To simulate weakened internal cohesion, a small subset of edges is removed inside C:

(7)

where J denotes the set of removed intra-community edges with for a small budget . Symmetry and a zero diagonal properties are preserved by setting and .

Node-level feature augmentation

Controlled feature augmentations are applied to imitate attribute-driven anomaly patterns [20,21]. These augmentations are just used to generate augmented views for reconstruction-level contrast. Three methods are applied: (i) Feature Copying, (ii) Feature Scaling, and (iii) Feature Masking.

Feature copying (Attribute Mimicking).

To induce attribute mismatches, for each node we sample a candidate set of size uniformly without replacement, and copy the features from the farthest candidate in :

(8)

Ties in the are broken arbitrarily. Here, V is the node set, SX the feature-side augmented nodes, s the pool size, Ci the candidate set for node i, the Euclidean norm, k (i) the index of the farthest candidate, the feature matrix, and the updated feature vector of node i (Fig 3).

Feature scaling.

To simulate magnitude shifts, for each node , its (continuous) feature vector is rescaled by a fixed factor , randomly up or down,

(9)

where Xi is the original feature vector of node i, the scaled one, SX the set of nodes chosen for feature-side augmentation, the scale factor, and selects multiplication ( + 1) or division (–1) with equal probability. Only continuous features are scaled (Fig 3).

Feature masking.

Missing fields are simulated by zeroing a small random subset of feature dimensions for nodes on the feature side (). For node i with d features, sample uniformly without replacement with (masking rate ) and define the binary mask by if , otherwise 1. Then

(10)

where Xi is the original feature vector of node i, the masked one, SX the set of nodes chosen for feature-side augmentation, d the feature dimension, q the masking rate, Ii the zeroed indices, mi the binary mask, denotes element-wise multiplication, and denotes the floor operator.

The structural and node-level feature augmentations illustrated in this section apply small, controlled modifications that mirror realistic anomaly patterns in social networks to enhance the model’s ability to detect subtle yet critical irregularities and improving robustness. We control augmentation budgets with per-dataset node-wise rates and sample independently (Bernoulli). Rates are tuned empirically via small ablation sweeps, and mean ± std are reported over three seeds. Formally, for each node i, and , independently across nodes and across the two views, with .

Augmented-view diversity metrics. To sanity-check that our augmented views are realistic yet diverse, we quantify the similarity between G and with scale-free metrics:

(11a)(11b)(11c)(11d)

where and are edge sets; and are neighbor sets. and denote the (normalized) empirical degree distributions of A and , and is the Kullback–Leibler divergence. In (11d), xi and are the i-th rows of X and (equivalently, and ), is the dot product, is the Euclidean norm, and is a small constant.

Contrastive learning framework

Contrastive learning distinguishes similar from dissimilar samples in a self-supervised manner. On graphs, comparing node embeddings across views can blur fine-grained structural or attribute cues, especially in heterophilous settings where message passing smooths distinctive signals [2326,29]. In addition, reconstruction-based detectors usually operate on a single view and thus miss cross-view inconsistencies that are informative for anomalies. To address both issues, we contrast at the reconstruction level. Two views are considered,

(12)

Then, a dual-branch reconstructor based on values of (12) is computed with shared weights to obtain

(13)

where denotes the shared dual-branch reconstructor comprising structure and attribute autoencoders. A,X are the original adjacency and feature matrices, are their augmented counterparts (hats indicate reconstructions).

During augmentation, we record which nodes were manipulated via the indicators in Eq (2). Our objective maintains cross-view reconstructions of non-augmented nodes close and applies an adaptive-margin penalty to augmented nodes: minimize structural and feature discrepancies for and , and add a margin penalty when or . “Non-augmented” and “augmented” are determined per node by , i.e., for structure and for features. The chosen discrepancy measures and the adaptive margin are detailed in the next subsection. are the structure- and feature-side augmentation indicators.

Dual autoencoder model

Fig 4 illustrates our reconstruction-level contrast architecture for anomaly detection in attributed networks. The model uses a shared GAT-based encoder that produces node embeddings Z, followed by two reconstruction heads: a structural head that reconstructs the adjacency via an inner-product decoder, and an attribute head that reconstructs node attribute vectors via a linear decoder. The same weights are applied to both views and , yielding and that feed the contrastive objective. The rationale is specialization with sharing: structural and attribute anomalies have different signatures, so separate reconstruction heads preserve fine-grained cues in each modality, while a shared encoder ties the representations across views. By reconstructing both A and X for the original and augmented graphs and contrasting the two sets of reconstructions, the model highlights cross-view differences for augmented nodes, while maintaining consistency for non-augmented ones.

thumbnail
Fig 4. Dual autoencoder with reconstruction-level contrast.

Left: an attributed network G and an augmented view produced by graph data augmentation. Middle: a shared graph-attention encoder yields node embeddings Z, which feed two decoders: a structure decoder reconstructing and an attribute decoder reconstructing for the two views, yielding and . Right: reconstruction-level contrast compares, for each node i, the reconstructions via and ; it minimizes D when and , and enforces a learnable margin m when or . Color coding: green nodes denote non-augmented nodes; red nodes denote augmented nodes; the dotted green arc indicates minimization of D; the dashed orange arc indicates margin enforcement; cross-hatched bars mark augmented features; gray edges are neutral; blue heatmaps depict reconstructed matrices.

https://doi.org/10.1371/journal.pone.0335135.g004

Structure autoencoder

We use a shared GAT-style encoder to map node features to latent embeddings Z, which feed both decoders.

Encoder.

First project node features to an intermediate representation

(14)

with and learnable. Then apply a GAT-style attention layer [67] to obtain node embeddings

(15)

where dz is the latent embedding dimension and A is the adjacency used for message passing after adding self-loops. Here I denotes the identity matrix, and is the neighbor set of node i.

The (additive) pre-softmax attention score between nodes i and j is

(16)

with learnable and ; here Hi denotes the i-th row of H.

Masked softmax over the neighbor set gives

(17)

where I is the identity matrix; this ensures for each i. Node embeddings then aggregate neighbor messages:

(18)

with applied element-wise.

Decoder.

The adjacency matrix is reconstructed as,

(19)

where Z is the latent representation obtained from the encoder and σ denotes the sigmoid activation. This yields a symmetric edge-likelihood matrix . Following standard practice, we adopted the inner-product decoder used in graph autoencoders [68].

Attribute autoencoder

This branch reconstructs node features by combining the shared node embeddings Z with a global attribute factor F learned from the current view’s features X (via ).

Encoder (global feature factorization).

We form a feature–feature summary matrix and encode it as

(20)

followed by

(21)

where , , , and ; here ha is the attribute-encoder hidden width, and is applied element-wise. The same encoder is applied per view.

Decoder.

Attribute reconstruction is factorized as

(22)

where are the node embeddings from the shared GAT (graph attention network) encoder. For the augmented view, analogously have .

Taken together, the shared encoder with dual decoders reconstructs the topology via and the attributes via (and for ).

Contrastive loss function

We use a reconstruction-level contrastive objective: given the original graph and an augmented view, the dual-autoencoders reconstruct both, and the loss pulls together reconstructions for non-augmented nodes while pushing apart those for augmented nodes via an adaptive margin. This focuses learning on structure and feature discrepancies introduced by augmentation and sharpens the separation between normal and anomalous nodes.

Structural contrastive loss.

The structural contrastive loss compares the reconstructed adjacency information from the original and augmented graph views. It encourages small reconstruction discrepancy for normal nodes and enforces at least a margin for nodes flagged as anomalous are depicted in the top-right panel of Fig 4:

(23)

where denotes the squared Frobenius norm, are the i-th rows of the reconstructed adjacencies, is the indicator, flags structural augmentation, and m > 0 is a learnable margin.

Attribute contrastive loss.

Analogously, the attribute contrastive loss compares reconstructed features across views (visualized in the bottom-right panel of Fig 4):

(24)

where are the i-th rows of the reconstructed feature matrices, and flags feature-side augmentation; and m are as defined above.

Positive and negative semantics. We treat each modality independently. For structure, the pair is positive if (minimize ) and negative if (enforce the adaptive margin m) as in Eq (23). For attributes, the pair is positive if and negative if as in Eq (24). Thus, a node can be positive in one modality and negative in the other, depending on the applied augmentation. We do not construct inter-node negatives; contrast is performed intra-node across views only.

Combined contrastive loss. The reconstruction-level contrast is defined as,

(25)

Margin parameter.

The margin m > 0 in Eq (23) and Eq (24) sets the minimum discrepancy required between cross-view reconstructions for augmented nodes via the hinge term . We treat m as a learnable scalar, and optimize it jointly with all model parameters by backpropagation, so the separation strength adapts to the data and augmentation difficulty. Unless stated otherwise, a single shared m is used for both structure and attributes; extending to modality-specific margins is straightforward.

Why learn m? An adaptive margin reduces manual tuning, calibrates the loss across datasets and augmentation budgets, and mitigates under- or over-separation (collapsed positives or trivially large gaps). In contrast, most prior graph contrastive or reconstruction-based detectors rely on a fixed margin (or no margin at all), which can be miscalibrated across settings.

Total loss function

A single objective is optimized to balance fidelity and separation by combining a reconstruction term and a reconstruction-level contrast term. Concretely, the total loss is a weighted sum of the reconstruction loss and the reconstruction-level contrast loss . Notably, also acts as a cross-view regularizer: it enforces consistency of reconstructions for non-augmented nodes, thereby refining and and improving overall reconstruction fidelity.

(26)

where (with defined in Eq (23) and in Eq (24)), and control the trade-off between accurate reconstruction and discriminative separation.

Reconstruction term. We combine structural and attribute reconstruction errors with a modality weight :

(27)

where and . We use the Frobenius norm (element-wise, without squaring) to linearly penalize reconstruction errors, complementing the squared discrepancies used in the contrastive terms.

Discussion and settings. Inside , interpolates between structural and attribute fidelity. At the outer level, control the trade-off between pure reconstruction and reconstruction-level contrast. Because in and the squared discrepancies in can have different scales, each term is normalized by its mini-batch moving average before weighting, then fixed coefficients are applied. This objective yields faithful reconstructions for non-augmented nodes and, via , enlarges cross-view discrepancies for augmentation-affected nodes, while regularizing reconstructions to remain view-consistent.

Anomaly scoring

The proposed approach assign a node-level anomaly score from reconstruction discrepancies across both modalities. Intuitively, nodes whose reconstructed features or adjacency rows deviate substantially from the originals are more likely to be anomalous. The score combines structure- and attribute-side errors for each node as,

(28)

Here, balances attribute vs. structural error. Nodes with larger are ranked as more anomalous. The dual-autoencoder is trained to reproduce prevalent (normal) structural and attribute patterns; augmented nodes are reconstructed poorly in the augmented view, and the reconstruction-level contrast further enlarges their cross-view discrepancies, yielding higher .

Experimental results

The proposed approach DCOR is evaluated on six standard attributed network datasets. This section details the datasets and the experimental setup. Then, the main results are presented along with training dynamics, ablations, augmentation-diversity checks, and robustness to anomaly prevalence shifts.

Datasets and evaluation metric

Six widely used attributed network datasets are used in our study, which are given in Table 3 with important statistics, application domains, and anomaly rates. Enron [69] (an employee email communication network that captures interaction patterns and organizational relationships), Amazon [70] (a product co-purchase network in which nodes are products and edges indicate frequently co-purchased pairs, reflecting consumer buying behavior), Facebook [71] (a social network where nodes represent users and edges denote friendships (social ties)), Flickr [72] (an online photo-sharing network in which nodes are users and edges represent interactions among users), ACM [73] (an academic citation network whose nodes and edges capture publication entities and citation links; we follow the processed split commonly used for attributed graphs), and Reddit [74] (an online discussion forum network in which nodes represent users and edges reflect interactions such as replies or mentions; node attributes summarize content and metadata).

thumbnail
Table 3. Description of the datasets statistics and their anomaly ratios.

https://doi.org/10.1371/journal.pone.0335135.t003

Evaluation metric. Following prior work, we report the area under the receiver operating characteristic curve (AUROC) [75]. AUROC is threshold-free and rank-based, making it robust to severe class imbalance that is typical in anomaly detection.

Implementation details

Our implementation uses Python and PyTorch [76] and runs on a single NVIDIA T4 GPU (Google Colab). We fix random seeds across Python, NumPy, and PyTorch, and enable deterministic settings in cuDNN (the CUDA Deep Neural Network library) for reproducibility. Raw graphs are loaded from MATLAB .mat files (adjacency, attributes, labels). Graphs are treated as undirected and unweighted: A is symmetrized, self-loops are added for message passing, and Louvain communities are computed on the unweighted graph without self-loops. The model input is the symmetrically normalized adjacency , while serves as the structure-reconstruction target. Here, I is the identity and is the degree matrix of . The architecture is a dual-autoencoder (GAT encoder; inner-product adjacency decoder; linear attribute decoder). Training uses Adam [77]. To scale to large graphs, we adopt GraphSAINT-style random-walk mini-batch sampling without reweighting; since our objective contrasts reconstructions across two views on matched node sets, we avoid estimator reweighting and accept the mild sampling bias for efficiency.

Complexity and runtime. Let n be the number of nodes, edges, d input features, and dz the embedding size. Message passing with sparse ops is . The inner-product decoder involves forming (sub)matrices like at if materialized. Parameter memory depends on layer sizes (e.g., ) and is independent of n; the dominant memory terms are activations and any explicit reconstruction buffers for (up to O(n2) if a full matrix is stored). On Amazon (, , dz = 128), we measured ∼190k parameters (∼0.7 MB), ∼265 MFLOPs (million floating-point operations) per forward pass, and ∼3 ms inference per 1k nodes; wall-clock scales roughly linearly with the number of epochs.

Comparative note (vs. SOTA). Training DCOR is costlier than single-view reconstructors (e.g., DOMINANT [30], AnomalyDAE [31]) because each training step involves encoding and decoding two views. Unlike common InfoNCE-style pipelines (e.g., CONAD [18]), it typically does not build dense similarity matrices or maintain large negative banks; and unlike adversarial schemes (e.g., GAAN [54]), it avoids generator and discriminator updates. With GraphSAINT subgraph sampling [65], the inner-product decoder’s O(n2) pairwise scoring reduces to per mini-batch (full-pair scoring), where B is the number of nodes in the sampled subgraph (mini-batch size). Runtime memory is dominated by node and edge activations (and any optional buffers). In inference, DCOR is single-pass (no augmentation or contrast), yielding a runtime comparable to DOMINANT and AnomalyDAE.

Anomaly detection performance

We compare against strong baselines on six datasets and report AUROC (higher is better) in Table 4. DCOR attains the best AUROC on all six datasets.

thumbnail
Table 4. Anomaly detection performance (AUROC).

Best per column in bold.

https://doi.org/10.1371/journal.pone.0335135.t004

Analysis. Relative to the strongest non-DCOR baseline per dataset, absolute AUROC gains are  + 15.6 percentage points on Enron,  + 14.4 on Amazon,  + 2.9 on Facebook,  + 4.0 on Flickr,  + 6.8 on ACM, and  + 4.0 on Reddit (avg.  + 8.0 pp). In relative terms, these correspond to , , , , , and improvements, averaging ( micro-averaged).

Beyond final AUROC, training dynamics are examined. Normalized losses over epochs are plotted, where each curve is divided by its own value at the first epoch to remove scale effects (Eq (29)). For DCOR, we reported both the reconstruction term and the total objective (with RLC); for baselines we reported the reconstruction-only term (Fig 5).

(29)
thumbnail
Fig 5. Normalized training loss vs. baselines (Facebook).

DCOR reports both reconstruction-only and total (with RLC); baselines report reconstruction-only. Each curve is normalized as in Eq (29) by dividing by its epoch-1 value and EMA-smoothed (exponential moving average) with , where the EMA is computed as with . This normalization enables fair visual comparison across methods with different objectives and scales; the plot therefore emphasizes relative convergence trends (shape and stability) rather than raw magnitudes. Consistent with DCOR’s design, RLC regularizes late-phase training: the reconstruction curve decreases more conservatively than methods that minimize reconstruction alone, while the total objective continues to decrease.

https://doi.org/10.1371/journal.pone.0335135.g005

where denotes the per-epoch loss and E is the number of epochs.

Ablation study

We ablate three components on Amazon: (i) structural augmentation, (ii) feature augmentation, and (iii) the reconstruction-level contrast (RLC). Each variant removes exactly one component; the encoder and decoders, schedule, and all other hyperparameters are fixed. Results are summarized in Table 5.

thumbnail
Table 5. Ablation on Amazon.

Δ is the signed difference vs. the full model (; negative indicates a drop).

https://doi.org/10.1371/journal.pone.0335135.t005

To visualize how RLC affects optimization, the total objective is tracked over epochs and report a scale-free version normalized as in Eq (29) (Fig 6).

Augmentation diversity. Table 6 reports diversity metrics on representative datasets. Structural diversity is moderate (–0.31), while the symmetric Kullback–Leibler (KL) on degrees remains small (), indicating local edge perturbations without global distortion. Feature-level diversity differs by dataset: Facebook shows small changes (), whereas Flickr exhibits stronger shifts (sparser bag-of-tags). DCOR’s reconstruction-level contrast is trained to be invariant to such moderate view differences while preserving anomaly separability.

thumbnail
Table 6. Diversity of augmented graphs.

Similarities (↑ larger = more overlap) and complementary diversities (↑ larger = more diverse). Lower is better for .

https://doi.org/10.1371/journal.pone.0335135.t006

thumbnail
Fig 6. Normalized total loss on the Facebook dataset (DCOR with and without RLC).

To enable a fair visual comparison of training dynamics, each curve is normalized as in Eq (29) by dividing by its epoch-1 value and EMA-smoothed (exponential moving average) with . The EMA is computed as with . This normalization emphasizes relative convergence behavior (shape and stability) rather than raw magnitudes: with RLC, the total objective continues to decrease in late epochs, whereas without RLC it plateaus, consistent with the ablation trends in Table 5.

https://doi.org/10.1371/journal.pone.0335135.g006

As summarized in Table 5, removing RLC yields the largest AUROC drop (0.203). Using only feature augmentation reduces AUROC by 0.083, and using only structural augmentation reduces it by 0.122. On Amazon, feature-only outperforms structural-only by 0.039 (0.712 vs. 0.673), indicating a stronger self-supervised signal from attribute augmentations.

Notation. and correspond to Edge-Jaccard and Neigh-Jaccard in Eq (11a) and Eq (11b); equals Deg-symKL in Eq (11c); equals Feat-cosine-mean in Eq (11d); and .

Robustness to varying anomaly ratios

To stress-test the robustness of our method to variations in anomaly prevalence, we keep the training procedure unchanged and vary, only during evaluation, the fractions of labeled structural and feature anomalies, using the same augmentation-based labeling protocol described in Subsection Graph data augmentation. On Enron, a mild setting (20% structural, 10% feature) yields an AUROC of 0.783, and a moderate setting (30%, 20%) yields 0.747. On Flickr, a mild setting (10%, 10%) yields 0.822, and a moderate setting (30%, 40%) yields 0.815. These results indicate that the ranking performance of our approach remains largely invariant under moderate shifts in anomaly prevalence.

Discussion

This section summarizes the paper’s scientific contributions, highlights open challenges observed in practice, discusses the limitations and practical considerations of DCOR, and outlines future research directions that are closely aligned with our reconstruction-level contrast framework. The objective is to provide a transparent perspective on what DCOR accomplishes, where it faces challenges, and how it can be further extended.

Contributions

The key contributions of this work are summarized, ensuring consistency with our formulation and experimental findings.

  • Reconstruction-level contrast (RLC) on decoded structure and attributes. Instead of contrasting embeddings, DCOR performs contrastive learning directly on the reconstructions across two views, directly on and (Eq 23 to Eq 26), as illustrated in Fig 2. This preserves cross-view discrepancies that message passing may smooth out and improves anomaly separability.
  • Domain-informed augmentation suite. We design a comprehensive and domain-informed augmentation suite that integrates both structural and attribute-level transformations. On the structural side, we employ techniques such as clique injection, node isolation, inter-community bridging, and intra-community edge removal. On the attribute side, we utilize feature copying, scaling, and masking to enrich feature diversity. This carefully controlled augmentation strategy provides self-supervised signals covering all three major anomaly taxonomies (structural, attribute, and interaction anomalies) while ensuring that the generated views remain realistic and faithful to the underlying graph semantics.
  • Learnable adaptive margin. A positive, learnable margin are considered within the hinge terms of the reconstruction-level contrast loss. This adaptive margin automatically calibrates the separation strength across different datasets and augmentation budgets, thereby reducing the need for manual tuning and improving the robustness of the framework.
  • Scalable training with GraphSAINT. We leverage GraphSAINT to enable scalable training through random-walk-based mini-batches with matched node sets across views. This strategy bounds memory consumption, preserves local connectivity, and stabilizes the reconstruction-level contrast during training.
  • Empirical validation across six benchmarks. Extensive experiments on six real-world benchmarks revealed that DCOR outperforms state-of-the-art competitors in terms of AUROC (Table 4).

Challenges

Extending the framework to million-node graphs remains challenging due to computational and memory constraints, even with efficient sampling strategies such as GraphSAINT. While the dual-autoencoder architecture is effective, it introduces additional training overhead. Furthermore, balancing the reconstruction and contrastive objectives requires careful tuning, as over-weighting either objective can degrade overall anomaly detection performance. Another practical challenge lies in selecting a compact yet representative set of anomaly scenarios for augmentation. In this study, we prioritized patterns most likely to occur in practice, including structural, attribute, and structure-attribute mismatches, consistent with established taxonomies [7880]. This choice trades off some diversity for greater realism, a decision supported by our ablation study (Table 5). Finally, although DCOR effectively captures subtle irregularities, distinguishing true anomalies from naturally occurring network dynamics such as community evolution or legitimate attribute updates remains difficult. Prior work in dynamic community detection and temporal graph anomaly analysis highlights the prevalence of such phenomena [9,81,82]. Incorporating temporal information, domain-specific constraints, and post hoc validation pipelines may help mitigate false positives and enhance robustness in real-world deployments.

Limitations

Although the proposed graph augmentation suite is designed to capture the three principal anomaly categories in graphs (structural, attribute, and interaction), it cannot fully encompass the diversity of real-world scenarios. Rare or domain-specific anomalies may fall outside this augmentation design space. Since DCOR relies on realistic augmented views, misaligned or overly aggressive augmentations can generate implausible structures or provide insufficient contrast between normal and anomalous nodes, thereby reducing detection accuracy. To mitigate this risk, we adopt a taxonomy-guided set of augmentations and examine sensitivity to augmentation budgets; nevertheless, broader and domain-adapted augmentation strategies remain necessary.

Another limitation pertains to the learnable margin in the contrastive loss. While the adaptive margin is intended to enhance separation, it can introduce training instabilities during early epochs if it adapts too rapidly or too slowly relative to batch difficulty. Achieving stable convergence therefore benefits from safeguards such as careful initialization, mild regularization, explicit lower and upper bounds on the margin, a brief warm-up phase, and gradient clipping.

Future work

In future work, we aim to explore LLM (large language model)-guided, semantically coherent graph augmentations that produce context-aware modifications while preserving the underlying graph statistics and attribute semantics. These augmented views will be integrated into our dual-autoencoder contrastive framework to improve the detection of subtle, domain-specific anomalies.

We plan to perform comprehensive evaluations on datasets including Enron, Amazon, Flickr, and Facebook, with the goal of achieving higher AUROC scores. Furthermore, we intend to extend our approach to dynamic graphs and knowledge-rich networks, enabling more robust and temporally aware anomaly detection.

Conclusion

DCOR introduces a novel paradigm for anomaly detection by contrasting reconstructed structures and attributes (rather than embeddings) across augmented graph views. This design preserves fine-grained, view-specific cues and significantly enhances the fidelity of both structural and attribute reconstructions, leading to superior anomaly separation.

Across six diverse benchmarks–including social, e-commerce, and academic networks–DCOR establishes new state-of-the-art results, achieving the highest AUROC on six datasets. It outperforms the strongest prior baseline by 11.3% on average, with a peak gain of 21.3% on Enron. ablation studies validate the method’s robustness: removing the reconstruction-level contrast causes a 25.5% AUROC drop on Amazon. These findings underscore the critical synergy between reconstruction-level contrast and complementary augmentations.

The effectiveness of DCOR can be attributed to four main aspects: the use of reconstruction-level contrast on decoded structure and attributes, a domain-informed augmentation suite that covers structural, attribute, and interaction patterns, a learnable margin that adapts the separation strength, and a dual-autoencoder architecture with a shared GAT encoder trained with GraphSAINT sampling for scalability.

Future work will extend DCOR to heterogeneous and dynamic graphs (e.g., temporal fraud networks) and optimize decoders for web-scale deployments, leveraging LLM-guided augmentations to handle complex data distributions.

References

  1. 1. Akoglu L, Tong H, Koutra D. Graph based anomaly detection and description: a survey. Data Min Knowl Disc. 2014;29(3):626–88.
  2. 2. Juniper Research. Online Payment Fraud: Emerging Threats, Segment Analysis & Market Forecasts 2021 –2025. Juniper Research. 2021. https://www.experian.com/blogs/global-insights/wp-content/uploads/2022/07/2021_04_Juniper_Online-Payment-Fraud.pdf
  3. 3. Cavazos R. The economic cost of bad actors on the internet: fake influencer marketing in 2019 . CHEQ in collaboration with the University of Baltimore. 2019. https://info.cheq.ai/hubfs/Research/THE_ECONOMIC_COST_OF_BAD_ACTORS_Influencers.pdf
  4. 4. Irofti P, Pătraşcu A, Băltoiu A. Fraud detection in networks. Studies in computational intelligence. Springer; 2020. p. 517–36. https://doi.org/10.1007/978-3-030-52067-0_23
  5. 5. Abshari D, Sridhar M. A survey of anomaly detection in cyber-physical systems. arXiv preprint 2025. https://arxiv.org/abs/2502.13256
  6. 6. Zibaeirad A, Koleini F, Bi S, Hou T, Wang T. A comprehensive survey on the security of smart grid: challenges, mitigations, and future research opportunities. arXiv preprint 2024. https://doi.org/arXiv:2407.07966
  7. 7. Gulzar Q, Mustafa K. Interdisciplinary framework for cyber-attacks and anomaly detection in industrial control systems using deep learning. Sci Rep. 2025;15(1):26575. pmid:40695948
  8. 8. Qiao H, Tong H, An B, King I, Aggarwal C, Pang G. Deep graph anomaly detection: a survey and new perspectives. arXiv preprint 2025. https://arxiv.org/abs/2409.09957
  9. 9. Ekle OA, Eberle W. Anomaly detection in dynamic graphs: a comprehensive survey. ACM Trans Knowl Discov Data. 2024;18(8):1–44.
  10. 10. Breunig MM, Kriegel H-P, Ng RT, Sander J. LOF. SIGMOD Rec. 2000;29(2):93–104.
  11. 11. Xu X, Yuruk N, Feng Z, Schweiger TAJ. SCAN. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining. 2007. p. 824–33. https://doi.org/10.1145/1281192.1281280
  12. 12. Alghushairy O, Alsini R, Soule T, Ma X. A review of local outlier factor algorithms for outlier detection in big data streams. BDCC. 2020;5(1):1.
  13. 13. You Y, Chen T, Sui Y, Chen T, Wang Z, Shen Y. Graph contrastive learning with augmentations. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20. Red Hook, NY, USA: Curran Associates Inc.; 2020. p. 5812–23.
  14. 14. Zhu Y, Xu Y, Yu F, Liu Q, Wu S, Wang L. Deep graph contrastive representation learning. arXiv preprint 2020. https://arxiv.org/abs/2006.04131
  15. 15. Hassani K, Khasahmadi AH. Contrastive multi-view representation learning on graphs. In: Proceedings of the 37th International Conference on Machine Learning. 2020. p. 4116–26.
  16. 16. Chen J, Zhu G, Yuan C, Huang Y. Boosting graph anomaly detection with adaptive message passing. arXiv preprint 2024. https://openreview.net/forum?id=CanomFZssu
  17. 17. Suresh S, Li P, Hao C, Neville J. Adversarial graph augmentation to improve graph contrastive learning. In: Advances in Neural Information Processing Systems, 2021. https://openreview.net/forum?id=ioyq7NsR1KJ
  18. 18. Xu Z, Huang X, Zhao Y, Dong Y, Li J. Contrastive attributed network anomaly detection with data augmentation. Springer; 2022. p. 444–57. https://doi.org/10.1007/978-3-031-05936-0_35
  19. 19. Lin Y, Tang J, Zi C, Zhao HV, Yao Y, Li J. UniGAD: unifying multi-level graph anomaly detection. arXiv preprint 2024. https://doi.org/arXiv:2411.06427
  20. 20. Ding K, Xu Z, Tong H, Liu H. Data augmentation for deep graph learning. SIGKDD Explor Newsl. 2022;24(2):61–77.
  21. 21. Zhou J, Xie C, Gong S, Wen Z, Zhao X, Xuan Q, et al. Data augmentation on graphs: a technical survey. ACM Comput Surv. 2025;57(11):1–34.
  22. 22. Marrium M, Mahmood A. Data augmentation for graph data: recent advancements. arXiv preprint 2022.
  23. 23. Dou Y, Liu Z, Sun L, Deng Y, Peng H, Yu PS. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020. p. 315–24. https://doi.org/10.1145/3340531.3411903
  24. 24. Liu Y, Ao X, Qin Z, Chi J, Feng J, Yang H, et al. Pick and choose: a GNN-based imbalanced learning approach for fraud detection. In: Proceedings of the Web Conference 2021 . 2021. p. 3168–77. https://doi.org/10.1145/3442381.3449989
  25. 25. Shi F, Cao Y, Shang Y, Zhou Y, Zhou C, Wu J. H2-FDetector: A GNN-based fraud detector with homophilic and heterophilic connections. In: Proceedings of the ACM Web Conference 2022 . 2022. p. 1486–94. https://doi.org/10.1145/3485447.3512195
  26. 26. Gao Y, Wang X, He X, Liu Z, Feng H, Zhang Y. Addressing heterophily in graph anomaly detection: a perspective of graph spectrum. In: Proceedings of the ACM Web Conference 2023 . 2023. p. 1528–38. https://doi.org/10.1145/3543507.3583268
  27. 27. Li Q, Han Z, Wu X. Deeper insights into graph convolutional networks for semi-supervised learning. AAAI. 2018;32(1).
  28. 28. Oono K, Suzuki T. Graph neural networks exponentially lose expressive power for node classification. In: International Conference on Learning Representations; 2020. https://openreview.net/forum?id=S1ldO2EFPr
  29. 29. Choi Y, Choi J, Ko T, Kim CK. Better not to propagate: understanding edge uncertainty and over-smoothing in signed graph neural networks. arXiv preprint 2024. https://arxiv.org/abs/2408.04895
  30. 30. Ding K, Li J, Bhanushali R, Liu H. Deep anomaly detection on attributed networks. Proceedings of the 2019 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics; 2019. p. 594–602. https://doi.org/10.1137/1.9781611975673.67
  31. 31. Fan H, Zhang F, Li Z. Anomalydae: dual autoencoder for anomaly detection on attributed networks. In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2020. https://doi.org/10.1109/icassp40776.2020.9053387
  32. 32. Roy A, Shu J, Li J, Yang C, Elshocht O, Smeets J, et al. GAD-NR: graph anomaly detection via neighborhood reconstruction. Proc Int Conf Web Search Data Min. 2024;2024:576–85. pmid:40018365
  33. 33. Grover K, Gordon GJ, Faloutsos C. CurvGAD: leveraging curvature for enhanced graph anomaly detection. In: Forty-second International Conference on Machine Learning; 2025. https://openreview.net/forum?id=O3dsbpAcqJ
  34. 34. Ren Z, Li X, Peng J, Chen K, Tan Q, Wu X, et al. Graph autoencoder with mirror temporal convolutional networks for traffic anomaly detection. Sci Rep. 2024;14(1):1247. pmid:38218745
  35. 35. Zade HR, Zare H, Ghassemi PM, Davardoust H, Bagheri MS. DCOR: anomaly detection in attributed networks via dual contrastive learning reconstruction. In: Complex Networks & Their Applications XIII. Springer Nature Switzerland; 2025. p. 3–15. https://doi.org/10.1007/978-3-031-82435-7_1
  36. 36. Li J, Dani H, Hu X, Liu H. Radar: residual analysis for anomaly detection in attributed networks. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. 2017. p. 2152–8. https://doi.org/10.24963/ijcai.2017/299
  37. 37. Peng Z, Luo M, Li J, Liu H, Zheng Q. ANOMALOUS: a joint modeling approach for anomaly detection on attributed networks. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence. 2018. p. 3513–9. https://doi.org/10.24963/ijcai.2018/488
  38. 38. van den Oord A, Li Y, Vinyals O. Representation learning with contrastive predictive coding. arXiv preprint 2018. ://doi.org/10.48550/arXiv.1807.03748
  39. 39. Alon U, Yahav E. On the bottleneck of graph neural networks and its practical implications. In: International Conference on Learning Representations; 2021. https://openreview.net/forum?id=i80OPhOCVH2
  40. 40. Chen B, Zhang J, Zhang X, Dong Y, Song J, Zhang P, et al. GCCAD: graph contrastive learning for anomaly detection. IEEE Trans Knowl Data Eng. 2022:1–14.
  41. 41. Duan J, Wang S, Zhang P, Zhu E, Hu J, Jin H, et al. Graph anomaly detection via multi-scale contrastive learning networks with augmented view. AAAI. 2023;37(6):7459–67.
  42. 42. Ren J, Hou M, Liu Z, Bai X. EAGLE: contrastive learning for efficient graph anomaly detection. IEEE Intell Syst. 2023;38(2):55–63.
  43. 43. Qian Y, Zhang C, Zhang Y, Wen Q, Ye Y, Zhang C. Co-modality graph contrastive learning for imbalanced node classification. In: Advances in Neural Information Processing Systems. 2022. https://openreview.net/forum?id=f_kvHrM4Q0
  44. 44. Dong H, Zhao J, Yang H, He H, Zhou J, Feng Y, et al. Semi-supervised graph anomaly detection via multi-view contrastive learning. In: 2024 International Joint Conference on Neural Networks (IJCNN). 2024. p. 1–8. https://doi.org/10.1109/ijcnn60899.2024.10650001
  45. 45. Zhou S, Huang X, Liu N, Zhou H, Chung F-L, Huang L-K. Improving generalizability of graph anomaly detection models via data augmentation. IEEE Trans Knowl Data Eng. 2023;35(12):12721–35.
  46. 46. Wang Y, Wang W, Liang Y, Cai Y, Liu J, Hooi B. NodeAug: semi-supervised node classification with data augmentation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020. p. 207–17. https://doi.org/10.1145/3394486.3403063
  47. 47. Liu Y, Li Z, Pan S, Gong C, Zhou C, Karypis G. Anomaly detection on attributed networks via contrastive self-supervised learning. IEEE Trans Neural Netw Learn Syst. 2022;33(6):2378–92. pmid:33819161
  48. 48. Jin M, Liu Y, Zheng Y, Chi L, Li Y-F, Pan S. ANEMONE. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021. p. 3122–6. https://doi.org/10.1145/3459637.3482057
  49. 49. Zheng Y, Jin M, Liu Y, Chi L, Phan KT, Chen Y-PP. Generative and contrastive self-supervised learning for graph anomaly detection. IEEE Trans Knowl Data Eng. 2023;35(12):12220–33.
  50. 50. Xu F, Wang N, Wen X, Gao M, Guo C, Zhao X. Few-shot message-enhanced contrastive learning for graph anomaly detection. In: 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS). 2023. p. 288–95. https://doi.org/10.1109/icpads60453.2023.00051
  51. 51. Wang Z, Zhou S, Dong J, Yang C, Huang X, Zhao S. Graph anomaly detection with noisy labels by reinforcement learning. arXiv prerpint 2024. https://doi.org/arXiv:2407.05934
  52. 52. Sun Q, Li J, Peng H, Wu J, Ning Y, Yu PS, et al. SUGAR: subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In: Proceedings of the Web Conference 2021 . 2021. p. 2081–91. https://doi.org/10.1145/3442381.3449822
  53. 53. Tian C, Zhang F, Wang R. Adversarial regularized attributed network embedding for graph anomaly detection. Pattern Recognition Letters. 2024;183:111–6.
  54. 54. Chen Z, Liu B, Wang M, Dai P, Lv J, Bo L. Generative adversarial attributed network anomaly detection. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2020. p. 1989–92. https://doi.org/10.1145/3340531.3412070
  55. 55. Guo D, Liu Z, Li R. RegraphGAN: a graph generative adversarial network model for dynamic network anomaly detection. Neural Netw. 2023;166:273–85. pmid:37531727
  56. 56. He Z, Tang Y, Zhao K, Liu J, Chen W. Graph-based log anomaly detection via adversarial training. In: Dependable Software Engineering. Theories, Tools, and Applications.Springer Nature Singapore; 2023. p. 55–71. https://doi.org/10.1007/978-981-99-8664-4_4
  57. 57. Zheng B, Ming L, Zeng K, Zhou M, Zhang X, Ye T, et al. Adversarial graph neural network for multivariate time series anomaly detection. IEEE Trans Knowl Data Eng. 2024;36(12):7612–26.
  58. 58. Ding K, Li J, Agarwal N, Liu H. Inductive anomaly detection on attributed networks. In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020. p. 1288–94. https://doi.org/10.24963/ijcai.2020/179
  59. 59. Chaudhary A, Mittal H, Arora A. Anomaly detection using graph neural networks. In: 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon). 2019. p. 346–50. https://doi.org/10.1109/comitcon.2019.8862186
  60. 60. Wu J, Qiu Z, Zeng Z, Xiao R, Rida I, Zhang S. Graph autoencoder anomaly detection for E-commerce application by contextual integrating contrast with reconstruction and complementarity. IEEE Trans Consumer Electron. 2024;70(1):1623–30.
  61. 61. Dong G, Tang M, Wang Z, Gao J, Guo S, Cai L, et al. Graph neural networks in IoT: a survey. ACM Trans Sen Netw. 2023;19(2):1–50.
  62. 62. DeMedeiros K, Hendawi A, Alvarez M. A survey of AI-based anomaly detection in iot and sensor networks. Sensors (Basel). 2023;23(3):1352. pmid:36772393
  63. 63. Huang X, Yang Y, Wang Y, Wang C, Zhang Z, Xu J, et al. DGraph: a large-scale financial dataset for graph anomaly detection. In: Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track; 2022. https://openreview.net/forum?id=2rQPxsmjKF
  64. 64. De Meulemeester H, De Smet F, van Dorst J, Derroitte E, De Moor B. Explainable unsupervised anomaly detection for healthcare insurance data. BMC Med Inform Decis Mak. 2025;25(1):14. pmid:39789541
  65. 65. Zeng H, Zhou H, Srivastava A, Kannan R, Prasanna V. GraphSAINT: graph sampling based inductive learning method. In: International Conference on Learning Representations; 2020. https://openreview.net/forum?id=BJe8pkHFwS
  66. 66. Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech. 2008;2008(10):P10008.
  67. 67. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In: International Conference on Learning Representations; 2018. https://openreview.net/forum?id=rJXMpikCZ
  68. 68. Kipf TN, Welling M. Variational graph auto-encoders. arXiv preprint 2016. https://doi.org/arXiv:1611.07308
  69. 69. Liu K, Dou Y, Zhao Y, Ding X, Hu X, Zhang R. BOND: benchmarking unsupervised outlier node detection on static attributed graphs. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022. p. 27021–35.
  70. 70. Sanchez PI, Muller E, Laforet F, Keller F, Bohm K. Statistical selection of congruent subspaces for mining attributed graphs. In: 2013 IEEE 13th International Conference on Data Mining. 2013. p. 647–56. http://dx.doi.org/10.1109/icdm.2013.88https://doi.org/10.1109/icdm.2013.88
  71. 71. McAuley J, Leskovec J. Discovering social circles in ego networks. arXiv preprint 2012. https://doi.org/arXiv:1210.8182
  72. 72. Huang X, Li J, Hu X. Label informed attributed network embedding. In: Proceedings of the tenth ACM international conference on web search and data mining. 2017. p. 731–9. https://doi.org/10.1145/3018661.3018667
  73. 73. Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z. ArnetMiner. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. 2008. p. 990–8. https://doi.org/10.1145/1401890.1402008
  74. 74. Kumar S, Zhang X, Leskovec J. Predicting dynamic embedding trajectory in temporal interaction networks. KDD. 2019;2019:1269–78. pmid:31538030
  75. 75. Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters. 2006;27(8):861–74.
  76. 76. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, et al. PyTorch: an imperative style, high-performance deep learning library. arXiv preprint 2019. https://arxiv.org/abs/1912.01703
  77. 77. Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint 2014. https://arxiv.org/abs/1412.6980
  78. 78. Lamichhane PB, Eberle W. Anomaly detection in graph structured data: a survey. arXiv preprint 2024.
  79. 79. Xing L, Li S, Zhang Q, Wu H, Ma H, Zhang X. A survey on social network’s anomalous behavior detection. Complex Intell Syst. 2024;10(4):5917–32.
  80. 80. Yu R, Qiu H, Wen Z, Lin C, Liu Y. A survey on social media anomaly detection. SIGKDD Explor Newsl. 2016;18(1):1–14.
  81. 81. Rossetti G, Cazabet R. Community discovery in dynamic networks. ACM Comput Surv. 2018;51(2):1–37.
  82. 82. Zhou R, Zhang Q, Zhang P, Niu L, Lin X. Anomaly detection in dynamic attributed networks. Neural Comput & Applic. 2020;33(6):2125–36.