Accurate graph classification via two-staged contrastive curriculum learning

Given a graph dataset, how can we generate meaningful graph representations that maximize classification accuracy? Learning representative graph embeddings is important for solving various real-world graph-based tasks. Graph contrastive learning aims to learn representations of graphs by capturing the relationship between the original graph and the augmented graph. However, previous contrastive learning methods neither capture semantic information within graphs nor consider both nodes and graphs while learning graph embeddings. We propose TAG (Two-staged contrAstive curriculum learning for Graphs), a two-staged contrastive learning method for graph classification. TAG learns graph representations in two levels: node-level and graph level, by exploiting six degree-based model-agnostic augmentation algorithms. Experiments show that TAG outperforms both unsupervised and supervised methods in classification accuracy, achieving up to 4.08% points and 4.76% points higher than the second-best unsupervised and supervised methods on average, respectively.


Introduction
How can we generate graph representations for accurate graph classification?Graph neural network (GNN) has drawn the attention of researchers since it is applicable to real-world graphstructured data including social networks, molecular graphs, etc. Various GNNs have been proposed to solve graph classification [1][2][3][4][5][6][7].
A main challenge of accurate graph classification is to learn graph embeddings that reflect the crucial information within graphs.Contrastive learning has been widely used to address the issue and achieved superior performance on the graph classification task.Graph contrastive learning produces the representations of graphs based on the similarity between graphs.The learning algorithm can be used in both settings: unsupervised [8][9][10][11][12][13][14] and supervised settings [15,16].
Recent graph contrastive learning methods utilize data augmentation to ensure the similarity of the original graph and the newly generated graph.Random-based augmentations are used to generate graphs in [9,10,13], but information loss is inevitable in those methods.Graph contrastive learning methods with carefully designed augmentations [8,11,12,14] preserve more graph semantics compared to those with random-based ones; however, these methods increase the complexity of their models.Furthermore, none of the previous approaches optimize node embeddings which are the basis of graph embeddings.
In this paper, we propose TAG (TWO-STAGED CONTRASTIVE CURRICULUM LEARNING FOR GRAPHS), an accurate graph contrastive learning approach that can be applied to both supervised and unsupervised graph classification.We design six model-agnostic augmentation algorithms that preserve the semantic information of graphs.Three algorithms change the features of nodes, and the other three modify the structure of graphs based on degree centrality.We then conduct graph contrastive learning in two levels: node-level and graph-level.Node-level contrastive learning learns node embeddings based on the relationship between nodes.Graphlevel contrastive learning learns the embeddings of graphs based on node embeddings.The embeddings of all nodes within a graph are aggregated to generate a graph embedding.Thus, the relationships of both nodes and graphs are reflected in the graph representations.Furthermore, TAG exploits a curriculum learning strategy to enhance performance.Fig 1 shows the overall performance of TAG; note that TAG outperforms the competitors in both unsupervised and supervised settings.
Our main contributions are summarized as follows: • Data augmentation.We propose six model-agnostic augmentation algorithms for graphs.Every augmentation method considers node centrality to preserve semantic information of original graphs.
• Method.We propose TAG, a two-staged contrastive curriculum learning method for accurate graph classification.The two-staged approach embeds the relational information of both nodes and graphs into the graph representations.
• Experiments.We perform experiments on seven benchmark datasets in supervised and unsupervised settings, achieving the best performance.
Table 1 describes the symbols used in this paper.The code is available at https://github.com/snudatalab/TAG.

Node-level graph contrastive learning
Node-level graph contrastive learning methods are designed to handle node classification task by capturing the relationship between nodes.DGI [17] is the first work that applies the concept of contrastive learning to the graph domain.JGCL [18] combines supervised setting, semi-supervised setting, and unsupervised setting to learn the optimal node representations.GMI [19] defines the concept of graph mutual information (GMI) and aims to maximize the mutual information in terms of node features and topology of graphs.GCC [20] learns transferable structural representation across various networks to guide the pre-training of graph neural networks.GRACE [21] jointly considers both topology and node attribute levels for corruption to generate graph views and maximizes the agreement in the views at the node level.Zhu et al. [22] propose GCA which removes unimportant edges by giving them large removal probabilities on the topology level and adds more noise to unimportant feature dimensions on the node attribute level for adaptive augmentation.BGRL [23] is a scalable method with two encoders that learns by predicting alternative augmentations of the input.Graph Barlow Twins (G-BT) [24] is a model that replaces negative samples with a cross-correlation-based loss function and does not introduce asymmetry in the network.black However, those previous approaches for node-level graph contrastive learning address only the node classification problem, making them unsuitable for graph classification problem.

Graph-level graph contrastive learning
Graph-level graph contrastive learning aims to obtain graph representations to solve graph classification task.Previous graph-level contrastive learning methods are divided into two types: model-specific and model-agnostic ones.Model-agnostic approaches use augmentation algorithms which do not engage in the training process.GraphCL [10] brings the contrastive learning method for images to the graph domain.CuCo [13] extends GraphCL by applying curriculum learning to properly learn from the negative samples.MVGRL [9] learns graph-  level representations by contrasting encodings from first-order neighbors and graph diffusion.These methods use random-based graph augmentations that cannot preserve the core information of graphs well.We propose a graph contrastive learning method along with degreebased augmentations to address the issue.Model-specific augmentation approaches directly participate in the training process.Info-Graph [8] learns graph representations by contrasting them with patch-level representations obtained from the training process.You et al. [11] propose JOAO which changes the simple augmentations to be learnable.AD-GCL [12] adopts the structure of an adversarial attack to obtain graph representations.AutoGCL [14] generates new graphs by changing the softmax function into the Gumbel-Softmax function.black However, those approaches for graph-level graph contrastive learning are more complex than model-agnostic methods, significantly increasing the training time.Therefore, we propose a contrastive learning method with simple augmentations for computational efficiency.

Graph augmentation
Data augmentation has garnered significant attention recently, due to its successful application to many domains including image classification [25], natural language processing (NLP) [26], human activity recognition (HAR) [27,28], and cognitive engagement classification [29].Among them, graph augmentation methods are actively studied for improving the performance of graph contrastive learning.
Graph augmentation algorithms are divided into two types: model-specific and modelagnostic augmentation.Model-specific augmentation algorithms are restricted to a certain model.black Thus, those augmentation methods are not easy to be directly used in graph contrastive learning.
Model-agnostic graph augmentations are applied to any graph neural network.You et al. [10] suggest DropNode and ChangeAttr for graph contrastive learning.DropNode discards randomly selected nodes with their connections and ChangeAttr converts features of randomly selected nodes into random values.DropEdge [30] changes graph topology by removing a certain ratio of edges.GraphCrop [31] selects a subgraph from a graph through a random walk.Wang et al. [32] introduce NodeAug which contains three different augmentations: ReplaceAttr, RemoveEdge, and AddEdge.ReplaceAttr substitutes the feature of a chosen node with the average of its neighboring nodes' features.RemoveEdge discards edges based on the importance score of edges.AddEdge attaches new edges to a central node which is designated based on the importance score for nodes.Motif-similarity [33] adds and deletes edges from motifs that are frequent in a particular graph.Yoo et al. [34] proposes NodeSam and Sub-Mix.NodeSam performs split and merge operations on nodes.SubMix replaces a subgraph of a graph with another subgraph cut off from another graph.black SFA [35] proposes a spectral feature argumentation for contrastive learning on graphs.
However, previous model-agnostic augmentation algorithms [10,[31][32][33][34] change nodes or edges that are randomly selected, which easily overlook the semantic information of the original graphs.Another limitation is that previous approaches change only node attributes [35] or graph structures [30,33], restricting the diversity of augmented examples.On the other hand, TAG changes both node attributes and graph structures based on the degree centrality to preserve crucial information of graphs.

Preliminary on graph contrastive learning
In this section, we describe the preliminary of our work.Contrastive learning aims to learn embeddings by capturing the relational information between instances.For each instance, positive and negative samples need to be defined to maximize the similarity between a given instance and a positive sample compared to negative samples.Graph contrastive learning operates on graph-structured data.Recent works utilize data augmentation to generate positive samples.Previous graph contrastive learning methods are divided into two categories: nodelevel and graph-level contrastive learning.
Node-level graph contrastive learning methods obtain node embeddings of a graph.Given a graph, previous approaches augment the given graph and contrast nodes of the given graph and the augmented graph.A pair of nodes from two graphs at the same position is defined as positive samples and all other nodes except for positive samples are defined as negative samples.The model then learns the similarity of a positive pair against a negative pair.Graph-level graph contrastive learning methods learn graph embeddings by contrasting the graphs.Previous approaches set two augmented graphs with the same origin as positive samples and all other graphs in the training set except for the original graph as negative samples.Graph-level contrastive learning models then capture the similarity between a positive pair of graphs compared to a negative pair.
Despite the decent performance of graph contrastive learning, there is still a room for improvement.First, the relationship between node and graph embeddings has not been studied.Even though graph embeddings are obtained based on node embeddings, previous graph contrastive learning methods do not consider node embeddings.Second, most augmentation algorithms for contrastive learning randomly select nodes or edges to be modified.Since node feature and graph topology are the most essential components of graphstructured data, augmenting graphs while preserving crucial information within the pivotal components is important.However, previous methods rely on random-based augmentation algorithms which inevitably involve information loss.Finally, the influence of both positive and negative samples has not been studied.Previous methods focus on either positive or negative samples.To improve the performance of graph contrastive learning, well-defining both positive and negative samples is important.In this work, we propose TAG which addresses the three issues.

Proposed method
We propose TAG, a two-staged contrastive curriculum learning framework for graphs.The main challenges and our approaches are as follows: 1. How can we generate graph representations in both unsupervised and supervised settings?We propose a two-staged graph contrastive curriculum learning method that is applied to both settings through two types of loss functions.

How can we design augmentations for contrastive learning to preserve the semantics well?
We propose six data augmentation algorithms for graph contrastive learning.The augmentation algorithms consider degree centrality to minimize information loss.

How can we determine the order of feeding the negative examples in contrastive learning?
We exploit curriculum learning to determine the order of negative samples and maximize the performance of the model.

Data augmentation
Our goal is to design data augmentation algorithms that minimize the information loss of graphs.Data augmentation is used to ensure the similarity between samples in contrastive learning.The most important challenge of augmentation is preserving the semantics, or keeping crucial information in determining graph labels.If the semantics are not preserved well in the process of augmentation, the original graph and the augmented graph would have different labels, resulting in increased dissimilarity.Therefore, we propose six model-agnostic graph augmentation algorithms based on degree centrality to minimize information loss.Our idea is to change low-degree nodes to minimize the loss of semantics.We categorize the six augmentation methods into two types: feature and structure modification.Feature modification algorithms generate new graphs by changing only the node feature.On the other hand, structure modification algorithms change the graph structure.We propose three algorithms for each type.The three algorithms designed for feature augmentation are listed as follows: 1. Edit feature.Randomly change the features of n nodes with the lowest degrees.
2. Mix feature.Mix the features of two selected nodes and then substitute the mixed features for the features of nodes with lower degrees.Repeat this process n times.
3. Add noise.Add noise to the features of selected nodes.n nodes with the lowest degrees are selected to be modified.
The algorithms for structure augmentation are as follows: 1. Delete node.Discard n nodes with the lowest degrees along with their connections.
2. Delete edge.Select m edges from nodes with the lowest degrees.Remove the selected edges.
3. Cut subgraph.Select a subgraph with high-degree nodes.
n and m denote the number of nodes and edges to be modified, respectively.nand m are decided according to the augmentation ratio which is given as a hyperparameter.All algorithms consider degree centrality to keep semantic information.
Algorithm 1 TAG (Two-staged Contrastive Curriculum Learning for Graphs)

Two-staged contrastive learning
We propose a graph contrastive learning model for accurate graph classification utilizing all the proposed augmentation algorithms.Graph contrastive learning is a self-supervised approach that allows a model to learn the representations of graphs without labels by teaching the model which graph instances are similar or different.We use the data augmentation algorithms proposed in the Data augmentation section to generate similar graphs.Considering the fact that graph embeddings are obtained based on node embeddings, learning the representative embeddings from both nodes and graphs is important.We propose TAG which conducts graph contrastive learning on two stages: node-level and graph-level.
Algorithm 1 shows the overall training process of TAG.Given a training set D of graphs, we first augment graphs in D before training, and then perform two-staged contrastive curriculum learning.Node-level contrastive curriculum learning captures the relational information between nodes in a graph G i and a feature-modified graph G f,i (line 8 in Algorithm 1).Graphlevel contrastive learning extracts representative graph embeddings by maximizing the similarity between graphs G f,i and G s,i with the same origin (line 9 in Algorithm 1).A graph neural network is trained by minimizing the proposed two-staged contrastive loss (line 12 in Algorithm 1).
In the following, we first explain the two-staged approach of TAG in detail.Then, we describe how to apply TAG for supervised graph classification and how to exploit curriculum learning for determining the order of negative samples.
Algorithm 2 ContrastNode in TAG end for 12: Sort negative nodes according to S in the ascending order 13: end for 14: Compute l n (i) ⊳ Eq 1 Node-level contrastive learning.The objective of the node-level contrastive learning in TAG is to learn meaningful node representations by embedding the nodes into a latent space where positive pairs of nodes are more closely located than negative ones.Positive pairs (v j , u j ) of nodes are obtained by selecting a node v j from an original graph G i , and a node u j from a feature-augmented graph G f,i with the same position.We utilize all of the proposed augmentations by randomly selecting an augmentation algorithm for a graph from the proposed algorithms.
There are two types of negative node pairs: 1) pairs (v j , v k ) of nodes both sampled from the original graph G i , and 2) pairs (v j , u k ) of nodes sampled from G i and G f,i , respectively.All nodes in G i which are not selected for the positive pairs are used to generate the negative samples v k .Similarly, every node u k from G f,i except for the selected positive node u j is treated as a negative sample.The process of sampling positive and negative pairs of nodes for the nodelevel contrastive learning is illustrated in Fig 4.
The node-level contrastive loss l n is defined as follows: where sim(�) denotes the cosine similarity function, τ is the temperature parameter, and K is the number of nodes in a graph.Vectors v j and u j are the hidden representations of nodes v j and u j , respectively.Algorithm 1 shows the process of calculating node-level contrastive loss.We exploit curriculum learning and compute the loss with reordered negative samples whose ordering is determined in line 12 of Algorithm 1.We feed negative samples from easy to hard ones where the difficulty of a negative sample is defined as the cosine similarity of the sample and its paired positive sample.

Algorithm 3 ContrastGraph in TAG
Input: feature-augmented graph G f,i , structure-augmented graphs fG s;i g N i¼1 , and graph neural network f with parameters θ Output: raph-level contrastive loss l g (i) for a graph select a positive pair of graphs 2: i 0 1 to N do 3: G s,i 0 select a negative graph 5: end if 8: end for 9: Sort negative graphs according to S in the ascending order 10: Compute l g (i) ⊳ Eq 2 Graph-level contrastive learning.Graph-level contrastive learning in TAG aims to obtain representative graph embeddings.Graph embeddings are learned by collecting all node embeddings within a graph with the average function.As with node-level contrastive learning, positive and negative samples are defined using augmentation in graph-level contrastive learning.
A positive pair (G f,i , G s,i ) of graphs contains a feature-modified graph G f,i and a structuremodified graph G s,i of a graph G i .Feature modification and structure modification algorithms are randomly chosen from the proposed augmentation algorithms.Negative pairs are (G f,i , G s,i 0 ) where The graph-level contrastive loss l g is written as below: where z �,i is a representation of graph G �,i and N is the number of graphs for training.Algorithm 3 describes the process of calculating graph-level contrastive loss where graph representations are obtained based on node representations in line 5.We reorder the negative samples in line 9 of Algorithm 3 to maximize the performance of TAG by exploiting curriculum learning.TAG trains the samples gradually from easy to hard ones where a negative pair of graphs with low similarity is regarded as an easy sample.
The final loss function L for TAG jointly uses the node-level and graph-level contrastive losses.Given a set D of graphs for training, where l n (i) and l g (i) are node-and graph-level losses for a graph G i , respectively.Supervised contrastive learning.To further improve the performance of TAG, we design the proposed method to operate in the supervised setting as well.In supervised graph classification, the labels of graphs are available while training.To exploit the information of the given labels, we use the typical cross-entropy loss l ce (�).Specifically, the loss l ce ðy i ; ŷi Þ between the one-hot encoded label y i and the prediction probability ŷi of a graph G i is computed as follows: where C is the number of classes, y i (c) is c-th element of y i , and ŷi ðcÞ is the prediction probability of a graph G i to class c.Node and graph representation vectors in Eqs 1 and 2 are learned using a graph neural network.For supervised graph classification, we attach a fully-connected layer to the final layer of the graph neural network to construct TAG as an end-to-end model.The probability vector ŷi is obtained through the softmax function after a fully-connected layer.
To fully exploit both the result of the two-staged contrastive learning and the information of given labels while training, we minimize the supervised loss l ce ðy i ; ŷi Þ and the two-staged contrastive loss L simultaneously.Thus, the loss L sup for supervised learning is computed by adding the cross-entropy loss to the loss in Eq (3): where l n (i) and l g (i) are node-and graph-level losses for a graph G i , respectively.N denotes the size of a set D.

Curriculum learning
Curriculum learning imitates the learning process of humans who start learning from easier samples, and then learn more from harder samples.To further improve the performance of TAG, we reorder the samples for training by exploiting the curriculum learning strategy.A naive approach would define negative samples that are misclassified with high probability as hard samples.However, this is not directly applicable to the contrastive learning methods including TAG since the labels may not be given.
To determine the difficulty of samples regarding the two-staged contrastive loss, we utilize the similarity between positive and negative samples.The sample with a large loss is hard to learn because loss minimization is difficult.However, it is hard to use the loss as a difficulty measure since reordering should be done before loss calculation.Thus, we define the cosine similarity of nodes in a negative pair which affects the size of the loss as a difficulty score.If a negative sample is similar to a positive sample, the model struggles to find the difference between the samples causing a large loss.We feed negative samples with lower similarity first, and then move on to harder negative samples as training continues to facilitate effective training.Both node-level and graph-level contrastive learning train negative samples gradually from easy to hard ones.

Experiments
We perform experiments to answer the following questions:

Experimental settings
We introduce our experimental settings including datasets, competitors, and hyperparameters.All of our experiments are conducted on a single GPU machine with GeForce GTX 1080 Ti.Datasets.We use seven benchmark datasets for graph classification task in our experiments, which are summarized in Table 2. MUTAG, PROTEINS, NCI1, NCI109, DD, and PTC-MR [36] are molecular datasets where the nodes stand for atoms and are labeled by the atom type, while edges are bonds between the atoms.DBLP [37] is a citation network dataset in the computer science field whose nodes represent scientific publications.
Competitors.We compare TAG in supervised and unsupervised settings.For the unsupervised setting, we compare TAG with ten previous approaches for unsupervised graph classification, including those for contrastive learning.
• DGK [38] learns latent representations of graphs by adopting the concept of the skip-gram model.
• sub2vec [39] is an unsupervised learning algorithm that captures two properties of subgraphs: neighborhood and structure.
• graph2vec [40] extends neural networks for document embedding to the graph domain, by viewing the graphs as documents.
• InfoGraph [8] generates graph representations by maximizing mutual information between graph-level and patch-level representations.
• MVGRL [9] learns graph representations by contrasting two diffusion matrices transformed from the adjacency matrix.
• JOAO [11] jointly optimizes augmentation selection together with the contrastive objectives.
• AD-GCL [12] uses an adversarial training strategy for edge-dropping augmentation of graphs.
• CuCo [13] adopts curriculum learning to graph contrastive learning for performance improvement.
• AutoGCL [14] uses node representations to predict the probability of selecting a certain augment operation.
We use support vector machine (SVM) and multi-layer perceptron (MLP) as base classifiers to evaluate the competitors and TAG in an unsupervised setting.We select an SVM classifier among various machine learning classifiers for a fair comparison since the competitors use SVM to evaluate their methods.To evaluate methods in deep learning as well as in machine learning, we exploit an MLP classifier.
In the supervised setting, we compare the accuracy of TAG with 4 baselines: • GCN+GMP [41] uses the graph convolutional network (GCN) to learn the node representations, and the global mean pooling (GMP) is applied to obtain the graph representation.
• GIN [5] uses multi-layer perceptrons (MLP) to update node representations, and sums them up to generate the graph representation.
• ASAP [6] alternatively clusters nodes in a graph and gathers the representations of clusters to obtain graph representations.• GMT [7] designs graph pooling layer based on multi-head attention.
We run 10-fold cross-validation to evaluate the competitors and TAG.Hyperparameters.We use GCN [41] to learn node embeddings and apply the global mean pooling algorithm to generate a graph embedding.We set the augmentation ratio which decides the amount of data to be changed to 0.4.The ratio is the only hyperparameter for data augmentation of TAG.Thus, TAG does not suffer from hyperparameter optimization problems.We train each model using the Adam optimizer with a learning rate of 0.0001.We set the number of epochs to 5.

Performance on unsupervised classification
We evaluate unsupervised graph classification accuracy and running time of TAG.The graph classification accuracy of TAG and previous unsupervised methods are described in Table 3.We adopt support vector machine (SVM) and multi-layer perceptron (MLP) as base classifiers for TAG and the baselines.Note that TAG achieves the best accuracy, giving 4.08% points and 2.14% points higher accuracy than the second-best competitors on average in SVM and MLP classifiers, respectively.
The overall performance in the unsupervised setting of TAG with two classifiers including the running time is summarized in Figs 6 and 7. Fig 6 shows the results of TAG and previous approaches with an SVM classifier.Note that TAG shows the highest classification accuracy in most cases with the shortest running time.This shows that TAG effectively and efficiently finds the graph representations for unsupervised graph classification from large graphs.

Performance on supervised classification
TAG also operates in the supervised graph classification task in addition to the unsupervised one.We compare TAG with four baselines for supervised graph classification in Table 4.We use classification accuracy and running time as the evaluation metrics.Note that TAG gives the highest accuracy, with 4.76% points higher average accuracy than the second-best method.Specifically, TAG in the supervised setting achieves 4.50% points and 13.26% points higher average accuracy than that in the unsupervised setting in SVM and MLP classifiers, respectively.
Fig 8 shows the classification accuracy and the running time of TAG and baselines in a supervised setting.Note that TAG gives the shortest running time with the highest accuracy in most of the cases.This shows that TAG efficiently learns meaningful graph representations not only for unsupervised graph classification, but also supervised one.

Effectiveness of proposed augmentations
We compare the proposed augmentations of TAG with eight previous model-agnostic augmentation algorithms for graphs.ChangeAttr modifies features and the other methods change the structure of graphs.Recall that TAG performs graph contrastive learning in two levels: node-level and graph-level.For node-level, TAG needs feature-augmented graphs.For graphlevel, TAG needs feature and structure augmentations.Thus, both augmentation algorithms are necessary for TAG.MVGRL [9], GraphCL [10], and CuCo [13] are previous methods that adopt model-agnostic graph augmentations.However, MVGRL causes out-of-memory errors for large-scale graph datasets.CuCo is more elaborate than GraphCL since it additionally performs curriculum learning.Therefore, we compare TAG with previous augmentation algorithms by applying them to CuCo.Table 5 shows the classification results using different augmentations.The accuracy is measured with an SVM classifier.TAG outperforms the baselines in most cases.Specifically, TAG achieves 5.05% points higher average accuracy than the strongest baseline SubMix.Note that random-based augmentations DropNode, DropEdge, GraphCrop, and ChangeAttr degrade the performance of CuCo for all datasets.This proves that random-based augmentation methods have difficulty preserving the semantics.In contrast, TAG with the proposed augmentations help enhance the performance.
We also show the effectiveness of the degree-based node and edge selection of TAG for graph augmentation.We compare TAG with two different selection methods: TAG-random and TAG-reverse.TAG-random randomly selects the nodes or edges to be changed.TAGreverse selects the nodes or edges from high to low degrees.Table 6 reports the classification accuracy of TAG and the baselines.We use SVM and MLP classifiers to measure the accuracy.Note that TAG outperforms the baselines in all datasets.Specifically, TAG achieves up to 4.36% points and 4.19% points higher average accuracy than the second-best baselines in SVM and MLP classifiers, respectively.This shows that the proposed augmentations of TAG considering the degree centrality effectively improves the graph classification accuracy.

Ablation study
We perform an ablation study for TAG and report the results in Table 7.The methods w/o curriculum and w/o node-level are TAG without the curriculum learning and the two-staged structure performing only graph-level contrastive learning, respectively.We also run TAG while fixing the proposed augmentations.black Since TAG needs both feature and structure augmentation algorithms to conduct two-staged contrastive learning, we evaluate the performance of pairs of algorithms.For example, the 'Edit feature + Delete node' runs TAG using 'edit feature' and 'delete node' algorithms for feature and structure modification, respectively.
TAG with the curriculum learning improves the classification performance of SVM and MLP by 6.20% and 3.35% points on average, respectively, compared to that without the curriculum learning.Using both node-level and graph-level contrastive learning on TAG achieves 6.55% and 4.16% points higher average accuracy than using only graph-level contrastive  learning on TAG in SVM and MLP classifiers, respectively.Experimental results of fixing the proposed augmentations show higher accuracies than the methods w/o curriculum and w/o node-level.The results prove that the proposed augmentation algorithms preserve the semantics well since the accuracies of the fixed augmentation methods are comparable to TAG.Furthermore, TAG achieves the best performance when it utilizes all the proposed augmentation algorithms.The results show that the proposed ideas, i.e., the two-staged framework,

Conclusion
We propose TAG, a two-staged contrastive curriculum learning model for graphs.We introduce two types of data augmentations for graphs and propose six model-agnostic augmentation algorithms that minimize information loss.TAG conducts contrastive curriculum learning in two stages.In the first stage, TAG gathers the relational information between nodes from an original graph and a feature-modified graph.In the second stage, the proposed method utilizes both feature-modified and structure-modified graphs to learn the similarity between them.We exploit curriculum learning to effectively train the modelvia carefully selected ordering of negative samples.We evaluate TAG by measuring the graph classification accuracy and running time.TAG shows the fastest running time and the best accuracy achieving up to 4.08% points and 4.76% points higher average accuracy than the second-best competitors in unsupervised and supervised settings, respectively.Future works include designing an accurate graph classification method for hypergraphs.

Fig 1 .
Fig 1. Overall performance of TAG in unsupervised and supervised graph classification.(a-d) are the performance in unsupervised setting, and (eh) are that in supervised one.Note that TAG shows the highest classification accuracy with the shortest running time in both settings.https://doi.org/10.1371/journal.pone.0296171.g001

Figs 2 and 3 .
Fig 2 explains how the proposed method learns a training set.Fig 3 illustrates the details of performing augmentation and contrastive learning.Given a graph dataset, we first augment graphs, and then perform contrastive curriculum learning in two levels: nodes and graphs.

Fig 2 .Fig 3 .
Fig 2. Overview of the proposed method.TAG first augments all graphs in a training set D, and then performs node-level and graph-level contrastive curriculum learning.For contrastive learning, TAG defines positive and negative samples, and computes the similarity between them.The proposed method learns negative samples from easy to hard ones which is determined based on the similarity.https://doi.org/10.1371/journal.pone.0296171.g002 Fig 5 explains positive and negative samples designed for graph-level contrastive learning.

Fig 4 .
Fig 4. Positive and negative samples of the node-level contrastive learning.Nodes v j and v k are selected from the original graph G i while nodes u j and u k are sampled from a feature-augmented graph G f,i at the same position.https://doi.org/10.1371/journal.pone.0296171.g004

Fig 5 .
Fig 5. Illustration of positive and negative samples for graph-level contrastive learning.(G f,i , G s,i ) is a positive pair originated from a graph G i , and (G f,i , G s,i 0 ) for i 6 ¼ i 0 are negative pairs.https://doi.org/10.1371/journal.pone.0296171.g005

Q1.
Performance on Unsupervised Classification.How fast and accurate is TAG compared to previous methods for unsupervised graph classification?Q2.Performance on Supervised Classification.Does TAG show superior performance than other baselines in supervised graph classification task?Q3.Effectiveness of Proposed Augmentations.Do the proposed augmentation algorithms improve the performance of TAG? Q4.Ablation Study.Does each step of TAG contribute to the performance of the unsupervised graph classification task?
Fig 7 shows the accuracy and running time of TAG and the competitors measured with an MLP classifier.TAG outperforms the competitors for most datasets.

Fig 6 .
Fig 6.Overall performance of TAG and previous unsupervised graph classification methods with an SVM classifier.Note that TAG shows the highest classification accuracy with the shortest running time in most cases.https://doi.org/10.1371/journal.pone.0296171.g006

Fig 7 .
Fig 7. Overall performance of TAG and previous unsupervised graph classification methods with an MLP classifier.(a-g) show the accuracy and running time of each dataset.TAG outperforms the competitors in most cases.https://doi.org/10.1371/journal.pone.0296171.g007

Fig 8 .
Fig 8. Overall performance of TAG with supervised graph classification methods.(a-g) show the performance in each dataset.Note that TAG shows the highest classification accuracy with the shortest running time for most datasets.https://doi.org/10.1371/journal.pone.0296171.g008

Table 1 . Description of symbols.
G s,i 0 structure-modified graph originated from G i 0 for i 6 ¼ i 0 v j j-th node in a graph G i u j j-th node in a graph G f,i https://doi.org/10.1371/journal.pone.0296171.t001 and graph neural network f with parameters θ Output: node-level contrastive loss l n (i) for a graph G i 1:

Table 3 . Accuracy of graph classification in unsupervised setting. Bold
and underlined text denote the best and the second-best accuracy, respectively.OOM and Avg.denote the out of memory error and average accuracy, respectively.Note that TAG shows the best classification accuracy.

Table 4 . Accuracy of graph classification in supervised setting.
Bold and underlined text denote the best and the second-best accuracy, respectively.Avg.denotes the average accuracy.Note that TAG shows the best accuracy. https://doi.org/10.1371/journal.pone.0296171.t004

Table 5 .
Comparison of the augmentation methods.We report the best and the second-best accuracy with bold and underlined texts, respectively.Avg.denotes the average accuracy.Note that TAG presents the best accuracy among the models.
https://doi.org/10.1371/journal.pone.0296171.t005exploitation of curriculum learning, and the proposed augmentation algorithms for contrastive learning improve the accuracy of graph classification.