Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Diffusion characteristics classification framework for identification of diffusion source in complex networks

  • Fan Yang,

    Roles Conceptualization, Methodology, Software, Validation, Writing – original draft

    Affiliations Key Laboratory of Intelligent Information Processing and Graph Processing, Guangxi University of Science and Technology, Liuzhou, Guangxi, China, School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, Guangxi, China

  • Jingxian Liu,

    Roles Methodology, Software, Supervision

    Affiliations Key Laboratory of Intelligent Information Processing and Graph Processing, Guangxi University of Science and Technology, Liuzhou, Guangxi, China, School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, Guangxi, China

  • Ruisheng Zhang,

    Roles Methodology, Supervision, Writing – original draft

    Affiliation School of Information Science and Engineering, Lanzhou University, Lanzhou, Gansu, China

  • Yabing Yao

    Roles Conceptualization, Supervision, Validation, Writing – original draft

    yaoyabing@lut.edu.cn

    Affiliation School of Computer and Communication, Lanzhou University of Technology, Lanzhou, Gansu, China

Abstract

The diffusion phenomena taking place in complex networks are usually modelled as diffusion process, such as the diffusion of diseases, rumors and viruses. Identification of diffusion source is crucial for developing strategies to control these harmful diffusion processes. At present, accurately identifying the diffusion source is still an opening challenge. In this paper, we define a kind of diffusion characteristics that is composed of the diffusion direction and time information of observers, and propose a neural networks based diffusion characteristics classification framework (NN-DCCF) to identify the source. The NN-DCCF contains three stages. First, the diffusion characteristics are utilized to construct network snapshot feature. Then, a graph LSTM auto-encoder is proposed to convert the network snapshot feature into low-dimension representation vectors. Further, a source classification neural network is proposed to identify the diffusion source by classifying the representation vectors. With NN-DCCF, the identification of diffusion source is converted into a classification problem. Experiments are performed on a series of synthetic and real networks. The results show that the NN-DCCF is feasible and effective in accurately identifying the diffusion source.

Introduction

Most complex systems in the real world take the form of networks [1] in which the nodes and edges denote the units and the interactions between units, respectively. Various diffusion phenomena taking place in networks are usually modelled as diffusion process [1], such as disease spreading [2], rumor diffusion [3] and computer virus propagation [4]. The ubiquity of these harmful diffusion processes has incurred huge losses to human society. Therefore, it is of great theoretical and practical significance to develop effective strategies to control the harmful diffusion process. One of the important measures is identifying the diffusion source that initiates the diffusion process on networks, which has attracted widespread attentions in recent years [5]. Many existing source identification methods provided effective solutions for some important issues in reality, such as identifying the source of SARS [6], COVID-19 [7], Cholera [8], finding the source of foodborne disease [9], etc. However, accurately identifying the diffusion source is still an opening challenge.

The success of artificial neural networks has boosted research on many scientific fields [1012]. Especially, the emergence of graph neural networks [13, 14] (GNNs) and network embedding [15, 16] facilitate the applications of artificial neural networks on irregular structures of networks. GNNs are the neural network models to address different graph tasks in an end-to-end way [13]. The most common GNNs include recurrent graphs neural networks [17], convolutional graph neural networks [18], graph autoencoders [13], etc. Network embedding is composed of various kinds of methods designed for a same task, i.e., network representation learning [13]. Recently, GNNs and network embedding have been successfully introduced into some important issues of complex networks [13, 16], such as link prediction and node classification. However, only a few artificial neural networks based methods focused on the diffusion source identification problem [19, 20]. Li et al. [19] proposed a label propagation framework to locate the diffusion source. Due to the common characteristics between label propagation framework and graph convolutional networks (GCNs), the source identification is converted into a multi-classification problem. Dong et al. [20] detected multiple sources by utilizing the wavefront information. Since existing GNNs is not a suitable solution for the wavefront based method, they developed a novel multi-task learning model based on encoder-decoder structure. Different from the two methods in [19] and [20], this paper utilizes the diffusion time and direction information recorded in limited observers to identify the diffusion source. The two types of information have been proved to be helpful in accurately identifying the source [2126]. We define the two types of information as diffusion characteristics, and identify the diffusion source by classifying the diffusion characteristics. Although existing GNNs and network embedding are powerful models to process graph data, both of them are not suitable to be used to process the diffusion characteristics which is dynamically generated in a diffusion process. Therefore, we develop a novel neural networks based diffusion characteristics classification framework, which contains the following three stages, (i) the diffusion characteristics are utilized to construct network snapshot feature, (ii) a graph LSTM auto-encoder is proposed, by which the network snapshot feature is represented as low-dimension vectors, (iii) a source classification neural network is proposed to identify the diffusion source by classifying the representation vectors of network snapshot feature. With the proposed framework, the identification of diffusion source is converted into a classification problem. Further, the feasibility and effectiveness of this framework is validated by the experimental results.

The rest of this paper is organized as follows. Existing related works are briefly reviewed in Section Related work. The neural networks based diffusion characteristics classification framework is proposed in Section Materials and methods. The experimental results are discussed in Section Results. We conclude this work in Section Conclusion.

Related work

The early diffusion source identification methods were developed for unweighted networks. A systematic method was pioneered by Shah et al. [27], they constructed a source estimator based on a topological quantity termed as Rumor Centrality (RC). The RC has been extended to identify the diffusion source in more complex environments [2830]. Zhu et al. [31] proposed a sample path based method termed as Jordan Center (JC), which has been improved to identify the diffusion source with limited observations [3234]. Meanwhile, many methods based on various ideas were proposed for unweighted networks, including the Dynamic Message Passing based method [35], the Belief Propagation based method [36], the Monte-Carlo method based method [37], the Rationality Observation based method [38], the Label Ranking framework based method [39], the Time Aggregated Graph based method [40], etc. The above methods are effective in unweighted networks. However, in reality, we have to consider various significant weights associated with the edges in networks, such as the traffic, the time delay and so on.

For weighted networks, Brockmann et al. [6] modeled the Global Mobility Network as a weighted graph, and identified the epidemic source based on a novel effective distance. This method has been extended to identify multi-source by Jiang et al. [41]. Meanwhile, several methods based on various ideas were proposed to identify the diffusion source in weighted networks [4244]. However, these methods require the knowledge of all nodes state. In reality, it is often the case that only limited nodes state can be observed [45]. For this problem, many methods were proposed by utilizing limited observers, including the Time-Reversal Backward Spreading algorithm [24], the Backward Diffusion-based method [46], the improved Gaussian estimator [47], the Gromov matrix based method [25], the Greedy Optimization based algorithm [26], the Sequential Neighbour Filtering algorithm [48], the Estimated Propagation Delay based algorithms [49], etc. These methods [2426, 4649] mainly utilized the diffusion time information of observers to identify the source. Pinto et al. [50] proposed a Gaussian estimator, which is the first method to identify the source by utilizing the diffusion direction information of observers. However, the diffusion direction information is only used in the tree graphs. Yang et al. [21] improved the accuracy of Gaussian estimator on general graphs by utilizing the diffusion direction information of observers. Zhu et al. [22, 23] also proposed a path-based source identification method by utilizing the diffusion direction information of observers. Obviously, the diffusion time and direction information of observers play important roles in accurately identifying the diffusion source.

Different from all the traditional source identification methods mentioned above, in recent years, a few artificial neural networks based methods are developed to identify the source. Li et al. [19] proposed a Source Identification Graph Convolutional Network (SIGN) framework, this method requires the knowledge of complete observation. Dong et al. [20] proposed a graph constraint based sequential source identification model. To obtain the wavefront information, this method [20] also requires the knowledge of complete observation. However, in reality, it is often the case that only limited nodes state can be observed [45]. In this paper, we identify the diffusion source by utilizing limited observers. We define the diffusion time and direction information of observers as diffusion characteristics, and propose an artificial neural networks based framework to identify the source by classifying the diffusion characteristics. The feasibility and effectiveness of the proposed framework are validated on a series of synthetic and real networks.

Materials and methods

Problem description and overview

A network the diffusion process taking place in is modelled as a finite and undirected graph , where V and E represent the nodes set and edges set, respectively. θ = {θvu}, θvu is the random propagation delay associated with an edge vu, vuE. Generally, is assumed to be known. We consider that the {θvu} associated with E are independent and identically distribution (I.I.D) random variables.

Diffusion model.

Assuming that the diffusion process taking place in follows a simple diffusion model that is similar to reference [50]. At time t, each node vV is only in one of the two states: (i) informed, if it has received the information from any one neighbour, or (ii) ignorant, if it has not been informed so far. Any node v is equally likely to be the source. The diffusion process is initiated by a single source s* at unknown start time, all nodes are ignorant except for s* is informed. Let denote the neighbour(s) of v. Suppose v is in the ignorant state, and receives the information for the first time from one informed neighbour w, thus becoming informed at time tv. Then, v will attempt to retransmit the information to all its other neighbours along the edges, so that each neighbour receives the information with success probability β at time tv + θvu. If there are two or more informed neighbours having a same propagation delay to u, u can be informed by only one neighbour. Once the diffusion process is terminated, a network snapshot, denoted by , will be generated.

For an arbitrary , with the diffusion model introduced above, a network snapshot is generated. Generally, only a part of nodes state in can be observed, we call these nodes observers, denoted by . The observations made by provide two types of information [21, 50]: (i) the direction in which information arrives to observers and, (ii) the timing at which the information arrives to observers. Obviously, the two types of information recorded in show the true details of the diffusion process, which have been proved to be helpful in accurately identifying the diffusion source [2126]. In this paper, the two types of information are defined as diffusion characteristics. The purpose is to find the diffusion source s* from by utilizing the diffusion characteristics recorded in . We propose a neural networks based diffusion characteristics classification framework (NN-DCCF) to identify the diffusion source, by which the identification of source is converted into a classification problem. NN-DCCF is composed of the following three stages.

  1. By selecting vital nodes and extending their neighbours in a given , we build . Then, for a , by utilizing the diffusion characteristics recorded in , we construct network snapshot feature, denoted by .
  2. We propose a graph LSTM auto-encoder (GLSTM-AE). By using GLSTM-AE, is represented as low-dimension vectors, denoted by .
  3. We propose a source classification neural network (SCNN) to estimate the diffusion source by classifying .

The overview of NN-DCCF is shown in Fig 1. Frequently used notations are summarized in Table 1.

thumbnail
Fig 1. Overview of NN-DCCF.

(a) The vital nodes selected by degree centrality [51] include node 8 and node 44. Observation areas set . . consists of node 8 and its neighbours within 1 hop distance. . consists of node 44 and its neighbours within 1 hop distance. . (b) . is composed of the sequence features constructed with the diffusion characteristics recorded in the observers of . is composed of the sequence features constructed with the diffusion characteristics recorded in the observers of . (c) By using GLSTM-AE, is converted into low-dimension representation vectors, i.e. . (d) With as the input of SCNN, we can estimate the diffusion source.

https://doi.org/10.1371/journal.pone.0285563.g001

Stage 1: Constructing network snapshot feature

To utilize the diffusion characteristics of observers to construct network snapshot feature, the observers set is built with the following strategy. Given a , we first rank the importance of nodes by a vital nodes identification methods [51]. Next, with the ranking results, we select the most important K nodes as vital nodes. Then, for each vital node, we extend its neighbours within h hops distance. Further, each vital node and its extended neighbours are combined to form an observation area. contains K observation areas , , . o denotes an unique observer in . When the diffusion process occurs on and generates , by utilizing the diffusion characteristics, i.e., the diffusion direction and time information, recorded in each , we construct the network snapshot feature . Here, we set . The procedure for constructing is summarised in Algorithm 1.

Algorithm 1 Network snapshot feature constructing algorithm

Input: , and

Output:

1: initialize an empty

2: sort all in according to the average informed time of

3: for each in do

4:  for each do

5:   initialize an empty seq to record a single sequence feature

6:   set current node c = o

7:   while do

8:    if c is in the informed state then

9:     add c into seq

10:     get next node n according to the diffusion direction information recorded in c and set c = n

11:    end if

12:   end while

13:   reverse the nodes order in seq

14:   if 1 < |seq| ≤ lmax then

15:    add seq into

16:   else if |seq| > lmax then

17:    remove the last |seq| − lmax nodes from seq

18:    add seq into

19:   end if

20:  end for

21: end for

22: for each in do

23:  remove duplicated sequence features from

24:  sort all sequence features in according to their length

25:  if then

26:   remove the last sequence features from

27:  end if

28: end for

In Algorithm 1, the inputs are the topology of , observation areas set and network snapshot . The average informed time of in step 2 is the average of the diffusion time information recorded in . Steps 4–20 are used for constructing the sequence features in by traversing each . Here, steps 7–12 are used to generate a single sequence feature, denoted by seq. A single seq is a basic unit for constructing . Obviously, generating a single seq depends on the diffusion direction information of observers. Step 13 is used to reverse the order of current seq. Steps 14–19 are used to add the seq into , where, 2 ≤ |seq| ≤ lmax. Further, from step 3 to step 21, the sequence features in each are constructed, then, we get . Steps 22–28 are used to remove the redundant sequence features and limit the size of . A schematic to obtain by using Algorithm 1 is shown in Fig 1(a) and 1(b).

Stage 2: GLSTM-AE based network snapshot feature representation

From Algorithm 1, we know that each sequence feature, termed as seq, in consists of several ordered informed nodes. Therefore, the seq is a type of sequential data. Inspired by the idea that the long short-term memory (S1 File) is a powerful tools for modelling sequential data [5254], we use the LSTM networks to learn the representation of seq. However, the seq is different from traditional sequential data since it is composed of ordered informed nodes. Further, we propose a graph LSTM auto-encoder (GLSTM-AE) to learn the low-dimension representation of seq. A GLSTM-AE consists of two LSTMs, the encoder LSTM and the decoder LSTM, as shown in Fig 2. GLSTM-AE works as follows. For an arbitrary seq, each node in seq is represented as an one-hot vector with dimension |V|. The input to GLSTM-AE is the one-hot representation of seq. The output of the encoder LSTM after the last input has been read is low-dimension representations of the one-hot vectors of seq, denoted by r, , where, dr denotes the representation dimension. r is the representation result we obtained from the GLSTM-AE. The decoder LSTM reconstruct back the input from r. The target of GLSTM-AE is same as the input.

thumbnail
Fig 2. The structure of GLSTM-AE, where, d denotes the dimension of a vector.

https://doi.org/10.1371/journal.pone.0285563.g002

Obviously, it is necessary to train GLSTM-AE before it is applied to learn the representations of the sequence features in . A simple way to obtain the training data of GLSTM-AE is generating sequence features with fixed length from .

Because the mean squared error (MSE) loss is commonly used for the regression task, it is suitable for the task of GLSTM-AE. Therefore, we adopt the MSE loss as the loss function of GLSTM-AE, which is described as follows. (1) where, Y denotes the output of the decoder LSTM in GLSTM-AE, Y* denotes the one-hot representation of seq.

Then, with the trained GLSTM-AE, we get the low-dimension representation of , denoted by . This process is summarised in Algorithm 2.

Algorithm 2 Network snapshot feature representation algorithm

Input:

Output:

1: initialize an empty ,

2: set

3: set

4: for each in do

5:  for each seq in do

6:   input=one-hot(seq), inputR|seq|×|V|

7:   r = GLSTM-AE (input),

8:   if |seq| < lmax then

9:    k = lmax − |seq|

10:    pad r with pl for k times

11:   end if

12:   add r into

13:  end for

14:  if then

15:   

16:   pad with pη for k times

17:  end if

18: end for

In Algorithm 2, the input is the network snapshot feature . The one-hot(⋅) function in step 6 is to get the one-hot representation of current seq. In step 7, the representation result r of seq is obtained by using the trained GLSTM-AE. Further, we set by steps 8–11, and set by steps 14–17.

Stage 3: Identify the diffusion source with SCNN

With Algorithm 2, we get the representation of , i.e. . In this section, with as input, we propose a source classification neural network (SCNN) to identify the diffusion source by classifying . SCNN is mainly composed of two fully connected layers. To get convergence faster, we add a normalization layer. The structure of SCNN is shown in Fig 3, where, the LogSoftmax is used for multi-class classification.

SCNN also requires to be trained before it is applied to identify the diffusion source. The training data of SCNN can be generated by Algorithm 3.

Algorithm 3 SCNN training data generating algorithm

Input: and

Output: training data collector C

1: specify the number of loops N

2: initialize an empty training data collector C

3: set , βi ∈ (0, 1), ∀i, j ∈ [1, M], βiβj

4: while N > 0 do

5:  for βiβ do

6:   for each node vV do

7:    generate by running the diffusion model (see Diffusion model) on with v as diffusion source and βi as propagation rate

8:    generate by Algorithm 1

9:    construct corresponding to by Algorithm 2

10:    add a training data (, one-hot(v)) into C

11:   end for

12:  end for

13:  N = N − 1

14: end while

In Algorithm 3, the inputs are the topology of and observation areas set . From step 7 to step 10, given a node v and a propagation rate β, a single training data can be generated, which is composed of the and the one-hot representation of v. Obviously, the SCNN training dataset size is N ⋅ |β| ⋅ |V|.

Because the cross entropy loss is mainly used for classification, we adopt the cross entropy loss as the loss function of SCNN, which is described as follows. (2) where, Z denotes the estimated diffusion source obtained by SCNN, Z* denotes the one-hot representation of true diffusion source.

Finally, by combining with the trained SCNN, the algorithm corresponding to NN-DCCF is summarised as Algorithm 4.

Algorithm 4 Diffusion source identification algorithm

Input: , and

Output:

 1: generate according Algorithm 1

 2: construct corresponding to according to Algorithm 2

 3: output = SCNN , outputR|V|

 4:

Results

Main experimental environment

Hardware: Dell R740 with 2 Intel(R) Xeon(R) gold 6254 CPU, 1 TB RAM, 1 NVIDIA Tesla V100S GPU with 32 GB GPU memory. Software: Python 3.8.10 + PyTorch 1.10.2 + CUDA 10.2.

Methods for comparison

Essentially, the proposed NN-DCCF is an observers based method. To validate its feasibility and effectiveness, three existing state-of-the-art observers based methods are selected for comparison, including time-reversal backward spreading (TRBS) algorithm [24], sequential neighbour filtering (SNF) algorithm [48] and estimated propagation delay (EPD) algorithm [49].

Datasets

We compare the four diffusion source identification methods on a series of synthetic and real networks. The synthetic networks include scale-free (BA) [55] model and small-world (WS) [56] model. The parameters for generating synthetic networks are summarised in Tables 2 and 3. The real networks are of different types, including NetworkScience (https://networkrepository.com/ca-netscience.php) [57], Euroroads (https://networkrepository.com/subelj-euroroad.php) [57], Email (https://networkrepository.com/email-univ.php) [57] and Blogs (https://doi.org/10.1007/978–3-642-01206-8_5) [58]. The topological properties of all networks are summarised in Table 4.

Evaluation metrics

The performance of diffusion source identification methods are commonly evaluated with two metrics [5, 21, 25, 27, 34], including the precision and average error distance. The precision focuses on evaluating the capability of a method in precise identification (i.e. the proportion of 0 error hop). For each network, we randomly select 100 nodes as test seeds. For the precision, the higher the value is, the better the algorithm is. For the average error hop, the smaller the value is, the better the algorithm is.

Parameters setting

For an arbitrary , we assume that θ are Gaussian distribution [24, 49], μ and σ2 are known [50], here, we set μ/σ = 4 [21, 50]. We assume that the diffusion process on networks follows the diffusion model introduced in Diffusion model. To investigate the performance of NN-DCCF under different propagation rates, we set relatively larger range for β, β ∈ [0.1, 0.9]. The diffusion process is terminated when there is no ignorant node.

How to select a suitable observers placement strategy may depends on the topology of a network [61]. Although lots of methods [51] can be used to select , sometimes there maybe no significant difference between the placement strategies for the performance of source identification [62]. In this paper, is selected by the strategy introduced in Section Stage 1. Here, the K vital nodes in are selected by the degree centrality (DC) [51] (due to the simplicity and efficiency of DC). For each network, we select 1% nodes as vital nodes. Then, by extending the neighbours within 1 hop distance of the vital nodes, we get and in , the details are shown in Table 5. Other general parameters are also summarised in Table 5. All the four compared methods adopt the same to identify the diffusion source.

The parameters set of GLSTM-AE are summarised in Table 6. Meanwhile, we generate the training dataset of GLSTM-AE for each network with the simple method introduced in Section Stage 2. To emphasize the local structure, we set l ∈ [2, 4]. The training dataset size of GLSTM-AE on different networks are shown in Table 7. The training parameters set for GLSTM-AE on different networks are summarised in Table 8. Because the purpose is to identify the diffusion source, we show the accuracy of GLSTM-AE by the results of source identification, which can be found in Figs 411 and Table 10.

thumbnail
Fig 4. The error distance of TRBS, SNF, EPD and NN-DCCF methods on BA model (1).

https://doi.org/10.1371/journal.pone.0285563.g004

thumbnail
Fig 5. The error distance of TRBS, SNF, EPD and NN-DCCF methods on BA model (2).

https://doi.org/10.1371/journal.pone.0285563.g005

thumbnail
Fig 6. The error distance of TRBS, SNF, EPD and NN-DCCF methods on WS model (1).

https://doi.org/10.1371/journal.pone.0285563.g006

thumbnail
Fig 7. The error distance of TRBS, SNF, EPD and NN-DCCF methods on WS model (2).

https://doi.org/10.1371/journal.pone.0285563.g007

thumbnail
Fig 8. The error distance of TRBS, SNF, EPD and NN-DCCF methods on NetworkScience network.

https://doi.org/10.1371/journal.pone.0285563.g008

thumbnail
Fig 9. The error distance of TRBS, SNF, EPD and NN-DCCF methods on Euroroads network.

https://doi.org/10.1371/journal.pone.0285563.g009

thumbnail
Fig 10. The error distance of TRBS, SNF, EPD and NN-DCCF methods on Email network.

https://doi.org/10.1371/journal.pone.0285563.g010

thumbnail
Fig 11. The error distance of TRBS, SNF, EPD and NN-DCCF methods on Blogs network.

https://doi.org/10.1371/journal.pone.0285563.g011

thumbnail
Table 7. The training dataset size of GLSTM-AE and SCNN on different networks.

https://doi.org/10.1371/journal.pone.0285563.t007

thumbnail
Table 8. The training parameters set of GLSTM-AE and SCNN on different networks.

https://doi.org/10.1371/journal.pone.0285563.t008

The parameters set of SCNN are summarised in Table 9. Further, for each network, we generate the training dataset of SCNN by Algorithm 3. The training dataset size of SCNN on different networks are shown in Table 7. The training parameters set for SCNN are summarised in Table 8. In SCNN, we adopt the batch normalization as the normalization layer [19, 20]. Since the purpose is to identify the diffusion source, we validate the performance of SCNN by the results of source identification, which can be found in Figs 411 and Table 10.

thumbnail
Table 10. The average error distance of TRBS, SNF, EPD and NN-DCCF on different networks.

https://doi.org/10.1371/journal.pone.0285563.t010

Experimental results and discussion.

Figs 411 show the error distance of the four methods on different networks. Table 10 shows the average error distance of the four methods. From Figs 4 to 11, we can see that the precisions (i.e. the proportion of 0 error hop) exposed by NN-DCCF on the eight networks are 74%, 83%, 58%, 66%, 22%, 19%, 83% and 49%, respectively. Obviously, except for WS model (2), NN-DCCF exposes the best performance in precision. On WS model (2), the precision of NN-DCCF is only inferior to the TRBS, and superior to other two methods. From Table 10, we know that the NN-DCCF is superior to other three methods in the average error distance on all networks. Therefore, the NN-DCCF is a feasible and effective method in accurately identifying the diffusion source. Additionally, from Table 4, we know that the eight networks are different in their topological properties, which indicates that NN-DCCF could effectively identify the source on different types of networks by simply modifying the training parameters. Therefore, the NN-DCCF is a general source identification framework.

Conclusion

This paper defines the diffusion direction and time information of observers as diffusion characteristics, and develops a NN-DCCF to identify the diffusion source by classifying the diffusion characteristics. Firs, we utilize the diffusion characteristics to construct network snapshot feature. Then, we propose a GLSTM-AE by which the network snapshot feature is represented as low-dimension vectors. Further, we propose a SCNN to identify the diffusion source. By using NN-DCCF, the identification of diffusion source is converted into a classification problem. The feasibility and effectiveness of NN-DCCF are validated by the experimental results on a series of synthetic and real networks. In the future work, we will generalize the NN-DCCF to the case of multi-source.

Supporting information

References

  1. 1. Boccaletti S, Latora V, Moreno Y, Chavez M, Hwang DU. Complex networks: Structure and dynamics. Physics Reports. 2006; 424(4): 175–308.
  2. 2. Chang S, Pierson E, Koh PW, Gerardin J, Redbird B, Grusky D, et al. Mobility network models of COVID-19 explain inequities and inform reopening. Nature. 2021; 589(7840): 82–87. pmid:33171481
  3. 3. Zhu L, Yang F, Guan G, Zhang Z. Modeling the dynamics of rumor diffusion over complex networks. Information Sciences. 2021; 562: 240–258.
  4. 4. Wang Y, Wen S, Xiang Y, Zhou W. Modeling the Propagation of Worms in Networks: A Survey. IEEE Communications Surveys & Tutorials. 2014; 16(2): 942–960.
  5. 5. Jiang J, Sheng W, Shui Y, Yang X, Zhou W. Identifying Propagation Sources in Networks: State-of-the-Art and Comparative Studies. IEEE Communications Surveys & Tutorials. 2017; 19(1): 465–481.
  6. 6. Brockmann D, Helbing D. The hidden geometry of complex, network-driven contagion phenomena. Science. 2013; 342(6164): 1337–1342. pmid:24337289
  7. 7. Wang Y, Zhong L, Du J, Gao J, Wang Q. Identifying the shifting sources to predict the dynamics of COVID-19 in the US. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2022; 32(3): 033104.
  8. 8. Li J, Manitz J, Bertuzzo E, Kolaczyk ED. Sensor-based localization of epidemic sources on human mobility networks. PLoS Computational Biology. 2021; 17(1): e1008545. pmid:33503024
  9. 9. Horn AL, Friedrich H. Locating the source of large-scale outbreaks of foodborne disease. Journal of the Royal Society Interface. 2019; 16(151): 20180624. pmid:30958197
  10. 10. Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE. A survey of deep neural network architectures and their applications. Neurocomputing. 2017; 234: 11–26.
  11. 11. Chamberlain B, Rowbottom J, Gorinova MI, Bronstein M, Webb S, Rossi E. GRAND: Graph Neural Diffusion. Proceedings of the 38th International Conference on Machine Learning. 2021; 139: 1407–1418. Available: http://proceedings.mlr.press/v139/chamberlain21a/chamberlain21a.pdf
  12. 12. Zhang C, Zhao S, Yang Z, Chen Y. A reliable data-driven state-of-health estimation model for lithium-ion batteries in electric vehicles. Frontiers in Energy Research. 2022; 10.
  13. 13. Wu Z, Pan S, Chen F, Long G, Zhang C, Yu PS. A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems. 2021; 32(1): 4–24. pmid:32217482
  14. 14. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. Computational Capabilities of Graph Neural Networks. IEEE Transactions on Neural Networks. 2009; 20(1): 81–102. pmid:19129034
  15. 15. Cui P, Wang X, Pei J, Zhu W. A Survey on Network Embedding. IEEE Transactions on Knowledge and Data Engineering. 2019. 31(5): 833–852.
  16. 16. Zhang D, Yin J, Zhu X, Zhang C. Network Representation Learning: A Survey. IEEE Transactions on Big Data. 2020; 6: 3–28.
  17. 17. Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G. The Graph Neural Network Model. IEEE Transactions on Neural Networks. 2009; 20(1): 61–80. pmid:19068426
  18. 18. Kipf TN, Welling M. Semi-Supervised Classification with Graph Convolutional Networks. International Conference on Learning Representations. 2017.
  19. 19. Li L, Zhou J, Jiang Y, Huang B. Propagation source identification of infectious diseases with graph convolutional networks. Journal of biomedical informatics. 2021; 116: 103720. pmid:33640536
  20. 20. Dong M, Zheng B, Li G, Li C, Zheng K, Zhou X. Wavefront-Based Multiple Rumor Sources Identification by Multi-Task Learning. IEEE Transactions on Emerging Topics in Computational Intelligence. 2022; 6(5): 1068–1078.
  21. 21. Yang F, Yang S, Peng Y, Yao Y, Wang Z, Li H, et al. Locating the propagation source in complex networks with a direction-induced search based Gaussian estimator. Knowledge-Based Systems. 2020; 195: 105674.
  22. 22. Zhu P, Cheng L, Gao C, Wang Z, Li X. Locating Multi-Sources in Social Networks With a Low Infection Rate. IEEE Transactions on Network Science and Engineering. 2022; 9(3): 1853–1865.
  23. 23. Cheng L, Li X, Han Z, Luo T, Ma L, Zhu P. Path-based multi-sources localization in multiplex networks. Chaos, Solitons & Fractals. 2022; 159: 112139.
  24. 24. Shen Z, Cao S, Wang W, Di Z, Stanley HE. Locating the source of diffusion in complex networks by time-reversal backward spreading. Physical Review. E. 2016; 93(3): 032301. pmid:27078360
  25. 25. Tang W, Ji F, Tay WP. Estimating Infection Sources in Networks Using Partial Timestamps. IEEE Transactions on Information Forensics and Security. 2018; 13(12): 3035–3049.
  26. 26. Hu Z. Wang L, Tang C. Locating the source node of diffusion process in cyber-physical networks via minimum observers. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2019; 29(6): 063117. pmid:31266325
  27. 27. Shah D, Zaman T. Rumors in a Network: Who’s the Culprit? IEEE Transactions on Information Theory. 2011; 57(8): 5163–5181.
  28. 28. Luo W, Tay WP, Leng M. Identifying Infection Sources and Regions in Large Networks. IEEE Transactions on Signal Processing. 2013; 61(11): 2850–2865.
  29. 29. Wang Z, Dong W, Zhang W, Tan CW. Rumor Source Detection with Multiple Observations: Fundamental Limits and Algorithms. SIGMETRICS Perform. Eval. Rev. 2014; 42(1): 1–13.
  30. 30. Wang Z, Dong W, Zhang W, Tan CW. Rooting our Rumor Sources in Online Social Networks: The Value of Diversity From Multiple Observations. IEEE Journal of Selected Topics in Signal Processing. 2015; 9(4): 663–677.
  31. 31. Zhu K, Ying L. Information Source Detection in the SIR Model: A Sample-Path-Based Approach. IEEE/ACM Transactions on Networking. 2016; 24(1): 408–421.
  32. 32. Zhu K, Ying L. A Robust Information Source Estimator with Sparse Observations. Computational Social Networks. 2014; 1(1): 1–21.
  33. 33. Luo W, Tay WP, Leng M. How to Identify an Infection Source With Limited Observations. IEEE Journal of Selected Topics in Signal Processing. 2014; 8(4): 586–597.
  34. 34. Jiang J, Wen S, Yu S, Xiang Y, Zhou W. Rumor Source Identification in Social Networks with Time-Varying Topology. IEEE Transactions on Dependable and Secure Computing. 2018; 15(1): 166–179.
  35. 35. Lokhov AY, Mézard M, Ohta H, Zdeborová L. Inferring the origin of an epidemic with a dynamic message-passing algorithm. Physical Review E. 2014; 90(1): 012801. pmid:25122336
  36. 36. Altarelli F, Braunstein A, Dall’Asta L, Lage-Castellanos A, Zecchina R. Bayesian inference of epidemics on networks via belief propagation. Physical Review Letters. 2014; 112(11): 118701. pmid:24702425
  37. 37. Antulov-Fantulin N, Lančić A, Šmuc T, Štefančić H, Šikić M. Identification of Patient Zero in Static and Temporal Networks: Robustness and Limitations. Physical Review Letters. 2015; 114(24): 248701. pmid:26197016
  38. 38. Yang F, Zhang R, Yao Y, Yuan Y. Locating the propagation source on complex networks with Propagation Centrality algorithm. Knowledge-Based Systems. 2016; 100: 112–123.
  39. 39. Zhou J, Jiang Y, Huang B. Source identification of infectious diseases in networks via label ranking. PLoS ONE. 2021; 16(1): e0245344. pmid:33444390
  40. 40. Chai Y, Wang Y, Zhu L. Information Sources Estimation in Time-Varying Networks. IEEE Transactions on Information Forensics and Security. 2021; PP(99): 2621–2636.
  41. 41. Jiang J, Wen S, Yu S, Xiang Y, Zhou W. K-Center: An Approach on the Multi-Source Identification of Information Diffusion. IEEE Transactions on Information Forensics and Security. 2015; 10(12): 2616–2626.
  42. 42. Cai K, Hong X, Lui JCS. Information Spreading Forensics via Sequential Dependent Snapshots. IEEE/ACM Transactions on Networking. 2018; 26(1): 478–491.
  43. 43. Feizi S, Médard M, Quon G, Kellis M, Duffy K. Network Infusion to Infer Information Sources in Networks. IEEE Transactions on Network Science and Engineering. 2019; 6(3): 402–417.
  44. 44. Chang B, Chen E, Zhu F, Liu Q, Xu T, Wang Z. Maximum a Posteriori Estimation for Information Source Detection. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2020; 50(6): 2242–2256.
  45. 45. Caputo JG, Hamdi A, Knippel A. Inverse source problem in a forced network. Inverse Problems. 2019; 35(5): 055006.
  46. 46. Fu L, Shen Z, Wang W, Fan Y, Di Z. Multi-source localization on complex networks with limited observers. EPL. 2016; 113(1): 18006.
  47. 47. Paluch R, Lu X, Suchecki K, Szymański BK, Holyst JA. Fast and accurate detection of spread source in large complex networks, Scientific Reports. 2018; 8(1): 2508. pmid:29410504
  48. 48. Wang H, Sun K. Locating source of heterogeneous propagation model by universal algorithm. Europhysics Letters. 2020; 131(4): 48001. https://dx.doi.org/10.1209/0295-5075/131/48001
  49. 49. Wang H, Zhang F, Sun K. An algorithm for locating propagation source in complex networks. Physics Letters A. 2021; 393: 127184.
  50. 50. Pinto PC, T Patrick, V Martin. Locating the source of diffusion in large-scale networks. Physical Review Letters. 2012; 109(6): 068702. pmid:23006310
  51. 51. Lü L, Chen D, Ren X, Zhang Q, Zhang Y, Zhou T. Vital nodes identification in complex networks. Physics Reports. 2016; 650: 1–63.
  52. 52. Sutskever I, Vinyals O, Le QV. Sequence to Sequence Learning with Neural Networks. Advances in Neural Information Processing Systems. 2014; 27. Available: https://proceedings.neurips.cc/paper/2014/file/a14ac55a4f27472c5d894ec1c3c743d2-Paper.pdf
  53. 53. Srivastava N, Mansimov E, Salakhudinov R. Unsupervised learning of video representations using lstms. International conference on machine learning. 2015. pp. 843–852. Available: http://proceedings.mlr.press/v37/srivastava15.pdf
  54. 54. Dai AM, Le QV. Semi-supervised Sequence Learning. Advances in Neural Information Processing Systems. 2015; 28. Available: https://proceedings.neurips.cc/paper/2015/file/7137debd45ae4d0ab9aa953017286b20-Paper.pdf
  55. 55. Barabasi AL, Albert R. Emergence of Scaling in Random Networks. Science. 1999; 286(5439): 509–512. pmid:10521342
  56. 56. Watts DJ, Strogatz SH. Collective dynamics of’small-world’ networks. Nature. 1998; 393(6684): 440–442. pmid:9623998
  57. 57. Rossi RA, Ahmed NK. The Network Data Repository with Interactive Graph Analytics and Visualization. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015; 29(1): 4292–4293. Available: https://ojs.aaai.org/index.php/AAAI/article/view/9277
  58. 58. Gregory S. Finding overlapping communities using disjoint community detection algorithms. Complex networks. 2009; 207: 47–61.
  59. 59. Newman MEJ. Assortative Mixing in Networks. Physical Review Letters. 2002; 89(20): 208701. pmid:12443515
  60. 60. Yang F, Li X, Xu Y, Liu X, Wang J, Zhang Y, et al. Ranking the spreading influence of nodes in complex networks: An extended weighted degree centrality based on a remaining minimum degree decomposition. Physics Letters A. 2018; 382(34): 2361–2371.
  61. 61. Gajewski Ł, Paluch R, Suchecki K, Sulik A, Szymanski B, Hołyst J. Comparison of observer based methods for source localisation in complex networks. Scientific Reports. 2022; 12: 5079. pmid:35332184
  62. 62. Zhang X, Zhang Y, Lv T, Yin Y. Identification of efficient observers for locating spreading source in complex networks. Physica, A. Statistical mechanics and its applications. 2016; 442: 100–109.