The Impact of the Network Topology on the Viral Prevalence: A Node-Based Approach

This paper addresses the impact of the structure of the viral propagation network on the viral prevalence. For that purpose, a new epidemic model of computer virus, known as the node-based SLBS model, is proposed. Our analysis shows that the maximum eigenvalue of the underlying network is a key factor determining the viral prevalence. Specifically, the value range of the maximum eigenvalue is partitioned into three subintervals: viruses tend to extinction very quickly or approach extinction or persist depending on into which subinterval the maximum eigenvalue of the propagation network falls. Consequently, computer virus can be contained by adjusting the propagation network so that its maximum eigenvalue falls into the desired subinterval.


Introduction
The rapidly popularized Internet has brought us lots of benefits. On the flip side of the coin, computer viruses can propagate their replicates through the Internet much more rapidly than ever before, resulting in great disruptions. Although antivirus software is recognized as the major means of defending against electronic viruses, there is a marked lag from the appearance of a new virus to the availability of its vaccine.
As an important supplement to antivirus techniques, the epidemic dynamics of computer viruses aims to understand the laws governing the spread of malware on networks and, thereby, to work out proper strategies to contain the viral prevalence. Since Kephart and White's seminal work on the compartment modeling of computer viruses in the early 1990s [1,2], a multitude of compartment-based computer virus propagation models, ranging from the SIR models [3] and the SIRS models [4,5] to the SEIRS models [6], have been suggested. Most of these models are suited to infectious diseases and computer viruses equally well. In reality, however, some computer viruses have peculiarities most infectious diseases do not have. As we know, for most infectious diseases, there is a non-ignorable interval from the time when an individual gets infected to the time when it can infect other individuals. As opposed to this, for most computer viruses, one computer can infect other computers as soon as it gets infected. To capture this common feature of most computer viruses, a series of epidemic models of computer virus, named as the SLBS models, were proposed [7,8].
The network through which computers communicate with one another is frequently used to propagate viruses, and it has been recognized that the structure of the network has significant impact on the prevalence of virus [9]. In the early 2000s, it was empirically found that many real-world networks, ranging from the Internet and the World Wide Web to some email networks, are highly structured [10][11][12]. Later, a wave of research on virus epidemic dynamics was initiated, with focus on the propagation of virus on scale-free networks [13][14][15][16][17][18][19].
One common defect of all compartment-based epidemic models is that only partial knowledge on the network topology (the degree distribution or the degree correlation, say) can be used when establishing such models. In sharp contrast to this, when establishing a node-based epidemic model, one can make the best of the complete knowledge on the network topology [20]. As a result, some interesting properties concerning the viral spread, ranging from the mean propagation time and the expected number of infected nodes to the most probable network state, have been found [21][22][23][24][25].
With the aid of a node-based epidemic model, Wang et al. [26] found that whether viruses approach extinction depends heavily on the spectral radius of the underlying network. Next, by studying the N-interwined SIS model, Mieghem et al. [27] found that whether viruses decline toward extinction depends on the maximum eigenvalue of the network. Later, by examining a node-based SIR model, Youssef and Scoglio [28] indicated that the maximum number of infected nodes is closely related to the spectrum of the network. For more information on this topic, see Refs. [29][30][31][32][33][34].
This paper addresses the impact of the network topology on the viral prevalence, provided that a computer can infect other computers as soon as it gets infected. For that purpose, a node-based virus epidemic model, known as the node-based SLBS model, is proposed. After exhaustive research, it is found that the maximum eigenvalue of the underlying network is a key factor determining the viral prevalence. Specifically, the value range of the maximum eigenvalue is partitioned into three subintervals: viruses tend to extinction very quickly or approach extinction or persist depending on into which subinterval the maximum eigenvalue of the network falls. Consequently, computer virus can be contained by adjusting the network topology so that its maximum eigenvalue falls into the desired subinterval. Numerical examples support our results.
The rest of this paper is organized as follows: Preliminary knowledge is presented in Section 2, and the compartment-based SLBS models are briefly reviewed in Section 3. Section 4 describes the node-based SLBS model, Section 5 conducts a comprehensive analysis of this model, Section 6 gives some numerical examples, and Section 7 discusses the potential applications of the proposed model. Finally, Section 8 summarizes this work and presents some topics that are worthy of study.

Preliminaries
In this paper, the underlying network through which viruses propagate is denoted by a simple graph G = (V, E) on N non-isolated nodes numbered 1 through N, where nodes stand for terminal devices of the network, and edges stand for network links through which viruses can propagate. Let A = [a ij ] N × N denote the adjacency matrix of graph G, let {d k ,1 k N} denote the degree sequence of G, and let {λ k ,1 k N} denote the spectrum of A. As A is real and symmetric, we may assume λ max = λ 1 ! λ 2 ! Á Á Á ! λ N .
For the purpose of analyzing the new computer virus epidemic model introduced in the next section, we need the following two lemmas.

Lemma 1 [35]
Consider a smooth dynamical system dxðtÞ dt ¼ f ðxðtÞÞ defined at least in a compact set C. C is positively invariant if for any smooth point y of @C, f(y) is pointing into C.
Lemma 2 [36] Consider an n-dimensional dynamical system Lemma 3 [37] For a graph G with {d k ,1 k N} as the degree sequence, its largest eigenvalue λ max has the following bounds.

A brief review of the compartment-based SLBS models
This section gives a brief review of the previously proposed SLBS models. Under an SLBS model, every node in a network is assumed to be in one of three possible states: susecptible, i.e. uninfected, latent, i.e., infected and with all virues in the node being in their latent phase, and exploding, i.e., infected and with at least one virus in the node being in its exploding phase. For a compartment-based SLBS model, all nodes in a network are grouped into three classes (i.e., compartments) according to their states, and the change in the fraction of each compartment is the focus of study.
The original compartment-based SLBS models were established based on the homogeneously mixed assumption of the propagation network [7,8]. However, most real-world networks, including the world-wide-web and the Internet, have been impirically found to be highly structured rather than simply homogeneously [11]. Therefore, a new compartmentbased SLBS model was later suggested based on the assumption that the propagation network admits a prescribed degree distribution [18].
All of the above mentioned SLBS models suffer from a common defect that it is not possible to make full use of the knowledge concerning the structure of the propagation network. As a result, it is extremely difficult to deeply understand the impact of the network topology on the viral prevalence by solely studying such compartment-based models.

The new computer virus epidemic model
As with the traditional compartment-based SLBS models [8,18], at any time, each and every node in the network is in one of three possible states: susceptible, latent, and exploding. Let X i (t) = 0 (respectively, 1, 2) stands for that node i is susceptible (respectively, latent, exploding) at time t. Then the state of the whole network at time t can be represented by the vector XðtÞ ¼ ½X 1 ðtÞ; X 2 ðtÞ; :::; X N ðtÞ: ) denote the probability of the event that node i is susceptible (respectively, latent, exploding) at time t, Now, let us impose a set of statistical assumptions on the state transitions of a node.
(H1) A susceptible node is infected by a latent (respectively, exploding) neighbor with probability per unit time β 1 (respectively, β 2 ). As a result, when the number of infected nodes is small, a susceptible node i gets infected approximately with average probability per unit time β 1 ∑ j a ij l j (t)+β 2 ∑ j a ij b j (t). As the mission of all the viruses staying in a latent node is to infect other nodes, whereas the mission of all the exploding viruses staying in an exploding node is to destruct the system, we assume β 1 > β 2 .
(H2) Some virus in a latent node breaks out with probability per unit time α.
(H3) A latent (respectively, exploding) node gets cured with probability per unit time γ 1 (respectively γ 2 ). As an exploding node has more chance to be cured than a latent node, we assume γ 2 > γ 1 .
The major task in the subsequent sections is to study the dynamical properties of system (2) (equivalently, system (1)).

Model analysis
Obviously, system (2) always has the origin as an equilibrium. This trivial equilibrium stands for that all viruses in the network die out almost surely. This section is focused on the stability properties of the trivial equilibrium.
First, consider the asymptotic stability of the trivial equilibrium of system (2). For that purpose, let :::; Ng: Let x(t) = (l 1 (t), . . ., l N (t), b 1 (t), . . ., b N (t)) T , and rewrite system (2) in matrix notation as with initial condition We are ready to present a criterion for the asymptotic stability of the trivial equilibrium. Theorem 1 Consider system (2).
(a) The trivial equilibrium is asymptotically stable if λ max < R 0 .
(b) The trivial equilibrium is unstable if λ max > R 0 .
Case 2: β 1 (γ 2 − γ 1 ) 6 ¼ α(β 1 − β 2 ). Then, Àg 2 À ab 2 b 1 is not a root of Eq (5). Thus, This implies that η is a root of Eq (5) if and only if for some k (1 k N), η is a root of equation If λ max < R 0 , we have a k > 0 and b k > 0. So, it follows from the Hurwitz criterion [38] that the two roots of Eq (6) have negative real parts. As a result, all roots of Eq (5) have negative real parts. Hence, the trivial equilibrium is asymptotically stable [38]. Otherwise, if λ max > R 0 , the equation has a root with positive real part. As a result, Eq (5) has a root with positive real part. Hence, the trivial equilibrium is unstable [38]. The proof is complete. Remark 2 This theorem can also be formulated as (a) Second, consider the global stability of the trivial equilibrium of system (2). For that purpose, the following lemma is indispensable.
Lemma 4 The set O is positively invariant for system (2). That is, Proof @O consists of the following 3N hyperplanes: as their respective outer normal vectors. Let x be a smooth point of @O. We distinguish among three possibilities.
Case 1: x i = 0 for some 1 i N. Then, x N+i < 1, and x j > 0 for all j 6 ¼ i. As graph G has no isolated node, we have hBx þ GðxÞ; Case 2: x N+i = 0 for some 1 i N. Then, x i > 0. Thus, hBx þ GðxÞ; n Nþi i ¼ Àax i < 0: Combining the above discussions, we get that Bx + G(x) is pointing into @O. The claimed result then follows from Lemma 1. The proof is complete.
We are ready to present a criterion for the global stability of the trivial equilibrium. Theorem 2 The trivial equilibrium of system (2) is globally asymptotically stable if λ max < R 0 . Proof Look at system (3). As matrix B T is irreducible and its off-diagonal entries are all nonnegative, it follows from [36] that B T has a positive eigenvector z = (z 1 , z 2 , Á Á Á, z 2N ) belonging to its eigenvalue s(B T ). Let r = min i z i (> 0). Then, for all x 2 O, we have Moreover, hG(x), zi = 0 implies that x = 0. In view of Theorem 1 and Lemma 3, the claimed result follows from Lemma 2. The proof is complete.

Remark 3
The global stability of the trivial equilibrium of system (2) implies that, almost surely, the viruses in the network decline toward extinction.
Next, consider the global exponential stability of the trivial equilibrium of system. For that purpose, let Now, let us give a criterion for the global exponential stability of the trivial equilibrium. Theorem 3 The trivial equilibrium of system (2) is globally exponentially stable if λ max < R 1 .
Finally, let us consider what happens if λ max > R 0 . By applying Lemma 2 to system (2) and in view of Theorem 1, we get the following result. Theorem 4 Consider system (2).
Remark 4 This theorem shows that if λ max > R 0 , then, almost surely, viruses in the network persist.
Remark 5 As the largest eigenvalue of a network is an indicator of the structure of the network, Theorems 1-4 clearly reveal the impact of the network topology on the viral prevalence; a network with smaller largest eigenvalue is inclined to contain viruses.
It follows from Theorems 1-4 that it is proper to partition the value range (0, 1) of λ max into three subintervals: I 1 = (0, R 1 ), I 2 = (R 1 , R 0 ), and I 3 = (R 0 , 1). When λ max 2 I 1 , viruses in the network tends to extinction almost surely, at an exponential speed. When λ max 2 I 2 , viruses in the network declines toward annihilation almost surely. When λ max 2 I 3 , viruses in the network persist.

Numerical examples
In this section, we shall verify the results obtained in the previous section using numerical examples. For that purpose, let p(t) denote the percentage of infected nodes in all nodes at time t, pðtÞ ¼ 1 Example 1 Consider the node-based SLBS model, and take a complete graph on 100 nodes as the viral propagation network. Then, λ max = 99.
1. Suppose β 1 = 0.01, β 2 = 0.006, γ 1 = 0.2, γ 2 = 0.3, and α = 0.1. As λ max 2 I 1 , Theorem 3 predicts that p(t) ! 0 at an exponential speed. Fig 3(1) shows the trend of p(t) provided (a) the hub is initially latent, and the remaining 99 nodes are initially susceptible, or (b) one leaf node is initially latent, and the remaining 99 nodes are initially susceptible. It can be seen that viruses tend to extinction very quickly, coinciding with the prediction.
Example 3 Consider the node-based SLBS model, and take an Erdos-Renyi graph on 50 nodes, which is produced randomly with connection probability 0.2, as the viral propagation network. Numerical calculation gives λ max = 10.19.
3. Suppose β 1 = 0.05, β 2 = 0.03, γ 1 = 0.2, γ 2 = 0.4, and α = 0.2. As λ max 2 I 3 , Theorem 4 predicts that p(t) ↛ 0. Fig 4(3) shows the trend of p(t) provided there are initially 10 latent nodes and 40 susceptible nodes. It can be seen that viruses persist, in agreement with the prediction. In a word, the above given numerical examples are all in perfect agreement with the theoretical results.

Further discussions
It can be seen from the main results in Section 5 that an effective approach to the containment of electronic virus is to adjust the system parameters so that R 0 or R 1 is large enough. Simple calculations yield @R 0 @b 1 < 0; @R 0 @b 2 < 0; @R 0 @g 1 > 0; @R 0 @g 2 > 0; @R 1 @a > 0; As a result, the following practical measures are strongly recommended.
• Install and timely update antivirus software on computers, so as to reduce the two cure rates of infected computers.
• Filter and block suspicious messages with firewall located at the gateway of a domain, so as to lower the two infecting rates of susceptible computers.
On the other hand, it benefits the inhibition of virus to adjust the structure of the propagation network so that its maximum eigenvalue is small enough. As there is no closed-form formula for the maximum eigenvalue of a general adjacency matrix, it is difficult to verify this condition. To circumvent this difficulty, let us present an easily verified condition for the final extinction of virus as follows. This theorem suggests that simultaneously reducing the number of links and the maximum node degree in a network should contribute to the annihilation of virus.

Conclusions and remarks
To understand the way that the spread of virus on a network is affected by the structure of the network, a new epidemic model of computer virus has been proposed. The model analysis reveals that the maximum eigenvalue of the network is a key factor determining the viral prevalence; viruses tend to extinction very quickly or approach extinction or persist depending on where the maximum eigenvalue of the network lies. As a result, viruses can be contained by properly adjusting the structure of the propagation network.
Towards this direction, lots of work has yet to be done. For instance, our model assumes that all computers have the same infection rate, the same bursting rate, and the same curing rate. In reality, however, these rates vary from computer to computer. Hence, our model should be generalized so that different nodes have different infection rates, different bursting rates, and different curing rates. Additionally, that computers are likely to be infected by removable storage media [39] may lead to the emergence of a non-trivial steady state. In this situation, it makes sense to suppress the fraction of the infected nodes. Third, the immunization strategy we adopt also has significant impact on the viral prevalence. To a certain extent, the static immunization problem reduces to that of assigning different curing rates to different nodes so that the best virus containment effect is achieved, given that the sum of curing rates of all nodes is fixed [33,40], while the dynamic immunization problem can be solved by use of the optimal control theory [41]. Last, but not least, the methodology developed in this work can be applied to the situation of infectious diseases [42][43][44][45].