Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

A Combinatorial Model of Malware Diffusion via Bluetooth Connections

A Combinatorial Model of Malware Diffusion via Bluetooth Connections

  • Stefano Merler, 
  • Giuseppe Jurman
PLOS
x

Abstract

We outline here the mathematical expression of a diffusion model for cellphones malware transmitted through Bluetooth channels. In particular, we provide the deterministic formula underlying the proposed infection model, in its equivalent recursive (simple but computationally heavy) and closed form (more complex but efficiently computable) expression.

Introduction

The spreading of malware, i.e., malicious self-replicating codes, has rapidly grown in the last few years, becoming a substantial threat to the wireless devices, and mobile (smart)phones represent nowadays the most appetible present and future target. Papers studying the problem from both theoretical and technical points of view already appeared in literature since 2005 [1][9], and nowadays a number of different approaches to modeling the virus diffusion are already available to the community. With the present work we want to contribute to this topic by proposing a more accurate model for the spread of a malware through the Bluetooth channel, providing both a recursive and a combinatorial equivalent deterministic formulation of the described solution.

The Model

The dynamics of the proposed model is the following: at a certain time , a number of infected mobiles come in contact with a number of clean (non-infected) cellphones ; hereafter we will denote this configuration as .

All telephones are in the Bluetooth transmission range of each other and they all have their Bluetooth device on. Each infected mobile tries to establish a connection with another device, clearly not knowing whether it is trying to pair to a clean or to an infected phone. All these connections are established instantaneously at time . However, for the sake of simplicity we assume that the infected mobiles establish connections following a given sequence, starting from down to . In other words, is the first to try to establish a connection, is the last one. Moreover, each connection is chosen uniformly at random among all possible available choices. Connections between infected and clean mobiles deterministically result in infection transmission: when a clean mobile gets paired to an infected one, it becomes infected. All these events occur in the time interval , where is the minimal time allowing all infected mobiles to establish a connection and eventually transmit the virus: in practice, it may be considered of the order of a few tens of seconds. We assume that in this time interval clean cellphones do not try to establish any connections, e.g., for non-malware purposes. We also assume that in this time interval no other mobile enters the Bluetooth transmission range of the mobiles and, when a connection between two mobiles is established, the two mobiles remain connected for the whole time interval. Basically, we are assuming that the initial configuration is given and it does not change in the time interval . Note that, given the definition of , new infections do not result in configuration changes in the time interval .

All the aforementioned assumptions are reasonably realistic, due to the very short time-scale considered.

The task here is to discover the probability that, in this situation, a given clean mobile gets paired to an infected one, and thus it becomes itself infected.

Summarizing, the setup and the constraints of the model are the following:

Setup

infected mobiles and clean mobiles are in a room (i.e., in the Bluetooth transmission range of each other).

Dynamics

Starting from down to , each infected mobile tries to connect with a yet unconnected device, regardless of whether it is infected or not.

Constraint #1

Since the connection channel is Bluetooth, once a connection between two mobiles is established, these two devices become unavailable to further connection, or, in other words, each device can have at most one connection to another cellphone.

Constraint #2

For each , when it is 's turn to choose, must connect to one of the still available devices, if any.

Let us consider the generic configuration with unpaired infected mobiles and unpaired clean mobiles . According to the setup, the first mobile establishing a connection is . In Fig. 1 a possible evolution is displayed starting from an initial configuration with infected and clean mobiles, together with an explanatory description of the occuring dynamics.

thumbnail
Figure 1. An example of model dynamics starting from the initial configuration (7,5).

In red, the pairing that it is established at each step. (a) At time , infected mobile phones and clean mobiles are all within their mutual Bluetooth connection range. (b) chooses a mobile among ; it chooses establishing connection . (c) Now it 's turn to choose, and and are not available anymore for pairing (marked by a grey circle ○ ). (d) connects to through pairing . (e) The two mobiles and become unavailable for pairing, too and the next infected mobile in line pairs to via . (f) Only and remain available for pairing with , which chooses (connection ). (g) Now the last mobile must connect to the remaining unpaired clean phones : it chooses creating pairing . (h) There are no more unpaired infected mobiles: the process ends at time .

https://doi.org/10.1371/journal.pone.0059468.g001

Due to the described dynamics, all the infected mobiles succeed in paring, with the exception of at most one , which can remain unpaired if there are no more available mobiles. This case can only happen when there are more infected mobiles than clean ones, their sum is odd and all the clean mobiles get paired:(†)where is the number of pairings between two infected mobiles. Henceforth, the last choosing infected mobile cannot find any available device to pair to. In what follows, we will refer to this case as the case ; an example of this situation in the initial configuration is shown in Fig. 2.

thumbnail
Figure 2. An example of the situation.

Starting from the initial configuration , infects the clean mobile , pairs to , infects and, finally, pairs to . Here the process ends, because there are no more mobiles available for pairing to which remains unconnected.

https://doi.org/10.1371/journal.pone.0059468.g002

The model is completely described by computing the probability that a certain clean mobile, for instance , gets infected in the time interval .

Although could be stochastically approximated by running repeated simulations, in the following Sections we will derive two equivalent exact (deterministic) formulæ for in the aforementioned setup. The former is a simple recursive expression, which follows straightforwardly from the model dynamics, while the latter is its corresponding closed form (thus with no recursion involved), which has a more complex expression and it heavily relies on combinatorics. Other than their alternative mathematical nature, the two formulæ show different behaviours also from a computational point of view, as discussed in a dedicated Section.

The Recursive Formula

Recursively, the probability of a given susceptible mobile to get infected starting from a given initial configuration can be written by the following expression:(1)where the trivial conditions , and initialize the recursion, thus covering all possible cases.

Since all clean mobiles share the same probability of getting infected, without loss of generality we may assume . The three terms , , and contributing to the general case of come from the three mutually exclusive cases which can occur starting from the initial configuration :

  1. establishes a pairing with . In this case gets infected and this event occurs with probability .
  2. establishes a pairing with one of the other clean mobiles . This event occurs with probability and of course does not get infected by . However, may be infected later by the remaining available infected phones (with only clean mobiles still available, because one clean mobile has been infected by ), thus falling back to a configuration.
  3. establishes a pairing with one of the other unpaired infected mobiles . This event occurs with probability and of course does not get infected by . However, similarly to the previous situation, may be infected later by the remaining unpaired infected phones, thus falling back to a configuration.

A worked out example illustrating the construction of Eq. 1 is shown in Fig. 3. The formula in Eq. 1 for relies on a recursive equation of second order with non constant coefficients, for which no general method is known to derive the corresponding non-recursive (closed) expression. Moreover, as detailed in a later Section, calculating by using Eq. 1 is computationally heavy. However, we will obtain the equivalent time-saving closed form solution in the next Section using combinatorial arguments.

thumbnail
Figure 3. Construction of the general case of the recursive formula Eq. 1.

Starting from the initial configuration , we want to compute the probability that a clean mobile ( without loss of generality) gets infected in the proposed model. At time , the first infected mobile tries to establish a pairing, and only one of the three following alternatives can occur. In green, the case when immediately infects (with probability ) and we are done. In blue, the case when pairs to one of the remaining another infected mobiles with probability ; then and becomes unavailable for pairing with the following choosing mobile , and we are moved into the case of computing the probability that gets infected when there are unlinked infected mobiles and clean ones, i.e., . Finally, in orange, the case when pairs to one of the other clean mobiles (with ) with probability ; then and becomes unavailable for pairing with the following choosing mobile , and we are moved into the case of computing the probability that gets infected when there are unlinked infected mobiles and unlinked clean ones, i.e., . The general case is obtained by summing the contributions of all three alternative cases described above.

https://doi.org/10.1371/journal.pone.0059468.g003

The Combinatorial Formula

To construct the explicit formula equivalent to Eq. 1, we need to employ a few combinatorial considerations. The key observation is that we can count all wirings (lists of pairings) that can occur at the end of the pairing process. Clearly, the fact that there is an order in setting up the connections between the mobiles heavily influences the probability that a given wiring can occur: in particular, this probability depends on the number of pairings between infected mobiles (bb-pairings, for short). As background material, we recall some definitions and results from combinatorics in the box in Fig. 4, together with the two following functions:

thumbnail
Figure 4. Basic definitions, examples and facts on dispositions, combinations and permutations.

https://doi.org/10.1371/journal.pone.0059468.g004

  • the Heaviside step function

  • the Kronecker delta function

As an example, the following indicator function can be written in the two equivalent formulations:where is the Euclidean remainder function, so is zero for even and one for odd .

Suppose now we are starting from an initial configuration ; then define the following quantities:

  • : the minimum number of bb-pairings in a wiring;
  • : the probability that a wiring with exactly bb-pairings occurs;
  • : the number of all possible ways to select bb-pairings;
  • : the number of all possible wirings with a given list of bb-pairings when a (generic) clean mobile gets paired;
  • : the number of all possible wirings with a given list of bb-pairings and where the clean mobile is paired;
  • : the number of all possible wirings with bb-pairings when a (generic) clean mobile gets paired;
  • : the number of all possible wirings with bb-pairings where the clean mobile is paired;
  • : in the case, with , the number of possible wirings with unpaired, for .

In the above notations, the (non recursive) closed form expression equivalent to Eq. 1 for the probability of a given susceptible mobile to get infected in a given initial configuration can be written as follows:

(2)

Eq. 2 has its roots on the following counting argument: the probability that a given clean mobile gets infected is the sum over all admissible values of of all possible wirings with bb-pairings weighted by the probability that a wiring with exactly bb-pairings occurs:(3)where is the minimum number of -pairings that can be established in an initial configuration .

The rationale of summing over the number of -pairings to compute relies on the observation that the probability of of getting infected depends on the number of available infected mobiles that will pair with clean mobiles, that is exactly the number of infected mobiles which are not already paired to another infected mobile, i.e., that are not involved in a bb-pairing.

In particular, the three terms between brackets in Eq. 2 match respectively the three factors in Eq. 3, while the term between double brackets ( to enhance readability) corresponds to .

In what follows we will show that the expansion of the right-hand member of Eq. 3 coincides with Eq 2. The expansions of all terms will be carried out first by separately considering all occurring cases, and then providing an unique closed form formula (without conditional expressions) by using the Heaviside step and the Kronecker delta functions.

Lemma 1

Given an initial configuration , the minimum number of bb-pairings in a wiring is the following:while the maximum number is .

In fact, while when it is possible not to have any bb-pairing, when they cannot be less than or respectively when is even or odd. This is due to the constraint #1 imposing that an infected mobile must connect to another device whenever available, when it is its turn to choose.

Lemma 2

Given a configuration, the probability that a wiring with exactly bb-pairings between two infected mobiles occurs is the following:

In fact, when there are bb-pairings in the admissible range, all possible wirings depend on the choice of infected devices and clean devices , i.e. elements from the original sets of . The first element has probability to be chosen, the second , the third and so on.

Lemma 3

Given an initial configuration in the case with , then the number of possible wirings with unpaired, for , is:

The idea is that all the infected mobiles must be part of a bb-pairing, so they must be connected to one of the . Once they have been chosen, the remaining bb-pairings must be selected among the mobiles that are yet unpaired. Both considerations can be exploited in terms of combinations using the definitions and the properties of Fig. 4.

Lemma 4

In the configuration, the number of all possible ways to select bb-pairings is:

Apart from the case, selecting bb-pairings is equivalent to consecutively choosing unordered pairs from the original set of infected mobiles. The first pair can be chosen in ways, the second pair in and so on. The division by is motivated by the fact that the particular ordering in which the pairs are chosen is irrelevant: the list is undistinguishable from the list . The number of these different ordering is precisely by definition of permutations. In the case, if there is only one way to choose bb-pairings, while if the unpaired infected mobile can only be , so from we have to subtract the case where the only bb-pairing involves , which is impossible. Finally, in the case with the unpaired infected mobile can be any with , and the total number of cases (which coincides with the number of cases where is selected, since all the clean mobiles are connected in these situations) is the sum of all cases with .

Lemma 5

In the configuration, with bb-pairings, the number of all possible cases when a particular is chosen is:

The result follows immediately from the cardinality equations in Fig. 4, in particular from the fact that among all combinations of objects in groups of elements, a particular element is selected exactly times. When is even and we follow the convention for . In case , since all the non infected mobiles are selected, the possible ways to select them are exactly their permutations.

This completes the expansion of Eq. 3 into Eq. 2.

Equivalence between the recursive and the closed formula can be proven by showing that Eq. 2 satisfies the recursive relations of Eq. 1. The analytical proof of the equivalence involves working out a large number of cumbersome identities of binomial coefficients and factorials: in the last Section, we will briefly outline a sketch of the proof in the simple case . Numerically, the differences between the two formulæ are below machine precision for .

We conclude the Section with the observation that the sum of the total number of cases weighted by their corresponding probabilities adds up correctly to one:because of the following counting lemma.

Lemma 6

In the configuration with bb-pairings, the number of all possible ways to select the remaining clean mobiles for pairing is:

Apart from the case, when there are bb-pairings, infected mobiles remain to be connected with clean devices. This is equivalent to compute the number of possible sets of elements from an initial set of clean mobiles: since here the ordering matters, this is the definition of dispositions (see Fig. 4) of elements from an original set of .

Note that, since in the case all the clean mobiles are selected, the two quantities and coincide.

Analytical and Computational Notes

Although defined only for positive integer values of and , it is possible to provide a graphical sketch of the shape of the function by linear interpolation on the non integer real values. In Fig. 5 we show both the tridimensional surface of and its corresponding contourplot for values of and ranging between 1 and 100. Asymptotically, the function converges to the following limits:(4)

thumbnail
Figure 5. Tridimensional surface (a) and corresponding levelplot (b) of for , linearly interpolated on the real non integer values.

https://doi.org/10.1371/journal.pone.0059468.g005

Graphical examples of the behaviour stated in Eq. 4 are provided in Fig. 6, where a few curves of are plotted when one of the two parameters is kept constant (and equal to 10, 50, 100) and the other ranges between 0 and 100, together with the curve corresponding to for . When one of the two parameter is equal to a constant , the smaller is , the faster converges to the limits in Eq. 4.

thumbnail
Figure 6. Plot of curves of for different configurations .

In blue, we show three curves of for constant ( solid line, dashed line and dotted line) and ranging from 0 to 100. All three curves approach the asymptotic value 0 for increasing , more rapidly for smaller values of . In black, we show the symmetric cases obtained keeping constant ( solid line, dashed line and dotted line) and letting range from 0 to 100. Again, all three curves approach the asymptotic value 1 for increasing , more rapidly for smaller values of . The sawtooth shape of the curve for is due to the effect of the case, which induces abrupt differences in for consecutive values of (changing from even to odd). Finally, the dotted-dashed red line shows the curve of for ranging between 0 and 100: in this case, the curve gets very close to its asymptotic value 0.5 even with small values of ; for instance, and .

https://doi.org/10.1371/journal.pone.0059468.g006

Apart from its intrinsic theoretical relevance, the non recursive closed formula is essential for numerically compute . In fact, the computational cost is notably different by using either the recursive formula Eq. 3 or its closed form counterpart Eq. 2: namely, the explicit formula is much faster, as shown by the values reported in Table 1 and the curves plotted in Fig. 7. For the recursive formula the computing time shows an exponentially growing trends for increasing values of and , while for the non recursive formula the computing time is very small and minimally growing for and ranging between 0 and 100. Actually, the average time over 10 values using a Python implementation of the non recursive formula on a 24 core Intel Xeon E5649 CPU 2.53GHz Linux workstation with 47 GB RAM is 11 milliseconds for and 60 milliseconds for , with very limited standard deviation. On the same hardware, a Python implementation of the recursive formula took about 12 milliseconds for , 2.4 seconds for , 6 minutes for and more than 9 hours for , which was the largest tested value.

thumbnail
Figure 7. Plot of the computing times (in scale) needed to compute for different values of as listed in Table 1.

Error bars range between minimum and maximum, while lines connect mean values; all values refer to 10 replicates. Solid line represents computing times obtained by using the recursive formula Eq. 1, while dotted line corresponds to the values produced by using the closed formula Eq. 2.

https://doi.org/10.1371/journal.pone.0059468.g007

thumbnail
Table 1. Computing times (in seconds) required to compute by the recursive formula in Eq. 1 and the equivalent closed formula in Eq. 2, for different values of the number of infected (I) and susceptible (S).

https://doi.org/10.1371/journal.pone.0059468.t001

Proof of Equivalence in the Case

In this Section we show the kind of arguments involved in proving the equivalence between Eq. 1 and Eq. 2 by outlining the main steps of the proof in a simple case, i.e., when there as many infected as clean mobiles, and their numnber is even. Clearly, the general case is computationally far more complex, but it used the same ideas.

Proving the equivalence between the recursive and the combinatorial formula requires substituting the explicit expression for of Eq. 2 in its three occurrences in Eq. 1. We are assuming , thus in this case the identity we need to prove reads as follows:or, equivalently:

(5)The expression for becomes:where the upper bound is since the right-hand member vanishes for and the product symbols were eliminated by using the factorial and double factorial notations:

.

Analogously, the expansions for and become respectively:

Then the left-hand member of Eq. 5 reads as follows:which, collecting common factors, reduces to:

Now, expanding the double factorial by the identity:and carrying the terms not involving outside the summation symbol, the above quantity becomes:

(6)Now, applying the following identityto Eq. 6 with , we obtain that

as claimed.

Acknowledgments

The authors thank two anonymous referees for their precious suggestions and notes, which helped in greatly improving the paper.

Author Contributions

Conceived and designed the experiments: SM GJ. Performed the experiments: SM GJ. Analyzed the data: SM GJ. Contributed reagents/materials/analysis tools: SM GJ. Wrote the paper: SM GJ.

References

  1. 1. Wang P, González M, Hidalgo C, Barabási AL (2009) Understanding the spreading patterns of mobile phones viruses. Science 324: 1071–1076.
  2. 2. Nguyen HN, Shinoda Y (2009) A novel analytical framework to model malware diffusion in heterogeneous wireless networks. In: Proceedings of IEEE WoWMoM 09. IEEE, 1–10.
  3. 3. Zanero S (2009) Wireless Malware Propagation: A Reality Check. IEEE Security & Privacy 7: 70–74.
  4. 4. Guanhua Y, Eidenbenz S (2009) Modeling Propagation Dynamics of Bluetooth Worms (Extended Version). IEEE Transactions on Mobile Computing 8: 353–368.
  5. 5. Shuai F, Chang-Guang W, Li-Jing B, Qing-Yang H, Jian-Feng M (2009) Modeling and Analysis Malware Spread in Short-Range Wireless Networks. In: Proceedings of WiCom 09. IEEE, 1–4.
  6. 6. Khouzani M (2011) Optimal control of mobile malware epidemics. Ph.D. thesis, University of Pennsylvania.
  7. 7. Ghallali M, El Ouadghiri D, Essaaidi M, Boulmalf M (2011) Mobile phones security: the spread of malware via MMS and Bluetooth, prevention methods. In: Proceedings of MoMM 11. ACM, 256–259.
  8. 8. La Polla M, Martinelli F, Sgandurra D (2012) A Survey on Security for Mobile Devices. IEEE Communications Surveys & Tutorials PP: 1–26.
  9. 9. Ghallali M, El Ouahidi B (2012) Designing a new framework in order to limit the spread of malware in mobile phones. Int J EnCoTe 1: 1–8.