The authors have declared that no competing interests exist.
Conceived and designed the experiments: KF RE MC. Performed the experiments: KF RE. Analyzed the data: KF RE MC. Wrote the paper: KF RE MC.
Traditional contact tracing relies on knowledge of the interpersonal network of physical interactions, where contagious outbreaks propagate. However, due to privacy constraints and noisy data assimilation, this network is generally difficult to reconstruct accurately. Communication traces obtained by mobile phones are known to be good proxies for the physical interaction network, and they may provide a valuable tool for contact tracing. Motivated by this assumption, we propose a model for contact tracing, where an infection is spreading in the physical interpersonal network, which can never be fully recovered; and contact tracing is occurring in a communication network which acts as a proxy for the first. We apply this dual model to a dataset covering 72 students over a 9 month period, for which both the physical interactions as well as the mobile communication traces are known. Our results suggest that a wide range of contact tracing strategies may significantly reduce the final size of the epidemic, by mainly affecting its peak of incidence. However, we find that for low overlap between the face-to-face and communication interaction network, contact tracing is only efficient at the beginning of the outbreak, due to rapidly increasing costs as the epidemic evolves. Overall, contact tracing via mobile phone communication traces may be a viable option to arrest contagious outbreaks.
There is great potential to deepen our understanding of disease dynamics through the analysis of digital traces of individual and collective behaviour
We already have some examples in the digital epidemiology direction which use large-scale digital traces for simulation. For instance, a large-scale sociotechnological network based on Facebook data was used to study the role of community structure in disease dynamics
While these previously investigated sources of digital sensing (Facebook and CPIs from wearable badges) are advantageous in that they capture large scale interactions in a continuous manner giving a more complete estimate of human interactions in reality, they also present some limitations. Online social networks represent online social behaviours which differ from physical proximity interactions whereby disease transmission occurs and may fail to capture the fine-grained, face-to-face interaction dynamics relevant for disease transmission
In this regard, mobile phones provide a promising resource as they are ubiquitously carried by the population, irrespectively of socio-economic status, and provide a much larger-scale, data-driven opportunity for epidemiology. Further, mobile phones are carried by people when they travel overseas, potentially serving as a global physical proximity sensor. Its pervasiveness in countries under development, where pandemic prevention is most critical, makes then a viable option
Our present effort focuses on exploiting these phone communication and interaction traces for epidemic simulation and contact tracing
We develop a model where the infection takes place over the close-proximity physical network (which can never be fully recovered in reality), and assume contact tracing occurs on a differing network, in this case a communication (phone, sms) inferred one. We explore the contact tracing model proposed in detail, particularly focusing on tracing efforts on noisy networks, representing a perturbed subset of the ideal network. Finally, we simulate our proposed model over the real mobile phone interaction data dynamics, demonstrating mobile phone interactions are a promising tool for large-scale epidemic simulations, and mobile phone communication logs can be used as a concrete source for contact tracing reducing the effects of an epidemic. Just as optimizing immunization strategies is of great interest if only incomplete immunization is possible
We consider a population of
population size | |
a small interval of time. | |
constant determining infection rate. | |
the ideal network in which the epidemic is actually spreading. | |
the number of infected neighbours of node |
|
constant determining random tracing rate. | |
constant determining contact tracing rate. | |
the dual network which is used for contact tracing. | |
the number of traced neighbours of node |
|
tracing-policy constant controlling the fading time for contact tracing. |
Initially, the whole population is susceptible to infection. One node is subsequently randomly infected, which then starts to infect its neighbours and may initiate an outbreak. The probability that a susceptible node becomes infected is given by
We assume there is no spontaneous recovery, and individuals becomes traced for a certain period of time after which recovery takes place and the individual becomes removed. There are two types of tracing efforts to identify infected individuals, random checking and contact tracing. Random checking is done by choosing an arbitrary node with probability
Traced individuals are transformed into the removed state, or recovered state, and are unable to become infected again. A node can recover from the traced state with a probability given by
The contact tracing model can be summarized by the following equations.
We first study a dual network topology which accounts for differing edge formations between the infection and tracing networks. Given the contact tracing model defined by
We define the network of physical interactions as
Next, we propose a formal method for obtaining
We define below the process by which we generate the dual network from the ideal network. By removing a portion of the actual ties we simulate a scenario in which the communication traces are only capturing a subset of the actual links. By adding new ties, we simulate the case where communication traces provide dyadic interactions that do not happen in the real world, only in the digital realm. One important measure for our study is the overlap between the two networks, which corresponds to the proportion of links that are present in both networks. The dual network topology is generated as follows:
First generate the physical proximity network,
Generate the proximity network of
Generate N*K unique links, where
Second, generate the dual network,
Remove
Add
In the reported experiments we used
Average maximum and total number of infected people for a network overlap
Note that
These equations can be re-derived with the help of
Illustration of the overlap in terms of links between the ideal network and the dual network depending on
We present the dataset that motivates our dual model, and whose parameters, network structure, and dynamics is used in the rest of this paper. The participants in the study represent
We consider interaction data logged by the mobile phones. Bluetooth sensors monitored the physical proximity interaction. Other non-physical interactions were monitored by phone communication logs including phone calls and SMS activity. We only consider phone communication and proximity interaction with other study participants (known devices to the study). The data has been previously studied in the framework of real-life health and obesity diffusion
For each of the mobile phone proximity interaction (Bluetooth) and communication (call and SMS) events sensed, we consider the number of events (regardless of their duration), including missed calls. Users correspond to nodes, and undirected edges to interactions. The edges are weighed by the number of events. By considering the number of events, we can readily combine the two types of phone communication logs (calls and SMS). By considering undirected interactions, the proximity interactions can be easily compared to the communication data since phone communication is directed but Bluetooth is undirected. The data is therefore symmetrized, and we assume undirected links. The static average daily networks for the phone communication and physical proximity interactions can be seen in
The static networks obtained by the overall average number of daily mobile phone (a) communication (call and sms) and (b) physical proximity interactions.
Next, we consider the overlap between the real physical proximity and communication networks (
(a) Distribution of % overlap between the overall communication and Bluetooth networks on a log-log scale. (b) Monthly variations in the % overlap between the communication and Bluetooth networks averaged over all users.
In
First we simulate the various network configurations to compare the spread of infection characteristics over the full range of the overlap parameter
(a) The maximum number of infected individuals (representing the peak of the epidemic), (b) its time of occurrence, and (c)-(d) the overall number of infected individuals on log and non-log scales, respectively; all plotted as a function of
The infected population plot as a function of time for (a)
However, plots from
We are showing, in
The time-varying nature of the epidemic can be seen in
We observe the changing effects of the time-varying spread over
In
Our results have shown that even with very small overlap between the two networks, contact tracing was still effective in limiting the peak size of the epidemic. With low overlap this behaviour might be surprising. It is actually explained by a simple fact: when using contact tracing, an increase in the number of infected people causes an increase in the tracing effort. This adaptation phenomenon is not present when only random tracing is used. We aim here at quantifying whether it is still worth doing contact tracing with a relatively small network overlap or if increasing random tracing is preferable.
We measure the tracing effort defined as the sum of the effort due to random tracing and the effort due to contact tracing:
What the formula encodes is that the random tracing (with intensity
In
Only the last curve considers the case with complete network overlap (
Comparing the solid green curve (
In general, contact tracing does not require a great effort at the beginning of the outbreak, but rapidly becomes costly when the epidemic evolves. However, it is effective in reducing the size of an epidemic with low network overlap, as is random tracing alone. An optimal solution to consider in future work may be to consider varying the random and contact tracing efforts over time to optimize costs as the epidemic evolves. A tracing policy including contact tracing allows to both adapt tracing to the number of infected people and exploit the known information about people’s interaction. Such policies have the potential of reducing the constant efforts required by random tracing and considering the use of contact tracing at particular intervals while containing an epidemic outbreak with minimal cost.
We observed that one benefit of contact tracing over pure random tracing is that it adapts the tracing effort to the number of detected infections and thus has a varying effort (and cost) over time. To further explore the role of contact tracing, we consider a setup where we assume a fixed amount of tracing effort is available. In such a case, we expect and observe that contact tracing with a low overlap is not advantageous.
In the simulation, we allow a fixed tracing budget (400 units). We allocate a fixed part of this budget to random tracing, the rest goes to contact tracing. In practice, we continuously adapt the
Simulations consider a network overlap of
In this section, we consider experiments on the real data. We apply the dual contact tracing model on the full empirical interaction and communication dataset obtained by mobile sensing. While the physical interactions obtained by Bluetooth are not a complete picture of the interaction history, they do represent a large portion of the interactions (subjects were explicitly asked to leave their Bluetooth on all the time). We consider two timescales over which the real interaction data is aggregated, daily and weekly. The two timescales are chosen to consider the time-specific nature of real data in our evaluation and to simulate the dynamics from real data considering two easily interpretable timescales. The results referred to as practice (as opposed to theory), are simulated only considering the empirical data (daily and weekly); the real interaction events occurring within the community are used to model the dynamics of the parameters
Only the real physical proximity interactions are used to obtain
Bluetooth physical proximity is used for
First, we evaluate the difference between using the physical proximity data in an ideal network scenario in comparison to the theoretical case by comparing the model outputs on this community of
Considering the ideal network scenario, we run the simulated contact tracing model with
After making a comparison of the theoretical case with the data-driven case, whereby only the physical proximity network is considered in simulating infectious spread, we evaluate the proposed dual network methodology entirely on the real dataset. First we consider the single network case (with
In
We explore a data-driven avenue for contact tracing in epidemic prevention using social interaction data from mobile phones. A medium-sized real communitys data is considered to get insight into the relationship between physical interactions and mobile phone communication, and whether the latter can be exploited to perform contact tracing on the former. We explore the effectivity of such a strategy using data-driven simulations with realistic parameters extracted from the social network dataset, first, and then the full dual realistic network model of physical and communication interactions. Across multiple realistic scenarios for contact tracing, we find that contact tracing is an effective means for epidemic prevention, even when there exists a low overlap between the physical and communication networks. When considering tracing effort, we observe that contact tracing is greatly beneficial when the epidemic is starting, however, this effort will increase greatly as the epidemic grows. With low overlap between the physical and communication networks, we find that this effect is mainly due to the automatic adaptation of the tracing effort to the amount of infected people. We also uncover the relationship between the network overlap and the proportion of effort spent in random tracing versus contact tracing. The study thus gives insight into what proportion of the effort should be spent in contact tracing depending on the estimated network overlap (how much we trust the communication network represents the interaction network). While contact tracing is effective in reducing the number of infected cases, a dynamic approach considering a time-evolving combination of random and contact tracing is most promising, and optimization of costs as a function of varying random and contact tracing efforts over time will be considered in future work. We are also able to uncover the nonlinear relationship between overlap (between physical and communication networks) and contact tracing effort. This is important, as different communication technologies, present and future, are likely to have a different link to physical interactions. Quantifying how the overlap interacts with the tracing effort can inform public health policies aiming to exploit digital communication traces for epidemiology. Overall, we find interactions sensed by mobile phones to be a promising tool for epidemic simulation, particularly for future large-scale scenarios, for example city-scale infectious disease transmission. This work demonstrates mobile phone communication history to be a useful data source in disease prevention by obtaining contact information readily for epidemic contact tracing.
(EPS)
(EPS)
(EPS)
(EPS)
(PDF)
We would like to thank Anmol Madan and Alex Pentland for collecting and providing the empirical dataset, as well as Juliette Stehlé for help with the network simulation procedure.