Ensemble inference of unobserved infections in networks using partial observations

doi:10.1371/journal.pcbi.1011355

Fig 1.

An illustration of the ensemble inference algorithm.

(A). Model state at time t and observations after time t. Two infected individuals are observed after time t. The red color represents the probability of infection. We use an ensemble of model states to represent the distribution of transmission rates and the distribution of infection probabilities. (B). We use Bayes’ rule and master equations to propagate information backward in time and estimate the infection probability of observed individuals at time t: , where node i is observed to be in state O at time T. (C). We use cross-ensemble covariability to adjust the infection probability of individuals connected to observed nodes. Denote the adjustment on observed node i as for each ensemble member. The adjustement on node j, a neighbor of node i, is , where . Here is the covariance between and and is the variance of , both computed using the ensemble members. Red arrows mean that the information on infection probability is propagated to the neighbors of observed nodes. (D). We integrate the master equations to time t+1. Red arrows show that information on infection probability is further propagated forward to neighbors. After these three procedures, information from the two observations reaches six other individuals in the network.

More »

Expand

Fig 2.

Performance of different methods in a tree graph.

The tree has 781 nodes 780 edges. Five different methods, including the ensemble inference (ENS-I), modified dynamic message-passing with a fixed transmission rate (DMP1), modified dynamic message-passing with uniformly distributed transmission rates (DMP2), number of connections (Degree), and contact with observed infections (Contact), are compared. The ROC curves for various observation rates 18.44% (A), 32.64% (B), and 55.95% (C) are shown. The corresponding AUC values are compared in (D, E, F). The numbers of infections identified among high-risk individuals selected by different methods are shown in (G, H, I).

More »

Expand

Fig 3.

Performance of different methods in homogeneous networks.

The ER network has 1,000 nodes with an average degree of 2.6. Five different methods, including the ensemble inference (ENS-I), modified dynamic message-passing with a fixed transmission rate (DMP1), modified dynamic message-passing with uniformly distributed transmission rates (DMP2), number of connections (Degree), and contact with observed infections (Contact), are compared. The ROC curves for various observation rates 15.98% (A), 32.74% (B), and 49.72% (C) are shown. The corresponding AUC values are compared in (D, E, F). The numbers of infections identified among high-risk individuals selected by different methods are shown in (G, H, I).

More »

Expand

Fig 4.

Performance of different methods in scale-free networks.

The scale-free network has 2,044 nodes with a power-law exponent of 2.5. Five different methods, including the ensemble inference (ENS-I), modified dynamic message-passing with a fixed transmission rate (DMP1), modified dynamic message-passing with uniformly distributed transmission rates (DMP2), number of connections (Degree), and contact with observed infections (Contact), are compared. The ROC curves for various observation rates 14.48% (A), 29.01% (B), and 40.31% (C) are shown. The corresponding AUC values are compared in (D, E, F). The numbers of infections identified among high-risk individuals selected by different methods are shown in (G, H, I).

More »

Expand

Fig 5.

Performance of different methods in real-world networks.

ROC curves for different methods (the ensemble inference (ENS-I), modified dynamic message-passing with a fixed transmission rate (DMP1), modified dynamic message-passing with uniformly distributed transmission rates (DMP2), number of connections (Degree), and contact with observed infections (Contact)) in nine real-world networks. The observation rates range from 12.64% to 17.61% in these networks. Details on the networks, model parameter setting, and observation rates are provided in S1 Text.

More »

Expand

Fig 6.

Inference of high-risk individuals in real-world networks.

The numbers of infections identified among high-risk individuals selected by different methods are shown for nine real-world networks. Five different methods, including the ensemble inference (ENS-I), modified dynamic message-passing with a fixed transmission rate (DMP1), modified dynamic message-passing with uniformly distributed transmission rates (DMP2), number of connections (Degree), and contact with observed infections (Contact), are compared.

More »

Expand

Fig 7.

Computational complexity of the ensemble inference algorithm.

(A). Computation time for increasing number of nodes with a fixed number of observations (N_o = 200). Experiments were run on ER random networks with an average degree 3. Distributions of running time were obtained from 100 runs. Boxes show the median and interquartile. Whiskers show 1.5 times the interquartile range away from the bottom or top of the boxes. The inset shows the fitting of the computation time against the number of nodes (both log-transformed). (B). Computation time for a fix number of nodes 3,000 and a fix average degree 3 with increasing number of observations. (C). Computation time for a fix number of nodes 3,000 and a fix number of observations 200 with increasing average degrees. (D). Computation time for increasing numbers of nodes with a fixed percentage of observations (15%). (E). Computation time for increasing number of ensemble members in an ER random network with 3,000 nodes, an average degree of 3, and 200 observations.

More »

Expand