A graph-based evidence synthesis approach to detecting outbreak clusters: An application to dog rabies

doi:10.1371/journal.pcbi.1006554

Fig 1.

Schematic illustration of our graph approach for combining multiple data streams to identify outbreak clusters.

In this example, two data streams are considered: the spatial locations of the cases (A) and a phylogeny of pathogens from these cases (D). The three ‘actual’ outbreak clusters are identified in red, blue and green, using different shadings to identify individual cases. Each data source defines a fully connected graph where nodes represent cases and edges are weighted by the spatial (B) and genetic distances (E). Thicker edges represent smaller weights (distances) between cases. Each graph is then pruned separately, removing edges whose weight exceeds a given cutoff (C, F). The intersection of these graphs defines a new graph which retains only edges present in every pruned graph (G). The resulting clusters of cases indicate likely outbreak clusters.

More »

Expand

Fig 2.

Visualization of the clusters of cases of rabies.

The top left panel shows the incidence of reported cases of rabies over time, by date of report; cases identified as belonging to the same outbreak cluster (using all distances) are shown in the same colour (grey indicates singletons). The top right panel shows the geographic locations of the reported cases using the same colour coding as the incidence panel. The bottom panel shows the unrooted phylogeny obtained by Neighbour-Joining on Hamming distances (i.e. number of different nucleotides) between sampled sequences; the full tree showing two distinct strains more than 700 nucleotides apart is plotted on the left. Details of the two clades are provided in inset A) and B). A reporting rate of 20% was assumed, and pruning cutoff distances corresponding to the 95% quantiles of the input distance distributions were used (see S1 Text for sensitivity analyses on these assumptions).

More »

Expand

Fig 3.

Distribution of pairwise temporal (top), spatial (middle) and genetic (bottom) distances for rabies in Bangui. The temporal distance is defined as the time between reporting of the cases. The spatial distance is defined as the Euclidean distance between the geographic locations of cases. The genetic distance is defined as the Hamming distance between the sequenced isolates. The grey histograms show the observed pairwise distances between any two cases reported in Bangui. The solid black lines show the input distribution of distances between a case and its closest observed ancestor, given an assumed reporting rate of 20% (see S1 Text for sensitivity analyses to this assumption). Distributions have been rescaled to fit on the same graph as the histograms. The red vertical lines show the cutoffs corresponding to the 95% quantiles of these distributions. For a given data stream and a given choice of cutoff, pairs of cases with observed distance above the cutoff are considered not connected, and the corresponding graph edges are removed at the pruning step (see Fig 1).

More »

Expand

Fig 4.

Pruned (A-C) and final graph (D) used to define clusters of cases in the rabies outbreak, obtained using A) only temporal distances, B) only spatial distances, C) only genetic distances and D) all three combined. Nodes represent cases, and edges potential epidemiological links, according to the corresponding data. The inner colours of the nodes indicate the final clusters obtained by combining all data streams (D), whilst the outer colours correspond to the clusters obtained using one data stream at a time. Grey indicates singletons. A reporting rate of 20% was assumed, and pruning cutoff distances corresponding to the 95% quantiles of the input distance distributions were used (see S1 Text for sensitivity analyses on these assumptions). In each graph, the transparency of the vertices was adjusted according to the number of vertices in the graph, with more transparency in graphs with more vertices, to improve readability.

More »

Expand

Table 1.

Estimates of the reproduction number (R) and rate of importation of rabies into the canine population (total and unobserved).

The rate of importation was defined as the estimated number of outbreaks per unit of time over the whole study period. The rate of unobserved importation was defined as the estimated number of unobserved outbreaks per unit of time over the whole study period. A reporting rate of 20% was assumed, and pruning cutoff distances corresponding to the 95% quantiles of the input distance distributions were used (see S1 Text for sensitivity analyses on these assumptions).

More »

Expand

Fig 5.

Summary of the simulation study.

A, B: For the baseline simulation (mimicking rabies transmission in Bangui), performance of the method using different reconstruction scenarios, varying in terms of cutoff used at the pruning step and assumed reporting rate. C, D: Performance of the method, using the control reconstruction scenario (i.e. assuming transmission and evolution parameters as well as reporting rate are known), applied to different simulation scenarios varying in terms of reporting rate and diversity of the pathogen in the imported cases. See materials and methods and S1 Text for definition of all the simulation and reconstruction scenarios. Panels A and C show, across these scenarios, the ability of the model to correctly identify outbreak clusters, as measured by the true positive rate (TPR, proportion of pairs of cases belonging to the same transmission tree who are inferred to be in the same outbreak cluster), the true negative rate (TNR, proportion of pairs of cases not belonging to the same transmission tree who are assigned to different outbreak clusters), and the mean between TPR and TNR. Panels A and C show, across all scenarios, the distribution of the relative error in the estimated reproduction number (R) and importation rate.

More »

Expand