Modeling Disease Vector Occurrence When Detection Is Imperfect II: Drivers of Site-Occupancy by Synanthropic Triatoma brasiliensis in the Brazilian Northeast

Background Understanding the drivers of habitat selection by insect disease vectors is instrumental to the design and operation of rational control-surveillance systems. One pervasive yet often overlooked drawback of vector studies is that detection failures result in some sites being misclassified as uninfested; naïve infestation indices are therefore biased, and this can confound our view of vector habitat preferences. Here, we present an initial attempt at applying methods that explicitly account for imperfect detection to investigate the ecology of Chagas disease vectors in man-made environments. Methodology We combined triplicate-sampling of individual ecotopes (n = 203) and site-occupancy models (SOMs) to test a suite of pre-specified hypotheses about habitat selection by Triatoma brasiliensis. SOM results were compared with those of standard generalized linear models (GLMs) that assume perfect detection even with single bug-searches. Principal Findings Triatoma brasiliensis was strongly associated with key hosts (native rodents, goats/sheep and, to a lesser extent, fowl) in peridomestic environments; ecotope structure had, in comparison, small to negligible effects, although wooden ecotopes were slightly preferred. We found evidence of dwelling-level aggregation of infestation foci; when there was one such focus, same-dwelling ecotopes, whether houses or peridomestic structures, were more likely to become infested too. GLMs yielded negatively-biased covariate effect estimates and standard errors; both were, on average, about four times smaller than those derived from SOMs. Conclusions/Significance Our results confirm substantial population-level ecological heterogeneity in T. brasiliensis. They also suggest that, at least in some sites, control of this species may benefit from peridomestic rodent control and changes in goat/sheep husbandry practices. Finally, our comparative analyses highlight the importance of accounting for the various sources of uncertainty inherent to vector studies, including imperfect detection. We anticipate that future research on infectious disease ecology will increasingly rely on approaches akin to those described here.

Suppose you are trying to determine the pattern of infestation of a series of discrete ecotopes (houses, corrals, henhouses, etc.) by a disease vector. In any realistic scenario, available methods for detecting the vector will all be imperfectthat is, some observed "absences" will in fact be false-negative results due to detection failures. Therefore, the observed fraction of ecotopes in which the vector was detected (the naïve "infestation index") will be biased down. Let n be the sample of ecotopes and x the ecotopes where vectors were detected; then, let p be the probability that a vector is detected in an ecotope during one search, given that the ecotope is infested (or "occupied") by the vector. The probability of detecting the vectors at least once after s searches in one ecotope is p s = 1 -(1p) s , and an estimator of the fraction of ecotopes that are occupied is (see ref. [S1] One straightforward way of dealing with this pervasive problem is repeated ecotope sampling [S1,S4]. Repeatedly searching an ecotope yields a "detection history" consisting of a series of "0s" (non-detections) and "1s" (detections). For example, the results of searching an ecotope four times could be "0100", meaning that infestation was detected only during the second of four visits. If searches are made during a short time-period, so we can safely exclude ecotope-level extinction or colonization (i.e., assuming population "closure"), we now know (i) that the ecotope was indeed infested and (ii) that we failed to detect the vector three times (we have three false-negative results Let p be the probability of observing infestation (represented by "1") in a visit to an infested ecotope; then, that of observing "0" is 1p. The probability of observing the first detection history ("1101") would equal Ψ (the probability that the vector is present, Valença-Barbosa C, et al. (2014)  which we know must be true because it was detected) times the observed detection probabilities for each visit: The probability of observing the second detection history ("0011") is and so on for the remaining histories. The two "0000" histories, however, require a slightly more complex treatment, because they imply two mutually-exclusive possibilities: either the vectors (i) were present (Ψ) but went undetected even after four searches [(1p) 4 ] or (ii) were truly absent (1 -Ψ); therefore, --.
Given that each ecotope represents an independent observation, the joint probability of the complete dataset (the likelihood) is estimated by multiplying the probabilities for each ecotope: In this simple situation (a "completely null" model without covariates) we have just two parameters to estimate: mean Ψ over ecotopes and mean p over ecotope-searches.
Using the freely available program PRESENCE [S5], the vector-searching method, which in our example was ~54% (95%CI, ~35% to 72%) for each individual ecotope-search. We again emphasize that this is an extremely simplified example intended only as a basic illustration of the rationale of the method; interested readers can find an extensive and detailed treatment of this and other siteoccupancy models in ref. [S1].
Using this approach, both key parameters (Ψ and p) can in addition be modeled as a function of covariates. For instance, we may be interested in finding out whether and to what extent infestation probabilities (Ψ) vary with the characteristics of ecotopes, among distinct localities, or as a consequence of a vector control intervention. Detection probabilities (p) may also be of interest, particularly when the performances of two or more bug-detection strategies (e.g., manual searches vs. vector-sensing devices, or different vector-sensing devices) are being compared [S4,S6]. Because both occupancy and detection probabilities are binomial variables, the natural link function is , where is the probability (or parameter) of interest for the i th sampling unit, β 0 is the intercept of the regression equation, and the rest of βs are the regression (slope) coefficients for covariates 1 to C [S1]. Because each estimate of Ψ, p, or β has an associated variance (which quantifies our uncertainty about the point estimate), we can compute confidence intervals that allow comparing the estimates; for example, we may ask whether infestation estimates are the same in different ecotope types, or to what extent detection probabilities change when a new vector-detection method is introduced.
In the case of covariate effects, the analyses will yield a point estimate, positive or negative, of each β coefficient, as well as its variance or SE; the statistical significance of each effect can then be assessed by constructing appropriate confidence intervals and checking whether they encompass zero.

Testing further potential sources of sampling-process heterogeneity
The fact that goodness-of-fit testing detected moderate overdispersion suggested that there were further, un-modeled sources of sampling-process heterogeneity in addition to those considered in the assessment of "null" models (see main text). Aiming to address this issue, we tested other factors that could possibly affect sampling outcomes, including covariates: (i) Describing, for each bug-search and ecotope, the result of the previous bug-search; (ii) Distinguishing the third bug search (performed after insecticide spraying) from the previous two; (iii) Indexing whether any individual search was performed by the same team that performed the previous one (this happened in a few cases in searches 2/3); and (iv) Indexing whether individual bug-searches were carried out with/without supervision from a member of the research team (this happened in some third visits).
As mentioned in the main text, none of these covariates improved model fit in terms of either QAICc or overdispersion (details not shown).