Negative density-dependent dispersal in tsetse (Glossina spp): An artefact of inappropriate analysis

Published analysis of genetic material from field-collected tsetse (Glossina spp, primarily from the Palpalis group) has been used to predict that the distance (δ) dispersed per generation increases as effective population densities (De) decrease, displaying negative density-dependent dispersal (NDDD). Using the published data we show this result is an artefact arising primarily from errors in estimates of S, the area occupied by a subpopulation, and thereby in De. The errors arise from the assumption that S can be estimated as the area (S^) regarded as being covered by traps. We use modelling to show that such errors result in anomalously high correlations between δ^ and S^ and the appearance of NDDD, with a slope of -0.5 for the regressions of log(δ^) on log(D^e), even in simulations where we specifically assume density-independent dispersal (DID). A complementary mathematical analysis confirms our findings. Modelling of field results shows, similarly, that the false signal of NDDD can be produced by varying trap deployment patterns. Errors in the estimates of δ in the published analysis were magnified because variation in estimates of S were greater than for all other variables measured, and accounted for the greatest proportion of variation in δ^. Errors in census population estimates result from an erroneous understanding of the relationship between trap placement and expected tsetse catch, exacerbated through failure to adjust for variations in trapping intensity, trap performance, and in capture probabilities between geographical situations and between tsetse species. Claims of support in the literature for NDDD are spurious. There is no suggested explanation for how NDDD might have evolved. We reject the NDDD hypothesis and caution that the idea should not be allowed to influence policy on tsetse and trypanosomiasis control.


Introduction
Using critical assumptions about gene flow, a model developed by Rousset [1], and analyses of trap samples of tsetse flies (Glossina spp), de Meeûs et al. [2] claimed to have found strong support for the hypothesis that the dispersal distance per generation, in tsetse, increases as a power function of decreasing population density. The claim was based on genetic analyses of material from five different species of tsetse, sampled using traps in ten studies in six different countries in West and East Africa [3][4][5][6][7][8][9][10][11]. De Meeûs et al. concluded that negative densitydependent dispersal (NDDD) probably applied to all tsetse species [2]. They predicted that mean dispersal distance (δ) would increase by 200-fold when the effective population density (D e ) of adult tsetse decreased from about 24,000 to 1 per square kilometre, the order of density decline commonly associated with tsetse control operations [12]. This led them to warn that such control measures could unleash enhanced invasion of areas cleared of tsetse, so prejudicing the long-term success of tsetse control campaigns. That, in turn, prompted them to suggest the necessity of using "area-wide and/or sequential treatments of neighboring sites" to counter the added invasion, and they implied that small control operations risk doing more harm than good.
The term "area-wide" applied to control campaigns is jargon for the eradication of whole infestations of tsetse up to natural barriers to reinvasion and is commonly associated with recommendations for the use of the costly and complex sterile insect technique [13]. It would be dismaying if area-wide control were indeed necessary, since small-scale operations run by local communities offer an economically viable way forward [14][15][16]. Moreover, if NDDD were a reality in tsetse, then it would imply risk even for the many large and seemingly successful operations which have already cleared tsetse from tens of thousands of square kilometres, but which have fallen short of tackling the whole fly-belt [17]. The implication is that such signal successes are in danger of becoming disasters due to long-term evolution of increased invasion rates [18]. All of these considerations are of added importance since the protocol developed for tsetse by de Meeûs et al. [2], henceforth called the "NDDD protocol", is potentially applicable to any trappable creature, including many insect vectors of neglected tropical diseases. Hence, if the protocol is valid it should be extended to all of these vectors to assess whether NDDD must be considered when formulating policies for their control.
Recognising the potential importance of the NDDD hypothesis, de Meeûs et al. called for it to be field tested [18]. The clearest and most direct means of doing this would be via a set of mark, release and recapture experiments to assess both the density of tsetse and their rate of displacement in a variety of locations before and after control. Unfortunately, such experiments would not only be prohibitively costly and protracted, but would also be effectively impossible-given the problem of obtaining meaningful numbers of recaptures in places where tsetse density is low. Hence, the only workable option for field tests would seem to be yet more genetic analyses of the sort already made [2], effectively involving nothing better than getting the analytical procedure to check itself.
Given the problems of conducting pertinent experiments, the only available means we have of checking the NDDD hypothesis is to examine closely the medley of procedures, assumptions, arguments and citations on which it depends. We note immediately that, whereas it was predicted that NDDD applies to all tsetse species [2], the evidence for it is based largely on tsetse of the Palpalis group. There was only one representative of the Morsitans group, G. pallidipes, and no Fusca group species. There seems, thus, to be no a priori support for the claim that NDDD probably applied to all tsetse [2]. Moreover, modelling studies have already shown that, even if the NDDD hypothesis were correct, threats to tsetse control would be minimal [19]. It remains to show, however, whether there is any valid evidence that NDDD in tsetse exists at all, and hence whether it and its associated protocol should feature in policies of tsetse research and control, or be addressed in the study, control and management of other creatures.
Accordingly, we dissect here the evidence adduced by de Meeûs et al. [2] in favour of NDDD in tsetse. We show that the methods they used to estimate parameters are subject to large errors, and that such errors create the false signal of NDDD, even in simulated populations where NDDD has been specifically proscribed. We stress that we make no explicit or implicit criticism of the model of Rousset [1], nor of the genetic analyses in any of the studies that generated the data used by de Meeûs et al. [2]. We simply enumerate the errors, inconsistencies and false signals that arise from the way in which the Rousset model, and the genetic data, have been used and interpreted.

The de Meeûs et al. procedure and its variables [2]
In designing the methods, we recognised that the main support for the NDDD hypothesis is built around Eq (1): where δ is the predicted dispersal distance per generation of a tsetse population, and N e is the effective population size, roughly defined as the number of adults in a population that will leave a genetic signature to the next generation. N e was estimated using linkage disequilibrium methods [2,20]. S is the surface area occupied by the effective population, and b is the slope of the linear regression of the genetic distance among subpopulations on the log-transformed geographic distance among those subpopulations. D e is the effective population density, equal to N e /S. [2]. We will show that the insertion of highly uncertain values into Eq 1 can generate spurious negative correlations between the estimated values of D e and δ. Here, and in what follows, we represent estimated, as opposed to the true, values of the variables byŜ;N e ,D e ,b and d. Note that de Meeûs et al. did not measured: they simply calculated its value from Eq (1), using estimatesb andN e obtained from genetic analyses, and estimatesŜ of the area occupied by the population, calculated from the disposition of the traps that sampled the population [2].

Analysis plan
We first looked at relationships between all of the variables in Eq (1). When we found several counter-intuitive correlations, we provided an heuristic explanation of how these could result from measurement error. Mathematical analysis, and a simulation study, were then used to confirm that errors inD e lead to a false signal of NDDD. This led to an investigation of the importance of the variable S in contributing to the errors inD e and a demonstration that there were indeed serious errors in the estimation of S in the de Meeûs et al. study [2]. Finally, we used data from two of the ten field studies used in the development of the NDDD protocol, carrying out simulations that tested the idea that variations in trap deployment patterns, or the way in which such patterns were interpreted, could lead to the false signal of NDDD. Details follow of the methods used in executing the above plan. Analysis of relationships observed in the de Meeûs et al. study [2]. To elucidate the relative importance of the variables involved in Eq 1 we first took logarithms of both sides to get Eq 2: logðdÞ � À 0:5 logðpÞ þ 0:5 logðŜÞ À 0:5 logðN eÞ À 0:5 logðbÞ ð2Þ Then, using data kindly supplied by Dr Thierry de Meeûs (Table A in S1 Text and Table A in S1 Table), we looked for correlations among the variables in Eq (2).
How measurement errors generate a false signal of NDDD with a slope of -0.5. The above analysis showed some counter-intuitive correlations, so we investigated the possibility that these could result from measurement error. We provide an heuristic explanation for the way in which such errors lead to a false signal of NDDD, and support this with a mathematical analysis, full details of which are provided in S2 Text.
The false signal of NDDD in a simulated population with assumed density-independent dispersal (DID). We next carried out a simulation exercise confirming that errors in measures of D e lead naturally to a false signal of NDDD in a situation where we specifically assume DID. This is illustrated in S2 Table, which provides full details of the simulation procedure, carried out in Excel, which can be executed by the reader. We simulate a group of populations where δ is constant, D e varies within some arbitrary range, and b is calculated from δ and D e according to Eq (1). We use these true values of D e in this simulated population to generate "estimates" ofD e with large error. For this,D e is calculated as true D e multiplied by some random factor between 0.2 and 5; these errors can be made additive instead, without consequence to the conclusion. For simplicity, we assume thatb is estimated without error (b = b). We then calculated using Eq (1), replicating the method used in the NDDD protocol [2].
The importance ofŜ in determining the value ofδ. In the previous section we investigated how errors inD e can lead to a false signal of NDDD. SinceD e =N e =Ŝ errors inŜ can be a major contributor in errors inD e , unless these are exactly cancelled out by other errors in N e . Accordingly, we investigated the importance of S in determining the δ value predicted by Eq (2), relative to contributions from other terms. Since the equation has the form of a multiple linear regression, we borrowed a tool from regression to estimate, for the data from the ten studies used by de Meeûs et al. [2], the relative importance of each of the predictor variables, log(N e ), log(Ŝ), and log(b), in determining the predicted value of log(d). We measure "relative importance" by the percentage of the total variance in log(d) that is explained by variation in each of the three predictor variables. Since these three variables are inter-correlated across the ten studies, we used hierarchical partitioning [21][22][23] to estimate their relative importance. In this approach, one of the variables, say log(Ŝ), is added to each of the four possible regression models that contain neither, either, or both of the other two predictors. In each case, the increase in multiple R 2 due to the addition of log(Ŝ) is recorded, and the average of the four increases is the estimated proportion of total variance in log(d) that is explained by log(Ŝ). For example, regression of log(d) on log(N e ) alone yields R 2 = 30%. The addition of log(Ŝ) to this regression model improves R 2 to 85%, an increase of 55%. Increases in R 2 are likewise calculated when log(Ŝ) is added to the three other possible regression models involving the other two predictors: the null model, the model containing log(b) alone, and the model containing both log(b) and log(N e ). The mean of the four R 2 increases is 65%. This procedure is then repeated for log(b), and again for log(N e ), to obtain the mean increase in R 2 , when each is added to the four regression models involving the other two predictors. Because log(N e ), log (Ŝ), and log(b) were used to predict values of log(d) in the first place, via Eq (2), they jointly explain 100% of its total variance. We applied hierarchical partitioning to the data, using either the hier.part or relaimpo package of the R language.
The importance of errors inŜ in creating a false signal of NDDD. The preceding analyses led to the demonstration of the central importance ofŜ in determining the estimated value of δ. We then show that the true area (S) covered by a biological subpopulation bears no known relation to the area (Ŝ) estimated to be covered by a set of traps.
Simulations from field studies illustrating false signals of NDDD. Finally, we check the validity of the field estimates of population densities and resulting dispersal distances reported by de Meeûs et al. [2]. For each of their 10 populations they assumed a single true value of δ that is roughly applicable throughout the entire area occupied by the population. Thus, if investigators deployed traps in different patterns within the same population, while following the rules for estimating S [2], the resulting estimates of δ should be the same, regardless of the trap distribution adopted-subject only to experimental errors in estimating b and N e . We investigate whether this is true, using data from studies carried out in Tanzania and Uganda [2,7,11].
For the Tanzania study, using G. pallidipes, the original report states that sampling of tsetse was carried out using two traps at each of seven sites [7]. Because GPS coordinates were apparently not available for these traps, de Meeûs et al. analysed the data imagining that only one trap was used at each site. For such analyses they state: "when only one trap was available per site or when the GPS coordinates of corresponding traps (one subsample) were not available, we computed S = π(D min /2) 2 where D min is the distance between the two closest sites taken as the distance between the centers of two neighboring subpopulations" [2].
However, there were actually two traps at each site, not one [7]. Suppose that GPS coordinates were available for the traps. Then it would be logical to employ the alternative definition for estimating S: "For all analyses, when more than one trap was available in a site, the surface area of the site was computed as S = π(D max ) 2 where D max is the distance between the two most distant traps in a given site, taken as the radius of the corresponding subpopulation." [2].
We calculate, and compare, the values ofd that result from the two different values ofŜ.
In the study of G. fuscipes fuscipes in Uganda, six traps were deployed at each of 42 sites, spread across an area of about 4000 km 2 [11]. Traps at each site were separated by a distance of at least 100 m. For this trap spacing, a value ofŜ = 0.02 km 2 was calculated, usingŜ = π(D max ) 2 (see above). Using finite estimates ofN e , from 30 of the 42 sites, Opiro et al. calculate an arithmetic mean ofN e = 425 flies for the effective population ( Fig A of S3 Text) [11]. The other 12 sites in the Opiro et al. study did not provide finite estimates ofN e . Similarly, they estimatedb = 0.0202 using information on genetic and geographic distances between all available sites.
Using the above estimates forŜ;b andN e , de Meeûs et al. calculatedd = 27 m per generation, the lowest among all of the 10 studies cited, and the one where the estimated effective population density,D e , was the second highest [2].
We carried out a "thought experiment" to investigate how theŜ;D e andd estimates are affected by changes in trap deployment patterns. This was a simulation exercise, based solely on the data from the Uganda study and the NDDD protocol [2,11]. We imagined that the study was replicated 10 times, with different trap placements for each replicate, giving rise to different values ofŜ. We stipulated only that the traps were always distributed throughout the roughly 4000 km 2 in the study area [11]. See S3 Text for full details of the analytic procedure, and Table B in S3 Table for the Excel file in which the simulations were executed -and where the reader can make repeat runs of the procedure.

Analysis of relationships observed in the de Meeûs et al. study [2]
The primary observation of de Meeûs et al. was a strong linear correlation (R 2 = 0.85; P<0.01) between log (d), the dispersal per generation, and log (D e ), the effective population density, for the 10 populations studied, with a slope close to -0.5 ( Fig 1A) [2]. Their results also showed a strong negative correlation (R 2 = 0.86; P<0.02) between their predicted values of log(d) and the logs of the estimated census population densities (D c =N c =Ŝ), withN c -the census population -estimated from trap catches. These relationships form the central structure of the argument for NDDD. Notice that, in their data, log(b), the slope of the linear regression of genetic distance against log-transformed geographic distance, is not significantly correlated with log(D e ) (Fig 1C; P > 0.05). Similarly, log(N e ), the effective population size, is weakly correlated with log(D e ) (Fig 1D), and not significantly correlated with log(Ŝ), the area estimated to be occupied by N e (Fig 1E). Surprisingly, log(d) is strongly correlated with log(Ŝ) (Fig 1B), and log(Ŝ) is strongly negatively correlated with log(D e ) ( Fig 1F). The following sections clarify the origins of these counter-intuitive results.

How measurement errors generate a false signal of NDDD with a slope of -0.5
Eq (1) is a rearrangement of the original derived by Rousset [1] to describe the value of b that would arise in a population as a result of given values of D e and δ, where D e = N e /S and δ is a distance measure somewhat different from that used in [2]. If the parameters in this equation could be measured without error, Eq (1) could indeed be a valid tool for obtaining a value for d based on estimates ofD e andb, and then correlatingd withD e . But that is not the case where errors occur in the estimateD e , whether these errors occur randomly or otherwise. When Eq (1) is used to calculated fromD e the error inD e will propagate tod. This means that any overestimate ofD e will lead to an underestimate ofd, and vice-versa. IfD e andd are then plotted against each other, the result is the error inD e being plotted against itself. If the error inD e is large enough -and we will show later that errors can be >1000-fold -this autocorrelation will overwhelm any true relationship between D e and δ. As the error inD e increases,b approaches independence fromD e , and d(ln(δ))/d(ln(D e ))i.e., the rate of change of ln(δ) with changes in ln(D e ) -can then be calculated from Eq (1) with b as a fixed parameter, which yields a slope of -0.5.
The complementary mathematical analysis in S2 Text confirms that, as long as estimation errors strongly dominate the values ofD e , the relationship between log(D e ) and log(d) will appear to suggest the presence of NDDD, with a slope of -0.5, even in circumstances where the true situation is DID. We now use a simulation exercise to confirm this result. A false signal of NDDD in a simulated population with assumed DID. We tested the prediction that errors in measures of D e lead naturally to the false signal of NDDD, using a simulated population where we specifically assumed DID. Thus, if D e is measured without error, then calculated values of b declined as a power function of D e , to satisfy Eq 1 (Fig 2A). When there was error in D e , however, log(b) appeared uncorrelated with log(D e ) ( Fig 2B) and there was a negative correlation between log(D e ) and log(d), with a slope tending towards -0.5 (Fig 2C). This approximated the situation observed in plots of real field data (cf Figs 1C and 2B, and Figs 1A and 2C). [2]. A. Predicted dispersal distance (log(d )) vs effective population density (log(D e )). B. Predicted dispersal distance (log(d )) vs surface area (log(Ŝ )) occupied by the effective population. C. Regression coefficient (log(b)) vs effective population density (log (D e )). D. Effective population size (log(N e )) vs effective population density (log(D e )). E. Effective population size (log(N e )) vs surface area (log(Ŝ )) occupied by the effective population. F. Surface area (log(Ŝ )) occupied by the effective population vs effective population density (log(D e )).  Table A in S2 Table to observe the consequences for the slope of log(d) against log(D e ). Since the simulation is stochastic, the slope changes with each realisation of the process but, for fold-error greater than around 1.2, the slope is invariably less than zero, as illustrated in Fig A in S2 Table. That is to say, the population appears to exhibit NDDD, despite the fact that it has been set up such that dispersal is actually independent of population density. SinceD e =N e =Ŝ, errors inŜ can lead to errors inD e . We now show that there are indeed serious errors in the estimation of S: these errors are central to the false signal of NDDD.

Fig 1. Relationships between various parameters in Eq (1) for data published in
The importance ofŜ in determining the value ofδ. Given thatŜ represents an estimate of the area covered by a collection traps, there is no biological reason to expect the high correlations that de Meeûs et al. [2] found between log(D e ) and log(Ŝ) and between log(d) and log (Ŝ) (Fig 1A and 1B). The fact that strong correlations were found, led us to explore whether the variation that de Meeûs et al. observed inD e was mostly driven by changes in the estimated area (Ŝ) covered by traps, rather than by true changes in D e [2]. As described in the Methods section, we investigated the importance of S in determining the value of δ predicted by Eq (2), relative to contributions from other terms. These analyses yielded the following percentages for the three predictors: log(N e ) = 25%, log(Ŝ) = 65%, and log(b) = 10%. Thus, the variation in log(Ŝ) explained almost twice as much of the variation in log(d) as did the other two variables combined.
The importance of errors inŜ in creating a false signal of NDDD. The central importance of S in accounting for variation in the dispersal rate (δ) implies that errors inŜ will be crucial in determining errors inD e and thusd. This leads inevitably to the main problem with the NDDD protocol, embodied in the statement: "The average surface (S) occupied by a subpopulation can be computed as the surface area occupied by the different traps used in a given survey site" [2]. In fact, the relationship between the true area (S) covered by a biological subpopulation and the area (Ŝ) estimated to be covered by a set of traps, remains unknown. Methods for estimating N e assume that there is no genetic structure within a sub-population [24], meaning that there should be no systematic genetic distinction between flies caught in  Table A in S2 Table. https://doi.org/10.1371/journal.pntd.0009026.g002 different traps. Thus, N e can be estimated by sampling from any area smaller than the geographic range over which the sub-population can be said to be well mixed genetically. Fig 3 shows three different possibilities for the trap-sampling areas (Ŝ 1 -Ŝ 3 ) in relation to the true area (S t ) occupied by a hypothetical, well-mixed subpopulation. Sampling flies from within eitherŜ 1 orŜ 2 will produce the same expected value of the estimateN e , of the true value of N e , but will use different values ofŜ, which is the denominator for calculatingD e . Thus, the estimated population density will vary according to the area covered by the traps, introducing error into theD e estimates.
IfŜ 3 is used (covering an area larger than the size of what can be considered a well-mixed subpopulation), then the assumptions underlying the estimates ofN e will be violated, and the estimates ofN e will be flawed, so introducing yet more error toD e . The extent to which these erroneous estimates ofN e will scale withŜ -as it grows above the size of S -is not entirely clear, but previous work suggests that it will not scale linearly [25] and will thus continue to produce incorrect values ofD e that are a function of trap placement. Correct values ofD e can be obtained only in the special case where S = S t . We will now show that, for the data used in the NDDD protocol, variation inD e is due largely to the choice of trap placement, and the way in which those placements are interpreted, rather than to true biological variation in N e [2].

Simulations illustrating false signals of NDDD in field studies
Study on G. pallidipes in Tanzania [7]. These data were originally analysed in the NDDD protocol as if only one trap was used at each site, with a value of D min � 6.7 km, so thatŜ = π (D min /2) 2 = 34.87 km 2 [2]. Using the resulting estimated values ofb = 0.0168 andN e = 44, the authors calculated a value ofd = 3875 m per generation from Eq (1), the highest estimate in any of the studies they cited [2]. As we noted in the Methods section, however, there were actually two traps at each site, not one. Although GPS coordinates were lacking for these traps, the distance between them was known (100 m). Using the definition appropriate for this situation gives D max = 0.1 km, resulting in a value ofŜ = π(D max ) 2 � 0.031 km 2 , differing from the published estimate ofŜ by a factor of more than 1000 [2].
Notice that, whichever concept of trap spacing was used, the same flies would have provided the genetic and geographic information employed in the published estimates ofb and N e [2] which would thus have been identical in each case. Accordingly, the revised estimates of effective population density isD e � 44/0.031 = 1401 tsetse per km 2 , and dispersal distance is δ� ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 0:031 p = ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi p � 0:0168 � 44 p = 116 m, differing from the published estimate of δ by a factor of 33.4 [2]. We emphasise that either of the methods for estimating S presented by De Meeûs et al. [2] could reasonably be used for this dataset, and that the simple choice of which to apply caused this large difference in outcome.
This analysis suggests that the values ofd obtained in all of the studies used by de Meeûs et al. [2], were strongly dependent on the spacings of the traps used to sample the population, and also on the decision about how to interpret those spacings. Alternatively, if it is objected that -by regarding the trap deployment as either one, or two, traps per site -the procedure is measuring the movement rates in two different subpopulations, we would be forced to conclude that values of δ could differ by orders of magnitude between subpopulations of the same population. Either scenario is sufficient to undermine the basis of the NDDD protocol [2].
Study on G. f. fuscipes in Uganda. Analysis of data from the Opiro et al. study of G. f. fuscipes in Uganda [11], casts further doubt on the NDDD hypothesis. As described in the Methods and in S3 Text and S3 Table we estimate values ofD e andd for simulated studies where 10 different trap deployment patterns were used throughout the roughly 4000 km 2 of the study area [11]. If, as clearly assumed by the NDDD analysis [2], the expected value of δ is constant across the whole study area [11], then the analysis in each of our 10 estimation procedures, above, should provide roughly the same value ofd -allowing for errors in the measurement of b and N e . That is to say,d should be independent of patterns of trap deployment. In particular, if log(d) is plotted against log(D e ) or against log(Ŝ), the results should approximate horizontal lines.
The reality is markedly different. The results of a single realisation of the simulation procedure are shown in Fig 4, from which it is seen that the simulation essentially reflects all of the properties of the NDDD picture provided in Fig 1. In particular we see that: log(d) is strongly correlated with log(D e ), with a slope around -0.5 ( Fig 4A); log(d) is strongly correlated with log(Ŝ), with a slope around 0.5 ( Fig 4B); log(b) and log(D e ) are poorly correlated (Fig 4C).
Moreover, log(b) and log(D e ) are poorly correlated ( Fig 4D); log(N e ) and log(D e ) are positively, but rather weakly, correlated (Fig 4E), and log(Ŝ) declines linearly with increasing log (D e ) (Fig 4F). The reader is invited to use the algorithm in Table B in S3 Table to make serial iterations of the stochastic procedure -with each iteration using a different randomly generated error for log(b) and log(N e ).
Notice that we make no assumption about the true underlying nature of dispersal: it could be NDDD, DID or PDDD (positive density-dependent dispersal). Regardless of this, however, the output always strongly resembles NDDD. Moreover, while we have used only one of the ten studies involved in generating the NDDD hypothesis, the result is entirely general and the same problem will arise in the analysis of any of the studies cited [2,11]. Notice also that, if b and N e are measured without error, the outcome still gives the appearance of NDDD (Table B and Fig A in S3 Table). That is to say, the false signal of NDDD is entirely due to the fundamentally erroneous assumption thatŜ, as measured by the distribution of traps used in the sampling procedure, provides a good estimate of the true area (S) occupied by the tsetse population under study.
The foregoing analyses show that, depending on trap placement and the choice of howŜ is calculated from such placement, the Tanzanian and Ugandan studies can both be made to reflect either an extremely high effective population density and low dispersal rate, or completely the reverse [7,11]. Clearly, in each study, these scenarios cannot both be correct.  [11]. A. Dispersal distance (log(d )) vs effective population density (log(D e )). B. Dispersal distance (log(d )) vs surface area (log(Ŝ )) occupied by the effective population. C. Regression coefficient (log(b)) vs effective population density (log(D e )). D. Effective population size (log(N e )) vs effective population density (log(D e )). E. Effective population size (log (N e )) vs surface area (log(Ŝ )) occupied by the effective population. F. Surface area (log(Ŝ )) occupied by the effective population vs effective population density (log(D e )). https://doi.org/10.1371/journal.pntd.0009026.g004 Indeed, both are almost certainly incorrect because, as explained above,Ŝ is virtually never equal to the true area (S) occupied by a subpopulation.

Discussion
The primary goal of this paper was to assess the evidence relating to the notion of NDDD in tsetse. If support had been found for the idea, then the very general nature of the model on which NDDD was based [1], might suggest that the phenomenon could occur in other creatures, including other insect vectors of disease. Our simulations and arguments show, however, that there are serious flaws in the arguments that led to the notion of NDDD in tsetse [2].
These flaws can be summarised by the following two points. First, the use of Eq 1 to calculated from measurements ofD e is biased towards finding a spurious correlation betweend andD e , whenever there is error in the measurement ofD e . Since population genetic parameter estimates will always contain some degree of error, this alone is concerning. Second, the methods used for estimatingŜ introduce very large errors into the estimates ofD e =N e =Ŝ. As the error inD e becomes large, the predicted log slope of the spurious correlation withd will tend to -0.5. Hence, the procedure is inherently biased towards finding NDDD as marked as that actually observed by de Meeûs et al. even when dispersal is independent of density [2]. By far the largest errors in measurement appear to be in the estimateŜ, and henceD e , and these stem from the notion that the true areas inhabited by the various subpopulations under study are determined by the distance between traps. That is the root error, since inter-trap distances are not features of tsetse populations. Instead, they are strongly constrained by logistical issues, such as the numbers of available paths and traps, the mode of transport employed, the purpose of the trapping study and the whims of the researcher.
The issues raised in the Results section provide good evidence, of themselves, to reject the NDDD hypothesis for tsetse. However, various other problems, explored in detail in S4 Text, cast further doubt on the credibility of the hypothesis and its associated arguments. Many of these problems relate to the interpretation of trap catches. Thus, errors in census population estimates result from an erroneous understanding of the relationship between trap placement and expected tsetse catch. This is exacerbated through failure to adjust for variations in trapping intensity, trap performance, and in capture probabilities between geographical situations and between tsetse species. Such problems lead to serious errors in estimates of census populations, and a consequent absence of the expected high correlation between the estimates of census and effective population numbers,N c andN e [2]. We also point out that there is no credible suggestion for any mechanism by which NDDD might have evolved and, contrary to claims in [2], no support in the literature for NDDD in tsetse. Indeed, available published evidence seems to suggest that PDDD is more likely than NDDD. Finally, we note that -even if we take the correlations in Fig 1 at face value -de Meeûs et al. [2] drew no distinction between correlation and causation, and did not consider the possibilities of reverse causality or confounding.
These considerations all give background support for our present results with tsetse and offer general warnings about problems that could occur if the NDDD protocol developed for tsetse were used with other creatures. As an important example, if trapping exercises, and genetic analyses, were carried out on mosquitoes -without due regard to the problems of correctly estimating the area S occupied by a subpopulation -then we can be sure that the same signal of NDDD would result, regardless of the true nature of dispersal in the mosquitoes. This could have serious consequences for decisions regarding vector control in support of efforts to combat the many human diseases that mosquitoes transmit.
Although NDDD was initially offered as a strongly supported hypothesis in need of testing, some of the co-authors are already treating the hypothesis as established fact. It is stated, for example, that: "through genetic studies, De Meeûs et al. (2019) have shown that a strong negative density-dependent dispersal occurs after control operations" [26], incorrectly implying that dispersal had been measured before and after control, so giving unwarranted weight to NDDD and the claimed risks that such dispersal poses for tsetse control. Such statements, taken with the warning that NDDD will unleash massive invasion from neighbouring untreated areas, must be seen as having potentially serious impacts on disease control policy. It is for that reason that we have felt it essential to expose so fully the errors involved in the NDDD hypothesis.
The danger in the over-hasty transformation of an intriguing hypothesis into an established "fact" is that caution is easily abandoned, and that support for the hypothesis can be seen where none occurs. For example, a claim for the existence of NDDD, which involves believing that sparse populations evolve to disperse especially widely, has been made while also stressing the contradictory proposition that such dispersal is markedly disadvantageous to the survival of sparse populations [26]. Similar problems have arisen in the history of tsetse control. For example, the hypothesis that tsetse have open "feeding grounds" and well-wooded "homes" [27][28][29] encouraged the unnecessary destruction of large areas of natural woodland in the name of tsetse control [30].
Nonetheless, it would clearly be beneficial if genetic analysis could be used to provide useful indications of the dispersal rate of tsetse, or indeed of any other creature. As we have seen, a central problem is that of providing accurate and meaningful estimates of population density, which requires in turn good estimates of the area (S) occupied by a population. The problem of providing estimates of absolute numbers, and densities, of tsetse populations has taxed workers for nearly a century-since C.H.N. Jackson pioneered the use of mark-recapture in a remarkable series of population studies with tsetse [27][28][29]31] -and we still do not have good solutions. The best estimates of census population numbers and densities (N c and D c ) have involved mark-recapture exercises applied to populations on an island of known area, and closed to immigration and migration [32,33].
When such exercises are carried out on open populations, the results are much more difficult to interpret. For example, estimates of census numbers of G. m. morsitans and G. pallidipes near Rekomitjie Research Station, in Zimbabwe, varied by between 3 and 12-fold depending on whether numbers were estimated using mark-recapture, or a dynamic system that modelled rates of population change due to in-and out-migration as well as death and trapping [34,35]. Discrepancies were biggest for G. pallidipes, the larger and more mobile species, underlining the confusing effect of varied rates of fly dispersal. However, the problem lies more in estimating the area (S) occupied by the study population than in estimating the census numbers themselves. Moreover, while the effective population numbers (N e ) can be estimated from genetic data [36,37] there is still the problem of estimating S and, as we have seen in this paper, that difficulty lies at the heart of the efforts made to estimate dispersal rates from the analyses of genetic data [2]. We cannot ourselves suggest a solution to this problem but feel that it is important to highlight it for further consideration.
Supporting information S1  Absence of correlation betweenN c andN e . D. Failure to allow for intensity and duration of trapping, differences in performance between traps, between species and between geographical/ecological regions E. Contradictory evidence from trap catches. F. Inappropriate pooling of data for different situations. G. Unsupported claims of effects of NDDD reinvasion dynamics. H. No field evidence for larviposition pheromone in tsetse. I. Absence of any suggestion for a mechanism by which NDDD might have evolved. J. Errors in claimed support for NDDD. K. PDDD more likely than NDDD. L. Confusion between correlation and causation; possible reverse causality and confounding. Fig A in S4 Text. Data and fitted IBD regression for G. tachinoides study in Ghana. Row 2, Table A reports regression statistics (Adam et al., 2014). supplied values of b and its CI, estimated via bootstrap-over-loci (BOL) for 7 studies, and also supplied Mantel test results for 3 studies. We calculated jackknife estimates of b and its 95% CI for 7 of the 10 studies, using genetic distances based on all available loci. We estimated squared Pearson correlations (R 2 ) from all available genetic and geographic distances, with the full subset of loci included in genetic distances. Fig B in S4 Text. Surface area occupied by a subpopulation, as estimated using the NDDD protocol. A. More than one trap deployed at a site. In the example shown (cf (Opiro et al., 2017) five traps are spaced at equal intervals on the circumference of a circle of radius r units, with a further trap at the centre of the circle. The protocol calculate the area of the site asŜ = π(D max ) 2 , where D max is the distance between the two most distant traps in a given site, taken as the radius of the corresponding subpopulation. With this pattern, D max � 1.9 r. B. Two trapping sites with one trap deployed at each site. For this scenario, the surface area occupied by the sub-population sampled is calculated from S = π(D min / 2) 2 , where D min is the distance the distance between the centres of two neighbouring subpopulations and thus as the average diameter of a subpopulation.