Population genomic evidence that human and animal infections in Africa come from the same populations of Dracunculus medinensis

Background Guinea worm–Dracunculus medinensis–was historically one of the major parasites of humans and has been known since antiquity. Now, Guinea worm is on the brink of eradication, as efforts to interrupt transmission have reduced the annual burden of disease from millions of infections per year in the 1980s to only 54 human cases reported globally in 2019. Despite the enormous success of eradication efforts to date, one complication has arisen. Over the last few years, hundreds of dogs have been found infected with this previously apparently anthroponotic parasite, almost all in Chad. Moreover, the relative numbers of infections in humans and dogs suggests that dogs are currently the principal reservoir on infection and key to maintaining transmission in that country. Principal findings In an effort to shed light on this peculiar epidemiology of Guinea worm in Chad, we have sequenced and compared the genomes of worms from dog, human and other animal infections. Confirming previous work with other molecular markers, we show that all of these worms are D. medinensis, and that the same population of worms are causing both infections, can confirm the suspected transmission between host species and detect signs of a population bottleneck due to the eradication efforts. The diversity of worms in Chad appears to exclude the possibility that there were no, or very few, worms present in the country during a 10-year absence of reported cases. Conclusions This work reinforces the importance of adequate surveillance of both human and dog populations in the Guinea worm eradication campaign and suggests that control programs aiming to interrupt disease transmission should stay aware of the possible emergence of unusual epidemiology as pathogens approach elimination.


Principal findings
In an effort to shed light on this peculiar epidemiology of Guinea worm in Chad, we have sequenced and compared the genomes of worms from dog, human and other animal infections. Confirming previous work with other molecular markers, we show that all of these worms are D. medinensis, and that the same population of worms are causing both a1111111111 a1111111111 a1111111111 a1111111111 a1111111111

Introduction
Guinea worm-Dracunculus medinensis (Linnaeus, 1758) Gallandant, 1773 -has been an important human parasite for most of recorded history. It is also one of the best known human pathogens and has been known since antiquity [1]. This infamy is presumably due to its distinctive life cycle, where the large adult female worm (up to 1m long) causes excruciating pain as it emerges from a skin lesion. As recently as 1986, there were probably around 3 million cases of Guinea worm disease (GWD) reported annually from 22 countries in Africa and Asia [2] and historically probably very many more [3]). Called the quintessential "forgotten disease of forgotten people," GWD was responsible for an enormous disease burden as patients are incapacitated for several weeks during worm emergence [4, and many other studies cited in 5], and subsequent complications and serious secondary infections of the resulting ulcer are common and occasionally fatal [1].
Following the eradication of smallpox in 1980, public health scientists at the US Centers for Disease Control and Prevention (CDC) recognised that Guinea worm disease was a potential target for eradication [e.g. 6,7]. Since 1986, Guinea worm has been the target of a large-scale control program aiming for complete, global eradication of the disease and extinction of the parasite responsible [8]. The introduction of interventions to encourage residents to report cases of GWD, prevent infected persons from contaminating source of drinking water, provide new sources of safe water and promote greater use of existing sources, promote the use of cloth and pipe ("straw") filters, and the application of vector control measures has subsequently reduced the incidence of GWD [5,9]. As the program has progressed, these measures have been complemented by work to ensure sources of infection are traced and treated for cases, containment of cases to prevent contamination of water, and active searching for new cases [9]. The global Guinea worm eradication campaign has been a great success-by 2000 there were only 74,258 cases of GWD in 15 countries in sub-Sharan Africa [10], and this had fallen to just 15 cases in each of Chad and Ethiopia as of 2017 [11], while in 2019 only Angola, Chad and South Sudan reported human infections [12]. Either Guinea worm disease or polio [13] will soon become the second human disease to be eradicated, and Guinea worm is on track to be the first to be wiped out without a vaccine, and probably the first animal species to be deliberately made extinct.
The eradication campaign was predicated on D. medinensis being an anthroponotic parasite, with transmission between people via drinking water. Sporadic reports of animal infections were assumed to either be due to misidentification of the worm involved or to represent spill-over infection of little or no epidemiological importance. However, experimental infections of non-human animals-and particularly of dogs-have been performed successfully on a number of occasions [1]. In natural conditions, worms have particularly frequently been reported as emerging from dogs, but generally at a low prevalence and sporadically. When human infections with Guinea worm have been eliminated from a region, dog infections from that region have subsequently disappeared [14,15]. There are some apparent exceptions: for example, in Bukhara, Uzbekistan, where a hotspot of very high Guinea worm prevalence (up to 20%) was eliminated in the 1930s, Guinea worm infections in dogs continued to be reported for a few years after this, but no human cases were found after 1931 [16,17].
From 2012, however, a distinct, and apparently unique situation became evident in Chad, where large numbers of infections in domestic dogs have appeared, against the background of a small number of human cases [15]. Dog infections became evident beginning in April 2012, when the Chad Guinea worm eradication program (with assistance from The Carter Center) launched active village-based surveillance in nearly 700 villages, following the detection in 2010 of human cases for the first time in 10 years in Chad. With increasing surveillance of dog populations, the number of dog infections reported has subsequently steadily increased, and since 2016, there have been over 1,000 infected dogs reported from Chad each year (1,935 in 2019 ; 12]. Small numbers of dog infections have also been identified in the other recently endemic countries (in 2019, 2 from Ethiopia, 8 from Mali and none from South Sudan, although this country did report a single dog infection in 2015). There is an urgent interest in understanding the epidemiology of worm infections in dogs and wildlife, and their relationship with human infections in the same areas [18,19]. Greater scrutiny of animals for potential Guinea worm infection has also revealed occasional infections in wildlife, such as cats and baboons (see [20] for a full description of the situation in 2016-2017).
In this context, there was some uncertainty as to whether the worms emerging from dog and human infections in Chad represented the same species. Most of the key defining morphological features for this group of nematodes are found on adult males, which are not recovered from natural infections [1,21]. The other described species of the genus Dracunculus are all from the New World and include D. insignis and D. lutrae from North American carnivorous mammals and D. fuelleborni from a Brazilian opossum [1,22,23]. There are also numerous reports of Dracunculus spp. in reptiles-particularly snakes [21]. Molecular phylogenetic work supports the mammalian parasites as a distinct clade to those found in other vertebrates [24][25][26][27]. The diversity and phylogeny of the genus has recently been reviewed [21]. There is a relative scarcity of parasitological work on wild mammals, and particularly of work looking beyond gastrointestinal species. There are also a number of reports of cryptic species of parasitic worms in wildlife (reviewed in [28]). It is thus possible that other, undescribed mammalinfective species exist, and these could explain the few reports of human or mammal Guinea worm infections from countries otherwise considered non-endemic (see Muller, 1971 for references to case reports). A number of comprehensive reviews of D. medinensis biology, epidemiology and control are available (e.g. [1,8]) and the reader is referred to the extensive literature on the Guinea worm eradication program [5,9,[29][30][31], including regularly updated surveillance data (most recently in [11]).
Previous molecular work established that D. medinensis and D. insignis could be differentiated at the 18S rRNA locus, and that a single dog worm from Ghana was identical at that locus to D. medinensis collected from nearby human cases [25]. We subsequently reported data from the 18S rRNA locus and a mitochondrial marker for 14 worms that emerged from humans and 17 from dogs in Chad, together with whole-genome data from 6 worms [15]. A draft reference genome assembly based on sequence data from a single worm from Ghana [32] recently gave the first picture of the genome content of this species and confirmed the phylogenetic position of D. medinensis within a large spiruromorph clade of parasites related to filarial nematodes. Here, we present an improved genome assembly for D. medinensis and whole genome sequence data for a much larger set of adult D. medinensis and from two closely related species. Together, these data give a detailed picture of the relationships between D. medinensis from different hosts and countries, and confirms existing microsatellite genotyping and mitochondrial sequence data from the same populations [33] which showed that human cases of dracunculiasis and animal infections all originate from the same populations of D. medinensis. We also investigate whether the pattern of genetic variation suggests a small population size (a population bottleneck) in Chad in the recent period of 10 years that no human cases were reported in the country, and more generally whether D. medinensis populations show signals of declining population sizes that we might expect to be due to the eradication program.

Methods
Worm material from D. medinensis was collected by the national Guinea worm eradication programs in the relevant countries, except that material from experimentally infected ferrets were obtained as previously described [34]. D. insignis material was collected from an American mink (Neovison vison) and D. lutrae material was collected from an otter (Lutra canadensis) in Ontario, Canada [26]. Specific ethical approval was not required as material was derived from standard containment and treatment procedures sanctioned by WHO and national governments and performed by national control program staff, and molecular testing is part of the standard case confirmation procedure. Human case samples were anonymized prior to inclusion in this study.
Genomic DNA was extracted from either 5-15mm sections of adult female worm specimens or from the pool of L1 larvae visible in sample tubes, wherever larvae were visible. DNA extraction was performed using the Promega Wizard kit, but with worm specimens cut into small pieces before digestion with 200μg of Proteinase K overnight in 300 μl of lysis buffer, then following the protocol described in the manual. PCR-free 200-400 bp paired-end Illumina libraries were prepared from genomic DNA as previously described [35] except that Agencourt AMPure XP beads were used for sample clean up and size selection. DNA was precipitated onto the beads after each enzymatic stage with a 20% (w/v) Polyethylene Glycol 6000 and 2.5 M sodium chloride solution, and beads were not separated from the sample throughout the process until after the adapter ligation stage. Fresh beads were then used for size selection. Where there was insufficient DNA for PCR-free libraries, adapter-ligated material was subjected to~8-14 PCR cycles. Libraries were run on an Illumina platform (HiSeq 2000, 2500 or HiSeq X) to generate 100 base pair or 150 bp paired-end reads.
Sequence data was compared to a reference genome assembled from a worm collected in Ghana in 2001. The sequence data and automated assembly of v2.0 of this reference is described fully elsewhere [32]. The v3.0 reference used here has undergone some manual improvement, with REAPR [36] used to identify problematic regions of the assembly to be broken, followed by iterative rounds of re-scaffolding as indicated by read-pair and coverage information visualised in GAP5 [37] and automated gap-filling with IMAGE [38] and Gapfiller v1. 11 [39] and a final round of sequence correction with iCORN v2.0 [40]. Assembly statistics for v2.0 and v3.0 of the D. medinensis genome are shown in S1 Table. All data generated in this study are available from the European Nucleotide Archive short read archive, a project ERP117282; accession numbers for individual samples are shown in S1 Table. Mapping was performed with SMALT v.0.7.4 (http://www.sanger.ac.uk/science/tools/ smalt-0), mapping reads with at least 95% identity to the reference, mapping paired reads independently and marking them as properly paired if the reads within a pair mapped in the correct relative orientation and within 1000bp of each other (parameters-x-y 0.95 -r 1 -i 1000). To avoid problems with mitochondrial data, mapping was also performed similarly against a reference containing mitochondrial genomes for dog, human and ferret. Duplicate reads were removed using Picard v2.6 MarkDuplicates. The BAM files produced were used as input to Genome Analysis Toolkit v3.4.0 for variant calling, following the 'best practice' guidelines for that software release: briefly, reads were realigned around indel sites, after which SNP variants were called using HaplotypeCaller with ploidy 2. Variants were then removed where they intersected with a mask file generated with the GEM library mappability tool [41] with kmer length 100 and 5 mismatches allowed, or were within 100bp of any gap within scaffolds. Finally, SNPs were then filtered to keep those with DP > = 10; DP < = 1.75 � (contig median read depth); FS < = 13.0 or missing; SOR < = 3.0 or missing, ReadPosRankSum < = 3.1 AND ReadPosRankSum > = -3.1; BaseQRankSum < = 3.1 AND BaseQRankSum > = -3.1; MQRankSum < = 3.1 AND MQRankSum > = -3.1; ClippingRankSum < = 3.1 AND Clippin-gRankSum > = -3.1. An additional mask was applied, based on the all-sites base quality information output by GATK HaplotypeCaller. The filters applied were DP > = 10, DP < = 1.75 � (contig median read depth) and GQ > = 10. Finally, sites with only reference or missing genotypes were then removed. Variant calling on the mitochondrion was performed similarly, except reads were first filtered to retain only those for which both reads in a pair mapped uniquely to the mitochondrion in the correct orientation with mapping quality at least 20, the read depth filter was 10 for all samples, all heterozygous calls were removed and the mask file was generated manually by examining dot plots and removing regions with a high density of heterozygotes.
Synteny between the D. medinensis v3.0 assembly and the published O. volvulus v4.0 assembly was confirmed using promer from the Mummer package [42] to identify regions of >50% identity between the two sequences over 250 codons. These results were then visualised using Circos v0.67pre5 [43]. The phylogenetic tree for D. medinensis samples was based on the proportion of alleles matching between each pair of samples at those sites for which both samples in a pair had a genotype call that passed the filter criteria. A phylogeny based on these distances was inferred by neighbour-joining using the program Neighbour from Phylip v3. 6 [44]. Principal components analysis was performed on a matrix of genotypes for sites with no missing data in R v3.3.0 [45] using the prcomp command. Population genetic summary statistics within and between populations were calculated for 10kb window of SNPs containing between 5 and 500 variants, using ANGSD v0.919-20-gb988fab [46]. This software estimates neutrality tests [47] or genetic differentiation between populations [48] following a probabilistic framework that employs genotype likelihoods. It is intended to be more robust to genotyping error than traditional calculations using the genotypes directly. Only sites with a minimal depth of 5 reads, a minimal base and mapping quality Phred score of 30 and a call rate of at least five individuals were used, and genotype likelihoods were estimated under the samtools [49] framework (GL = 1). In the absence of known ancestral states, folded site frequency spectra were generated to derive nucleotide diversity π, Watterson's θ and Tajima's D in 1kb windows. F ST estimates were computed from maximum-likelihood joint site frequency spectra between pairs of populations derived using the reference genome as the ancestral state. Estimates were generated for 10-kb sliding windows (with 1 kb overlaps) containing between 5 and 500 variants. We report genome-wide averages across these windows, and confidence intervals for these statistics were calculated for 10kb from 100 bootstrap replicates, resampling from 10kb windows. Values for absolute divergence (d xy ) were calculated similarly using pixy (https:// pixy.readthedocs.io/en/latest/about.html), with summary statistics calculated as above. Unless otherwise specified, plots were produced in R v3.3.0 with ggplot2 [50].
Bayesian clustering was performed with MavericK v1.0 using thermodynamic integration to estimate the number of clusters (K) best describing the data [51]. MavericK was run for 3 independent runs of 1,000 burnin generations and then 10,000 generations for inference, and with each rung of the thermodynamic integration run for 1,000 burnin generations and 5,000 generations for inference, for the default 21 rungs. For comparison, Structure v2.3.4 was run [52], using the deltaK method to select a value of K. Input to both of these was a set of 19,983 SNP variants samples across the D. medinensis scaffolds at 5 kb intervals. Structure was run using an admixture model, with a burn-in of 100,000 generations and using another 100,000 generations for inference.
Population history of D. medinensis was inferred using BPP v4 [53] to infer the number and branching pattern of populations, and then GPhoCS v1.2.2 [54] to infer branching times and effective population sizes on the maximum posterior probability history. GPhoCS requires populations and the phylogeny to be specified a-priori, but is able to perform inference using a larger set of loci more efficiently. For GPhoCS a total of 781 loci were chosen as contiguous 1kb regions spaced every 100kb across all autosomal scaffolds. Results were scaled to time and effective population time using a mutation rate of 2.7 x 10 −9 per generation, as estimated for Caenorhabditis elegans [55] and a generation time of 12 months: D. medinensis females emerge 10-14 months after infection [6]. At least three (3-6) independent MCMC chains were run for each of 5 different prior assumptions, with each chain running for at least 25,000 MCMC generations. In each case, identical priors were used for all θ and τ (population sizes and divergence times, respectively) parameters; priors for GPhoCS are specified as gamma distributions parameterised with a shape (α) and rate (β) parameters (hyperparameters). We held the β hyperparameter constant at 0.1, and chose α values varying by 4 orders of magnitude, from 10 −4 to 10 −8 , so that the means of the prior distributions varied from 6.43335-64,335 for θs following scaling and from 25.7342 to 257,342 for τ parameters. The variance of the prior distributions also varied linearly with changes in the α hyperparameter. Convergence was confirmed by visual inspection of the chains for each prior. For inference, the first 15,000 generations of each chain were removed and the remaining steps concatenated; highest posterior density estimates and effective sample sizes were calculated using the R packages HDInterval and mcmcse respectively. For all parameters the effective sample size was at least 250.
BPP attempts to identify reproductively isolated populations and estimate the phylogeny underlying those populations in a joint Bayesian framework [56,57]. Population size parameters were assigned the default inverse gamma priors with mean 0.002 and shape parameter (alpha) = 3, the root divergence time an inverse gamma prior with mean 0.001 and alpha 3, other divergence time parameters default Dirichlet prior. Each analysis is run at least twice to confirm consistency between runs, and each chain was run for 10,000 burnin generations and 50,000 generations for inference. Convergence was assessed by inspection of these chains in Tracer v.1.6. Due to difficulties in getting the software to run successfully to convergence using our complete dataset for BPP, a subset of 100 loci was chosen at random from these 781 loci. Three different random sets of loci gave essentially identical results (97.2%, 98.2% and 98.4% support for the same maximum-probability reconstruction; duplicate runs of the same loci varied by less than 0.5%).
Kinship between samples was calculated using King v1. 4 [58]. Distances between latitude and longitude points were calculated using the online calculator at the US National Oceanic and Atmospheric Administration at https://www.nhc.noaa.gov/gccalc.shtml. The map in S7 Fig was based on the 1:50m land shape file from www.naturalearth.com, and drawn using the R packages maps (v3.3.0), maptools (v0.9) and the theme_map from ggthemes (v4.1).

Whole-genome sequence data from Dracunculus specimens from a range of host species and geographic regions
In this study, we attempted to generate whole-genome shotgun sequence data for 90 D. medinensis specimens; for four samples we could not make sequencing libraries. We also sequenced two samples of D. insignis and one sample of D. lutrae. To aid interpretation of these data, we used the original Illumina data used to improve the v2.0.4 reference genome assembly for D. medinensis-based on a worm collected in Ghana in 2001 [32]-with a combination of both manual and automated approaches to produce an improved (v3.0) assembly for D. medinensis. This substantially improved contiguity and reduced misassemblies, for example the average scaffold length is twice that of the previously published assembly version (Table 1).
Despite extensive sequencing effort, mapping our data against this reference showed that we achieved a median depth of 10× coverage for only about one-third (33) of D. medinensis samples. Unless otherwise specified, subsequent analyses were restricted to this set of 33 D. medinensis samples. These samples were collected from a number of African countries (Fig 1), with 22 from Chad, 5 from Ethiopia, 2 each from Ghana and South Sudan and 1 each from Mali and Côte d'Ivoire. It included 15 samples from humans, 15 from dogs and 3 from other animals (2 Ethiopian baboons, Papio anubis, and one from a Chad cat, Felis catus). Full details of the samples and coverage achieved are shown in S1 Table. The low coverage we achieved was due to extensive contamination with bacterial and, in some cases, host DNA, so that the percentage of reads mapping to the reference varied from 0.07% up to 94.8% (S1 Fig; S2 Fig); even within the genome-wide coverage set of 33 samples as few as 17.9% of reads mapped in one case. For 9 D. medinensis samples, sequence libraries were generated from both adult female material and L1 larvae present in the same sample tubes (representing the offspring of that female). Coverage also varied across the genome, most strikingly for two of the longest 5 scaffolds, which were often lower in coverage than other large scaffolds, varying from around threequarters of the expected coverage to approximately similar coverage. We hypothesised that these scaffolds could represent all or part of a sex chromosome (X) in D. medinensis. The L1 larval samples showed coverage on these scaffolds around 75% that for other large scaffolds, as expected for an XY or XO sex determination system if the pool of larvae consisted of an approximately equal ratio of male and female larvae. More surprisingly, most of the DNA samples extracted from female worms showed similarly low relative coverage of these two scaffolds. We suggest that this is because much of the material extracted from these specimens is actually from L1 larvae remaining in the body of the female worm section. This seems plausible, as much of the female body comprises uterus containing several million larvae [8], and if the female body was largely degraded that explains the difficulty we had in extracting DNA from many samples.
To confirm this, we generated sequence data for juvenile worms harvested from a domestic ferret experimentally infected with D. medinensis [34]. These worms were 83 days old and prepatent (and thus comprised only or largely somatic tissue), but could be morphologically identified as male and female. Analysis of data from these worms (Fig 1A) confirmed that the scaffolds with low coverage showed this pattern specifically in male worms, while the female worm showed essentially even coverage across the largest scaffolds, including the putative X and likely autosomal scaffolds. Further evidence comes from a comparison with Onchocerca : these data suggest that this was already present in Dracunculus, as well as filarial nematodes as previously suggested [60].
We thus used the ratio of mean coverage between the 3 largest autosomal scaffolds and 2 longest X chromosome scaffolds as a measure of the proportion of genomic DNA in our sample derived from larval vs female tissue, under the assumption that larvae are an equal mixture of the two sexes ( Fig 1B). These data confirm that many samples contain substantial amounts of larval-derived DNA. One sample had a particularly high value for this statistic-for this sample, the mean coverage on scaffold X_DME_002 was inflated by the presence of a small region of extremely high read depth. X chromosome scaffolds were also excluded from subsequent population genetic analyses likely to be sensitive to the different dosage of these chromosomes (see Methods).

African D. medinensis is highly divergent from other mammalian Dracunculus species
While most sequencing reads from high-quality D. medinensis samples mapped against our reference assembly (median across samples of 68.48%), the reads from the other two Dracunculus species mapped less comprehensively against the D. medinensis reference (S2 Table), which given the mapping parameters used suggests that many regions of the genome are more than 5% divergent between species. This was confirmed by variant calls in those regions of good read mapping: even given the poorer mapping quality, around 2.9 million sites varied between the three species, suggesting genome-wide divergence of at least 3% of the 103.8 Mb genome, as the mapping difficulty meant this is likely a significant underestimate. While interpretation of absolute divergence levels is difficult, our lower-bound estimate of divergence between these species is much greater than between different species of Onchocerca (O. ochengi and O. volvulus, respectively), which are less than 1% divergent (Cotton et al., 2016) and is consistent with having hundreds of thousands of years of independent evolutionary history. A principal components analysis (PCA) of SNP variants between these samples confirmed that samples from each species cluster closely together, and that the different species are well separated (Fig 2A). The first two principal component axes shown here explain 79.9% and 16.5% of the variation, respectively. More than 3-fold more sites were called as varying between Dracunculus spp. than observed across all 33 of our genome-wide D. medinensis samples, where about 981,198 sites vary.

Geography rather than host species explains the pattern of variation within African D. medinensis
Clear geographic structure was observed in the pattern of genome-wide variation within D. medinensis. PCA (Fig 2B) of the variants show distinct clusters of parasites from Ghana, Mali and Côte d'Ivoire (referred to as the 'West African' cluster) the Ethiopia, South Sudan and one Chad sample (an 'East African' cluster), and a group of parasites from Chad. The first two principal components explain only 22% of the variation in these data (14% and 8% respectively). Additional principal components axes, up to the 8 th axis, together explain 54% of the variation but none of these axes partition the genetic variation between host species (S4 Fig). Phylogenetic analysis (Fig 2C) supported this pattern, with clear clades of West African and East African worms. The Chad sample visible as being part of the East African cluster in the PCA (2015-5ChD, a worm from a dog infection emerging in 2015) was part of the East African clade in the phylogenetic tree. A second Chad worm, from a human case in 2011, also appeared to be divergent from any other worm on our phylogeny. There was no apparent clustering by host species in Chad or Ethiopia, the two countries for which worms from multiple dog and human infections were included, and no clear clustering by year of worm emergence. In all three cases where both L1 larvae and adult sections from a single emerging worm yielded high-quality data, these two samples clustered very closely together.
Other approaches to investigate population structure support these conclusions. Bayesian clustering using MavericK strongly supported a model of only 2 populations (K = 2) for these data, with posterior probability of 1.0 for this value of K. The two populations divided worms collected in Chad from those collected elsewhere, with the exception of the Chad worm 2015-5ChD, which clustered with those from other countries, as in the PCA and phylogeny. Analysis with Structure suggested that K values of between 2 and 4 fitted the data well. In all cases these analyses clustered worms largely by geographical origin, and not by host. In the highest K values, most Chad worms had mixed ancestry between two Chad populations, and in no analysis did worms from different host species cluster together more than expected (S5 Fig). As expected, differences in allele frequencies between worms from dog infections and human cases within Chad are low (mean F st 0.01806, 99% confidence interval 0.0172-0.0189; median F st 0.0114, CI 0.0109-0.0118) as are absolute levels of genetic differentiation (mean d xy 0.00357; 99% CI 0.00352-0.00364; median dxy 0.00265; 99% CI 0.00260-0.00269) and remain consistently low across the genome (S6 Fig), confirming that there is no genetic difference between worms infecting dogs and humans.

Mitochondrial genome data confirms the geographic structure of the D. medinensis population
To allow us to study a wider range of samples, we called variants against the mitochondrial genome of D. medinensis for a total of 65 samples that had median coverage of at least 10× across this sequence. The additional samples included 14 dog and 18 human samples and included a single sample from Niger, slightly expanding the geographical range of samples included. Our variant calling approach identified 182 variable sites that could be reliably genotyped across those samples. The results of this analysis (Fig 3) are congruent with those from nuclear genome variation, with a strong signal of clustering by geographical origin. The worm collected in Niger joined a tight cluster that included all West African samples (Ghana, Mali and Côte d'Ivoire) with the exception of two worms from Mali collected in 2014: one was closely related to two worms from Chad cases in 2014 and 2015, and the second appears as an outgroup to a large clade of Chad worms. The two other exceptions to the clear geographical structure were a worm from a dog in Chad in 2015 which was most similar to one from a South Sudan case from 2014 within a small clade of Ethiopia and South Sudan worms, and one from a human case in Chad in 2014 that groups as part of a more diverse group of Ethiopia worms. As with the nuclear data, worms from human cases and infections in dogs and other animals often group together, with extremely similar mitochondrial haplotypes; there is no clear signature of clustering by host species.

D. medinensis from Chad are genetically diverse but are in decline
Phylogenetic analysis of both nuclear and mitochondrial data, and the nuclear genome PCA appear to show that worms in Chad are considerably more diverse than those from the other regions included in our analysis, although this could be due to our much smaller number of samples from other countries. To ensure an adequate sample size for comparison, we combined samples from countries with small numbers of samples into three regional groups, combining Ethiopia and South Sudan samples into an East African group, and samples from Mali, Ghana and Côte d'Ivoire into a West African group, while Chad was considered alone. Population genetic summary statistics ( Table 2) for these groups confirmed the pattern suggested by phylogenies and PCA: we see highest nucleotide diversity (π) in Chad, while the East African group is slightly, but significantly less diverse and the West Africa group has an orderof-magnitude lower nucleotide diversity. A second estimator of genetic diversity (Watterson's θ) shows lower values for the East African and Chad populations, but is higher in East Africa than Chad. For neutral variants in a population at equilibrium π and θ are expected to be equal, but Watterson's estimator is heavily influenced by rare alleles. The difference between π and θ that we observe indicates an excess of common variants in the East Africa and Chad populations over neutral expectations (see e.g. pp28-30 and pp288-289 of [61] for a full description). This is captured by high Tajima's D values, which are simply a normalised difference between π and θ. While a variety of population genetic processes can influence these statistics, high Tajima's D across the genome in these two regions is most likely indicative of a demographic process, such as a recent sharp decline in the worm populations [62].

Coalescent models suggest a large population has been continuously present in Chad
To confirm the population structure of D. medinensis in Africa, we constructed coalescent models based on 1kb loci spaced every 100kb-much longer than the distance over which linkage disequilibrium decays to approximately background levels-across the large scaffolds of the D. medinensis reference genome assembly. An initial analysis used samples grouped by country and human vs animal hosts, which aimed to assign these samples to the most likely set of discrete populations, resulted in the highest posterior probability supporting a single group of West African parasites (samples from Mali, Ghana and Côte d'Ivoire into a West African group (96.44% posterior probability), an East African group of Ethiopia and South Sudan samples (58.26%) and uniting human and animal infections from Chad into their own group (58.83% support). There was weaker support for a group uniting East Africa and Chad samples (38.5%). No alternative grouping of samples-including the solution of having a single population for each country-received more than 3% posterior probability support. Due to our small number of samples, we thus combined samples into these three regional groups; except that given our more extensive sample of worms from Chad we could also investigate whether Chad worms were best explained as two host-specific populations of worms from dog infections and human cases, or as a single group. Only two scenarios for the population structure received support in the posterior sample from the Markov chain Monte Carlo (MCMC) procedure (see Fig 4A). In both scenarios, worms from Chad were more closely related to those from the East African group than to the West African group. By far the strongest support (average 97.9% of posterior samples, over 3 replicate sets of 100 random loci) supported a single Chad population of worms that emerged from both human and animal hosts.
Using this highly supported population history, we used a second coalescence approach to estimate parameters describing the demographic history of the three regional present-day populations and the ancestral populations that gave rise to them (Fig 4B, S3 Table). Assuming a similar per-generation mutation rate to C. elegans and a generation time of 1 year, these analyses suggest that the Chad and East African populations have been separated for at least several thousand years, and that divergence from the West African population was about 5-fold older. The long-term effective population sizes of the Chad and East Africa populations reflect the higher nucleotide and phylogenetic diversity, with Chad being around 4-fold higher with an estimated 20 to 40 thousand breeding individuals.

Relatedness between D. medinensis isolates
Our population genetic evidence supports the idea that a single, diverse population of Guinea worms exists in Chad and is infecting both humans and animal species. More direct evidence of transmission between host species would be genetic relatedness between worms that emerged in different species. We employed a method to estimate pairwise relatedness between isolates based on SNP variants that is intended to be robust to population structure. Kinship is the probability that a random allele sampled from each of two individuals at a particular locus are identical by descent. The expected value in an outbred diploid population is 0.5 for monozygotic twins and 0.25 for full sibs or parent-offspring pairs. Excluding the L1 larval samples that have matching adult worm samples, the median kinship across all pairs of samples is low, but non-zero (0.0077; approximately that expected for third cousins), but worms from the same countries are much more highly related (e.g. median kinship of 0.087 for pairs within Chad). There is clear geographic structure to kinship in these data, as most worm samples from the same countries are related to at least one other sample from that country with kinship of close to 0.25 or higher (Fig 5), while only a single pair of closely related worms are from different countries. Notably, six pairs of worms have relatedness of higher than 0.45, close to the maximum possible value of 0.5 (Fig 5). These six pairs include all 3 sets of matched adult and larval samples in the whole-genome coverage set, providing support for the hypothesis that other pairs with a similarly high relatedness could represent first-degree relatedness (such as parent-offspring or full siblings). The inflated kinship values are explained by a high level of inbreeding within each country, with high median kinship even when these 6 pairs of related worms excluded in Chad (median remains 0.087) and more generally (median kinship for all remaining pairs 0.0038). The other three pairs of highrelatedness samples are all from different worms and from consecutive years, so in light of the semelparous biology of Guinea worm we interpret these as being parent-offspring pairs and these links thus represent putative direct transmission events between guinea worm infections.
The three pairs we identify are all of significant epidemiological interest. One appears to confirm cross-border transmission, proposing that a worm emerging from a dog in Chad in 2015 was caused by a human case detected in South Sudan in 2014. A second pair links a human case in 2014 in Chad with a dog infection in 2015, apparently confirming transmission is possible between human cases and dog infections, while a third links two Chad dog infections in 2014 and 2015. One important note of caution is that all three of these events would imply long range transmission of the infection, with 1812km, 378km and 432km separating the three pairs of infections above, respectively; the two transmission events within Chad also imply movement in different directions on the Chari river basin (S7 Fig).

Discussion
Our data shows that a single population of D. medinensis is responsible for both dog infections and human cases in Chad, with genetic structure in D. medinensis being apparently driven by geographic separation rather than definitive host species. Our data suggest that all Guinea worm infections in African mammals are caused by a single species, D. medinensis. The two other species of Dracunculus with mammalian hosts for which we have sequence data are highly divergent from any D. medinensis specimen we investigated. Genetic variation does exist within D. medinensis in Africa, but follows a spatial pattern, with populations from South Sudan and Ethiopia being more closely related to worms from Chad, and more divergent population of D. medinensis being present in West African countries prior to the recent elimination of the parasite from that region. The set of samples we have investigated from Chad and East Africa show a particularly high genetic diversity, but also signals of a recent decline in population size (positive Tajima's D), consistent with that expected to be caused by the ongoing work to eradicate Guinea worm in those areas.
We have identified three pairs of worms with high kinship that emerged in consecutive years, which we propose may represent transmission events. If so, our data confirm that crossborder transmission of Guinea worm infection can occur (from South Sudan to Chad in this case) and that infections can be passed from dog to dog and from humans to dogs. Unfortunately, we did not observe dog to human transmission directly in these data. This could be due to the small number of transmission events we could reconstruct, because these transmissions are rare, or a combination of these factors. Interpreting kinship in an inbred population is difficult, so these genealogical links must be considered only provisional, although we note that all three pairs of larval-adult samples for which we had good sequence coverage were correctly identified by this approach. While our data do not speak directly to the changes in lifecycle that might be driving transmission through dog hosts in Chad, both the long-range nature of these 3 transmission events and the fact that they imply different directions of movement along the Chari river basin would seem to lend some support to the idea that a paratenic or transport host could be involved. In particular, as one event is between two dog hosts, human movement may be less likely to be involved. It has been demonstrated experimentally that D. medinensis can pass through tadpoles as paratenic hosts and fish as transport hosts and that both routes can successfully infect ferrets [34,63]. Furthermore, a frog naturally infected with D. medinensis has been found in Chad [64]. Wildlife infections are also being reported, for example with a number of infections recently reported in Baboons in Ethiopia [11].
Our coalescent models of the Guinea worm population genetic data appear to confirm the geographical structure of these populations, and that worms from human cases and dog infections in Chad form a single population. The estimates of population divergence dates imply that the genetic structure we observe between different regions of Africa predates recent control efforts and likely represents historical population structure. The oldest subdivision we observe, at around 20,000 years ago, coincides with the last glacial maximum when Africa was likely to be extremely arid, even compared to present-day conditions [65]. Similarly the more recent divergence between East African and Chad populations at around 4,000 years ago is during the drying-out of the Sahara at the end of the African humid period [65], which was probably accompanied by a major collapse in human habitation of much of this region [66].
Although our qualitative results appear robust, there are more caveats with the specific quantitative results, due both to the nature of our samples as mixtures of maternal and offspring genetic material and to our limited knowledge of the basic transmission genetics of Guinea worm. In particular, these estimates depend on assumptions about the mutation rate and generation time of D. medinensis. It is generally accepted that Guinea worm infections take approximately 10-14 months to reach patency in human infections [1,8]. Less certain is whether larvae can remain viable in copepods or within a paratenic host for extended periods of time. No direct measurement of the mutation rate is available for D. medinensis or any related parasitic nematode, and while mutation rates are reasonably consistent across eukaryotes with similar genome sizes [67], variation of several fold from the value we have assumed would not be very surprising. We also note that the relative values of divergence time and population size estimates will remain unchanged under different mutation rates.
Our quantitative model suggests that all three present-day populations have large average effective population size (N e ) (of the order of thousands to low tens of thousands) over thousands of years. The modelling approach we have used is not able to detect more recent changes in these populations, and interpreting these estimates is challenging, as genetic effective population sizes are influenced by many factors such as breeding systems, demography and selection. In particular, historical fluctuations in population size have a strong influence on N e , approximated by the geometric mean of the population sizes across generations (pp225-226 of [61]). While we have not exhaustively investigated possible demographic scenarios that might be consistent with these genomic data, the high N e in Chad appears to exclude the possibility that the population of worms in Chad either disappeared or was reduced to a very small bottleneck during the decade without reported human cases. Our N e is difficult to reconcile with a population size during this time much below hundreds of worms, but we cannot exclude the possibility of a much shorter bottleneck during the period without cases, or a very much higher ancestral population or some combination of these factors. In the absence of Chad samples prior to 2000 or more extensive sampling from neighbouring countries, we cannot exclude the possibility that the Chad worms we analyse-which all emerged in Chad following the 10-year gap in reported cases-migrated from elsewhere. However, we see few Chad worms that are closely related to worms from any of the neighbouring countries for which we had access to samples, so this possibility is purely speculation, and it would seem that quite large-scale influx would be required to explain the level of diversity we see in Chad by migration. Without historical samples, it also remains uncertain to what extent the population structure we see in African Guinea worm today would have been different 30 years ago, when the census population size of the worms was more than three orders of magnitude higher and worms were still widespread in Africa. Our coalescence model suggests that at least Chad and the East African populations we have sampled were still largely distinct at this time, but we have not been able to obtain worm samples suitable for molecular analysis from much of the ancestral range of D. medinensis.
A limitation is the nature of samples available to us, and in particular the very small quantity of genuinely adult material present in specimens despite these being very large for a parasitic nematode. Enrichment methods targeting parasite over host DNA cannot enrich for adult versus larval DNA and it is operationally difficult to alter the way that material is collected in the field in the context of the eradication campaign. The nature of our existing samples as mixtures of many diploid individuals makes some forms of analysis challenging, and is likely to have had an effect on many of our quantitative results-for example in the population genetic summary statistics and coalescent models-in a way that is hard to estimate without more data from genuinely 'individual worms'-i.e without larval material present. Some forms of analysis appear impossible with our current data: for example, many of the most sensitive signatures of inbreeding we expect to see appearing as the population size declines rely on changes in the level and distribution of homozygous and heterozygous sites [68]. These are not readily apparent in analysis of the data presented here, presumably because of the mixture of genotypes present in each sample. These may be particularly complex if, like many other parasitic nematodes D. medinensis is polyandrous [69][70][71]. We are currently generating sequence data from individual L1 larvae which should let us look for these signals, dissect the contribution of different males to a brood, and infer recombination and mutation rates in D. medinensis, avoiding the need to rely on estimates from C. elegans, which is both very distantly related to D. medinensis and has, of course, a very different life history. We have recently demonstrated the feasibility of this approach in a different parasitic nematode system [71]. Efforts to extract useful genome-wide information from the low-quality D. medinensis samples not analysed here are ongoing, with results from a sequence capture approach showing some promise. Our results are consistent with the findings of previously published targeted genotyping with mitochondrial and microsatellite markers, which also produced additional insights into the population genetics of D. medinensis from a much more extensive set of parasite samples [33].
Finally, the data we present here, together with other data from Guinea worm populations [15,25,33] preserve something of the genetics of D. medinensis in the final foci of infection. The genome sequence should help preserve some of the biology of this important human pathogen following the extinction of D. medinensis with eradication, but more importantly we expect these data to be crucial in the final steps of the eradication process. By defining much of the currently existing diversity of Guinea worm, these data will act as a reference to determine whether future cases for which the source of infection is unclear represent continuing transmission from these foci or previously unidentified worm populations. At the time the global campaign to eradicate Guinea worm began, the strong expert consensus was that Guinea worm was anthroponotic, and transmitted exclusively through ingestion of contaminated drinking water [9]. The emergence of large numbers of dog infections in Chad could not have been predicted, and the eradication campaign has uncovered other unexpected aspects of Guinea worm biology or epidemiology, such as the possibility of paratenic or transport hosts being involved in the lifecycle [34,64]. For example, a recent surprise is the emergence of Guinea worm infections in Angola, which has no previous history of Guinea worm disease [72]. The long incubation time may make it particularly easy for Guinea worm to take advantage of human movement to spread, leading to these kinds of sporadic cases. It is also likely that incorrect reports of emerging worms will appear post-eradication [73]: given the paucity of morphological features defining D. medinensis, molecular tools will be key in providing certainty about the pathogen involved, and thus ultimately in allowing the WHO to declare that the world is free of Guinea worm.
Our work has clear implications for other parasite systems as we move into an era intended to see enhanced control efforts, regional elimination and even eradication for several neglected tropical disease parasites [74]. The Guinea worm eradication program has in many ways set the scene for these efforts in other parasites. The small size of the remaining Guinea worm populations means it should be particularly feasible to employ whole-genome approaches to track changes in Guinea worm populations during the final stages of eradication [75], but the particular difficulties in generating high-quality sequence data and in interpreting these data for D. medinensis highlight the fact that every pathogen system is unique, and genetic surveillance will likely face unique challenges in each case. Whether the particular challenges of an apparently emerging zoonotic transmission cycle in the endgame of eradication are unique remains to be seen as programs for other pathogens advance. It seems clear that the endgame of elimination has different requirements to much of the process of reducing disease burden [76] and the strong selection pressure on pathogen populations to evade control measures near eradication will result in evolutionary responses. The ecological changes apparently occurring in Guinea worm may be the equivalent of the evolution of drug resistance in chemotherapy-lead campaigns [77].
Our results are entirely consistent with a single population of D. medinensis infecting both dogs and humans in Chad. We show genetic variation within D. medinensis is largely geographical, with significant differentiation between populations present in Chad, and those present in countries in East Africa (South Sudan and Ethiopia) and West Africa (Côte d'Ivoire, Ghana, Mali and Niger). Worms that were genetically very similar were recovered from human cases and animal infections in both Chad and Ethiopia. We find a particularly diverse population of worms in Chad and East Africa that appears to be shrinking, presumably due to the eradication program. Coalescent models confirm that a single population of worms infects both dogs and humans in Chad, and the long-term effective population size suggests that a significant Guinea worm population persisted in Chad during the ten-year period prior to 2010 during which no cases were reported. Kinship analysis shows that the Guinea worm population is highly inbred, as we might expect in a small and shrinking population, and suggests direct relatedness between 3 pairs of worms, including two recovered from human cases in one year and recovered from dogs in a subsequent season. In the context of epidemiological data and previous genetic data, this suggests that dog infections are likely to be central to maintaining Guinea worm transmission in Chad. Continued efforts to understand the biology of transmission in Chad, as well as sustained surveillance among both human and non-human hosts, will help ensure the continuing success of the eradication program.

S1 Fig. Proportion of sequencing reads mapping to the reference genome assembly for
Dracunculus medinensis samples. Each bar indicates the proportion of sequencing reads from each sample that mapped against the reference genome assembly. The density of each bar indicates whether whole-genome data is included in our analysis, only mitochondrial genome data or whether insufficient data was available for that sample.