Advertisement
  • Loading metrics

Quantifying connectivity between local Plasmodium falciparum malaria parasite populations using identity by descent

  • Aimee R. Taylor ,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft

    ataylor@hsph.harvard.edu

    Affiliations Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America, Infectious Disease and Microbiome Program, Broad Institute, Cambridge, Massachusetts, United States of America

    ORCID http://orcid.org/0000-0002-2337-8992

  • Stephen F. Schaffner,

    Roles Data curation, Formal analysis, Investigation, Methodology, Software, Writing – review & editing

    Affiliation Infectious Disease and Microbiome Program, Broad Institute, Cambridge, Massachusetts, United States of America

    ORCID http://orcid.org/0000-0001-6699-3568

  • Gustavo C. Cerqueira,

    Roles Data curation, Formal analysis, Investigation

    Affiliation Infectious Disease and Microbiome Program, Broad Institute, Cambridge, Massachusetts, United States of America

    ORCID http://orcid.org/0000-0002-7692-3886

  • Standwell C. Nkhoma,

    Roles Data curation

    Affiliation Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, United States of America

    ORCID http://orcid.org/0000-0002-7137-7935

  • Timothy J. C. Anderson,

    Roles Funding acquisition, Supervision, Writing – review & editing

    Affiliation Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, United States of America

  • Kanlaya Sriprawat,

    Roles Data curation, Investigation

    Affiliation Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand

  • Aung Pyae Phyo,

    Roles Data curation, Investigation

    Affiliation Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand

    ORCID http://orcid.org/0000-0002-0383-9624

  • François Nosten,

    Roles Investigation, Writing – review & editing

    Affiliations Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahidol University, Mae Sot, Thailand, Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine Research building, University of Oxford, Old Road campus, Oxford, United Kingdom

    ORCID http://orcid.org/0000-0002-7951-0745

  • Daniel E. Neafsey ,

    Roles Conceptualization, Funding acquisition, Supervision, Writing – review & editing

    ‡ DEN and COB are joint senior authors on this work.

    Affiliations Infectious Disease and Microbiome Program, Broad Institute, Cambridge, Massachusetts, United States of America, Department of Immunology and Infectious Disease, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America

  • Caroline O. Buckee

    Roles Conceptualization, Funding acquisition, Supervision, Writing – review & editing

    ‡ DEN and COB are joint senior authors on this work.

    Affiliation Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America

Quantifying connectivity between local Plasmodium falciparum malaria parasite populations using identity by descent

  • Aimee R. Taylor, 
  • Stephen F. Schaffner, 
  • Gustavo C. Cerqueira, 
  • Standwell C. Nkhoma, 
  • Timothy J. C. Anderson, 
  • Kanlaya Sriprawat, 
  • Aung Pyae Phyo, 
  • François Nosten, 
  • Daniel E. Neafsey, 
  • Caroline O. Buckee
PLOS
x

Abstract

With the rapidly increasing abundance and accessibility of genomic data, there is a growing interest in using population genetic approaches to characterize fine-scale dispersal of organisms, providing insight into biological processes across a broad range of fields including ecology, evolution and epidemiology. For sexually recombining haploid organisms such as the human malaria parasite P. falciparum, however, there have been no systematic assessments of the type of data and methods required to resolve fine scale connectivity. This analytical gap hinders the use of genomics for understanding local transmission patterns, a crucial goal for policy makers charged with eliminating this important human pathogen. Here we use data collected from four clinics with a catchment area spanning approximately 120 km of the Thai-Myanmar border to compare the ability of divergence (FST) and relatedness based on identity by descent (IBD) to resolve spatial connectivity between malaria parasites collected from proximal clinics. We found no relationship between inter-clinic distance and FST, likely due to sampling of highly related parasites within clinics, but a significant decline in IBD-based relatedness with increasing inter-clinic distance. This association was contingent upon the data set type and size. We estimated that approximately 147 single-infection whole genome sequenced parasite samples or 222 single-infection parasite samples genotyped at 93 single nucleotide polymorphisms (SNPs) were sufficient to recover a robust spatial trend estimate at this scale. In summary, surveillance efforts cannot rely on classical measures of genetic divergence to measure P. falciparum transmission on a local scale. Given adequate sampling, IBD-based relatedness provides a useful alternative, and robust trends can be obtained from parasite samples genotyped at approximately 100 SNPs.

Author summary

The spatiotemporal dispersal of organisms can inform efforts to conserve endangered species, to contain the spread of drug resistance, and to eliminate disease. As genomic data become increasingly more affordable and accessible via public depositories, the demand for methods capable of extracting fine-scale population structure from genomic data grows. However, to the best of our knowledge, there are no guidelines regarding the type of data and methods required to resolve local spatial trends over sexually recombining haploid organisms, such as the malaria parasite. The approach we present here compares relatedness based on identity by descent, which accounts for recombination while distinguishing genetic identity due to inheritance from genetic identity due to chance, to a classic population genetic measure of divergence, using data from sexually recombining malaria parasites. Using identity by descent, we uncover a significant decrease in highly related malaria parasites collected from proximal clinics on the Thai-Myanmar border, a region where human mobility is high. Our results demonstrate the power of analyses based on identity by descent to detect recent and local trends. Similar analyses could be used to inform the molecular epidemiology of other sexually recombining organisms.

Introduction

Molecular tools show great promise for helping us understand and contain the spatial spread of pathogens, and the application of population genetic approaches to monitoring and controlling infectious diseases is becoming routine. Routes and volumes of non-sexually recombining pathogens, such as the influenza and Ebola viruses, can be tracked using genomic surveillance [1], enabling time-calibrated phylogenies [2], which can be spatially projected [3], and used to jointly estimate transmission chains [47]. For sexually recombining pathogens such as the human malaria parasite Plasmodium falciparum, however, these methods are not readily applicable, especially on local spatial scales relevant for control and elimination strategies [8]. Furthermore, P. falciparum evolves more slowly than viral pathogens, and non-sampled asymptomatic infections, complex within-host dynamics, and extensive within-host diversity (multiple-genotype infections) obscure inference [5,7,9]. As a result, despite increasing efforts to collect genomic data for epidemiological surveillance of malaria on local spatial scales, methods for making sense of them, and guidelines for study design, are lacking.

On large or continental scales, or where recombination is limited, P. falciparum genetic data have been usefully employed to identify spatial relationships between parasite populations using standard approaches [10]. For example, microsatellite analyses have been used to infer the origins of drug resistant genotypes [1113] or outbreaks [14], to monitor population dynamics [15,16] and to explore population structure in West Africa [17]; single nucleotide polymorphisms (SNPs) in non-recombining mitochondrion and apicoplast genomes have been used to infer the evolutionary trajectory of the parasite [18]; and whole genome data have been used to interrogate population structure across continents and within Southeast Asia, including Cambodia, the epicenter of drug resistant malaria [1922]. Explicitly spatial methods applied to these data include tests and scans of spatial autocorrelation [23,24], which are typically suited to highly resolved geo-referenced data. Non-spatial methods include principal component and phylogenetic analyses, as well as many model-based Bayesian methods, including STRUCTURE [25], ChomoPainter and fineSTRUCTURE [26].

Measuring connectivity among proximal populations of P. falciparum is more challenging, however [27,28]. Classical measures include Wright’s fixation index (FST) [29,30], a measure of divergence between population pairs, which has been used to recover large-scale population structure in malaria [17,21], but has been shown to be less reliable at smaller spatial scales [20]. More recent studies have investigated relatedness using identity by descent (IBD) and identity by state (IBS), with some promise for smaller spatial scales. Henden and colleagues, for example, constructed networks of related parasites within and across countries using IBD inferred under a probabilistic model that accounts for recombination [31]. On a micro-geographic scale, Omedo and colleagues reported trends in relatedness using IBS [32], which approximates IBD [33], and has been used elsewhere to infer relatedness between malaria parasites [34]. These studies emphasize the need for tools on local scales that can account for transmission between local hotspots, particularly in areas considering or implementing elimination programs, and suggest that IBD-based measures are promising approaches.

IBD is a fundamental concept in population genetics, relating ancestry to variability due to recombination [35]. FST can also be interpreted as a measure of IBD stemming from remote inbreeding [36], but unlike IBD, FST relies on allelic variation providing a traceable history of co-ancestry. Since recombination works on shorter times scales than mutation and genetic drift, estimates of IBD provide insight into more recent demographic events than FST [35], and IBD-based analyses have been used extensively in human genetics (e.g. to impute genotypes, to map disease loci, and to infer demographic histories [35,37]). Increasingly, it is thought that much of the useful signal in the malaria genome lies in the pattern of recombination, rather than variation at any one locus, and IBD is gaining popularity in malaria research and policy (e.g. to monitor disease transmission [38], relatedness within multiple-genotype infections [39], to aid surveillance of antimalarial resistance [40], and to detect signals of selection [31]).

To explore the utility of IBD for estimating connectivity between very local parasite populations, we analyzed one of the largest joint data sets of both genotyping and sequencing data, collected between 2001–2014 from four Shoklo Malaria Research Unit (SMRU) clinics on the Thai-Myanmar border (Fig 1). The border is characterized by mobile migrant populations, villagers, and refugees from Myanmar, and is an area of low and declining malaria transmission [4143]. This region is therefore representative of many near-elimination settings where remaining pockets of transmission are often found in border areas, and where human mobility is potentially difficult to measure for political or logistic reasons. Here, we focus specifically on measures that capture connectivity: FST between population pairs and relatedness between parasite sample pairs. We show that, unlike FST, IBD-based relatedness decreases significantly over inter-clinic distance. Importantly, where a tradeoff must be made between sequencing effort and sample sizes, we show that robust spatial trends can be recovered using 93-SNP barcodes, providing a cheap and simple approach to implementing these analyses in the field.

thumbnail
Fig 1. SMRU clinics on the Thai-Myanmar border.

The border between Myanmar and Thailand (also the Moei river) is depicted in blue. Grey edges are proportional to inter-clinic proportions of highly related barcode parasite sample pairs (pairs with ). Latitudes and longitudes, respectively, are: 17.128107, 98.382152 (Maela); 16.83014, 98.53737 (Wang Pha); 16.5781479, 98.5846176 (Mae Kon Ken); 16.3258896, 98.670166 (Mawker Thai).

https://doi.org/10.1371/journal.pgen.1007065.g001

Results

The genomic data included 1173 single-infection parasite samples genotyped at 93 SNPs [44,45], henceforth referred to as barcode data, and 178 single-infection parasite samples for which whole genome sequences (WGS) were available, henceforth referred to as WGS data [40]. The combined SMRU catchment area spans approximately 120 km of the border, with a population that includes villagers, mobile migrant workers, and refugees entering Thailand from Myanmar [41,44,46]. Previous analyses of these data have shown that despite a decrease in P. falciparum multiple-genotype infections over the last decade, there has been no evidence of a change in genetic diversity, geographic divergence or effective population size (see S1 Table), likely due to extensive human mobility [45].

Divergence based on FST does not correlate with inter-clinic distance

First we explored spatial structure between parasites collected from different clinics on the Thai-Myanmar border using FST, a standard measure of divergence between populations, and one that has been applied frequently in the context of malaria. FST estimates were calculated using Hudson’s estimator [4749], which is recommended for small and unequal sample sizes [49,50]. Estimates based on barcode and WGS data across all available years were statistically different from zero with p-values < 0.001 (Tables A and B of S1 Text). Those based on barcode data were low (Fig 2), indicative of migration between populations. Those based on WGS data were an order of magnitude larger (Fig 3), but there was no evidence of spatial trends between clinics based on either barcode or WGS data (Table 1). We observed a positive correlation between FST estimates and within-clinic relatedness based on IBD (Fig G in S1 Text), and this appears to explain not only the difference in estimates using barcode versus WGS, but also the lack of spatial trend (S1 Text).

thumbnail
Fig 2. FST estimates based on 2001–2014 barcode data plotted with respect to inter-clinic distance.

Annotations refer to site comparisons using abbreviated clinic names (MLA for Maela, 212 parasite samples; WPA for Wang Pha, 457 parasite samples; MKK for Mae Kon Ken, 116 parasite samples; and MKT for Mawker Thai, 388 parasite samples). All parasite samples were single-infection. For a clinic pair, A and B say, the FST estimate was based on nA + nB parasite samples, where n denotes the number of parasite samples per clinic. Error bars represent 95% confidence intervals based on bootstrapping over SNPs.

https://doi.org/10.1371/journal.pgen.1007065.g002

thumbnail
Fig 3. FST estimates based on 2001–2014 WGS data plotted with respect to inter-clinic distance.

Annotations refer to site comparisons using abbreviated clinic names (MLA for Maela, 55 parasite samples; WPA for Wang Pha, 103 parasite samples; MKK for Mae Kon Ken, 4 parasite samples; and MKT for Mawker Thai, 16 parasite samples). All parasite samples were single-infection. For clinic pair, A and B say, the FST estimate was based on nA + nB parasite samples, where n denotes the number of parasite samples per clinic. Error bars represent 95% confidence intervals, based on boostrapping over SNPs.

https://doi.org/10.1371/journal.pgen.1007065.g003

Relatedness based on IBD decreases with inter-clinic distance

We define relatedness using the expected fraction IBD, , a probabilistic measure of the fraction of the genome that a pair of parasites inherited from a recent common ancestor [52]. For a given pair of clinics (e.g. Maela and Wang Pha) we obtained a single FST estimate versus many (nMaela × nWang Pha, where n denotes the number of parasite samples per clinic). For comparison with FST estimates, we plotted proportions of highly related parasite sample pairs (those with ) within and across clinics (Figs 4 and 5 and Fig O and P of S2 Text). However, to leverage the wealth of information across the many parasite sample pairs, spatial trends were estimated using individual . Specifically, we regressed highly related parasite sample pair labels (equal to one if and zero otherwise) onto spatial and temporal predictors within a logistic regression framework (see Materials and methods).

thumbnail
Fig 4. Logit-transformed proportions of highly related 2001–2014 barcode parasite sample pairs with respect to inter-clinic distance.

Annotations refer to site comparisons using abbreviated clinic names (MLA for Maela, 212 parasite samples; WPA for Wang Pha, 457 parasite samples; MKK for Mae Kon Ken, 116 parasite samples; and MKT for Mawker Thai, 388 parasite samples). All parasite samples were single-infection. For inter-clinic pair A and B say, the proportion was based on nA × nB parasite sample pairs, where n denotes the number of parasite samples per clinic. Error bars represent 95% confidence intervals based on bootstrapping over highly related parasite sample pair labels (equal to 1 if and 0 otherwise), and are therefore zero where there are no .

https://doi.org/10.1371/journal.pgen.1007065.g004

thumbnail
Fig 5. Logit-transformed proportions of highly related 2001–2014 WGS parasite sample pairs plotted with respect to inter-clinic distance.

Annotations refer to site comparisons using abbreviated clinic names (MLA for Maela, 55 parasite samples; WPA for Wang Pha, 103 parasite samples; MKK for Mae Kon Ken, 4 parasite samples; and MKT for Mawker Thai, 16 parasite samples). All parasite samples were single-infection. For inter-clinic pair A and B say, the proportion was based on nA × nB parasite sample pairs, where n denotes the number of parasite samples per clinic. Error bars represent 95% confidence intervals based on bootstrapping over highly related parasite sample pair labels (equal to 1 if and 0 otherwise), and are therefore zero where there are no .

https://doi.org/10.1371/journal.pgen.1007065.g005

Unlike FST, IBD-based relatedness decreased with inter-clinic distance (Figs 1, 4 and 5), even after adjusting for heterogeneous temporal sampling within the regression model (Tables 2 and 3). Considering barcode data collected from 2001–2010, highly related parasite sample pairs were negatively associated with distance (km) both before (βunadjusted ΔDistance = -0.026, p-value = 0.002) and after (βadjusted ΔDistance = -0.023, p-value = 0.002) adjusting for temporal differences between parasite sample collection dates (Table 2). The spatial trend was of the same order as the temporal trend (βadjusted ΔWeeks = -0.021, p-value = 0.002). The impact of distance decreased with time, but the interaction was very small (βadjusted ΔWeeks × ΔDistance = 0.0001, p-value = 0.002). Importantly, the spatial and temporal trends were also negative upon exclusion of repeat barcodes within clinics (Table C of S2 Text).

WGS data showed similarly negative spatial trends (Fig 5 and Table 3). Since contemporaneous WGS data from all four clinics was only available in 2014, we excluded prior years in the regression model, and found significant negative trends both before (βunadjusted ΔDistance = -0.023, p-value = 0.002) and after (βadjusted ΔDistance = -0.026, p-value = 0.002) adjustment for temporal differences between parasite sample collection dates. The trend based on data across all years was also significant, but only after adjustment for an overall increase in highly related parasite sample pairs in 2014 (βadjusted 2014 ΔDistance = -0.020, p-value = 0.035).

Independent evidence of spatial trends present in the data

IBD-based analyses recovered spatial and temporal trends where FST did not. Together with evidence of phenotypic differences in parasites across SMRU clinics [44], our results suggest IBD-based analyses are sensitive to local spatial genetic structure. To further validate these findings, we used ChromoPainter [26] to estimate average numbers of segments donated and received between parasites from different clinics, hereafter referred to as clinic-averaged co-ancestries (S3 Text). Like IBD-based analyses, ChromoPainter accounts for recombination [26], and it has been used to study malaria parasite populations in Cambodia [20].

Clinic-averaged WGS co-ancestry estimates showed a similar pattern as proportions of highly related parasite pairs, declining with inter-clinic distance (Fig B in S3 Text), and thereby supporting the spatial genetic structure observed in our IBD-based analyses.

Clinic-averaged barcode co-ancestry estimates declined with distance only when considering both within-clinic and inter-clinic estimates (Fig D in S3 Text). Plots of pairwise estimates suggest that differences between inter-clinic averaged barcode co-ancestry estimates were unresolved because the range of estimates was narrow compared with (Fig E in S3 Text), despite positive correlation with (Fig F in S3 Text). This is expected, however, since ChromoPainter is not intended for sparse barcode data.

Calculating sample sizes and sequencing effort required to recover spatial trends

To assess the sample sizes required to measure connectivity between proximal sites using IBD, we re-estimated trends using random subsets of the data across all years under temporally adjusted logistic regression models. Subsets ranged in size from 50 to 1171 barcode parasite samples, and from 50 to 176 WGS parasite samples. We also considered the impact of sequencing fewer SNPs, since many studies use a 24-SNP barcode (e.g. [38]). We use an ideal set of 24 SNPs with high minor allele frequency (Fig A in S4 Text), following the experimental design of a molecular barcode [53]. Our 24-SNP barcode results therefore represent a best-case scenario; “true” barcodes, which are constructed a priori, will almost surely deviate from this ideal due to spatiotemporal variations in minor allele frequencies.

Fig 6 shows the relationship between sample size and significant negative spatial trends observed for different sequencing approaches. We estimated that approximately 147 WGS parasite samples, 222 93-SNP barcode parasite samples, and 344 24-SNP barcode parasite samples were sufficient to recover significant negative trends 95% of the time (Table 4). It is important to note, however, that spatial trend estimates based on only 24-SNPs converged to -0.004, whereas equivalent estimates based on 93 or more SNPs converged to -0.023 and -0.020, respectively (Tables 2 and 3). The comparatively low spatial trend estimate based on only 24 SNPs was not unexpected. Previous studies have shown barcode size limits resolution of low genome wide identity [38], while simulated data show rapidly decreasing accuracy in with fewer SNPs [52]. Moreover, due to the bounded nature of , very wide error at low IBD is liable to result in a bias towards 24-SNP that exceed genome-wide and the 0.5 threshold for highly related parasite sample pairs (Fig B in S4 Text), leading to poor resolution of spatiotemporal trends. We therefore do not recommend the use of 24 SNP barcodes for IBD-based analyses.

thumbnail
Fig 6. Proportions of significant negative spatial trend estimates with respect to data subset sample size.

Spatial trend estimates were based on regression of highly related parasite sample pair labels (equal to 1 if and 0 otherwise) onto distance, within temporally adjusted models (βadjusted ΔDistance (km) for barcode data, and βadjusted 2014 ΔDistance (km) for WGS data).

https://doi.org/10.1371/journal.pgen.1007065.g006

thumbnail
Table 4. Models fit to significant negative proportions of spatial trends.

https://doi.org/10.1371/journal.pgen.1007065.t004

The 93 SNP barcode provided a surprisingly robust estimate of geographic structuring, however. Like chromosome painting methods [26], IBD-based analyses capture information based on dependence between SNPs [35]. Although inter-SNP distances are large on the 93-SNP barcode, barcode SNPs are mostly dependent under hmmIBD because the recombination rate is low (Fig C in S4 Text).

Discussion

Despite the potential utility of genetic data for resolving fine-scale differences in connectivity among proximal populations in P. falciparum and other species, there are minimal guidelines about how to quantify gene flow between proximal locations. Here we show that IBD-based relatedness provides a more robust measure of local spatial structure than FST. Moreover, where a trade-off must be made between sample size and sequencing effort, 93 SNPs were sufficient to recover robust spatial trends using relatively few additional parasite samples compared with WGS. This is an important practical insight given the wide availability of historical barcode data, and the relative cost-effectiveness and ease of generating barcode data compared to whole genome sequences. We therefore propose that IBD-based relatedness is not only a useful metric of gene flow between proximal populations, but also that it can be efficiently estimated using 93-SNP barcodes, which are inexpensive and can be generated from parasite DNA extracted from dried blood spots on filter papers.

FST estimates were strongly affected by clinics characterized by highly related parasites, and this association appeared to overwhelm spatial trends. With a view to monitoring malaria parasite populations, we consider this apparent sensitivity of FST potentially problematic for its routine use. This is especially true of regions of declining transmission, where fewer infections go together with the emergence of increasingly clonal hotspots. Although IBD-based analyses were not completely impervious to high within-clinic relatedness, they retain their ability to recover spatial trends. Furthermore, since IBD-based analyses allow explicit estimation of within-clinic relatedness, its impact on relatedness across populations can be assessed.

We estimated that approximately 147 single-infection WGS parasite samples, or 222 single-infection 93-SNP barcode parasite samples, were required to recover robust spatial trends. In very low transmissions settings, such as those where the number of cases has dropped below the World Health Organization’s pre-elimination threshold of 1 infection per 1000 persons per year, the number of parasite samples required to estimate spatial trends would in many cases exceed the number of cases. Here, could still be used to assess relatedness between individual cases and suspected source populations, which may be critical given the World Health Organization’s definition of elimination, which requires no local cases for 3 years, but allows for imported ones. Although our analyses suggest 93-SNP barcodes are sufficient to recover robust spatial trends at the population level, we do not recommend using 93-SNP barcodes for standalone analyses of individual parasite sample pairs due to large expected error in that application ([52] and Fig B in S4 Text). In high transmission settings overall relatedness will likely decrease due to increased recombination. To account for low population-level relatedness, one could genotype more SNPs and decrease the threshold for highly related sample pairs. Ideally one would also use a model capable of estimating IBD from complex samples of multiple-genotype infections, since these are liable to increase in abundance with transmission [54]. Henden et al. recently proposed an IBD model that can support parasite samples with one or two parasite strains [31], although it doesn’t currently output directly. Models capable of supporting parasite samples with three or more strains are lacking.

Although this combined set of barcode and WGS data is one of the largest of its kind, the sampling design was not intended for the question at hand. However, despite uneven sampling in time and space we find evidence of spatial genetic structure on the Thai-Myanmar border, which is consistent with earlier reports of phenotypic differences between parasites from different clinics [44]. Evidence of spatial structure is also supported by results from an independent method ChromoPainter [26]. Akin to IBD-based analyses, ChromoPainter leverages the wealth of haplotypic information in WGS data, but struggles to resolve variation in 93-SNP barcode data, for which it was never intended. Regardless of the method used, evidence of spatial structure calls for a better understanding of the drivers that sustain spatial trends. Epidemiological models parameterized by human mobility data have been used to estimate the spatial spread of pathogens in some cases [5558], but data on human migration are difficult to obtain, particularly in sparsely populated areas and in regions near international borders, where there are political sensitivities around measuring migration.

Analyses of spatial genetic structure are common beyond malaria (e.g. studies of pollen dispersal [5962]). Measures used are largely variants of IBS and therefore sensitive to the marker system and reference population [62]. Unlike IBS-based methods, IBD-based methods explicitly account for the marker scheme by conditioning on allele frequencies. They could thus prove useful as IBS surrogates in spatial studies of other recombining organisms [6366].

In summary, we propose that IBD-based relatedness will prove useful in the malaria field and in other infectious disease systems to compare data collected from local sites, from areas with more complex topologies, and where data are available, to compare human and parasite movement. IBD-based relatedness could also prove useful beyond epidemiological applications to complement spatial analyses of other sexually recombining organisms.

Materials and methods

Data

The barcode data were generated as part of a longitudinal trial of artemisinin resistance and its genetic heritability [44], then later reanalyzed to identify correlates of declining malaria transmission [45]. Full details of sample collection and laboratory methods can be found in [44] and [45]. Briefly, 1173 filter paper blood spots were collected between 2001 and 2010 from hyper-parasitaemic patients (> 4% infected red blood cells) with uncomplicated P. falciparum malaria presenting at four SMRU clinics on the Thai-Myanmar border (Fig 1). DNA extracted using a two-step protocol was successfully genotyped at 93 SNPs using the Illumina GoldenGate platform. The 93 SNPs were distributed across the P. falciparum genome (Fig A in S4 Text), but not in regions likely under strong selection (supporting information of [45]). In total, 558 parasite samples were considered multiple-infection (containing more than one P. falciparum genotype), based on 6 or more heteroallelic genotyping outcomes [45], while 1173 were considered single-infection. Analyses in this study were based on single-infection parasite samples only (S2 Table). The WGS data were generated from 178 parasite samples collected between 2001–2014 from the same four clinics (S3 Table). Full details of sample collection and sequencing workflow can be found in [40]. Briefly, parasite samples collected prior to 2010 were derived from a single-infection subset of the aforementioned dried blood spots, selected such that no two showed identical 93-SNP genotypes, and sequenced following hybrid selection on an Illumina HiSeq 2500 platform. Parasite samples collected from 2010 onwards were collected as venous blood and directly sequenced on an Illumina HiSeq 2500 platform following leukocyte depletion. As described by Cerqueira and collegues [40], reads were aligned to the P. falciparum 3D7 v3 reference genome, genotypes called and sites filtered. Those removed included heterozygous sites, indels, sites with QUAL < 60, GQ < 30, polymorphic sites located in pericentromeric, subtelomeric and hypervariable regions, and sites occurring in genes belonging to large antigenic gene families. In addition to the sites listed above, we removed 121 sites with reference or alternative allele assignments indicating potential indels, monomorphic sites, sites lacking genotype calls in 20% or more of the parasite samples, and mitochondrial and apicoplast sites, leaving a total of 34911 polymorphic biallelic SNPs.

Data analyses

Aside from IBD estimates generated using hmmIBD (v2.0.0) [52], and co-ancestry estimates generated by the ChromoPainter package within fineSTRUCTURE version 2 [26], all data analyses were performed in R [67]. P-values less than 0.05 were considered significant and were calculated by permutation. They were exact if the number of possible permutations was less than 1000, otherwise they were Monte Carlo estimates [68]. Monte Carlo p-values can overestimate true p-values [69]; however, overestimation is small when the number of randomly sampled permutations, n, is large (at least 99 [68]). We use n = 100 when assessing the sensitivity of spatial trends to sample size (see below) and in sensitivity tests (Fig Q in S2 Text), otherwise n = 1000. All p-values were two-tailed, with the exception of those for FST estimates (Tables 1 and 2 of S1 Text), since FST is non-negative. Two-tailed p-values were calculated by summation over left and right-hand tails.

Estimates of divergence between population pairs.

Pairwise divergence estimates were based on Wright’s fixation index (FST) [29,30]. To estimate FST, we used Hudson’s estimator [48], whose explicit formulation can be found in [47] and [49]. We chose this particular estimator because it is recommended for small and unequal sample sizes [49,50], and is asymptotically consistent in the number of loci [47,49]. The significance of each pairwise FST estimate was tested by 1000 permutations of the clinic labels, thereby assuming FST = 0 under the null hypothesis. Following [70], 95% confidence intervals were obtained by bootstrapping over SNPs 1000 times.

Estimates of relatedness between parasite sample pairs.

Relatedness estimates were based on the expected fraction IBD, , a probabilistic measure of the fraction of the genome inherited by a pair of parasites from a recent common ancestor. For all pairwise comparisons of parasite samples in the barcode and WGS data sets, we estimated using hmmIBD [52]. Specifically to estimate , we calculated the posterior probability of the IBD state at the position of each SNP (equation 38 in [71]) using the forward-backward algorithm as described in [71], then averaged these probabilities over the T SNPs across the genome, can be interpreted as the expected number of IBD state assignments over the genome, divided by the number of SNPs, and is thus a measure of the fraction of the genome inherited by a pair of parasites from a recent common ancestor. Unlike estimates based on the Viterbi path [71], it is not dependent on a single sequence assignment. hmmIBD requires the position of each SNP measured in base pairs, since the probability of transitioning between IBD and non IBD states is considered a function of the distance between SNPs (S4 Text and [52]). When inferring IBD, sequencing data are sometimes pruned to account for linkage disequilibrium [72]. We did not prune our WGS data, however. hmmIBD also requires point estimates of allele frequencies in order to calculate the probability of observing concordance or discordance between genotype calls across parasite sample pairs. Given little evidence of differences in allele frequency point estimates across clinics or over years (S4 Table), we used allele frequency estimates based on data across all years and clinics for both barcode and WGS data respectively.

Highly related parasite sample pairs.

To investigate spatial trends we focused on highly related parasite sample pairs (those with ), since they were considered more likely representative of recent migration (S2 Text). The 0.5 threshold was based on empirical density plots of (Fig A to J in S2 Text). It is approximately equal to the mean relatedness between progeny derived from experimental P. falciparum crosses [73]. Large translations around 0.5 recovered significant negative spatial trends (see section below for details) over highly related barcode parasite sample pairs (Fig Q in S2 Text, top row), while only small translations around 0.5 recovered significant spatial trends in WGS parasite sample pairs (Fig Q in S2 Text, bottom row).

Logit-transformed proportions of highly related parasite sample pairs were calculated within and across clinics and plotted against inter-clinic distance (non-transformed equivalents can be found in S2 Text). 95% confidence intervals were generated by bootstrapping over highly related parasite sample pair labels (equal to 1 if and 0 otherwise) 1000 times, and are thus zero where there are no

Clinic-averaged co-ancestry estimates.

To support results based on highly related parasite sample pairs, additional evidence of spatial structure in the data was sought. We used the ChromoPainter package implemented within the standard pipeline of FineSTRUCTURE version 2 (fs-2.1.1) [26], to estimate the co-ancestry matrix of the WGS and barcode data. Linked analyses were performed using a uniform recombfile with recombination rate 7.4 × 10−7 Morgans per base pair [74], while unlinked analyses were performed by omitting the recomfile. For both linked and unlinked analyses, we set ploidy equal to one. The co-ancestry matrix contains estimated counts of DNA segments received and donated between parasites, where each SNP is considered an independent segment under the unlinked analysis. To construct clinic-averaged co-ancestry estimates, we averaged the received and donated counts within and across clinics. To explore spatial structure and concordance with IBD-based analyses, clinic-averaged co-ancestry estimates were regressed onto inter-clinic distance, and compared with corresponding estimates based on IBD (S3 Text).

Spatial trends in divergence.

Divergence trends were assessed by regressing FST estimates onto inter-clinic great circle distance in kilometers (km), where great circle distance is the distance between two clinics on the earth’s surface. Specifically, we estimated spatial estimates using untransformed variables following the regression model, FST = intercept + (*β × inter-clinic distance) + ε, where *β represents the spatial trend, and ε is assumed to be normally distributed with mean zero and variance σ2. In addition, we explored trends in transformed variables following [51]. More specifically, we fit FST / (1- FST) = intercept + (†β × inter-clinic distance) + ε, where †β represents the spatial trend under a one-dimensional model of isolation by distance, and ε is defined above; and FST / (1- FST) = intercept + (‡β × log(inter-clinic distance)) + ε, where ‡β represents the spatial trend under a two-dimensional model of isolation by distance, and ε is defined above. Two-tailed exact p-values were generated using all 6! = 720 permutations of the FST estimates, thereby assuming no trend under the null hypothesis.

Spatial trends in relatedness.

IBD-based relatedness trends were assessed by regressing highly related parasite sample pair labels (equal to 1 if and 0 otherwise) onto inter-clinic distance (km) under a logistic model framework with and without temporal predictors as outlined below and in S2 Text. Models fit without temporal predictors included an intercept term, inter-clinic distance (ΔDistance, measured in km as defined above for FST), and a predictor per clinic (equal to one if both parasite samples within a pair were collected in the corresponding clinic and zero otherwise) to account for variance between clinics at ΔDistance = 0. The structure of the temporally adjusted models was based on a preliminary analysis of eight models fit to barcode data (full details can be found in S2 Text). The eight models were compared using the Akaike information criterion (AIC), a model comparison score that favors comparatively low values [75]. Temporal predictors included ΔTime between parasite sample collection dates measured to the nearest day, week, month or year; Season, equal to one if two parasite samples within a pair were collected in spring, summer or both; and interactions between ΔTime, Season and ΔDistance. The chosen model included four temporal predictors: Season, ΔTime measured in weeks (ΔWeeks), and two interaction terms (ΔWeeks × Season and ΔWeeks × ΔDistance), allowing the impact of season and inter-clinic distance to vary with weeks between parasite sample collection dates. To account for an increase in IBD in 2014 (Fig N in S2 Text), we introduced an additional predictor to models fit to WGS data, year 2014, true if both parasite samples within a pair were collected in 2014 and false otherwise. Regression coefficient estimates fit under models with and without temporal predictors were denoted βadjusted and βunadjusted, respectively. Those fit under the model with the additional year 2014 predictor were denoted βadjusted 2014. Two-tailed Monte Carlo p-values of the regression coefficients were generated by 1000 permutations of highly related parasite sample pair labels (equal to 1 if and 0 otherwise), thereby assuming non-intercept regression coefficients equal to zero under the null hypothesis.

The sensitivity of spatial trends to the sample size.

Sensitivity to sample size was assessed by sequentially sampling random subsets of the data, and re-estimating spatial trends. Spatial trend estimates were generated under the temporally adjusted logistic regression framework, with the year 2014 predictor for WGS data, thereby accounting for fluctuations in sample sizes over clinics and time induced by random sampling. Random subsets ranged from 50 to 176 parasite samples for WGS data, and 50 to 1171 for barcode data. We also considered a smaller barcode by reanalyzing the barcode data using 24 of the 93 SNPs with the highest minor allele frequencies. For each subset size, subsampling was repeated 100 times. For each random subset, two-tailed Monte Carlo p-values were generated by 100 permutations of the highly related parasite sample pair labels (equal to one if and zero otherwise). We then calculated the proportion of negative and significant spatial trend estimates for each subset size, x, including only estimates generated from logistic regression analyses that successfully converged, and fit logistic regression curves to the proportions using the nonlinear least squares (Table 4). We added two artificial data points equal to 0.025 at x = 0, to constrain curves to tend to 0.025 as x tends to 0. Finally, we found the sample sizes corresponding to 95% significant negative spatial trend estimates by inversion of the fitted nonlinear curves.

Supporting information

S1 Table. Summary of previous population genetic analyses of the barcode data in [1] Nkhoma SC, et al. Mol Ecol. 2013; 22(2).

Given decline in malaria transmission. Figure references refer to Figures in [1].

https://doi.org/10.1371/journal.pgen.1007065.s001

(PDF)

S2 Table. Single-infection barcode data parasite sample counts per clinic and year.

Clinic code: MLA (Maela), WPA (Wang Pha), MKK (Mae Kon Ken) and MKT (Mawker Thai).

https://doi.org/10.1371/journal.pgen.1007065.s002

(PDF)

S3 Table. Single-infection WGS data parasite sample counts per clinic and year.

Clinic code: MLA (Maela), WPA (Wang Pha), MKK (Mae Kon Ken) and MKT (Mawker Thai).

https://doi.org/10.1371/journal.pgen.1007065.s003

(PDF)

S4 Table. Numbers of SNPs for which clinic and year variables were significant in a simple linear model.

The simple linear model was as follows. Non-reference allele frequency = intercept + βclinic clinic + βyear year + ε, where ε was normally distributed with mean 0 and variance σ2.

https://doi.org/10.1371/journal.pgen.1007065.s004

(PDF)

S1 Text. Additional details and analyses regarding FST.

https://doi.org/10.1371/journal.pgen.1007065.s005

(PDF)

S2 Text. Additional details and analyses regarding .

https://doi.org/10.1371/journal.pgen.1007065.s006

(PDF)

S3 Text. Additional details and plots of co-ancestry estimates.

https://doi.org/10.1371/journal.pgen.1007065.s007

(PDF)

S4 Text. Additional details and plots of 93 and 24-SNP .

https://doi.org/10.1371/journal.pgen.1007065.s008

(PDF)

Acknowledgments

We thank Salma Al Saai, Shalini Nair, Marina McDew-White for generating the barcode data, Seth Redmond for help mapping barcode SNPs to P. falciparum 3d7 v3, and Hsiao-Han Chang, Daniel Parker, Pierre Jacob, and Nick Patterson for helpful discussions and suggestions. We also thank the reviewers for many helpful and illuminating comments and suggestions.

References

  1. 1. Gardy J, Loman NJ, Rambaut A. Real-time digital pathogen surveillance—the time is now. Genome Biol. 2015;16(1):155. pmid:27391693
  2. 2. Neher RA, Bedford T. nextflu: Real-time tracking of seasonal influenza virus evolution in humans. Bioinformatics. 2015;31(21):3546–8. pmid:26115986
  3. 3. Bielejec F, Baele G, Vrancken B, Suchard MA, Rambaut A, Lemey P. SpreaD3: Interactive Visualization of Spatiotemporal History and Trait Evolutionary Processes. Mol Biol Evol. 2016;33(8):2167–9. pmid:27189542
  4. 4. Hall M, Woolhouse M, Rambaut A. Epidemic Reconstruction in a Phylogenetics Framework: Transmission Trees as Partitions of the Node Set. PLoS Comput Biol. 2015;11(12):1–36.
  5. 5. De Maio N, Wu CH, Wilson DJ. SCOTTI: Efficient Reconstruction of Transmission within Outbreaks with the Structured Coalescent. PLoS Comput Biol. 2016;12(9):1–23.
  6. 6. Klinkenberg D, Backer J, Didelot X, Colijn C, Wallinga J. New method to reconstruct phylogenetic and transmission trees with sequence data from infectious disease outbreaks; 2016. Preprint. Available from bioRxiv 10.1101/069195. Cited 4 October 2017.
  7. 7. Didelot X, Fraser C, Gardy J, Colijn C. Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks. Mol Biol Evol. 2017;34(4):997–1007. pmid:28100788
  8. 8. Carter R, Mendis KN, Roberts D. Spatial targeting of interventions against malaria. Bull World Health Organ. 2000;78(12):1401–11. pmid:11196487
  9. 9. Worby CJ, Lipsitch M, Hanage WP. Within-Host Bacterial Diversity Hinders Accurate Reconstruction of Transmission Networks from Genomic Distance Data. PLoS Comput Biol. 2014;10(3):e1003549. pmid:24675511
  10. 10. Volkman SK, Neafsey DE, Schaffner SF, Park DJ, Wirth DF. Harnessing genomics and genome biology to understand malaria biology. Nat Rev Genet. 2012;13:315. pmid:22495435
  11. 11. Mita T, Venkatesan M, Ohashi J, Culleton R, Takahashi N, Tsukahara T, et al. Limited geographical origin and global spread of sulfadoxine-resistant dhps alleles in plasmodium falciparum populations. J Infect Dis. 2011;204(12):1980–8. pmid:22021623
  12. 12. Ariey F, Fandeur T, Durand R, Randrianarivelojosia M, Jambou R, Legrand E, et al. Invasion of Africa by a single pfcrt allele of South East Asian type. Malar J. 2006;5(1):34.
  13. 13. Roper C, Pearce R, Nair S, Sharp B, Nosten F, Anderson T. Intercontinental Spread of Pyrimethamine-Resistant Malaria. Science. 2004;305(5687):1124. pmid:15326348
  14. 14. Patel JC, Taylor SM, Juliao PC, Parobek CM, Janko M, Gonzalez LD, et al. Genetic evidence of importation of drug-resistant Plasmodium falciparum to Guatemala from the Democratic Republic of the Congo. Emerg Infect Dis. 2014;20(6):932–40. pmid:24856348
  15. 15. Branch OH, Sutton PL, Barnes C, Castro JC, Hussin J, Awadalla P, et al. Plasmodium falciparum genetic diversity maintained and amplified over 5 years of a low transmission endemic in the peruvian amazon. Mol Biol Evol. 2011;28(7):1973–86. pmid:21109587
  16. 16. Orjuela-Sánchez P, Da Silva-Nunes M, Da Silva NS, Scopel KKG, Gonçalves RM, Malafronte RS, et al. Population dynamics of genetically diverse Plasmodium falciparum lineages: community-based prospective study in rural Amazonia. Parasitology. 2009;136(10):1097–105. pmid:19631016
  17. 17. Mobegi VA, Loua KM, Ahouidi AD, Satoguina J, Nwakanma DC, Amambua-Ngwa A, et al. Population genetic structure of Plasmodium falciparum across a region of diverse endemicity in West Africa. Malar J. 2012;11(1):223.
  18. 18. Conway DJ, Fanello C, Lloyd JM, Al-Joubori BMAS, Baloch AH, Somanath SD, et al. Origin of Plasmodium falciparum malaria is traced by mitochondrial DNA. Mol Biochem Parasitol. 2000;111(1):163–71. pmid:11087926
  19. 19. Miotto O, Amato R, Ashley EA, MacInnis B, Almagro-Garcia J, Amaratunga C, et al. Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nat Genet. 2015;47(3):226–34. pmid:25599401
  20. 20. Miotto O, Almagro-Garcia J, Manske M, Macinnis B, Campino S, Rockett KA, et al. Multiple populations of artemisinin-resistant Plasmodium falciparum in Cambodia. Nat Genet. 2013;45(6):648–55. pmid:23624527
  21. 21. Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, Maslen G, et al. Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature. 2012;487(7407):375–9. pmid:22722859
  22. 22. Campino S, Auburn S, Kivinen K, Zongo I, Ouedraogo JB, Mangano V, et al. Population genetic analysis of plasmodium falciparum parasites using a customized illumina goldengate genotyping assay. PLoS One. 2011;6(6):e20251. pmid:21673999
  23. 23. Coleman M, Coleman M, Mabuza AM, Kok G, Coetzee M, Durrheim DN. Using the SaTScan method to detect local malaria clusters for guiding malaria control programmes. Malar J. 2009;8:68. pmid:19374738
  24. 24. Epperson BK, Li T. Measurement of genetic structure within populations using Moran’s spatial autocorrelation statistics. Proc Natl Acad Sci. 1996;93(19):10528–32. pmid:8927628
  25. 25. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000;155(2):945–59. pmid:10835412
  26. 26. Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of Population Structure using Dense Haplotype Data. PLoS Genet. 2012;8(1):e1002453. pmid:22291602
  27. 27. Sturrock HJW, Bennett AF, Midekisa A, Gosling RD, Gething PW, Greenhouse B. Mapping Malaria Risk in Low Transmission Settings: Challenges and Opportunities. Trends Parasitol. 2016;32(8):635–45. pmid:27238200
  28. 28. Greenhouse B, Smith DL. Malaria genotyping for epidemiologic surveillance. Proc Natl Acad Sci. 2015;112(22):6782–3. pmid:26016526
  29. 29. Wright S. Isolation by distance. Genetics. 1943;28:114–38. pmid:17247074
  30. 30. Wright S. The Genetical Structure of Populations. Ann Eugen. 1951;15:322–54.
  31. 31. Henden L, Lee S, Mueller I, Barry A, Bahlo M. Detecting Selection Signals In Plasmodium falciparum Using Identity-by-Descent Analysis; 2016. Preprint. Available from bioRxiv 10.1101/088039. Cited 4 October 2017.
  32. 32. Omedo I, Mogeni P, Bousema T, Rockett K, Amambua-Ngwa A, Oyier I, et al. Micro-epidemiological structuring of Plasmodium falciparum parasite populations in regions with varying transmission intensities in Africa. Wellcome Open Res. 2017;2(10).
  33. 33. Gusev A, Lowe JK, Stoffel M, Daly MJ, Altshuler D, Breslow JL, et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 2009;19:318–26. pmid:18971310
  34. 34. Anderson TJC, Williams JT, Nair S, Sudimack D, Barends M, Jaidee A, et al. Inferred relatedness and heritability in malaria parasites. Proc R Soc London B Biol Sci. 2010;277(1693):2531–40.
  35. 35. Thompson EA. Identity by Descent: Variation in Meiosis, Across Genomes, and in Populations. Genetics. 2013;194(2):301–26. pmid:23733848
  36. 36. Wright S. The Interpretation of Population Structure by F-Statistics with Special Regard to Systems of Mating. Evolution. 1965;19(3):395–420.
  37. 37. Browning SR, Browning BL. Identity by Descent Between Distant Relatives : Detection and Applications. Annu Rev Genet. 2012;46:617–33. pmid:22994355
  38. 38. Daniels RF, Schaffner SF, Wenger EA, Proctor JL, Chang H-H, Wong W, et al. Modeling malaria genomics reveals transmission decline and rebound in Senegal. Proc Natl Acad Sci. 2015;112(22):7067–72. pmid:25941365
  39. 39. Wong W, Griggs AD, Daniels RF, Schaffner SF, Ndiaye D, Bei AK, et al. Genetic relatedness analysis reveals the cotransmission of genetically related Plasmodium falciparum parasites in Thiès, Senegal. Genome Med. 2017;9(1):5. pmid:28118860
  40. 40. Cerqueira GC, Cheeseman IH, Schaffner SF, Nair S, McDew-White M, Phyo AP, et al. Longitudinal genomic surveillance of Plasmodium falciparum malaria parasites reveals complex genomic architecture of emerging artemisinin resistance. Genome Biol. 2017;18(1):78. pmid:28454557
  41. 41. Carrara VI, Lwin KM, Phyo AP, Ashley E, Wiladphaingern J, Sriprawat K, et al. Malaria Burden and Artemisinin Resistance in the Mobile and Migrant Population on the Thai-Myanmar Border, 1999–2011: An Observational Study. PLoS Med. 2013;10(3):1999–2011.
  42. 42. WHO. World malaria report 2016. World Health Organization. 2016.
  43. 43. Paul REL, Hackford I, Brockman A, Muller-Graf C, Price R, Luxemburger C, et al. Transmission intensity and Plasmodium falciparum diversity on the northwestern border of Thailand. Am J Trop Med Hyg. 1998;58(2):195–203. pmid:9502604
  44. 44. Phyo AP, Nkhoma S, Stepniewska K, Ashley EA, Nair S, McGready R, et al. Emergence of artemisinin-resistant malaria on the western border of Thailand: A longitudinal study. Lancet. 2012;379(9830):1960–6. pmid:22484134
  45. 45. Nkhoma SC, Nair S, Al-Saai S, Ashley E, McGready R, Phyo AP, et al. Population genetic correlates of declining transmission in a human pathogen. Mol Ecol. 2013;22(2):273–85. pmid:23121253
  46. 46. Kajeechiwa L, Thwin MM, Shee PW, Yee NL, Elvina E, Peapah P, et al. The acceptability of mass administrations of anti-malarial drugs as part of targeted malaria elimination in villages along the Thai–Myanmar border. Malar J. 2016;15(1):494. pmid:27677694
  47. 47. Reich D, Thangaraj K, Patterson N, Price AL, Singh L. Reconstructing Indian Population History. Nature. 2009;461(7263):489–94. pmid:19779445
  48. 48. Hudson RR, Slatkin M, Maddison WP. Estimation of levels of gene flow from DNA sequence data. Genetics. 1992;132(2):583–9. pmid:1427045
  49. 49. Bhatia G, Patterson N, Sankararaman S, Price AL. Estimating and interpreting F ST: The impact of rare variants. Genome Res. 2013;23(9):1514–21. pmid:23861382
  50. 50. Willing EM, Dreyer C, van Oosterhout C. Estimates of genetic differentiation measured by fst do not necessarily require large sample sizes when using many snp markers. PLoS One. 2012;7(8):e42649. pmid:22905157
  51. 51. Rousset F. Genetic Differentiation and Estimation of Gene Flow from F-Statistics Under Isolation by Distance. Genetics. 1997;145(4):1219–28. pmid:9093870
  52. 52. Schaffner SF, Taylor AR, Wong W, Dyann F, Neafsey DE. hmmIBD: software to infer pairwise identity by descent between haploid genotypes; 2017. Preprint. Available from: bioRxiv 10.1101/188078. Cited 4 October 2017.
  53. 53. Daniels R, Volkman SK, Milner DA, Mahesh N, Neafsey DE, Park DJ, et al. A general SNP-based molecular barcode for Plasmodium falciparum identification and tracking. Malar J. 2008;7(1):223.
  54. 54. Anderson TJC, Haubold B, Williams JT, Estrada-franco JG, Richardson L, Mollinedo R, et al. Microsatellite Markers Reveal a Spectrum of Population Structures in the Malaria Parasite Plasmodium falciparum. Mol Biol Evol. 2000;17(10):1467–82. pmid:11018154
  55. 55. Wesolowski A, Eagle N, Tatem AJ, Smith DL, Noor AM, Snow RW, et al. Quantifying the Impact of Human Mobility on Malaria. Science. 2012;338(6104):267–70. pmid:23066082
  56. 56. Kraemer MUG, Faria NR, Reiner RC, Golding N, Nikolay B, Stasse S, et al. Spread of yellow fever virus outbreak in Angola and the Democratic Republic of the Congo 2015–16: a modelling study. Lancet Infect Dis. 2016;17(3):330–8. pmid:28017559
  57. 57. Merler S, Ajelli M, Pugliese A, Ferguson NM. Determinants of the spatiotemporal dynamics of the 2009 H1N1 pandemic in europe: Implications for real-time modelling. PLoS Comput Biol. 2011;7(9):e1002205. pmid:21980281
  58. 58. Merler S, Ajelli M. The role of population heterogeneity and human mobility in the spread of pandemic influenza. Proc R Soc London B Biol Sci. 2010;277(1681):557–65.
  59. 59. Roser LG, Ferreyra LI, Ewens M, Vilardi JC, Saidman BO. Isolation by distance and stochastic migration in a fragmented population of Prosopis alba. Am J Bot. 2017;104(2):313–21. pmid:28143832
  60. 60. Bessega C, Pometti CL, Ewens M, Saidman BO, Vilardi JC. Fine-scale spatial genetic structure analysis in two Argentine populations of Prosopis alba (Mimosoideae) with different levels of ecological disturbance. Eur J For Res. 2016;135(3):495–505.
  61. 61. Bezemer N, Krauss SL, Phillips RD, Roberts DG, Hopper SD. Paternity analysis reveals wide pollen dispersal and high multiple paternity in a small isolated population of the bird-pollinated Eucalyptus caesia (Myrtaceae). Heredity (Edinb). 2016;117(6):450–71.
  62. 62. Pardini EA, Hamrick JL. Inferring recruitment history from spatial genetic structure within populations of the colonizing tree Albizia julibrissin (Fabaceae). Mol Ecol. 2008;17(12):2865–79. pmid:18510586
  63. 63. Hadziabdic D, Vito LM, Windham MT, Pscheidt JW, Trigiano RN, Kolarik M. Genetic differentiation and spatial structure of Geosmithia morbida, the causal agent of thousand cankers disease in black walnut (Juglans nigra). Curr Genet. 2014;60(2):75–87. pmid:24177436
  64. 64. Kyrkjeeide MO, Hassel K, Flatberg KI, Shaw AJ, Yousefi N, StenØien HK. Spatial genetic structure of the abundant and widespread peatmoss Sphagnum magellanicum brid. PLoS One. 2016;11(2):1–19.
  65. 65. Chávez-Galarza J, Henriques D, Johnston JS, Carneiro M, Rufino J, Patton JC, et al. Revisiting the Iberian honey bee (Apis mellifera iberiensis) contact zone: Maternal and genome-wide nuclear variations provide support for secondary contact from historical refugia. Mol Ecol. 2015;24(12):2973–92. pmid:25930679
  66. 66. Criscione CD, Anderson JD, Sudimack D, Subedi J, Upadhayay RP, Jha B, et al. Landscape genetics reveals focal transmission of a human macroparasite. PLoS Negl Trop Dis. 2010;4(4):e665. pmid:20421919
  67. 67. R Core Team. R: A Language and Environment for Statistical Computing. Vienne, Austria: R Foundation for Statistical Computing; 2015.
  68. 68. Davison AC, Hinkley D V. Bootstrap methods and their application. 1st ed. Cambridge University Press; 1997.
  69. 69. North B V, Curtis D, Sham PC. A Note on the Calculation of Empirical P Values from Monte Carlo Procedures. Am J Hum Genet. 2002;71:439–41. pmid:12111669
  70. 70. de Meeûs T, Goudet J. A step-by-step tutorial to use HierFstat to analyse populations hierarchically structured at multiple levels. Infect Genet Evol. 2007;7(6):731–5. pmid:17765664
  71. 71. Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE. 1989;77(2):257–86.
  72. 72. Browning SR, Browning BL. High-Resolution Detection of Identity by Descent in Unrelated Individuals. Am J Hum Genet. 2010;86(4):526–39. pmid:20303063
  73. 73. Vaughan AM, Pinapati RS, Cheeseman IH, Camargo N, Fishbaugher M, Checkley LA, et al. Plasmodium falciparum genetic crosses in a humanized mouse model. Nat Methods. 2015;12(7):631–3. pmid:26030447
  74. 74. Miles A, Iqbal Z, Vauterin P, Pearson R, Campino S, Theron M, et al. Indels, structural variation and recombination drive genomic diversity in Plasmodium falciparum. Genome Res. 2016;26(9):1288–99. pmid:27531718
  75. 75. Akaike H. Information Theory and an Extension of the Maximum Likelihood Principle. In: Selected Papers of Hirotugu Akaike. Springer New York; 1998. p. 199–213.