Plasmodium vivax Diversity and Population Structure across Four Continents

Plasmodium vivax is the geographically most widespread human malaria parasite. To analyze patterns of microsatellite diversity and population structure across countries of different transmission intensity, genotyping data from 11 microsatellite markers was either generated or compiled from 841 isolates from four continents collected in 1999–2008. Diversity was highest in South-East Asia (mean allelic richness 10.0–12.8), intermediate in the South Pacific (8.1–9.9) Madagascar and Sudan (7.9–8.4), and lowest in South America and Central Asia (5.5–7.2). A reduced panel of only 3 markers was sufficient to identify approx. 90% of all haplotypes in South Pacific, African and SE-Asian populations, but only 60–80% in Latin American populations, suggesting that typing of 2–6 markers, depending on the level of endemicity, is sufficient for epidemiological studies. Clustering analysis showed distinct clusters in Peru and Brazil, but little sub-structuring was observed within Africa, SE-Asia or the South Pacific. Isolates from Uzbekistan were exceptional, as a near-clonal parasite population was observed that was clearly separated from all other populations (F ST>0.2). Outside Central Asia F ST values were highest (0.11–0.16) between South American and all other populations, and lowest (0.04–0.07) between populations from South-East Asia and the South Pacific. These comparisons between P. vivax populations from four continents indicated that not only transmission intensity, but also geographical isolation affect diversity and population structure. However, the high effective population size results in slow changes of these parameters. This persistency must be taken into account when assessing the impact of control programs on the genetic structure of parasite populations.


Introduction
Plasmodium vivax is the human malaria parasite with the largest geographical expansion, and the predominant malaria parasite outside of Africa [1]. Transmission intensity (according to annual parasite incidence as a surrogate measure) ranges from very low and seasonal in temperate zones and in countries approaching malaria elimination to very high mainly in Asian and South Pacific countries [1]. Prior to malaria control starting early in the 20 th century, P. vivax transmission even occurred in large parts of Europe, Russia and the US [2]. P. vivax is difficult to control due to relapsing liver stages, fast and constant formation of gametocytes and a large proportion of asymptomatic carriers contributing to transmission [3,4]. As a consequence, P. vivax has become the predominant malaria parasite in several countries where P. falciparum transmission has been successfully reduced [5,6].
Along with other parameters, parasite diversity can be used to assess the effect of interventions, as reduced transmission is expected to result in reduced diversity. This relationship was observed for P. falciparum [7,8]. Moreover, knowledge on parasite diversity is the basis to study gene flow between populations, or to track the source of imported infections [9]. Thus, global comparisons of population genetic data help to develop and validate molecular tools for surveillance of antimalarial interventions.
Typing of highly polymorphic microsatellites has proven useful to describe the diversity and structure of parasite populations [10][11][12][13][14], to study patterns of relapses [15][16][17], multiplicity and molecular force of infection [18,19], and for distinguishing reinfection from recrudescence in drug trials [20][21][22]. Several studies reported extensive P. vivax microsatellite diversity even in regions of moderate endemicity, and high multiplicity of infection was frequently observed. While some P. vivax populations showed pronounced structuring on small geographical scale [10,13], this was not the case for other populations [11,23].
Differences in method of sampling with respect to geographical space as well as different panels of markers used for genotyping make direct comparison of results difficult [24]. Lacking so far was a comprehensive comparison of P. vivax diversity based on samples collected across many different sites and typed with the same set of markers. Therefore we compiled data from AIDS, Tuberculosis and Malaria (www.theglobalfund. org) through the Organismo Andino de Salud-Convenio Hipolito Unanue (grant MAA-305-G01-M) and by the Directorate General for Development Cooperation (DGCD) of the Belgian Government (framework agreement 03, project 95502). Sample collection in Armenia, Azerbaijan and Uzbekistan was supported by COPERNICUS-2 RTD project contract ICA2-CT-2000-10046 (VIVAXNIS) of the European Commission. This work was made possible through Victorian State Government Operational Infrastructure Support and Australian Government NHMRC IRIISS. TA is funded by a Sir Henry Wellcome postdoctoral fellowship (WT100066MA). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
This global data set included 841 isolates from regions of different levels of transmission intensity ( Fig 1A and Table 1) and representing 6 out of 9 malaria transmission zones recently shown to differ in relapse patterns [29]. The highest P. vivax prevalence ever has been recorded in the lowlands of PNG (e.g. reaching >50% by PCR in children in East Sepik Province [19]). These South Pacific parasite populations are relatively isolated due to limited migration of human hosts. In Southeast (SE)-Asia transmission is also high, yet often focal [30]. Migration of hosts is high SE-Asia and a major complication of eradication efforts. In Latin America transmission is lower, but increased since the 1960s when the number of P. vivax cases was very low due to successful spraying campaigns [5]. Transmission in Central America is low, and parasite populations are separated from those in South America by the Isthmus of Panama, with no road connecting South and Central America. Also within Central America sub-structuring is likely; e.g. different P. vivax subpopulations were observed in Mexico, following vector distribution [31]. In Africa, P. vivax transmission occurs mostly in Madagascar and East Africa, i.e. Ethiopia and Sudan. In other parts of Sub-Saharan Africa P. vivax transmission is very low as most individuals carry the Duffy-negative blood type, which largely prevents P. vivax infection [32]. Thus, Madagascan parasites are isolated from other P. vivax populations in northern Africa, and transmission in Madagascar is relatively low. In Central Asia, transmission is very low; in Uzbekistan malaria had been eradicated in 1961, but was reintroduced later, and is characterized by small outbreaks in border areas [33].
The wide distribution of parasite populations included in this study permit for the first time assessing P. vivax microsatellite population structure on a global scale. Previous studies on P. vivax population structure across different continents mostly genotyped polymorphic antigens [34,35] or mitochondrial DNA [36,37]. These markers differ from microsatellites because antigens are under immune selection and mitochondria are maternally inherited, thus excluding recombination. It remains unclear whether similar results would be obtained from analyses of mtDNA, antigen-coding genes or of putatively neutral microsatellites. The present study allowed a global comparison of parasite microsatellite diversity and population structure made possible by harmonization of methods, and the definition of a minimal subset of markers for epidemiological studies.

Ethics statement
Informed written consent was obtained from all individuals or their parents or guardians prior to the study. Details on ethical approval are published for samples from Peru [10], Brazil [14,21], Sudan [25], Vietnam [27], Cambodia [26] and the South Pacific [11]. For samples collected from returning travelers to the US (samples from Central America, Mexico, Africa, India and Indonesia) the study protocol was approved by the Ethical Committee for Research with Human Subjects of the Institute of Biomedical Sciences, University of São Paulo (960/CEP). In Madagascar, the study protocol was approved by the Ethics Committee of the Ministry of Health of Madagascar (007/SANPF/2007). The study was approved by the WEHI Human Research Ethics Committee.  [1]. Note that populations named 'Central America', 'Africa', 'Central Asia', 'India' and 'Indonesia' include isolates from various locations within these regions (B) Cluster analysis for K = 2 to K7. Azj = Azerbaijan, Uzb = Uzbekistan Study sites and sample collection Details of samples included in this study are given in Table 1. In case of cohort studies, only 1 sample per individual was included. From Africa, India and Indonesia samples from returning travelers were utilized [38]. While the number of travelers' samples was too small to assess intra-population diversity, linkage disequilibrium, or F ST compared to other populations, they were still useful for clustering analysis and principal component analysis (PCA).
Samples from Central America and Mexico had been collected from travelers returning from several countries spanning from Panama to Mexico. Due to the limited number of isolates and lack of precise information on sample origin (some samples derived from travelers visiting several countries), these isolates were combined as 'Central American' population, despite possible substructuring. From Armenia and Azerbaijan 14 isolates were available; these were combined because migration between both countries is frequent and it was not always clear in which of the two countries the infection had been acquired. From Uzbekistan 20 isolates were available. For calculation of allelic richness all Central Asian countries were pooled to reach the required number of 25 samples.

Genotyping and data analysis
All samples were typed with the same set of 11 published microsatellite markers [28]. Three additional markers of that panel were excluded. MS3 and MS16 had not been typed in all populations, and MS8 showed signals of positive selection and variation in allele sizing between labs (details below). Microsatellites are considered neutral markers; however, some of them might lie within coding regions or be linked to genes under selection. Selection had been tested using Lositan software [39]. Only marker MS8 showed a weak signal for positive selection. A relationship between P. vivax microsatellite alleles and clinical disease or acquired immunity has never been reported, thus it was not to be expected that different age groups sampled across populations or different proportions of febrile and asymptomatic individuals would influence population genetic parameters. Minor differences in typing protocols did not affect the results; i.e. samples from PNG, Solomon Islands, Madagascar Sudan and Central Asia were amplified by nested PCR, while a single round of PCR was performed on all other samples. As the nested PCR primers were identical to the single-round primers, allele sizes can be compared directly. From each lab a subset of samples was typed again starting from DNA to ensure comparability of results. Allele lengths were very consistent for all markers, except for MS8, where variation of up to 1.5 base pairs was observed. Due to this variation and possible positive selection MS8 was excluded from analysis.
In case of multi-clone infections the predominant peak only was included into the analysis. Occasionally this can result in incorrect haplotype assembly. This affects analysis such as linkage disequilibrium or clustering, where individual haplotypes are needed. Diversity and F ST values are not affected because allelic frequencies are assessed at population level. Clustering analysis and calculation of LD were repeated with only those samples that harbored a single allele at each marker. To obtain sufficient samples from Madagascar and Cambodia, we permitted in the analysis also isolates with >1 alleles at one of the markers. It should be noted that isolates of low multiplicity were selected for genotyping for several parasite populations. Thus the difference between the number of samples of the full data set (including predominant peak haplotypes) and the number of only single clone infections does not reflect the proportion of multiple clone infections.
Alleles were binned using TANDEM software [40] and formatted using PGDspider [41]. Expected heterozygosity (H E ) of markers and allelic richness were calculated using FSTAT [42]. H E is the chance that two unrelated parasites carry a different allele of a given marker, and allelic richness is a measure of alleles per each marker adjusted for the different numbers of isolates per site. When the effective population size is reduced, e.g. as consequence of intensified malaria control, rare alleles are expected to disappear first. As a result, allelic richness changes more rapidly than H E , as the influence of low-frequency alleles on H E is small. The number of unique haplotypes was calculated by Dropout [43] and linkage disequilibrium (LD) by LIAN 3.5 with 100,000-fold re-sampling [44]. LIAN compares the observed association of markers to the values expected for random association based on the population diversity. Only unique haplotypes with no missing data were included for calculating LD, resulting in 633 samples in this analysis. Three of the markers used (MS2, MS4, MS5) localize to chromosome 6 and two markers (MS12, MS15) to chromosome 5, thus these markers are physically linked. To assess linkage disequilibrium irrespective of physical linkage of markers, LD was also calculated excluding markers MS2, MS4 and MS12, i.e. with 8 markers located on 8 different chromosomes. As compared to the 11-marker panel, fewer isolates had to be excluded due to missing data, but identical 8-marker haplotypes occurred more often, resulting in 637 samples for this analysis.
Relatedness between haplotypes was assessed by pairwise comparison of all samples within a population and calculating the proportion of shared alleles. Only samples with at least 7 markers available for comparison were included.
Effective population size N e (i.e. the estimated number of unique haplotypes circulating in each site) was calculated using step-wise mutations models (SMM) as well as infinite allele models (IAM), using mutation rates observed in P. falciparum studies of 1.59 Ã 10 −4 (95% confidence interval = 3.7 Ã 10 −4 , 6.98 Ã 10 −5 ) [45]. While some of the markers harbor simple tri-nucleotide repeats, and SMM are likely applicable, other markers contain more complex repeat structures (e.g. MS2, MS6, MS10, MS20) and IAM are more appropriate [46], thus both values are given.
The software STRUCTURE was used to assess clustering of isolates [47]. This method detects clusters without prior information on the origin of samples. Twenty iterations for K = 1 to K = 12 (K being the number of clusters) were run, each with a burn-in period of 10'000 steps and then 100'000 MCMC iterations. A method developed through simulation studies [48] was applied to estimate the most likely number of clusters. In addition, the optimal number of clusters was assessed using the program STRUCTURAMA [49]. F ST values among populations were calculated using FSTAT [42]. To compute Principal Components Analysis (PCA) the smartPCA application of EIGENSOFT was used [50]. In contrast to STRUCTURE analysis, PCA attempts to maximize variance between populations based on the known origin of samples. While smartPCA was designed for SNPs it can be used with microsatellites; each microsatellite allele was treated as a SNP. Preliminary analysis had shown no population substructuring between samples collected in Brazil in 2004 and 2006 [21], in the lowlands of PNG [11] and in Madagascar, thus samples were combined for calculation of F ST values and PCA.
In the absence of the same measures of transmission intensity for all populations, such as entomological inoculation rate, force of infection or parasite prevalence, populations were broadly classified as low, medium and high transmission (Fig 2).

Results
A total of 841 P. vivax isolates were available for this study. Fig 1A shows the origin of isolates on a published map of P. vivax endemicity in 2010, represented as parasite rate [1]. All isolates were typed for the same set of 11 microsatellite markers. For all 841 isolates genotyping results were obtained from !7 markers. For marker MS4, no amplification product was obtained from 70 isolates (i.e. data was obtained from 91.7% of all isolates), for all other markers data was obtained from 823-834 (97.9-99.2%) out of 841 isolates.

Genetic diversity, linkage disequilibrium and effective population size
Pronounced differences in population diversity were observed. While expected heterozygosity (H E ) was generally high, it was strongly reduced in isolates from Uzbekistan. With the exception of Uzbekistan, H E of all except 2 markers (MS5 and MS7) was >0.5 in all populations ( Table 2,  When all isolates and all markers were analyzed, linkage disequilibrium (LD) was strong and significant in South American, Madagascan, Central Asian and SE-Asian populations ( Table 3). No or limited LD was detected in Central America, Sudan and the South Pacific. Trends were similar when only single-clone infections were analyzed, with exception of Peru and Madagascar, where LD observed in the analysis of all data was no longer detected. Overall levels of LD were lower in the single-clone data set. This can be explained by the reduction in sample size, since similarly low levels of LD were observed in an equally low number of randomly selected multi-clone infections. Most representative results were obtained when only 1 marker per chromosome was included, thus excluding physical linkage of markers (total of 8 markers). All populations in the South Pacific, Peru, Central America and Sudan were in full linkage equilibrium (Table 3 and Fig 2).
Effective population N e size was about twice as high in Cambodia and Vietnam (IAM: 6378 and 5553) as compared to South America (2423-2829, Table 4). Values for Madagascar, Sudan and the South Pacific were intermediate, and low for Azerbaijan (2422) and Uzbekistan (1606). Estimates based on SMM were 2-3 fold higher. Samples from Central America and Mexico were highly diverse and showed high N e (5159), likely because they originated from different countries and thus represent different subpopulations.

Haplotype diversity
Across all populations a total of 759 individual haplotypes were found in 818 isolates. 11-loci haplotypes were shared only within the same country. Again isolates from Uzbekistan were  Relatedness among haplotypes was calculated for each population (Fig 2). In the pairwise comparisons isolates from Cambodia and Vietnam shared the same allele on average in 12.6% and 15.0% of markers (i.e. a mean of 1.4 and 1.65 out of 11 microsatellites carried the same allele). Samples from Central American, African and South Pacific populations shared the same allele in 16.5-22.5% of markers. Relatedness was 29.3-32.5% in South American populations, and in Azerbaijan 33.3% and in Uzbekistan 48.5% of markers shared the same allele.
To assess discrimination power of a smaller panel of markers compared to the full set of markers, microsatellites were removed successively and haplotype counts recorded from each individual population, as well as for the full dataset (Fig 3). MS4 was removed first, as no data for this marker could be obtained from 70 isolates. The remaining 10 markers were sequentially removed according to their diversity across all populations, starting with the least diverse one.
Taking into account haplotypes observed in several populations, a single haplotype was shared between Peru and Vietnam when the panel was reduced to 5 markers. With four markers 15 additional haplotypes were shared. While some were shared between populations from the same continent (5 identical 4-loci haplotypes in 251 isolates from PNG), 11 haplotypes occurred on different continents (e.g. Peru/Azerbaijan, Madagascar/Vietnam, Brazil/Cambodia/Vietnam, Sudan/Solomon Islands). As a consequence, when haplotypes shared between populations were taken into account, 4 markers identified 687/759 (91.5%) haplotypes, and 3 markers 595/759 (78.4%) haplotypes (black line in Fig 3, panel 1).

Population structure
Clustering analysis indicated clearly distinct clusters, mainly following geographical lines ( Fig  1B). First South American samples (but not those from Central America) and samples from Azerbaijan were separated from all other populations, with admixture in samples from Central America and Africa. When the number of clusters (K) was 3, the South Pacific populations formed a separate cluster, as well as the Brazilian samples, while samples from Peru, Central America, Madagascar and Asia clustered together. Peruvian samples formed a separate cluster when K was set to 4, K = 5 led to the separation of Madagascar, Sudan and Azerbaijan from SE-Asia. Central American samples formed an individual cluster when K = 6. The clonal population in Uzbekistan clustered with different populations in individual STRUCTURE runs for a given value of K, indicating that no clear relationship to any other population was observed. When the optimal number of clusters was calculated as described [48], high values were observed for K = 2, K = 5 and K = 7 (Fig 4). In addition the program STRUCTURAMA was used to assess the best number of clusters. The 99% CI for estimates of cluster number showed a wide range (79-115 clusters) and was thus not informative As the clear separation of Latin American and South Pacific isolates from those from Africa and Asia might interfere with more subtle population structure within Africa and Asia, Fig 3. Loss of discrimination power between haplotypes with a reduced panel of markers. Markers were sequentially removed (X-axis: number of remaining marker and marker removed), and the percentage clustering analysis by STRUCTURE was repeated including the latter isolates only (Fig 5A). For K = 2 African and Azerbaijani isolates clustered separately from SE-Asian ones. Indian isolates clustered with Africa. As with the full set of isolates, the clonal samples from Uzbekistan clustered with different populations in individual runs for the same number of clusters (e.g. with Africa or with SE-Asia if K = 2). For K = 3 these isolates formed a separate cluster. Both within Africa and SE-Asia admixture was high, and when the number of clusters was set to 4 or 5, the separation between countries was limited. Analysis with STRUCTURE was repeated for single clone infections only. Results were similar as for the full data set (Fig 5B).
Results of clustering analysis were confirmed by F ST values ( F ST values were also compared between continents or sub-continents, i.e. South America, Africa, Central Asia, SE-Asia and the South Pacific (Table 6). F ST was highest between South America and the South Pacific as well as between Central Asia and all other populations (0.11), and lowest between SE-Asia and the South Pacific (0.042) and SE-Asia and Africa (0.058).
In principal component analysis (PCA), PC1 differentiated isolates from Brazil and Peru on the one hand and the South Pacific on the other hand from a cluster containing African and Asian isolates (Fig 6). PC2 separated isolates from Peru und Brazil.
In summary, clustering analysis, F ST values and PCA showed similar results. South American populations were separated from all others, and populations from Brazil and Peru formed clearly separated groups with little admixture. African, SE-Asian and South Pacific populations each formed a large cluster with little sub-structuring. Samples from Central America, Indonesia and India grouped with SE-Asia.

Discussion
Analysis of 841 P. vivax isolates of global origin, typed with the same markers, allowed direct comparison between populations differing in transmission intensity, geographical isolation and history of malaria control. Determination of the optimal set of microsatellite markers required for differentiation of individual parasite infections will improve strategies for genotyping. Knowledge on population structure can be used to assess the effect of interventions and help to track imported infections.

Selection of molecular markers for epidemiological studies
Diversity of a marker, expressed as expected heterozygosity (H E ), is a key criterion for choosing a subset of microsatellite markers for a variety of genotyping questions [51]. Markers of a lower diversity are more suitable to assess population structure, as with less diverse markers, a smaller number of isolates is needed to detect genetic differences among populations. In contrast, many applied genotyping studies aim at distinguishing between "identical" and "different" clones. Samples from treatment failures in a drug trial for example require distinction between new infection versus recrudescence (reappearance of a pre-treatment clone) [18][19][20][21][22]. In this scenario the phylogenetic relationship of genotypes observed is not of interest. Such applied typing tasks often include large numbers of isolates, and thus a minimal set of markers is desirable that is able to reduce the genotyping workload and price without impairing the discrimination power for distinguishing clones. For phylogeographical studies a larger panel of markers is needed. The same is required for tracking the origin of imported malaria cases.
The step-wise removal of markers showed that as little as 3 highly diverse markers were sufficient to detect around 90% of all haplotypes in most populations, only in South American populations up to 40% of haplotypes were missed. Thus, typing only 2-3 markers in SE-Asia, the South Pacific and Africa, and 4-6 markers in South America would lead to only a small underestimation of multiplicity of infection or of treatment efficiency (when a clone during follow-up could not be distinguished from the clone at baseline and a new infection was taken for a recrudescence). Other limitations affect drug trial results, such as imperfect detectability of minority clones [52] and as a consequence substantial day-to-day variation in the alleles detected [53]. Longitudinal P. vivax studies involving genotyping are also complicated by relapsing hypnozoites, which can be homologous or heterologous to the clone at baseline [16,54]. Longitudinal studies can also be affected by re-infection with a genotype observed earlier in the same individual. This is a particular threat when the overall diversity in the parasite population is low, as observed in Uzbekistan, or in case of clonal expansion during an outbreak [46]. Thus, when the population diversity is intermediate or high, using a reduced panel of markers is acceptable as this reduces the ability to differentiate clones only in a minor way and is justified in view of substantial savings in time and costs.

Transmission intensity shapes population parameters
Overall parasite diversity measured as H E or allelic richness was high and reflected transmission levels. It was lowest in South America and Central Asia, where transmission is low, and highest in SE-Asia. In the South Pacific diversity was intermediate despite highest levels of transmission (prevalence >50% in Ilaita [19]). This could be due to the relative geographical isolation of South Pacific populations, and thus limited introduction of parasites from other regions. In contrast migration of infected hosts is high among SE-Asian countries, thus high parasite diversity can be maintained, even when transmission is reduced.
Linkage disequilibrium can be the result of selfing in the mosquito of male and female gametes from the same parasite clone [55], as opposed to recombination of different clones resulting in a break up of linkage. Recombination can occur if a mosquito feeds on a host infected with different parasite clones. The level of LD is expected to decrease with increasing transmission intensity, while diversity is expected to increase. In line with this expectation, LD was detected in Brazil, Central Asia and Madagascar, where transmission is low [1]. It is, however, noteworthy that levels of diversity did not fully correlate with transmission intensity. Highest diversity and strong LD was observed in SE-Asia. In the South Pacific an intermediate diversity but no LD was found. Different processes influence diversity and LD; likely the local ecology add to the high parasite diversity in SE-Asia, while the high proportion of individuals carrying multi-clone infections in the South Pacific (up to 75% of all infected [19]) lead to high levels of parasite recombination. High LD is also expected when closely related parasites are sampled. Yet no LD was observed in Ilaita in PNG, despite most dense sampling of all populations with 132 isolates collected across hamlets approx. 5 km from each other [56].
Incorrect assembly of multi-locus haplotypes in multi-clone infections within a host would be expected to lead to incorrect low levels of LD, but the opposite trend was observed: lower LD was found when only single-clone infections were included in the analysis. This unexpected finding was attributed to the smaller sample size after removing multi-clone infections.
Analysis of population structure revealed significant F ST values between all populations. While clustering analysis and PCA differentiated among those populations separated by high F ST values, no within-continent subdivision was observed in the South Pacific, Africa and SE-Asia. The difference in clustering between populations from Latin American (clearly separated clusters in Peru, Brazil and Central America and Mexico) on the one hand and from the South Pacific (no subdivision between different provinces in PNG and Solomon Islands) or Africa (no subdivision between Madagascar and Sudan) on the other hand is striking, given comparable distances between sites. While high levels of human migration in SE-Asia could explain parasite gene flow, no clusters were found in the South Pacific despite limited human movement (only air and sea transport between East Sepik and Madang provinces, and between PNG and Solomon Islands). Likewise limited sub-structuring was evident between Madagascar and Sudan, despite open sea and countries with very low P. vivax transmission separating sampling locations. In Central Asia, clear subdivision was observed between parasites east and west of the Caspian Sea. Parasites from Azerbaijan and Armenia clustered with those from Africa, while parasites from Uzbekistan were highly clonal and formed a separate cluster.
Distinct parasite subpopulations might be the result of expansion of parasite clones after the near elimination of malaria in the second half of the 20 th century in some countries. In Latin America they could also reflect different independent introductions of P. vivax, as it has been shown for P. falciparum [57]. In Uzbekistan P. vivax had been reintroduced after its elimination, most likely from other Central Asian countries. In contrast, in SE-Asia and the South Pacific P. vivax was present much longer than in Latin America, and even during the peak of the eradications campaigns in the 1960s prevalence remained high [58]. Central American and Mexican samples were exceptional as they were highly diverse and LD was low despite low transmission intensity. This is most likely caused by the fact that these samples represent different isolated subpopulations over a large geographical range. Further studies involving additional parasite populations and different molecular markers are needed to establish differences between South and Central American samples, to understand why Central America clusters with SE-Asia, and to identify potential routes for P. vivax colonization of South and Central America.
Beside microsatellites, other molecular markers have been used to assess P. vivax population structure, most importantly mtDNA and polymorphic antigens. In agreement with the present study, sequencing of mtDNA identified a separate subgroup in Latin America (highest support of all subgroups in Bayesian tree analysis) [36], as well as highly diverse populations in Asia and the South Pacific [37]. However, no pronounced separation between South-Pacific, SE-Asian and some South American isolates was observed using mtDNA [36]. Isolates from South and Central America had been found in the same subgroup, yet, only 3 haplotypes from Central America were sequenced [36]. The same study found different subgroups in East Asia (China and Korea) and SE-Asia (Cambodia, Thailand and Indonesia), plus a third subgroup including isolates from locations across Asia and PNG. The present study includes no isolates from China or Korea, and only few from Indonesia, thus no such structure within East Asia could be found.
In concordance with microsatellite results, mtDNA diversity was highest in SE-Asia and high in the South Pacific [36]. In contrast to microsatellite-derived measurements of diversity, mtDNA diversity was lower for Madagascar, Central America and Africa, whereby results of the latter two populations are likely affected by a very small sample size [36]. The same study indicated that overall P. vivax diversity in Latin America was as high as in SE-Asia, despite locally reduced diversity, a result confirmed by analysis of 3 whole-genome sequences from Latin America [59]. The present study found microsatellite diversity across Central America and Mexico to be similarly high as that of SE-Asia, but when isolates from Peru and Brazil were combined diversity remained lower.
SNPs in antigens are expected to be under strong balancing selection, limiting their use to study the underlying population structure. In line with this, many antigens showed strong clustering but in contrast to microsatellites many clusters were shared between continents [34,35]. Like microsatellites, AMA-1 and MSP-1 alleles from South Pacific populations showed very little admixture with any other parasite populations [34,60]. DBP-II alleles in contrast were more evenly distributed across continents [35]. A study using putatively neutral SNPs covering a 200-kb genomic region confirmed subdivision between Brazil and SE-Asia [61], and a barcode of 42 SNPs across the genome was recently published and tested on a small number of clinical samples from three continents [62]. However, this barcode was not yet tested for assessing local population structure or for typing asymptomatic, low-density infections.

Application of genotyping in malaria control
Continuous malaria control is expected to reduce parasite diversity and effective population size, and to increase differences between populations due to clonal expansion of remaining parasite strains [63]. Indeed, near-clonal expansion of parasites has been observed for P. falciparum in the highlands of PNG [8], in Solomon Islands [64] and in South America [7]. Likewise, Artemisinin-resistant clones have expanded in SE-Asia [65].
In striking contrast to these findings, nearly all studies assessing P. vivax diversity found high parasite diversity, even in countries now aiming to eliminate malaria [66][67][68][69]. The clonal expansion in Uzbekistan, a country that had successfully eliminated malaria in the 1960-ies, is the first such population structure reported for P. vivax. Low microsatellite diversity was also found in South Korea, where transmission has been low for decades and the parasite population is relatively isolated [70], as well as from a rural, isolated site in Peru [71]. The high P. vivax diversity in countries with low transmission likely indicates a high underlying effective population size and thus a large number of infected individuals. Two hallmarks of P. vivax biology add to this, namely hypnozoites in the liver, and a large proportion of asymptomatic, lowdensity infections that escape screenings conducted by light microscopy or rapid diagnostic test and thus a substantially underestimated parasite reservoir [4].
The isolates studied here were collected prior to the renewed call for malaria elimination. Only few studies have typed samples collected after up-scaling control measures, but diversity remained high [72][73][74]. It seems that control has little short-term effect on population size, and diversity measures changes slowly as long as the effective population size remains high (above 100 genetically distinct parasite clones) [75]. Therefore diversity measures will only be useful to assess the impact of control programs once transmission is very low after several years of intensified control.
In recent years malaria control has been intensified reducing prevalence and incidence in PNG [76], many Asian countries [69,77] and South America [78]. It will be important to evaluate whether reduced prevalence is paralleled by increased sub-structuring on small scale, i.e. breaking up of the South Pacific and SE-Asian clusters, indicating local hotspots of transmission. A pronounced reduction of genetic diversity and increase in population structure will implicate success of control and interruption of parasite gene flow from neighboring populations.
In previously malaria-free regions, microsatellite typing can help to study outbreaks. Because of their high discrimination power between clones, genotyping outbreak samples can clarify whether a single clone was imported and spread across a local region, or whether steady gene flow from neighboring regions with ongoing transmission occurs, resulting in a diverse parasite population [9].

Conclusions
Microsatellite typing remains an important tool to study P. vivax, as it can be done in any lab equipped for PCR. For epidemiological studies and drug trials, a limited set of 2-6 markers, depending on transmission intensity, provides sufficient resolution to distinguish individual clones. The full panel of 11 microsatellite markers showed clear population structure on a global scale, and differences in diversity reflect transmission intensity and isolation of parasite populations. These population genetic measures could potentially be used as tools to measure the impact of control programs; however, due to the large effective population size even in countries of moderate endemicity, these parameters are likely to change slowly.