Admixture in Humans of Two Divergent Plasmodium knowlesi Populations Associated with Different Macaque Host Species

Human malaria parasite species were originally acquired from other primate hosts and subsequently became endemic, then spread throughout large parts of the world. A major zoonosis is now occurring with Plasmodium knowlesi from macaques in Southeast Asia, with a recent acceleration in numbers of reported cases particularly in Malaysia. To investigate the parasite population genetics, we developed sensitive and species-specific microsatellite genotyping protocols and applied these to analysis of samples from 10 sites covering a range of >1,600 km within which most cases have occurred. Genotypic analyses of 599 P. knowlesi infections (552 in humans and 47 in wild macaques) at 10 highly polymorphic loci provide radical new insights on the emergence. Parasites from sympatric long-tailed macaques (Macaca fascicularis) and pig-tailed macaques (M. nemestrina) were very highly differentiated (FST = 0.22, and K-means clustering confirmed two host-associated subpopulations). Approximately two thirds of human P. knowlesi infections were of the long-tailed macaque type (Cluster 1), and one third were of the pig-tailed-macaque type (Cluster 2), with relative proportions varying across the different sites. Among the samples from humans, there was significant indication of genetic isolation by geographical distance overall and within Cluster 1 alone. Across the different sites, the level of multi-locus linkage disequilibrium correlated with the degree of local admixture of the two different clusters. The widespread occurrence of both types of P. knowlesi in humans enhances the potential for parasite adaptation in this zoonotic system.


Introduction
The epidemiological emergence of infections can be traced by genotypic analyses, with a high level of resolution when pathogens have a high mutation rate, as illustrated by recently emerged viruses that now have a massive impact on global public health [1,2]. Such analysis is more challenging for eukaryote pathogens with low mutation rate, although it is now clear that the major human malaria parasites Plasmodium falciparum and P. vivax have been endemic for many thousands of years after having been acquired as zoonotic infections from African apes [3,4]. In contrast, natural human infections by P. knowlesi were almost unknown [5] until a large focus of cases in Malaysian Borneo was described a decade ago [6]. Infections have since been reported from throughout southeast Asia, within the geographical range of the long-tailed and pig-tailed macaque reservoir hosts (Macaca fascicularis and M. nemestrina) and mosquito vectors (of the Anopheles leucosphyrus group) [7]. The most highly affected country is Malaysia, where there have been thousands of reported cases and P. knowlesi is now the leading cause of malaria in most areas [8,9].
It is vital to determine the causes of this apparent emergence, as P. knowlesi can cause severe clinical malaria with a potentially fatal outcome [10][11][12]. Increasing rates of case detection may reflect better diagnosis, increased transmission by mosquitoes from reservoir host macaques to humans, or parasite adaptation to humans. Molecular tools to discriminate P. knowlesi from other malaria parasite species were not widely applied until the zoonosis became known, but analysis of DNA in archived blood samples from Malaysia and Thailand shows that it was already widespread twenty years ago [13,14]. Sequences of parasite mitochondrial genomes and a few nuclear gene loci indicate ongoing zoonotic infection, as human P. knowlesi genotypes share most alleles identified in parasites sampled from wild macaques [15][16][17].
To understand this zoonosis, and to identify whether human-to-human mosquito transmission is occurring, analyses of parasite population genetic structure in humans and macaques should be performed by extensive population sampling and characterisation of multiple putatively neutral loci. This study presents a P. knowlesi microsatellite genotyping toolkit and its application to the analysis of a large sample of isolates from human cases at ten different sites, as well as from both species of wild macaque reservoir hosts. Results reveal a profound host-associated sympatric subdivision within this parasite species, as well as geographical differentiation indicating genetic isolation by distance. The existence of two divergent parasite subpopulations, and their admixture in human infections provides unparalleled opportunity for parasite hybridisation and adaptation. Observations of some clinical infections with parasite types that appear intermediate between the two subpopulations may reflect this process, and are a possible result of human-to-human mosquito transmission.

P. knowlesi microsatellites as genetic markers for population studies
Hemi-nested PCR assays were developed for amplification of 19 tri-nucleotide simple sequence repeat loci from throughout the genome of P. knowlesi and tested for species-specificity using control DNA from all 10 known parasite species of humans, long-tailed or pig-tailed macaques, as well as human and macaque DNA to identify those suitable for genotyping samples from all hosts (S1 Table). Assays for 11 loci were entirely species-specific for P. knowlesi, and 10 of these gave a clear single electrophoretic peak for each allele without any stutter bands (S2 Table). These were used to genotype P. knowlesi infections in a total of 599 humans and wild macaques with a high rate of success, 556 (92.8%) scoring clearly for all 10 loci (S3 Table and S1 Dataset). Numbers of alleles at each locus ranged from 7 (for locus NC03_2) to 21 (for locus CD05_06) (S1 Dataset).
Host-dependent genetic structure of P. knowlesi We first compared parasites from different host species sampled from Kapit where high numbers of clinical cases are seen (Fig 1), analysing the infections with complete 10-locus genotype data. Almost all P. knowlesi infections in macaques contained multiple genotypes, with no significant difference between long-tailed macaques (88% of 34 were mixed, a mean of 2.71 genotypes per infection) and pig-tailed macaques (100% of 10 were mixed, a mean of 2.70 genotypes per infection; P = 0.65 for comparison between macaque host species), whereas only a minority of human P. knowlesi infections had multiple genotypes (35% of 167, a mean of 1.40 genotypes per infection; P < 10 -15 for comparison between humans and macaques) (Fig 2A). To allow equally weighted sampling per host, the predominant allele at each locus within each infection was counted for subsequent analysis (S1 Dataset).
Pairwise comparisons of each of the complete 10-locus profiles revealed that all infections in Kapit were genotypically distinct, except for one identical pair and one identical triplet of human infections (Fig 2B, S4 Table). There was a much higher average proportion of shared alleles among pig-tailed macaque infections than among those in long-tailed macaques or humans (medians of 5, 3 and 2 identical alleles out of 10 loci respectively). Analysis of allele frequencies revealed that P. knowlesi parasites from pig-tailed macaques are very highly divergent from those in long-tailed macaques (F ST = 0.217, P < 0.001), whereas those in humans have an intermediate level of relatedness (F ST = 0.067 versus long-tailed macaques, F ST = 0.104 versus pig-tailed macaques; P < 0.001 for both).
A Bayesian model-based STRUCTURE analysis of multi-locus genotype data from all hosts sampled in Kapit clearly indicated the existence of two sub-population clusters of P. knowlesi (K = 2; ΔK = 936.75 based on Evanno's estimation of K-population) (Fig 3A, S1 Fig and S1 Dataset). An individual infection genotype was assigned to be predominantly of a particular cluster if the STRUCTURE analysis score exceeded 0.5 for that cluster. All except one of the long-tailed macaque infections were assigned to the Cluster 1 subpopulation, whereas all pigtailed macaque infections were assigned to the Cluster 2 subpopulation, while 71% of human infections were assigned to Cluster 1 and 29% to Cluster 2 ( Fig 3A, S5 Table). A small minority of those which were primarily assigned to either cluster appeared to have a degree of mixed diversity among infections in humans and long-tailed macaques, but a higher average identity among infections from pig-tailed macaques. All infections had a different 10-locus genotype, except for 5 of the 167 human infections (there was one pair sharing an identical 10-locus genotype, and a triplet of infections sharing another 10-locus genotype, indicated with asterisks here and shown in S4 Table).
doi:10.1371/journal.ppat.1004888.g002 assignment, with scores nearer 0.5 than either zero or 1.0 for the alternative clusters (S1 Dataset), which is analysed in a separate section below. An independent scan by principal component analysis (PCA) showed an almost complete separation between parasites from long-tailed macaques and pig-tailed macaques along the first principal component, while parasites from humans covered the whole distribution and overlapped with all of the samples from both of the macaque hosts ( Fig 3B).
Geographical population genetic structure of P. knowlesi We analysed a further 367 human P. knowlesi infections from nine other geographical sites (Fig 1). Most human infections had single P. knowlesi genotypes (Fig 4A and S1 Dataset), and there were no differences in the proportions of mixed genotype infections across all sites (Comparison across 10 sites including Kapit: Pearson's X 2 , P = 0.096; 32% of infections having > 1 genotype overall). There were no differences in allelic diversity among the different sites (H E  Table). Pairwise comparisons among genotypes from different infections showed a similar level of diversity at each site, with a median of 2 or 3 identical alleles out of 10 loci in each site (Fig 4B, S4 Table). Every infection had a different multi-locus genotype, and there were virtually none that shared alleles at more than 7 loci, except for nine pairs of identical haplotypes (three pairs in Betong, three in Miri, and one in each of Sarikei, Tenom and Kelantan) ( Fig 4B). Each identical haplotype pair was shared by infections from different individuals sampled at the same site within the same year, except for two of the identical haplotype pairs in Miri, shared by individual infections sampled one and two years apart (S4 Table).
There were two subpopulation clusters (K = 2, ΔK = 174.94, S1 Fig) throughout all of these sites, as had been seen in Kapit, but the relative frequency of the clusters varied geographically (P < 0.0001, Fig 4C). The Cluster 1 subpopulation was more frequent overall, but Cluster 2 was also common at each of the sites in Sarawak, particularly in Miri and Kanowit where it was more frequent than Cluster 1 (S5 Table and S1 Dataset). Over all human infections, there was a similarly high level of divergence in allele frequencies between the two subpopulation clusters as was seen between parasites from the two different macaque host species (F ST = 0.194, P < 0.001). As expected, the degree of cluster admixture at each sampling site (p1 Ã p2, where p1 and p2 are the local frequencies of Cluster 1 and Cluster 2 respectively) correlated positively with the (I S A ) index of multi-locus linkage disequilibrium (Spearman's Rho = 0.678, P = 0.015, Fig 5 and S5 Table).
Analysis of geographical divergence on the basis of F ST indices derived from population allele frequencies (S6 Table) identified a pattern strongly consistent with isolation by distance (Mantel test of matrix correlation P < 0.0001, Fig 6A). The greatest level of divergence was seen between peninsular Malaysia and Borneo as expected, although isolation by distance was also apparent within Borneo (Mantel test P = 0.0016). The overall pattern consistent with isolation by distance remained when only infections with Cluster 1 genotypes were analysed (P = 0.0016, Fig 6A). There was a similar trend for the smaller number of samples with Cluster 2 genotypes, although this was not significant (P = 0.0922, S2 Fig), indicating that the majority of the geographical differentiation is independent of the Cluster subpopulation structure. A principal    Fig 7A), whereas both of these independently had higher indices than the macaque infections (P < 0.001 for both comparisons). When analysis was focused on Kapit alone, the distribution of intermediate cluster assignment indices were not significantly different between human and macaque infections (P = 0.25, Fig 7B). However, there were geographical differences, with human infections from Kelantan having a significantly higher distribution of values compared to five of the sites in Borneo (Mann-Whitney test P < 0.05 for each comparison after Bonferroni correction, Fig 7B). Across the different sites, there was no significant correlation between the local population admixture of both clusters (p1 Ã p2, S5 Table)

Discussion
We show that human P. knowlesi is an admixture of two divergent parasite populations associated with different forest-dwelling macaque reservoir hosts. In human infections, the longtailed macaque-associated P. knowlesi type (Cluster 1) is most common overall and at most of the geographical sites, while the pig-tailed macaque-associated type (Cluster 2) is also common at sites in Sarawak. The estimate of divergence between these two sympatric parasite subpopulations (F ST index of~0.22 averaged over 10 microsatellite loci) may be conservative, due to high allelic diversity of the microsatellite loci which restricts the potential upper range of fixation indices [18,19]. The differentiation varied among the loci, with two of the microsatellite loci being particularly highly differentiated between the clusters (F ST~0 .35), so the robustness of the two assigned clusters was confirmed by repeat analyses which excluded these. Previous analysis of P. knowlesi mitochondrial DNA sequences from a relatively small number of human and long-tailed macaque infections in Kapit did not indicate two divergent lineages [15], although analysis of samples from Sabah suggests that sequences from pig-tailed macaque infections are differentiated from sequences from long-tailed macaque infections [20].
The results confirm that humans have mostly single genotype P. knowlesi infections whereas macaques have polyclonal infections, supporting the expectation that there is a higher rate of transmission among macaques [15,21]. The estimated number of genotypes per infection here is a minimum number based on the alleles detected, and it is possible that some infections may have contained additional parasite clones that were not detected, due to having low density in the blood or having similar alleles to the ones detected. The number of P. knowlesi genotypes detected per infection in humans is lower than was previously seen in microsatellite analyses of the endemic human malaria parasites P. falciparum and P. vivax in some of the same areas in Malaysia [22,23], whereas the number of P. knowlesi genotypes per infection in macaques is much higher. Levels of multi-locus linkage disequilibrium in P. knowlesi here are lower than reported in P. vivax or P. falciparum in these areas [22,23], indicating that recombination in P. knowlesi probably commonly occurs in mosquitoes containing a macaque blood meal with multiple parasite genotypes.
It is unknown how the two sympatric P. knowlesi subpopulations are genetically isolated. The observation of a single long-tailed macaque with a P. knowlesi Cluster 2 type infection (otherwise only seen in pig-tailed macaques and humans) suggests there is not an absolute barrier in terms of primate host susceptibility, although there are differences in ecology. Additional sampling of both long-tailed and pig-tailed macaques will be important to confirm the host associations of different parasites [20]. Both macaque species are widespread, but long-tailed macaques prefer secondary forest near human settlements where they have access to farms for food, whereas pig-tailed macaques spend more time in ground foraging in primary forests, generally having less frequent contact with humans [24]. There may be differential susceptibility of mosquito species to the respective parasite types, as suggested for subpopulations of another malaria parasite elsewhere [25], or different mosquitoes may feed on the respective macaque host species. Genetic differentiation in P. knowlesi was also strongly correlated with geographical distance, overall and for the Cluster 1 parasites. The observation of highest F ST values between populations from Malaysian Borneo and Peninsular Malaysia was expected, as the South China Sea has separated macaques in these areas since the last glacial period [26], but a test for isolation by distance remained significant when analysing only sites within Borneo.
A small minority of human infections had intermediate cluster assignment indices, which could potentially result from occasional crossbreeding between the two genotypic clusters, although this cannot be concluded from these data alone. Hybridisation between species or subspecies can offer opportunities for adaptation, and has been associated with emergence of novel host-specificity or pathogenicity in other parasitic protozoa [27] and fungi [28]. Switching of host species has occurred repeatedly in malaria parasites of birds [29] and small mammals [30], as well as apes and humans [3,4], but the occurrence of parasite hybridisation and introgression has not been investigated. The potential occurrence of inter-cluster hybridisation in even a minority of human P. knowlesi infections, combined with the possibility of human-mosquitohuman transmission, may increase the potential for P. knowlesi adaptation to the human host or to mosquito species that are more abundant than the currently known forest-associated vectors.
Genome-wide analysis of P. knowlesi populations would enable further evaluation of the genetic structure of this zoonotic parasite species, and allow scanning for loci under selection within each of the two subpopulations. Human clinical isolates containing single species infections would be relatively straightforward to analyse, as P. knowlesi sequences would be unmixed with those of other human malaria species. In contrast, as natural macaque infections usually contain a mixture of different malaria parasite species [15], to obtain unambiguous genome sequences it may be necessary to sequence from individual parasites isolated from these hosts [31]. Although experimental studies on P. knowlesi are usually conducted in vivo in nonhuman primates [32][33][34], new approaches to adapt the parasites to in vitro growth using human erythrocytes have been successful [35,36]. Analysis of phenotypic differences between the different host-associated types may be investigated using both in vivo and in vitro experimental systems, while continued epidemiological and clinical surveillance for increasing incidence or disease severity is of the highest priority.

Ethics statement
Human blood samples were taken after written informed consent had been obtained from patients. This study was approved by the Medical Research and Ethics Committee of the Malaysian Ministry of Health (Reference number: NMRR-12-1086-13607), which operates in accordance to the International Conference of Harmonization Good Clinical Practice Guidelines. Animal sampling was carried out as previously described [15] in strict accordance with the recommendations by the Sarawak Forestry Department for the capture, use and release of wild macaques. A veterinarian took a venous blood sample from each macaque following anesthesia by intramuscular injection of tiletamine and zolazepam, and all efforts were made to minimize suffering by collecting blood at the trap sites and releasing the animals immediately after the blood samples had been obtained. The Sarawak Forestry Department approved the study protocol for capture, collection of blood samples and release of wild macaques (Permit Numbers: NPW.907

P. knowlesi samples from humans and macaques
A total of 599 DNA samples from different P. knowlesi infections of humans and macaques were analysed from collections performed at 10 different geographical sites (Fig 1). 552 samples were from human P. knowlesi malaria patients from all of the sites, eight in Malaysian Borneo

Development of P. knowlesi microsatellite genotyping markers
A combination of three microsatellite mining tools (iMEX [39], mreps [40], and MSATCOM-MANDER [41]) were used to identify simple sequence repeat loci from the P. knowlesi reference genome [42]. Loci with perfect tri-nucleotide simple repeat sequences were carefully selected using customised perl-script commands based on narrow criteria to maximise their likely utility for genotyping: i) a minimum of 7 repeat copies in each microsatellite in the reference sequence, ii) located at non-telomeric chromosomal regions as defined by regions syntenic with the P. vivax reference genome [43], iii) absence of any homopolymeric tracts adjacent to the microsatellite sequence that could give rise to additional size polymorphism. As a result, 19 trinucleotide repeat loci widely spaced in the genome (S1 Table) were shortlisted and PCR primers were designed using PrimerSelect software (DNASTAR, USA) for hemi-nested PCR assays. The specificity of PCR was tested using DNA controls of all human Plasmodium species, common malaria parasites of the Southeast Asian macaques (P. knowlesi, P. coatneyi, P. inui, P. cynomolgi and P. fieldi), as well as human, long-tailed and pig-tailed macaque DNA. Loci for which primers showed complete specificity of amplification from P. knowlesi were tested further for genotyping performance.

PCR and genotyping protocols
Genotyping of each microsatellite locus was performed using a hemi-nested protocol with a fluorescent dye-labelled inner primer during the second round PCR amplification (primers listed in S1 Table). Both first and second round PCR amplifications were conducted in individual tubes or wells for each locus, in 11 μl reaction volume containing 0.2 mM each dNTP (Bioline, UK), 2 mM MgSO 4 , 1X ThermoPol II reaction buffer (NEB, UK), 0.275 U Taq DNA polymerase (NEB, UK), 0.1 μM of each forward and reverse primer, and 1 μl sample DNA template. The PCR cycling conditions were as follows: initial denaturation at 94°C for 2 min, followed by 28 cycles of 94°C for 30 sec, annealing at 56°C for 30 sec and elongation at 68°C for 30 sec, with a final elongation step at 68°C for 1 min. Final PCR products were pooled into three groups of loci with different product size and dye profiles together with Genescan 500 LIZ molecular size standards (Applied Biosystems, UK) and run on a Genetic Analyzer 3730 capillary electrophoretic system (Applied Biosystems, UK). GENEMAPPER version 4.0 software (Applied Biosystems, UK) was used for scoring of allele electrophoretic size, and quantification of peak heights.

Genotypic and statistical analyses
Infections containing multiple haploid parasite genotypes were apparent as multiple electrophoretic peaks for a locus corresponding to different alleles. The apparent genotypic multiplicity of infection (MOI) was determined by the locus with the most alleles detected in the infection, considering peaks with height of at least 25% relative to the predominant allele within each isolate. The predominant allele per locus within each infection was counted for subsequent population genetic analyses. Allelic diversity at each locus was measured as the virtual heterozygosity (H E ) using FSTAT software version 2.9.3.2 (http://www2.unil.ch/popgen/ softwares/fstat.htm), and allele frequency distributions were also inspected using GenAlEx version 6 [44] within the Microsoft Excel platform. Genetic differentiation between each population was measured by pairwise fixation indices (F ST ) using FSTAT, with Bonferroni correction on a nominal significance level of 0.05 applied for multiple comparisons across the population pairs. To test for correlation between genetic differentiation and geographical distance, a Mantel test for isolation by distance was performed with Rousset's linearised F ST /(1-F ST ) plotted against the natural log of geographic distance using Genepop version 4.2 [45].
The relatedness of haplotypes between individual isolates was assessed by measuring the pairwise proportion of shared alleles, excluding samples with missing data at any locus. A matrix of pairwise similarity among isolates was calculated based on the identical or mismatched alleles from a complete set of loci and the distribution of shared alleles between sample pairs for each population was visualised using a customised perl-script command. To test for nonrandom allele assortment, multi-locus linkage disequilibrium (LD) was assessed by the standardised index of association (I A S ) using LIAN version 3.6 [46], with significance of the I A S values tested by Monte-Carlo simulation with 10,000 data permutations to generate the null distribution under linkage equilibrium.
To explore evidence of population substructure in the entire population, a Bayesian analysis was performed using the STRUCTURE version 2.3.4 software [47] using samples with no missing data at any locus. Individuals in the population pool were clustered to the most likely population (K) by measuring the probability of ancestry using the multi-locus genotype data. The program parameters were set to admixture model with correlated allele frequency, with 50,000 burn-in period and 100,000 Markov chain (MCMC) iterations. To run the simulation, K value was predefined from 1-10 and the run was performed in 20 replicates for each K. The most probable K value was then calculated according to Evanno's method [48] using the webpage interface STRUCTURE Harvester [49]. The assignment of a sample to a subpopulation cluster was based on the inferred cluster scores by STRUCTURE analysis, where samples with inferred cluster scores within a range in relation to the K-value were assigned together as one subpopulation cluster. The intermediate cluster assignment indices were calculated based on the proportion of shared cluster ancestries per individual isolate inferred by the cluster scores from the STRUCTURE analysis.
We also independently performed a principal component analysis (PCA) using the GenA-lEx package for the same purpose. Samples with missing data at any locus were excluded, and the genetic distance matrix was generated based on the allelic mismatches between pairs of isolates. A two-dimensional PCA plot was generated considering the first two highest eigenvalues, and genetic clusters were determined based on the eigenvector coordinates along the axes of variation.   Table. Pairwise measures of fixation indices (F ST values above diagonal) and geographical distance (in kilometres below diagonal) across 10 populations of P. knowlesi from human infections. (DOCX) S1 Dataset. Genotypes at P. knowlesi microsatellite loci in infections from macaques (n = 47) and humans (n = 552).