• Loading metrics

THE REAL McCOIL: A method for the concurrent estimation of the complexity of infection and SNP allele frequency for malaria parasites

THE REAL McCOIL: A method for the concurrent estimation of the complexity of infection and SNP allele frequency for malaria parasites

  • Hsiao-Han Chang, 
  • Colin J. Worby, 
  • Adoke Yeka, 
  • Joaniter Nankabirwa, 
  • Moses R. Kamya, 
  • Sarah G. Staedke, 
  • Grant Dorsey, 
  • Maxwell Murphy, 
  • Daniel E. Neafsey, 
  • Anna E. Jeffreys


As many malaria-endemic countries move towards elimination of Plasmodium falciparum, the most virulent human malaria parasite, effective tools for monitoring malaria epidemiology are urgent priorities. P. falciparum population genetic approaches offer promising tools for understanding transmission and spread of the disease, but a high prevalence of multi-clone or polygenomic infections can render estimation of even the most basic parameters, such as allele frequencies, challenging. A previous method, COIL, was developed to estimate complexity of infection (COI) from single nucleotide polymorphism (SNP) data, but relies on monogenomic infections to estimate allele frequencies or requires external allele frequency data which may not available. Estimates limited to monogenomic infections may not be representative, however, and when the average COI is high, they can be difficult or impossible to obtain. Therefore, we developed THE REAL McCOIL, Turning HEterozygous SNP data into Robust Estimates of ALelle frequency, via Markov chain Monte Carlo, and Complexity Of Infection using Likelihood, to incorporate polygenomic samples and simultaneously estimate allele frequency and COI. This approach was tested via simulations then applied to SNP data from cross-sectional surveys performed in three Ugandan sites with varying malaria transmission. We show that THE REAL McCOIL consistently outperforms COIL on simulated data, particularly when most infections are polygenomic. Using field data we show that, unlike with COIL, we can distinguish epidemiologically relevant differences in COI between and within these sites. Surprisingly, for example, we estimated high average COI in a peri-urban subregion with lower transmission intensity, suggesting that many of these cases were imported from surrounding regions with higher transmission intensity. THE REAL McCOIL therefore provides a robust tool for understanding the molecular epidemiology of malaria across transmission settings.

Author Summary

Monitoring malaria epidemiology is critical for evaluating the impact of interventions and designing strategies for control and elimination. Population genetics has been used to inform malaria epidemiology, but it is limited by the fact that a fundamental metric needed for most analyses—the frequency of alleles in a population—is difficult to estimate from blood samples containing more than one genetically distinct parasite (polygenomic infections). A widely used approach has been to restrict analysis to monogenomic infections, which may represent a biased subset and potentially ignores a large amount of data. Therefore, we developed a new analytical approach that uses data from all infections to simultaneously estimate allele frequency and the number of distinct parasites within each infection. The method, called THE REAL McCOIL, was evaluated using simulations and was then applied to data from cross-sectional surveys performed in three regions of Uganda. Simulations demonstrated accurate performance, and analyses of samples from Uganda using THE REAL McCOIL revealed epidemiologically relevant differences within and between the three regions that previous methods could not. THE REAL McCOIL thus facilitates population genetic analysis when there are polygenomic infections, which are common in many malaria endemic areas.

This is a PLOS Computational Biology Methods Paper.


Malaria has declined significantly over the past decade, but continues to cause half a million deaths annually [1]. Calls for elimination have shifted research efforts towards developing new approaches for transmission reduction, including the identification of source and sink regions and hotspots that sustain transmission [24]. Plasmodium falciparum population genetic tools are increasingly being used to inform these efforts [512] and have been proposed as a means to establish the direction of parasite flows and to determine elimination status both by identifying the source of imported infections and by establishing that no local transmission is occurring [1317]. However, in malaria-endemic regions, infections are frequently characterized by multiple different genotypes (polygenomic infections), which makes interpreting genetic data challenging. As a result, population genetic analyses of malaria parasites have often been limited to monogenomic infections, greatly reducing the utility of available data and potentially introducing biases into results.

Rapid technological developments have led to a proliferation of approaches for characterizing malaria parasite genomes, each with different implications for cost, suitability for field samples across a range of transmission settings, and applicability to different research questions [5,1823]. Many genotyping approaches are based on a small number of single nucleotide polymorphisms (SNPs). SNP data are cheap and straightforward to obtain from commonly used dried blood spot (DBS) samples, collected in a variety of field settings, and remain the most common approach for genotyping studies. However, a high prevalence of polygenomic infections can render estimation of even the most basic parameters from SNP data, such as population allele frequencies, difficult.

Population allele frequencies are usually estimated from monogenomic infections [6,7,24], because of the challenge of estimating the true proportion of each lineage from heterozygous SNP loci resulting from high complexity of infection (COI, the number of clones in an individual). However, constraining data sets to only monogenomic infections may introduce systematic biases because these infections may not be representative. Such constraint also greatly limits the precision of estimates when the majority of samples are polygenomic. It is common to use the proportion of heterozygous calls in each individual or the fraction of polygenomic infections to compare genetic diversity between populations [6,7,16,2527]. However, the complexity of infection underlying polygenomic infections can vary dramatically, and the probability of a particular locus being heterozygous will depend on its allele frequency in the population. COIL (estimating COI using likelihood), was recently developed to provide a more quantitative measure of genetic diversity [28], but unless supplied with external allele frequency data, relies on monogenomic infections to estimate allele frequencies and is therefore problematic when a large fraction of infections are polygenomic. While external allele frequency data can be obtained from parasite population genomic data such as the Pf3K project (, these estimates are only available in specific locations, and may exhibit considerable heterogeneity in space and time.

Here we introduce a new Bayesian approach, Turning HEterozygous SNP data into Robust Estimates of ALelle frequency, via Markov chain Monte Carlo, and Complexity Of Infection using Likelihood (THE REAL McCOIL), to additionally incorporate polygenomic samples, using Markov chain Monte Carlo methods to simultaneously estimate allele frequency and COI. We tested two versions of our method on a series of simulations and then applied it to data on 105 SNP loci in 868 samples from cross-sectional surveys in three regions of varying endemicity in Uganda [2931]. The allele frequencies estimated by our new approach were used to calculate FST [32], a measure of genetic differentiation between sites, and FWS [33], a measure of the within-host genetic diversity. These results demonstrate the utility of THE REAL McCOIL to obtain accurate estimates of COI and allele frequency from SNP data, which can be used to characterize genetic diversity and perform population genetic analyses of parasite populations even in very high transmission settings.

Materials and methods

Ethics statement

The cross sectional survey was approved by IRBs at the University of California, San Francisco (#11–07138) and SOMREC at Makerere University, Uganda (#2011–203).

Methods to estimate population allele frequency and complexity of infection

We developed a Markov chain Monte Carlo (MCMC) method to simultaneously estimate population allele frequency for each SNP and COI for each individual. Since estimating COI and allele frequencies are highly related to each other, our approach explored the uncertainty of both at the same time, and by doing so, incorporated information from polygenomic infections. Assuming there are n individuals and k loci, the parameters to be estimated include complexity of infection for each individual (M = [m1, m2, …, mn]) and population allele frequency for each locus (P = [p1, p2, …, pk]). We used the data in two ways: a categorical method, in which we considered SNP at locus j of individual i, Bij, to be heterozygous or homozygous (0 [homozygous minor allele], 0.5 [heterozygous], 1 [homozygous major allele]), and a proportional method, in which the proportion of major allele at locus j of individual i, Sij, was calculated from the relative signal intensity for each allele (, where A1 and A2 represent the signal intensity of major and minor allele that are obtained from Sequenom or similar types of SNP assays, respectively [34]). The notations are summarized in Table A in S1 File. Similar to COIL, THE REAL McCOIL assumed that different loci are independent, that different samples are independent and polygenomic infections are obtained from multiple independent infections, and that the samples were collected from a single homogeneous population.

Categorical method: Modeling heterozygous/homozygous calls.

The likelihood of observing heterozygous/homozygous calls depends on COI, population allele frequency, and the probability of erroneously calling homozygous loci heterozygous (e1) and conversely calling heterozygous loci homozygous (e2). We have (1) where BTij and BOij represent the true and observed heterozygosity at locus j of individual i (BTij and BOij ∈[0, 0.5, 1]). We specify P(BOij|BTij) to take the following form (Table 1), depending on the values of BTij and BOij: and (2)

We assumed uniform priors for M and P and updated them sequentially using a Metropolis-Hastings algorithm over N = 100,000 iterations, excluding the initial burn-in 1000 iterations to obtain the posterior distributions of M and P. If e1 and e2 were not pre-specified, THE REAL McCOIL estimated their posterior distributions along with M and P. The details of the sampling procedure are described in Text A in S1 File.

Proportional method: Modeling frequency data.

The likelihood of obtaining the raw frequency of signals is composed of the observational model (f, the likelihood of observed frequency of signals given true within-host allele frequency) and the likelihood of true within-host allele frequency (g) as follows: (3) where STij and SOij represent the true and observed frequency of major allele at locus j of individual i (0 ≤ STij, SOij ≤ 1). Consistent with other population genetic approaches [35], we assumed that each observation SOij was drawn from a normal distribution with the mean equal to the true frequency STij and variance equal to , where εest represents the overall level of measurement error. The variance decreased with the intensity of the signal (). To exclude the values outside of [0, 1], we assumed point mass at 0 and 1 and their densities were obtained by integrating values from −∞ to 0 and from 1 to ∞, respectively.

That is, (4) where Φ and ϕ are the cumulative distribution function and the probability density function of the standard normal distribution.

The density of the true within-host frequency was composed of a continuous distribution and point masses at 0 and 1 as follows: (5) where denotes the probability density function of the Beta distribution evaluated at x. The shape and scale parameters, and , respectively, depend on the complexity of infection (mi) and population allele frequency (pj), and were obtained by fitting the simulated data. We estimated values for and pre-analysis, using simulated data to fit Beta distributions for a range of values for mi and pj. To do this, we simulated the within-host allele frequency distribution for given values of mi and pj by sampling a single allele for each infection from a Bernoulli distribution with pj and mixing these alleles with the relative contributions sampled from a uniform distribution as follows: sampling (mi −1) numbers from a uniform distribution, ordering these numbers to obtain , and mixing alleles using the proportions equal to the difference between them, . Biologically, this means the proportion of either lineage can be any value between 0 and 1 with equal probability when mi = 2. We then fit a Beta distribution to the resulting empirical distribution to obtain fitted values and . We performed this for each combination of m and p, where m ranged from 2 to 25 and p ranged from 0.01 to 0.99. As a continuous variable, we rounded observed values of p to the second decimal point to correspond to our discrete simulation range, selecting the appropriate and to calculate the likelihood. Fig A in S1 File shows some examples of the distribution of simulated within-host allele frequencies with the fitted Beta distribution given m and p. While the fitted Beta parameters were obtained by simulating the ratio of mixing from a uniform distribution, the method performed well when the ratio of mixing was sampled from an exponential distribution, and THE REAL McCOIL can incorporate any fitted Beta distributions the users provide. We assumed uniform priors and updated P, M, ST sequentially using a Metropolis-Hastings algorithm over N = 100,000 iterations, excluding the initial burn-in 1000 iterations to obtain posterior distributions of P and M. If εest was not pre-specified, THE REAL McCOIL estimated its posterior distribution along with P and M. The details of sampling procedure are described in Text A in S1 File.


We sampled COI of each individual from a zero-truncated Poisson distribution with mean , and population allele frequency of each locus from a uniform distribution U(0, 1). For each individual, we independently sampled allele(s) for each locus from Bernoulli (pj). We determined the relative proportion of different lineages within the host by sampling the proportion of each infection from a uniform distribution U(0, 1). For comparison, we additionally tried sampling from a truncated exponential distribution with the rate λ = 1. After obtaining within-host allele frequency (STij), we drew SOij from a normal distribution with mean = STij and variance , where ε represents the level of measurement error. We sampled the intensity of the signal I for each locus of each individual from the sum of a Poisson distribution with average and a normal distribution with mean = 0 and variance = 0.25. Simulations were designed to represent the type of raw data obtained from Sequenom or similar types of SNP assays, where an intensity value is obtained for each potential allele [34]. If the intensity of signal was smaller than Imin, we assumed the data were missing. We obtained the intensities of two alleles, A1 and A2, by and , and determined heterozygous calls or homozygous calls by the relative intensity of signals of two alleles, which was characterized by , the angle in polar coordinate system. The SNP was called as heterozygous if was within (d1, d2) and homozygous otherwise (Fig B in S1 File). For simulated data with measurement error ε >0, we used (d1, d2) = (5, 85). For real data, (d1, d2) was determined by expert review of each locus as described below.

We compared the performance of the categorical and proportional versions of our method to COIL, assessing the difference in parameter estimates and variation. We simulated violations of the model assumptions, specifically independence among loci, independence among parasite lineages within the same host, and a single, homogeneous population. Dependence among loci was simulated by different proportions of loci (p) that were linked. We simulated relatedness (r) among lineages within the same host by sampling alleles either from an existing lineage within the same host (with probability r) or from the population (with probability (1-r)). We simulated two equally sized subpopulations with either the same or different average COI and with various levels of difference in allele frequencies and treated them as one single population to test the robustness of the assumption that the population was well-mixed. We also simulated missing data and populations with COI up to 20.

Genotyping of field samples

Dried blood spot samples were obtained from representative cross-sectional surveys performed in 2012 and 2013 as part of the East African International Centers of Excellence in Malaria Research (ICEMR) program. Surveys were performed in each of three sub-counties in Uganda: Nagongera in Tororo District, Kihihi in Kanungu District, and Walukuba in Jinja District. Details of these surveys, along with entomological and cohort data from the same sites have been published [29,31,36,37]. In brief, 200 households from each sub-county were randomly selected from a census population, and all children and an age-stratified sample of adults were enrolled from each household. All samples taken from individuals with evidence of asexual parasitemia by microscopy were selected for Sequenom SNP genotyping, and an age-stratified subset were also selected for merozoite surface protein 2 (msp2) genotyping. The Sequenom assay consisted of 128 SNPs selected to be polymorphic and at intermediate/high frequency in multiple popluations ( After removing variants with elevated missing rate, we retained 105 SNPs (see S1 Table for SNP data) and three of them are in known drug resistance loci. Samples were genotyped according to the relative intensity of the two alleles, as previously described [21]. Genotyping of msp2 was performed with alleles sized by capillary electrophoresis, as previously described [38]. The number of unique alleles were called by a single, expert reader, with allele counts > 5 grouped into a single category due to difficulties in accurately distinguishing artifacts from true alleles at high complexities of infection.

Data analysis

After excluding samples with more than 25% missing SNP data and loci with more than 20% missing data from the analysis, the numbers of individuals included were 462 (71%) [Nagongera], 48 (51%) [Walukuba], and 74 (59%) [Kihihi], and the numbers of loci were 63 (60%) [Nagongera], 49 (47%) [Walukuba], and 52 (50%) [Kihihi]. After these cutoffs, only the analysis of Nagongera included one drug resistance locus, and others included none. We used a permutation test with N = 10,000 to compare estimated COI between groups because there were many ties. In the analysis, we assumed that error rates e1 and e2 were both 0.05 and εest = 0.02. FWS was calculated by (1−HW/HS), where HW and HS are 2pW(1−pW) and 2pS(1−pS) respectively and pw and ps are within-host allele frequency and population allele frequency respectively [33]. The HW/HS ratio was estimated by performing linear regression between HW and HS with fixed intercept = 0.


Simultaneously estimating allele frequencies and the complexity of infection

We simulated data of 100 SNPs from populations with an average COI of 3, 5 and 7 and sample size of 100, and compared estimates of COI and allele frequencies using COIL and THE REAL McCOIL. When average COI was 3, all three methods estimated COI well, although allele frequency estimates from COIL were less precise than THE REAL McCOIL (mean absolute deviation [MAD] = 0.077 [COIL], 0.019 [THE REAL McCOIL categorical], 0.019 [THE REAL McCOIL proportional], Mann-Whitney test p-value < 2×10−16) (Fig 1). When average COI was 5, however, COIL did not estimate COI or allele frequencies accurately (MAD = 1.45 [COI] and 0.15 [allele frequency]), and when COI was 7, it was unable to estimate allele frequencies due to a lack of monogenomic infections. In contrast to COIL, which consistently underestimated or failed to estimate COI in populations with greater numbers of polygenomic infections, THE REAL McCOIL estimated both COI and allele frequencies well even when COI was high (for categorical and proportional methods, respectively: COI = 5, MAD = 0.61, 0.45 [COI] and 0.024, 0.019 [allele frequency]; COI = 7, MAD = 0.86, 0.79 [COI] and 0.025, 0.015 [allele frequency]). Thus, the ability of THE REAL McCOIL to jointly estimate allele frequencies and COI from all available data resulted in considerably improved performance in estimates of both quantities, especially when the average COI was high.

Fig 1. True vs. estimated values of COI (A) and allele frequencies (B) using COIL and THE REAL McCOIL.

Each blue dot represents a sample. The black bar and the grey box show the median and 25% to 75% quantile. THE REAL McCOIL estimated allele frequencies and COI better than COIL, especially when the average COI was high and the majority of infections were polygenomic.

Furthermore, we compared the performance of the categorical and proportional methods when we included measurement error in simulations of observed within-host allele frequency. The categorical method modeled measurement error by incorporating the probability of calling homozygous loci heterozygous (e1) and vice versa (e2) in the likelihood equation, and the proportional method modeled measurement error by assuming that the difference between true and observed within-host allele frequencies decreased with the intensity of the signals, and was proportional to the error parameter (εest). Fig C (A)(C) in S1 File shows that measurement error resulted in a systematic bias in estimates of COI. However, this bias was relatively minor and fairly robust to misspecification of measurement error, especially when the proportional method was used. In addition, allele frequencies were accurately estimated by both methods (Fig C (B)(E) in S1 File). If parameters for measurement error were not specified, THE REAL McCOIL fit them as part of the MCMC. Fig C (D)(F) in S1 File shows that the probability that the 95% credible interval contained the true COI when error parameters were fitted was higher than those when error parameters were greatly mis-specified.

Sensitivity analysis

We next simulated specific violations of the model assumptions to test the robustness of our approach. In particular, we examined the impact of linkage disequilibrium between loci, genetic relatedness of parasites within an individual host, and relatedness between subsets of individuals within the overall population (population substructure). When a proportion of loci (p) were completely linked, COI was slightly overestimated (Fig D in S1 File). When different lineages in the same host were not independent, COI was underestimated and the level of underestimation of COI increased with the level of relatedness (r) (Fig E in S1 File). When we treated two subpopulations as one population, COI was underestimated and the difference between true and estimated COI increased with the difference in the average of COI and the difference in allele frequencies between two subpopulations (Fig F in S1 File). Of these three violations of model assumptions, only a high degree of relatedness between parasites within an individual host resulted in substantial bias in estimates of COI, and none substantially affected estimates of population allele frequencies. Genotyping of real samples often results in missing data; both methods performed well even when 50% of the data were missing (Fig G in S1 File). Furthermore, we tested how the number of loci influences the performance of estimating COI. While the probability that 95% credible interval contained the true COI did not change with the number of loci, the average difference between true and estimated COI decreased (Fig H in S1 File). THE REAL McCOIL provided unbiased estimates even when COI was very high (e.g. 15–20), despite the uncertainty of the estimates increasing with true COI (Fig I in S1 File).

Complexity of infection and allele frequencies in three regions of Uganda

We next applied THE REAL McCOIL to data on 105 SNPs generated from smear positive individuals identified in cross-sectional surveys in three regions of Uganda [36,37] and compared results obtained from THE REAL McCOIL to those using COIL. Both categorical and proportional methods were applied and showed consistent results; for simplicity we therefore present only results from the categorical method.

Nagongera, Kihihi, and Walukuba have been shown to have transmission intensities varying by approximately 100 fold, with entomological inoculation rates recently measured at 310, 32, and 2.8 infectious bites per person year, respectively [29]. Using COIL, the estimated COI was relatively low, with little difference between the 3 sites (median COI = 2 [Nagongera], 2 [Walukuba], and 1.5 [Kihihi]) (Fig 2A). In contrast, results from THE REAL McCOIL show that the COI in Nagongera and Walukuba were similar, and much higher than that in Kihihi (median COI = 5 [Nagongera], 4.5 [Walukuba], and 1 [Kihihi])(Fig 2A, Table B in S1 File and S2 Table). These differences between sites were not captured by COIL because of its dependence on monogenomic infections to obtain estimates of allele frequencies, which were rare in these individuals. We also compared our results to COI estimated using another standard method, msp2 typing, which was performed on a subset of the samples (Fig J in S1 File). Unlike THE REAL McCOIL, however, msp2 typing estimated similar COI in Walukuba and Kihihi (p-value = 0.49) (Fig 2A). msp2 encodes an antigen that elicits strong antibody responses, and this discrepancy may be due to complex population structure arising from immune selection. The difference may also result from the resolution of msp2 typing, which is constrained to COI ≤ 5 [39], or the fact that it is a single marker, rather than a collection of genome-wide markers.

Fig 2. Estimates of COI in Nagongera, Walukuba, and Kihihi.

(A) Estimates of COI by COIL, THE REAL McCOIL, and msp2. For THE REAL McCOIL, the point estimates of COI shown are medians from the posterior distributions. The COI estimated by THE REAL McCOIL in Nagongera and Walukuba were similar, and much higher than that in Kihihi (median COI = 5 [Nagongera], 4.5 [Walukuba], and 1 [Kihihi]; permutation test, p-values = 0.158 [Nagongera vs. Walukuba], 0.002 [Nagongera vs. Kihihi], 0.0006 [Walukuba vs. Kihihi]). Allele counts > 5 in msp2 typing were grouped into a single category due to difficulties in accurately distinguishing artifacts from true alleles at high complexities of infection. The dashed red lines represent the medians of COI in three regions. (B) The spatial distribution of estimated COI by THE REAL McCOIL in three regions. Small random noise was added to the location of samples in the map. COI of samples collected from the West of Walukuba was higher than those from the East of Walukuba (medians = 5 [West] and 3 [East], p-value = 0.027).

The high COI observed in the lowest transmission site of Walukuba was unexpected but reflected clear differences in the proportion of heterozygous calls, which was similar between Nagongera and Walukuba and lower in Kihihi (Fig K in S1 File). The distributions of age and parasite density were similar between the sites, and thus unlikely to explain these differences (Fig L and Fig M in S1 File). We calculated FWS, an inverse measure of outcrossing [33,40], and found that it was significantly negatively associated with our COI estimates (Fig 3; Pearson’s correlation test between log(COI) and FWS, ρ = −0.93 [Nagongera], −0.94 [Walukuba], and −0.95 [Kihihi], p-values < 2.2×10−16 for all). FWS in Nagongera and Walukuba are similar and lower than that in Kihihi, suggesting that the level of outcrossing is smallest in Kihihi, which is consistent with the pattern of COI.

Fig 3. FWS.

(A) Estimated COI by THE REAL McCOIL was negatively associated with FWS. (B) FWS in Kihihi was higher than Nagongera and Walukuba. The FWS values shown were calculated using population allele frequencies estimated from categorical method of THE REAL McCOIL.

We also examined the relationship between COI and epidemiological and geographical factors within each site. In Nagongera, COI in young children increased with age until peaking at age 7, and then decreased; sample sizes for the other two sites were too small to estimate trends (Fig N in S1 File). Interestingly, parasite density was negatively correlated with COI after adjusting for age (partial correlation r = −0.15 [Nagongera], −0.27 [Walukuba], −0.23 [Kihihi], p-values = 0.0011 [Nagongera], 0.058 [Walukuba], 0.043 [Kihihi]). This negative association was most pronounced in those aged 3–10 years in Nagongera (Fig O in S1 File), and may reflect the dominance of particular clones in acute, high-density infections. No differences in COI were observed between households with or without Insecticide Treated Nets (ITNs), or between sampling years.

In Kihihi, elevation and COI were negatively associated (r = −0.259, p-value = 0.026), consistent with the previously identified negative associations between elevation and mosquito density, the incidence of malaria, and serological evidence of exposure [41]. Interestingly, the unexpectedly high COI observed in Walukuba was largely driven by samples collected from the West of this sub-county, (Fig 2B; medians = 5 [West] and 3 [East], p-value = 0.027). We have previously noted that mosquito densities in Walukuba are lower in the West, which is closer to urban centers, as compared to the East, which is a fishing village comprised largely of makeshift wooden housing [42]. One potential explanation for this seemingly paradoxical finding—high COI in the lowest transmission part of the lowest transmission site–is that a substantial proportion of these infections were imported from areas of higher transmission, where parasite populations are more diverse and co-transmission of multiple genetically distinct parasites is more likely.

Finally, we compared allele frequencies from each of the three sites to determine whether there was any evidence of population differentiation. We found little genetic differentiation between sites measured based on our estimated allele frequencies (FST ranged from 0.004 to 0.04; Table C in S1 File and S3 Table), although Kihihi, which is somewhat geographically isolated, had slightly higher FST with respect to the other two sites.


Despite the availability of increasingly efficient genotyping technologies for molecular epidemiology, the prevalence of polygenomic infections in malaria-endemic regions hinders the estimation of basic population genetic parameters for Plasmodium falciparum. While COIL can estimate COI using allele frequencies from monogenomic infections or external data, direct estimation of allele frequencies from all samples is a preferable approach, particularly when no relevant frequency data are available and sample size is sufficient to overcome stochastic sampling error. THE REAL McCOIL accomplishes this by incorporating information from polygenomic infections to simultaneously estimate COI and population allele frequencies. We show through detailed simulations that our approach is robust to most model assumptions and can readily handle missing data. In addition, THE REAL McCOIL can utilize raw SNP genotyping data, allowing the method to be robust to errors in allele calling. Analysis of genotyping data from Uganda show that THE REAL McCOIL is able to identify nuances in field data that previous methods could not. In particular, compared with msp2 genotyping or applying COIL to SNP data, we identified much higher average COI overall and epidemiologically relevant variation between and within study sites.

Through a number of simulations, we show that results obtained from THE REAL McCOIL are robust to assumptions that loci are independent and that the parasite population is homogeneous. As would be expected, a high degree of relatedness between parasites within an individual host resulted in substantial downward bias in estimates of COI. This is not trivial, as parasites in some epidemiological settings may be closely related within a host, e.g. due to co-transmission [43]. Fortunately, we found that this bias follows a clear linear pattern and can either be corrected if the level of relatedness is known, estimated directly from the data, or can at least be given reasonable bounds (Text B in S1 File). While estimating the level of relatedness may be challenging, enough information may be present in the data to do so in some cases, as demonstrated by a recent paper which estimated this parameter from sequence-read data [44]. THE REAL McCOIL can also be applied to read-based SNP data, and in theory can be extended to estimate relatedness. While we note that the most obvious model for measurement error in sequence-read data is a binomial distribution (Text C in S1 File), a normal distribution as applied in our current version offers a reasonable approximation and has computational advantages.

Genotyping of one or a few highly polymorphic antigen markers, such as msp1 and msp2, is currently the most common method for determining COI [45,46]. The use of capillary electrophoresis has improved resolution of alleles, but due to the creation of PCR artifacts it is still difficult to accurately measure COI > 5 [38]. Deep sequencing of antigens such as csp is an alternative approach [47,48]. However, with all of these approaches, immune selection on these genes within individuals and in a population can bias estimates of COI in ways which are difficult to predict [49,50]. Since loci under different types of selection can evolve independently in the presence of recombination, the diversity and geographic distribution of loci under immune selection may not be the same as observed among SNP loci. Both recombination rate and immune selection pressure will vary systematically with transmission intensity, resulting in complex associations between different genetic markers. Therefore, multiple genetic lineages defined by SNP panels may be associated with few msp2 alleles, or vice versa, depending on the transmission setting and selective environment. In addition, if lineages within the host are related, using multiple markers across the genome is more likely to detect multiple lineages than using one region of the genome. FWS, based on the difference between within-host and population heterozygosity, is a related metric used to quantify within-host diversity [33]. While FWS is correlated with COI, the metric is conceptually different because it is influenced by both the relative proportions of lineages within the host and population allele frequencies [21,33,40]. estMOI [51] uses phasing information from sequence reads and the number of unique allelic combinations to estimate COI but requires deep sequencing data and can be biased by sequencing error. Some methods that use SNP data to estimate haplotype frequencies also simultaneously estimate COI [52,53]. However, current haplotype-based methods can only consider a limited number of loci (~7) because the number of possible haplotypes quickly expands with the number of loci. We expect that THE REAL McCOIL is better at estimating COI than these methods because it can incorporate a much larger number of SNPs. Moreover, COI estimated from THE REAL McCOIL could be used as a prior in tools estimating haplotype frequencies.

Application of THE REAL McCOIL to genotyping data from Uganda allowed us to calculate allele frequencies and FST, which was not possible to do from the raw data or using COIL due to the high proportion of heterozygous calls. THE REAL McCOIL also provided estimates of COI for all sites, which demonstrated associations with epidemiologic factors not identified using msp2 genotyping. Interestingly, we identified a high COI in the lowest transmission site, potentially indicating importation of parasites from higher transmission areas. Although the possibility remains that recent transmission reduction left complex, chronic infections in its wake, explaining the high COI observed in Walukuba, the simplest explanation is that these infections were imported from high transmission settings nearby. Additionally, our results demonstrated that COI increased with age until age 7, and subsequently decreased, consistent with studies based on msp1 and/or msp2 typing [5459]. Previous studies reported inconsistent associations between COI and parasite density for children > 2 years old (positive [55,58,60], none [54,61], or negative [62]). We observed a negative association between COI and parasite density in children aged 3–10 in Nagongera. Although higher parasite density may help detect more strains within the host [6365], the detection of minority strains may be more influenced by relative proportions of the strains [39]. Individuals with high parasite densities may be relatively immunologically naïve and have one or few lineages dominating the infection [66]. Lower parasite densities may be associated with partial immunity and parasite persistence, and consequently the accumulation of parasite lineages [6771]. Also, parasite lineages are more likely to persist and accumulate in people with low parasite density because they are less likely to have clinical symptoms [70,72] and be treated. The discrepancy between studies can be due to different genetic markers, different transmission setting and immune levels, different contribution of co-transmission vs. superinfections, or some combination of these factors.

In summary, THE REAL McCOIL facilitates population genetic analysis of SNP data from polygenomic infections, which are common in many transmission settings and may predominate even in low transmission settings. Population allele frequency, which was previously difficult to estimate if the majority of samples were polygenomic, can be estimated by THE REAL McCOIL, allowing downstream analysis that requires frequencies, such as estimating FST, FWS, and effective population size (Ne) [32,33,73]. THE REAL McCOIL is not only limited to P. falciparum, but can also be applied to other parasite species with polygenomic infections [74], including Plasmodium vivax [75]. Codes for THE REAL McCOIL are available on GitHub (

Supporting information

S1 File. Supporting information.

Supplementary texts, figures and tables.


S2 Table. The 95% credible intervals of COI of samples from Uganda.


S3 Table. The 95% credible intervals of allele frequencies.



We thank Aimee Taylor and Rachel Daniels for helpful discussions.

Author Contributions

  1. Conceptualization: HHC COB BG.
  2. Formal analysis: HHC.
  3. Methodology: HHC CJW DEN.
  5. Supervision: COB BG.
  6. Writing – original draft: HHC COB BG.
  7. Writing – review & editing: CJW GD DEN RA.


  1. 1. World Health Organization (2015) World Malaria Report 2015. Geneva, Switzerland: World Health Organization.
  2. 2. Bousema T, Drakeley C, Gesase S, Hashim R, Magesa S, et al. (2010) Identification of hot spots of malaria transmission for targeted malaria control. J Infect Dis 201: 1764–1774. pmid:20415536
  3. 3. Bousema T, Griffin JT, Sauerwein RW, Smith DL, Churcher TS, et al. (2012) Hitting hotspots: spatial targeting of malaria for control and elimination. PLoS Med 9: e1001165. pmid:22303287
  4. 4. Moonen B, Cohen JM, Snow RW, Slutsker L, Drakeley C, et al. (2010) Operational strategies to achieve and maintain malaria elimination. Lancet 376: 1592–1603. pmid:21035841
  5. 5. Carlton JM, Volkman SK, Uplekar S, Hupalo DN, Pereira Alves JM, et al. (2015) Population Genetics, Evolutionary Genomics, and Genome-Wide Studies of Malaria: A View Across the International Centers of Excellence for Malaria Research. Am J Trop Med Hyg 93: 87–98.
  6. 6. Daniels R, Chang HH, Sene PD, Park DC, Neafsey DE, et al. (2013) Genetic surveillance detects both clonal and epidemic transmission of malaria following enhanced intervention in Senegal. PLoS One 8: e60780. pmid:23593309
  7. 7. Daniels RF, Schaffner SF, Wenger EA, Proctor JL, Chang HH, et al. (2015) Modeling malaria genomics reveals transmission decline and rebound in Senegal. Proc Natl Acad Sci U S A 112: 7067–7072. pmid:25941365
  8. 8. Conway DJ (2007) Molecular epidemiology of malaria. Clin Microbiol Rev 20: 188–204. pmid:17223628
  9. 9. Malaria GENPfCP (2016) Genomic epidemiology of artemisinin resistant malaria. Elife 5.
  10. 10. Miotto O, Amato R, Ashley EA, MacInnis B, Almagro-Garcia J, et al. (2015) Genetic architecture of artemisinin-resistant Plasmodium falciparum. Nat Genet 47: 226–234. pmid:25599401
  11. 11. Mobegi VA, Loua KM, Ahouidi AD, Satoguina J, Nwakanma DC, et al. (2012) Population genetic structure of Plasmodium falciparum across a region of diverse endemicity in West Africa. Malar J 11: 223. pmid:22759447
  12. 12. Nkhoma SC, Nair S, Al-Saai S, Ashley E, McGready R, et al. (2013) Population genetic correlates of declining transmission in a human pathogen. Mol Ecol 22: 273–285. pmid:23121253
  13. 13. Obaldia N 3rd, Baro NK, Calzada JE, Santamaria AM, Daniels R, et al. (2015) Clonal outbreak of Plasmodium falciparum infection in eastern Panama. J Infect Dis 211: 1087–1096. pmid:25336725
  14. 14. Patel JC, Taylor SM, Juliao PC, Parobek CM, Janko M, et al. (2014) Genetic Evidence of Importation of Drug-Resistant Plasmodium falciparum to Guatemala from the Democratic Republic of the Congo. Emerg Infect Dis 20: 932–940. pmid:24856348
  15. 15. Wei G, Zhang L, Yan H, Zhao Y, Hu J, et al. (2015) Evaluation of the population structure and genetic diversity of Plasmodium falciparum in southern China. Malar J 14: 283. pmid:26194795
  16. 16. Escalante AA, Ferreira MU, Vinetz JM, Volkman SK, Cui L, et al. (2015) Malaria Molecular Epidemiology: Lessons from the International Centers of Excellence for Malaria Research Network. Am J Trop Med Hyg 93: 79–86. pmid:26259945
  17. 17. Greenhouse B, Smith DL (2015) Malaria genotyping for epidemiologic surveillance. Proc Natl Acad Sci U S A 112: 6782–6783. pmid:26016526
  18. 18. Anderson TJ, Haubold B, Williams JT, Estrada-Franco JG, Richardson L, et al. (2000) Microsatellite markers reveal a spectrum of population structures in the malaria parasite Plasmodium falciparum. Mol Biol Evol 17: 1467–1482. pmid:11018154
  19. 19. Daniels R, Volkman SK, Milner DA, Mahesh N, Neafsey DE, et al. (2008) A general SNP-based molecular barcode for Plasmodium falciparum identification and tracking. Malar J 7: 223. pmid:18959790
  20. 20. Chang HH, Park DJ, Galinsky KJ, Schaffner SF, Ndiaye D, et al. (2012) Genomic sequencing of Plasmodium falciparum malaria parasites from Senegal reveals the demographic history of the population. Mol Biol Evol 29: 3427–3439. pmid:22734050
  21. 21. Manske M, Miotto O, Campino S, Auburn S, Almagro-Garcia J, et al. (2012) Analysis of Plasmodium falciparum diversity in natural infections by deep sequencing. Nature 487: 375–379. pmid:22722859
  22. 22. Mobegi VA, Duffy CW, Amambua-Ngwa A, Loua KM, Laman E, et al. (2014) Genome-wide analysis of selection on the malaria parasite Plasmodium falciparum in West African populations of differing infection endemicity. Mol Biol Evol 31: 1490–1499. pmid:24644299
  23. 23. Haasl RJ, Payseur BA (2011) Multi-locus inference of population structure: a comparison between single nucleotide polymorphisms and microsatellites. Heredity (Edinb) 106: 158–171.
  24. 24. Sisya TJ, Kamn'gona RM, Vareta JA, Fulakeza JM, Mukaka MF, et al. (2015) Subtle changes in Plasmodium falciparum infection complexity following enhanced intervention in Malawi. Acta Trop 142: 108–114. pmid:25460345
  25. 25. Chang HH, Meibalan E, Zelin J, Daniels R, Eziefula AC, et al. (2016) Persistence of Plasmodium falciparum parasitemia after artemisinin combination therapy: evidence from a randomized trial in Uganda. Sci Rep 6: 26330. pmid:27197604
  26. 26. Echeverry DF, Nair S, Osorio L, Menon S, Murillo C, et al. (2013) Long term persistence of clonal malaria parasite Plasmodium falciparum lineages in the Colombian Pacific region. BMC Genet 14: 2. pmid:23294725
  27. 27. Johnston WT, Mutalima N, Sun D, Emmanuel B, Bhatia K, et al. (2014) Relationship between Plasmodium falciparum malaria prevalence, genetic diversity and endemic Burkitt lymphoma in Malawi. Sci Rep 4: 3741. pmid:24434689
  28. 28. Galinsky K, Valim C, Salmier A, de Thoisy B, Musset L, et al. (2015) COIL: a methodology for evaluating malarial complexity of infection using likelihood from single nucleotide polymorphism data. Malar J 14: 4. pmid:25599890
  29. 29. Kilama M, Smith DL, Hutchinson R, Kigozi R, Yeka A, et al. (2014) Estimating the annual entomological inoculation rate for Plasmodium falciparum transmitted by Anopheles gambiae s.l. using three sampling methods in three sites in Uganda. Malar J 13: 111. pmid:24656206
  30. 30. Okello PE, Van Bortel W, Byaruhanga AM, Correwyn A, Roelants P, et al. (2006) Variation in malaria transmission intensity in seven sites throughout Uganda. Am J Trop Med Hyg 75: 219–225. pmid:16896122
  31. 31. Kamya MR, Arinaitwe E, Wanzira H, Katureebe A, Barusya C, et al. (2015) Malaria transmission, infection, and disease at three sites with varied transmission intensity in Uganda: implications for malaria control. Am J Trop Med Hyg 92: 903–912. pmid:25778501
  32. 32. Weir BS, Cockerham CC (1984) Estimating F-Statistics for the Analysis of Population Structure. Evolution 38: 1358–1370.
  33. 33. Auburn S, Campino S, Miotto O, Djimde AA, Zongo I, et al. (2012) Characterization of within-host Plasmodium falciparum diversity using next-generation sequence data. PLoS One 7: e32891. pmid:22393456
  34. 34. Ross P, Hall L, Smirnov I, Haff L (1998) High level multiplex genotyping by MALDI-TOF mass spectrometry. Nat Biotechnol 16: 1347–1351. pmid:9853617
  35. 35. Wen X, Stephens M (2010) Using Linear Predictors to Impute Allele Frequencies from Summary or Pooled Genotype Data. Ann Appl Stat 4: 1158–1182. pmid:21479081
  36. 36. Nankabirwa JI, Yeka A, Arinaitwe E, Kigozi R, Drakeley C, et al. (2015) Estimating malaria parasite prevalence from community surveys in Uganda: a comparison of microscopy, rapid diagnostic tests and polymerase chain reaction. Malar J 14: 528. pmid:26714465
  37. 37. Yeka A, Nankabirwa J, Mpimbaza A, Kigozi R, Arinaitwe E, et al. (2015) Factors associated with malaria parasitemia, anemia and serological responses in a spectrum of epidemiological settings in Uganda. PLoS One 10: e0118901. pmid:25768015
  38. 38. Gupta V, Dorsey G, Hubbard AE, Rosenthal PJ, Greenhouse B (2010) Gel versus capillary electrophoresis genotyping for categorizing treatment outcomes in two anti-malarial trials in Uganda. Malar J 9: 19. pmid:20074380
  39. 39. Greenhouse B, Myrick A, Dokomajilar C, Woo JM, Carlson EJ, et al. (2006) Validation of microsatellite markers for use in genotyping polyclonal Plasmodium falciparum infections. Am J Trop Med Hyg 75: 836–842. pmid:17123974
  40. 40. Murray L, Mobegi VA, Duffy CW, Assefa SA, Kwiatkowski DP, et al. (2016) Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections. Malar J 15: 275. pmid:27176827
  41. 41. Helb DA, Tetteh KK, Felgner PL, Skinner J, Hubbard A, et al. (2015) Novel serologic biomarkers provide accurate estimates of recent Plasmodium falciparum exposure for individuals and communities. Proc Natl Acad Sci U S A 112: E4438–4447. pmid:26216993
  42. 42. Kigozi SP, Pindolia DK, Smith DL, Arinaitwe E, Katureebe A, et al. (2015) Associations between urbanicity and malaria at local scales in Uganda. Malar J 14: 374. pmid:26415959
  43. 43. Nkhoma SC, Nair S, Cheeseman IH, Rohr-Allegrini C, Singlam S, et al. (2012) Close kinship within multiple-genotype malaria parasite infections. Proc Biol Sci 279: 2589–2598. pmid:22398165
  44. 44. O'Brien JD, Iqbal Z, Wendler J, Amenga-Etego L (2016) Inferring Strain Mixture within Clinical Plasmodium falciparum Isolates from Genomic Sequence Data. PLoS Comput Biol 12: e1004824. pmid:27362949
  45. 45. Snounou G, Beck HP (1998) The use of PCR genotyping in the assessment of recrudescence or reinfection after antimalarial drug treatment. Parasitol Today 14: 462–467. pmid:17040849
  46. 46. Viriyakosol S, Siripoon N, Petcharapirat C, Petcharapirat P, Jarra W, et al. (1995) Genotyping of Plasmodium falciparum isolates by the polymerase chain reaction and potential uses in epidemiological studies. Bull World Health Organ 73: 85–95.
  47. 47. Neafsey DE, Juraska M, Bedford T, Benkeser D, Valim C, et al. (2015) Genetic Diversity and Protective Efficacy of the RTS,S/AS01 Malaria Vaccine. N Engl J Med 373: 2025–2037. pmid:26488565
  48. 48. Bailey JA, Mvalo T, Aragam N, Weiser M, Congdon S, et al. (2012) Use of massively parallel pyrosequencing to evaluate the diversity of and selection on Plasmodium falciparum csp T-cell epitopes in Lilongwe, Malawi. J Infect Dis 206: 580–587. pmid:22551816
  49. 49. Ferreira MU, Hartl DL (2007) Plasmodium falciparum: worldwide sequence diversity and evolution of the malaria vaccine candidate merozoite surface protein-2 (MSP-2). Exp Parasitol 115: 32–40. pmid:16797008
  50. 50. Escalante AA, Lal AA, Ayala FJ (1998) Genetic polymorphism and natural selection in the malaria parasite Plasmodium falciparum. Genetics 149: 189–202. pmid:9584096
  51. 51. Assefa SA, Preston MD, Campino S, Ocholla H, Sutherland CJ, et al. (2014) estMOI: estimating multiplicity of infection using parasite deep sequencing data. Bioinformatics 30: 1292–1294. pmid:24443379
  52. 52. Li X, Foulkes AS, Yucel RM, Rich SM (2007) An expectation maximization approach to estimate malaria haplotype frequencies in multiply infected children. Stat Appl Genet Mol Biol 6: Article33. pmid:18052916
  53. 53. Takala SL, Smith DL, Stine OC, Coulibaly D, Thera MA, et al. (2006) A high-throughput method for quantifying alleles and haplotypes of the malaria vaccine candidate Plasmodium falciparum merozoite surface protein-1 19 kDa. Malar J 5: 31. pmid:16626494
  54. 54. Smith T, Beck HP, Kitua A, Mwankusye S, Felger I, et al. (1999) Age dependence of the multiplicity of Plasmodium falciparum infections and of other malariological indices in an area of high endemicity. Trans R Soc Trop Med Hyg 93 Suppl 1: 15–20.
  55. 55. Peyerl-Hoffmann G, Jelinek T, Kilian A, Kabagambe G, Metzger WG, et al. (2001) Genetic diversity of Plasmodium falciparum and its relationship to parasite density in an area with different malaria endemicities in West Uganda. Trop Med Int Health 6: 607–613. pmid:11555426
  56. 56. Ntoumi F, Contamin H, Rogier C, Bonnefoy S, Trape JF, et al. (1995) Age-dependent carriage of multiple Plasmodium falciparum merozoite surface antigen-2 alleles in asymptomatic malaria infections. Am J Trop Med Hyg 52: 81–88. pmid:7856831
  57. 57. Konate L, Zwetyenga J, Rogier C, Bischoff E, Fontenille D, et al. (1999) Variation of Plasmodium falciparum msp1 block 2 and msp2 allele prevalence and of infection complexity in two neighbouring Senegalese villages with different transmission conditions. Trans R Soc Trop Med Hyg 93 Suppl 1: 21–28.
  58. 58. Henning L, Schellenberg D, Smith T, Henning D, Alonso P, et al. (2004) A prospective study of Plasmodium falciparum multiplicity of infection and morbidity in Tanzanian children. Trans R Soc Trop Med Hyg 98: 687–694. pmid:15485698
  59. 59. Branch OH, Takala S, Kariuki S, Nahlen BL, Kolczak M, et al. (2001) Plasmodium falciparum genotypes, low complexity of infection, and resistance to subsequent malaria in participants in the Asembo Bay Cohort Project. Infect Immun 69: 7783–7792. pmid:11705960
  60. 60. Vafa M, Troye-Blomberg M, Anchang J, Garcia A, Migot-Nabias F (2008) Multiplicity of Plasmodium falciparum infection in asymptomatic children in Senegal: relation to transmission, age and erythrocyte variants. Malar J 7: 17. pmid:18215251
  61. 61. Agyeman-Budu A, Brown C, Adjei G, Adams M, Dosoo D, et al. (2013) Trends in multiplicity of Plasmodium falciparum infections among asymptomatic residents in the middle belt of Ghana. Malar J 12: 22. pmid:23327681
  62. 62. Kidima W, Nkwengulila G (2015) Plasmodium falciparum msp2 Genotypes and Multiplicity of Infections among Children under Five Years with Uncomplicated Malaria in Kibaha, Tanzania. J Parasitol Res 2015: 721201. pmid:26770821
  63. 63. Contamin H, Fandeur T, Bonnefoy S, Skouri F, Ntoumi F, et al. (1995) PCR typing of field isolates of Plasmodium falciparum. J Clin Microbiol 33: 944–951. pmid:7790466
  64. 64. Mayor A, Saute F, Aponte JJ, Almeda J, Gomez-Olive FX, et al. (2003) Plasmodium falciparum multiple infections in Mozambique, its relation to other malariological indices and to prospective risk of malaria morbidity. Trop Med Int Health 8: 3–11. pmid:12535242
  65. 65. Guerra-Neira A, Rubio JM, Royo JR, Ortega JC, Aunon AS, et al. (2006) Plasmodium diversity in non-malaria individuals from the Bioko Island in Equatorial Guinea (West Central-Africa). Int J Health Geogr 5: 27. pmid:16784527
  66. 66. Childs LM, Buckee CO (2015) Dissecting the determinants of malaria chronicity: why within-host models struggle to reproduce infection dynamics. J R Soc Interface 12: 20141379. pmid:25673299
  67. 67. Felger I, Smith T, Edoh D, Kitua A, Alonso P, et al. (1999) Multiple Plasmodium falciparum infections in Tanzanian infants. Trans R Soc Trop Med Hyg 93 Suppl 1: 29–34.
  68. 68. al-Yaman F, Genton B, Reeder JC, Anders RF, Smith T, et al. (1997) Reduced risk of clinical malaria in children infected with multiple clones of Plasmodium falciparum in a highly endemic area: a prospective community study. Trans R Soc Trop Med Hyg 91: 602–605. pmid:9463681
  69. 69. Farnert A, Rooth I, Svensson , Snounou G, Bjorkman A (1999) Complexity of Plasmodium falciparum infections is consistent over time and protects against clinical disease in Tanzanian children. J Infect Dis 179: 989–995. pmid:10068596
  70. 70. Doolan DL, Dobano C, Baird JK (2009) Acquired immunity to malaria. Clin Microbiol Rev 22: 13–36, Table of Contents. pmid:19136431
  71. 71. Smith T, Felger I, Tanner M, Beck HP (1999) Premunition in Plasmodium falciparum infection: insights from the epidemiology of multiple infections. Trans R Soc Trop Med Hyg 93 Suppl 1: 59–64.
  72. 72. Ali H, Ahsan T, Mahmood T, Bakht SF, Farooq MU, et al. (2008) Parasite density and the spectrum of clinical illness in falciparum malaria. J Coll Physicians Surg Pak 18: 362–368. pmid:18760048
  73. 73. Pamilo P, Varvio-Aho SL (1980) On the estimation of population size from allele frequency changes. Genetics 95: 1055–1057. pmid:17249052
  74. 74. Balmer O, Tanner M (2011) Prevalence and implications of multiple-strain infections. Lancet Infect Dis 11: 868–878. pmid:22035615
  75. 75. Friedrich LR, Popovici J, Kim S, Dysoley L, Zimmerman PA, et al. (2016) Complexity of Infection and Genetic Diversity in Cambodian Plasmodium vivax. PLoS Negl Trop Dis 10: e0004526. pmid:27018585