Mitotic Evolution of Plasmodium falciparum Shows a Stable Core Genome but Recombination in Antigen Families

Malaria parasites elude eradication attempts both within the human host and across nations. At the individual level, parasites evade the host immune responses through antigenic variation. At the global level, parasites escape drug pressure through single nucleotide variants and gene copy amplification events conferring drug resistance. Despite their importance to global health, the rates at which these genomic alterations emerge have not been determined. We studied the complete genomes of different Plasmodium falciparum clones that had been propagated asexually over one year in the presence and absence of drug pressure. A combination of whole-genome microarray analysis and next-generation deep resequencing (totaling 14 terabases) revealed a stable core genome with only 38 novel single nucleotide variants appearing in seventeen evolved clones (avg. 5.4 per clone). In clones exposed to atovaquone, we found cytochrome b mutations as well as an amplification event encompassing the P. falciparum multidrug resistance associated protein (mrp1) on chromosome 1. We observed 18 large-scale (>1 kb on average) deletions of telomere-proximal regions encoding multigene families, involved in immune evasion (9.5×10−6 structural variants per base pair per generation). Six of these deletions were associated with chromosomal crossovers generated during mitosis. We found only minor differences in rates between genetically distinct strains and between parasites cultured in the presence or absence of drug. Using these derived mutation rates for P. falciparum (1.0–9.7×10−9 mutations per base pair per generation), we can now model the frequency at which drug or immune resistance alleles will emerge under a well-defined set of assumptions. Further, the detection of mitotic recombination events in var gene families illustrates how multigene families can arise and change over time in P. falciparum. These results will help improve our understanding of how P. falciparum evolves to evade control efforts within both the individual hosts and large populations.


Introduction
Although the global burden of malaria has declined over the last few years to 216 million cases and 655,000 deaths in 2010 [1], the overall goal of global eradication is still out of reach. Emerging resistance to artemisinin, a frontline chemotherapeutic for which resistance is not widespread, has recently been reported along the Thai-Cambodia border (reviewed in [2]). Furthermore, RTS,S, the most advanced vaccine candidate in development, is only minimally effective and does not induce long-lived sterile immunity [3].
A primary reason why malaria is difficult to control is its genome's ability to recombine and/or mutate away from a protective immune response or drug pressure. For example, the development of an effective vaccine has been hampered by the prevalence of strain-specific immunity, where vaccination with one antigenic haplotype protects for only one specific variant [4]. To date, this has been attributed to pre-existing genetic diversity; however, it may also be that escape mutants emerge in vaccinated individuals. Plasticity of the Plasmodium genome can also contribute to the evolution of resistance against anti-malarial drugs. Single nucleotide variants (SNVs) and copy number variants (CNVs) in target and resistance genes allow the parasites to evade drug pressure. Most notably, the emergence of chloroquine-resistant parasites ultimately caused a huge resurgence in the number of malaria cases in the 1990s. Although these two mechanisms are well described, it is not understood how often variation arises during mitotic asexual growth or how quickly SNVs accumulate in the absence of selection pressure.
In addition to diversity at the population level, there is also variability within the individual parasite. Multigene families, where only one or few members are expressed, provide antigenetic diversity and allow the parasite to persist in a host. Recombination events which occur in meiosis [5,6] as well as mitosis [7] give rise to new variants in these already diverse families. This genetic variability in parasites, both in an individual host and on a population level, allows the parasite to evade the host immune system even in the absence of transmission (i.e. during dry seasons).
Given this remarkable genetic diversity, it is not surprising that naturally-infected patients often carry multiple, genetically distinct parasite clones. The multiplicity of infection (MOI) has traditionally been estimated with a handful of genetic markers, which may encode proteins under strong selection by the host immune system. However, these methods are not comprehensive enough to measure true parasite heterogeneity and many variants are missed within an individual [8][9][10]. It is unclear whether parasite heterogeneity is created through multiple infectious mosquito bites, heterogeneity in a single mosquito inoculation, or evolution of new genetic changes (SNVs and structural variants) within the human host.
Knowing the rate at which genetic changes occur is critical to understanding the emergence of drug resistance, the evolution of antigen polymorphisms and multigene families, and the patterns of malaria transmission. It is not possible to study neutral parasite mutation rates in humans due to the influence of selection pressure of the host immune response and genetic host-to-host variability. Previous quantifications of the mutation rate have focused only on single genes under drug selection [11]. The basal rate at which mutations drive Plasmodium evolution has therefore never been measured at the whole-genome level and much must be inferred.
Using whole-genome sequencing as well as whole-genome tiling arrays, we have determined the rates at which genetic changes occur in clonal parasite populations in the absence of immune and drug pressure in vitro. In addition to the accumulation of SNVs in the core genome, we observed major deletions in the subtelomeric regions and identified seven mitotic recombination events. The rates of these events were not changed by the addition of atovaquone, a commonly used anti-malarial drug.

Generation of clonal P. falciparum lines in long-term in vitro culture
To study the natural genomic plasticity of P. falciparum, we investigated how the parasite genome changes over time. A single clonal parent was split into six lines. To investigate the effect of selection pressure on the mutation rate, five parasite lines were exposed to atovaquone (ATQ), a hydroxy-1,4-naphthoquinone that targets the mitochondria-encoded cytochrome bc 1 (CYTbc 1) complex of the electron transport chain of Plasmodium parasites [12]. ATQ is a component of Malarone, a traveler's medicine drug combination currently used to prevent or treat malaria. Resistance mutations are known to arise quickly [13]. Five lines were exposed to various concentrations of ATQ (R1, R2 and R3: 2 nM, R4: 20 nM, R5: 50 nM) and one line was cultured without drug pressure (S1) for up to a year ( Figure 1A). For each line, two clones were selected (three clones for the drug-free line). The four clones of lines R1 and R2 were retained in culture and cloned again (R1a/b and R2a/b, generation 1 and 2). Growth inhibition doseresponse assays confirmed that ATQ-resistant clones had indeed acquired at least a 9-fold increase in EC 50 values for ATQ compared to the 3D7 parent ( Figure 1B). Hence, we were able to evolve drug-resistant parasite clones that are 10-fold more tolerant to ATQ than their parent. To facilitate the analysis of different parasites lines on a whole-genome level, all lines were cloned by limiting dilution before being expanded to isolate DNA for further analysis and were as genetically homogenous as could be expected after this process.
Long-term in vitro cultivation of P. falciparum reveals few small genomic changes at the whole-genome level Next we identified the number of genetic changes by comparing the genomes of all seventeen clones to the original 3D7 parental clone using comparative genomic hybridization analysis. The whole-genome tiling microarray we used covers 76% of the coding regions and 41% of the non-coding regions and has a SNV discovery rate of 91% and a false discovery rate of 11% [14]. The PfGenominator software [14] was used to analyze the microarray data and predicted 15 polymorphisms. Capillary sequencing revealed that two of these polymorphisms were deletions, while two were false positives and eliminated from further analysis (Table S1).
As the microarray does not cover the whole genome, the parent and sixteen clones were further analyzed by whole-genome sequencing (WGS) using paired-end 60 bp reads with an average 145 bp insert size. On average, 91.8% of the P. falciparum genome was covered by five or more reads, with clones having between 73.5% and 99.9% of the genome covered by fivefold or greater coverage (Table 1). To assess the clonality of our haploid parasite populations, we calculated the number of positions where a significant amount of nucleotides were different from the most prevalent nucleotide. On average, only 235 positions were detected throughout the whole genome and we thus deemed coverage by five or more high quality reads adequate to call SNVs accurately. Areas with less than fivefold coverage included highly repetitive regions such as the telomere repeats and the flanking telomere associated regions (TARs) as well as certain conserved regions within gene families such as the var, rifin, and stevor families. It is therefore possible that some genetic changes in these hard-toalign and poorly annotated regions were not detected.

Author Summary
Malaria is one of the six diseases that together are responsible for 90% of all infectious disease deaths throughout the world. The five species of Plasmodium that cause human malaria take over 655,000 lives each year. Parasites evade the immune response through antigenic variation and develop resistance to anti-malarial drugs through genetic changes in either the drug target or genes conferring resistance. We used whole-genome sequencing and microarray techniques to study evolution in P. falciparum parasites propagated in vitro for up to 180 generations. We determined the mutation rate and found that the core genome of a single clone is stable, while the subtelomeric regions are prone to acquire structural variants. These changes occur mainly in multigene families involved in immune evasion. Our findings indicate that the parasite specifically increases the sequence variability in multigene families through mitotic recombination. This high plasticity of the parasite genome suggests that multiple haplotypes will be present in a natural infection initiated by a single parasite.
The WGS data was analyzed with the PlaTypUS 0.12 software (M. Manary et al., manuscript in preparation), which integrates many community-developed tools into a pipeline to first align the reads to the reference genome and then detect SNVs. To decide on the characteristics of a true SNV, a computer-learning algorithm was trained on a set of 10,500 known SNVs. The WGS data confirmed ten of the fifteen polymorphisms detected by the microarray and identified an additional 25 SNVs across all clones ( Figure 2A and Table S2). To verify that a reasonable cutoff was set to call SNVs in PlaTypUS, six SNVs that did not make the cutoff were analyzed by capillary sequencing (Table S1). Four events located next to a poly A or poly T stretch and sequencing confirmed that all clones, including the parent, had the same sequence. The other two events were in fact small deletions but were discarded by the PlaTypUS software as it is designed to detect SNVs only. In summary, 38 SNVs were detected by microarray and WGS (Table S2) and the construction of a cladogram with all polymorphisms in the genome (38 SNVs, nineteen deletions, and one duplication) confirmed the known evolutionary relationship between the clones ( Figure 2C).
To estimate how closely related our 3D7 parent is to the reference 3D7 genome, we compared the WGS data to the 3D7 reference sequence from PlasmoDB v9.1. While on average only 5.4 SNVs distinguished a single clone from the parent, we detected 58 SNVs in the parental 3D7 clone relative to the 3D7 reference ( Figure 2A and Table S3). It is not clear whether these differences are true SNVs or due to sequencing/assembly errors from the lower coverage reference genome sequencing efforts [15], platform differences, or from the different sources of genomic DNA used in the sequencing projects; however, they highlight the importance of A. Selection schematic. An initial parasite clone (3D7) was split into three lines after 55 days, and 20 nM or 50 nM ATQ pressure was applied to lines R4 and R5, respectively. The parental line was kept drug-free. After 24 days, the parental line was split again into four lines, and 2 nM ATQ pressure was applied to three lines (R1, R2, and R3) while line S1 was kept drug free. R2 was again split after 74 days, and 20 nM ATQ was applied repeatedly to line R2b. After the indicated time in culture, all lines were cloned by limiting dilution. Four ATQ-resistant clones were kept in culture (Generation 1 (G1): R1a and b G1 and R2a and b G1) and recloned, resulting in a second generation of clones (Generation 2 (G2): R1a and b G2 and R2a and b G2). The number of days (d) in culture between splits is indicated above each flask. B. ATQ structure and growth inhibition assay. EC 50 values for 3D7 parent, the sensitive clones, and the ATQ-resistant clones are the means 6 SD of three independent experiments performed in quadruplicate. Statistically significant differences between EC 50 values of the parental 3D7 line and the ATQ-resistant clones were calculated by a one-way ANOVA followed by a Dunnett posttest (*, p,0.0001). doi:10.1371/journal.pgen.1003293.g001 having a recent, isogenic reference genome for such comparative studies.

ATQ pressure directs the selection but does not alter the mutation rate
We compared the genetic changes acquired in the sensitive clones to the ATQ-resistant clones. As expected, all ATQ-resistant clones acquired SNVs in cytochrome b, the target of the drug, while the sensitive clones did not ( Figure 2 and Figure 3). Mutations were observed at amino acid position 133 (M133V in R1 pairs and M133I in R2, R3, and R4 pairs), as well as an L144S change in R4 and an F267V mutation in the clones R5. In addition, we found a single amplification event on chromosome 1 in the same R5 sister clones ( Figure 3B). Interestingly, this ,220 kb amplified region encompasses PFA0590w (PF3D7_0112200), which encodes the P. falciparum multidrug resistance associated protein 1 (PfMRP1), a protein that has been implicated in parasite resistance to chloroquine and quinine but has not been associated with naphthoquinone resistance [16,17].
In order to determine if this amplification leads to crossresistance, we tested our mutants against a series of diverse antimalarial drugs ( Figure 3C). Only the strains that contained the amplification (R5) were cross-resistant to decoquinate, a compound that also targets cytochrome b [18] (,10X, p,0.0001, one way ANOVA followed by a Dunnett posttest). In contrast, the R4 clones, which showed the strongest EC 50 shift for ATQ compared to the parent, were no more resistant to decoquinate than the parent or the other ATQ-resistant clones. While the presence of ATQ selected for SNVs in cytochrome b, it had little influence on the overall number of genetic changes accumulated in ATQ-resistant clones (0.02 genetic changes per days in culture) compared to parasite clones cultured without drug pressure (0.01 genetic changes per day in culture, p = 0.08, unpaired student t test). Our data suggest that external influences such as drug pressure do not appreciably increase the genetic variability of P. falciparum.
Having identified the number of isolated genetic changes present in each clone, we calculated the mutation rate per base pair per generation based on the number of mutations detected  within each clone and the total time in culture for each clone ( Figure 1 and Figure 2) [19,20]. To accurately calculate the mutation rate, we needed to include not only the observed SNVs but also those SNVs that were lethal/deleterious, which were unobserved due to loss of mutants from the population. The selection force against nonsynonymous mutations causes the appearance of fewer mutants than actually occurred; thus, using observed mutations alone would cause a downward bias in the mutation rate estimate. To correct for this selection bias, we calculated the dN/dS ratio of P. falciparum to be 0. 59  tage against nonsynonymous mutations of 40%, this would result in 0.04 mutations per surviving daughter parasite on average. Thus, after 25 generations, every surviving parasite would be expected to have accumulated one mutation relative to its parent.

Antigenic gene families in subtelomeric regions acquire structural variants
Deletions or amplifications of large chromosomal stretches have been observed in long-term in vitro P. falciparum cultures as well as in field isolates [10,[25][26][27][28][29], with amplifications typically being associated with drug pressure [30][31][32]. We therefore examined the data for structural variants using both microarray (manifested as a substantial loss of hybridization for multiple consecutive probes, or an increase in signal for a block of probes) and WGS (a lack of aligned reads or an increase in read pileup, Figure 4). All derived clones were compared to our parental 3D7 clone. In addition to the above mentioned single amplification event in two sister clones exposed to ATQ (R5a and R5b), we detected eighteen independent large-scale (.1000 bp) deletions of subtelomeric regions (within ,60 kb of the telomere) and one chromosome internal deletion in the seventeen clones analyzed by microarray as well as WGS ( Figure 2, Figure 4, and Figure 5). Comparison of our 3D7 parent to the available 3D7 genome from PlasmoDB v9.1 revealed a deletion on chromosome 2 in our parental clone.
To quantify the appearance of these structural variation events, we calculated a structural variation rate in the same manner as the mutation rates but for the total number of nucleotides deleted or amplified, without the correction for the dN/dS ratio. A structural variation rate of 4.7610 26 per base pair per generation (SD: 1.7610 26 ) was determined in the absence of drug and 1.0610 25 per base pair per generation (SD: 9.0610 26 ) in the presence of drug, with the assumption that deletions are fitness neutral. P. falciparum subtelomeric regions encode multigene families that are implicated in antigenic variation. Over 80% of the genes involved in structural variation events were members of multigene families with 11 var, 14 rifin, and 4 stevor genes, showing genetic changes that were so substantial that they could no longer be found by WGS or microarray ( Figure 5). We confirmed that exon 1 from var gene PFB0010c on chromosome 2 was deleted by Southern blot analysis ( Figure 4). The identification of deletions at the telomeres confirmed earlier observations from our laboratory where deletions were also common in field isolates [29].

Deletions are associated with mitotic recombination events
Structural variants can be the result of a variety of events including double-strand breaks and mitotic recombination. To investigate if the observed deletions in multigene families were associated with mitotic recombination events, we further analyzed the boundaries of the detected structural variants in the same paired-end read WGS data set used for variant discovery. Since read pairs are generated on the same fragment of DNA, they normally map to the same chromosome. However, at the edge of a deletion, we found that reads aligning towards the deletion often had read pairs that aligned to a different chromosome, indicating a likely recombination event. The read coverage at the position of the read mate on the other chromosome was often twice the expected number, suggesting a gene conversion event where DNA from one chromosome is added onto another, thereby doubling the donor sequence, while the original sequence in the recipient chromosome is lost.
To test this hypothesis, we extracted the reads that aligned within 1000 bp of each predicted deletion as well as the reads within 1000 bp of their pair mates on a different chromosome. A read from the deletion site was used as a seed to generate a de novo assembly of all these extracted reads. Seven contigs could be assembled that spanned between 1000 and 4000 bp of the sequence next to the deleted region as well as the sequence from a different chromosome indicating a recombination event ( Figure 6). Sanger sequencing of three predicted recombination events confirmed the presence of two new chimeric var genes consisting of PFF0010w (PF3D7_0600200) and MAL13P1.1 (PF3D7_1300100) as well as PFA0005w (PF3D7_0100100) and PF10_0001 (PF3D7_1000100). The third recombination was already present in the parent and involved intergenic regions close to var genes on chromosome 2 and 12.
A previous study also showed gene conversion events between var genes from E5 and 3D7, two different clones present in the NF54 isolate; short, 100 bp stretches of a 3D7 var gene sequence were found in the context of var gene sequences unique to E5 [33]. In contrast, the gene conversions described here are longer and no mosaic sequences were observed. As the parasite is haploid during the asexual blood stage phase of its cycle, the rearrangements of the genome observed here are results of rearrangements during The top two panels show the number of WGS paired-end reads mapping to 16 kb of chromosome 2 for 3D7 and S1a. The third panel shows the same region but with data from the microarray. The log2 ratio of the intensity of each unique probe for S1a relative to 3D7 parent is indicated and colored by the moving average over a 500-base pair window. B. Southern blot analysis. The gDNA of the 3D7 parent and S1a was cut with restriction enzymes HpaI and FspI and analyzed by pulsed field gel electrophoresis using a probe to the rifin gene PFB0015c adjacent to the var gene containing the deletion (schematic on the left, southern blot on the right). Arrows indicate the expected sizes for the fragments of the full-length 3D7 and the truncated S1a var gene (PFB0010w). Stars show nonspecific bands. doi:10.1371/journal.pgen.1003293.g004 mitotic growth, while the gene conversions described by Frank et al. could be either due to meiotic or mitotic recombination [33].
Not all of the observed deletions were associated with translocation of DNA from one chromosome to another. Broken telomere ends can also be repaired by the addition of telomere repeats to the broken chromosome arm [34,35]. Therefore, we wanted to investigate if deletions induce a general increase of the telomere repeat length (TRL) on all chromosomes (general increase in the average TRL) or if the telomere repeat size is only increased on broken telomere arms (presence of two populations of TRL). We measured the average TRL by the terminal restriction fragment Southern blot method, where the peak TRL is determined. Of the tested clones, only S1a (containing three deletions) had two peaks indicative of a second population of extended telomere repeats ( Figure S1). All other clones had only one peak that was shorter than in the parent, and the average TRL varied between clones from different lines. There was no correlation between the number of deletions and the average TRL of a clone ( Figure S1). In addition, clones R1a and R1b from the first generation had longer average TRL than their offspring in generation 2 that had been retained in culture and subcloned again. This indicates that the average TRL fluctuates over time. This also explains the differences observed between the average TRL in this study and an earlier report for 3D7 and Dd2 [34]. These findings show that deletions at the telomeres, in combination with the fluctuation of the average TRL and mitotic recombination events in the asexual erythrocytic cycle can accelerate the evolution of genetic diversity in gene families exposed to the host immune system.
A stable core genome is common in independently cultured 3D7 clones as well as in a geographically different strain Based on our findings of genetic variation arising in long-term culture, we predicted substantial changes in the genomes of different circulating clones of 3D7. Three clones from different laboratories were compared to our parental 3D7 by microarray. As expected, all clones contained at least one deletion in the subtelomeric region and only seven small genomic changes in core chromosomal regions were detected in total ( Figure S2 and Table S4). Comparing our parental 3D7 to the different 3D7 clones by microarray confirmed a deletion on chromosome 2 in the parent detected by WGS when compared to the reference genome.
To determine if these findings were specific to the 3D7 line, which is susceptible to most drugs, we also analyzed the mutation rate in a P. falciparum Dd2 clone and its eight offspring clones that had been selected for resistance to spiroindolones [30]. Dd2 is a multidrug-resistant clone originally derived from W2, a multidrug resistant patient isolate from Southeast Asia, and has been reported to acquire drug resistance at higher rates than other strains (Accelerated Resistance to Multiple Drugs or ARMD [36]). Microarray and WGS analysis detected that the core genome was stable in all Dd2 clones regardless of drug pressure (29 SNVs in six resistant clones and 15 SNVs in two sensitive control clones). Calculation of the mutation rates as described above suggested that Dd2 does not acquire resistance through an intrinsically higher average mutation rate (3.2610 29 and 2.4610 29 , in the absence or presence of drug, respectively) ( Figure S3 and Table S5), although we cannot rule out that some clones in the W2 strain might have this phenotype. We could detect only one deletion in the eight Dd2 clones that involved a var gene, but two clones showed large hybridization differences in subtelomeric regions. The high level of genetic diversity between 3D7 and Dd2 subtelomereric regions makes hybridization analysis with microarray designed for 3D7, or WGS aligned to 3D7, more difficult to interpret; deletions and SNVs could be under-or overestimated in Dd2. Even though the two compounds tested are chemically different and target different pathways, our observations suggest that the mutation rates are similar between different P. falciparum strains with different geographical origins.

Discussion
Using controlled in vitro conditions, we were able to reproduce genetic changes (SNVs, deletions in subtelomeric regions and CNVs) usually observed in field isolates and estimate a mutation as well as a structural variation rate for P. falciparum. The majority of the 38 SNVs (39%) mapped to intergenic regions or uncharacterized conserved proteins (21%). The appearance of deletions in subtelomeric regions has mainly been attributed to the absence of immune pressure and therefore the lack of counter-selection within in vitro cultures. The majority of the genes located in these regions are members of multigene families such as the 60 var genes that encode different versions of the P. falciparum erythrocyte membrane protein 1 (PfEMP1s). Var genes are further sub-grouped into Figure 6. Mitotic recombination events detected by WGS. Paired-end reads next to deletions as well as their read pair mates that mapped to a different chromosome were extracted. De novo assembly of these reads, starting with a single seed that mapped next to the deletion, generated new contigs. Hypothetical scenarios of recombination events that created gene conversions are shown on the left; the sequence alignments of the contigs (center sequences), where the two sequences from different chromosomes joined, are on the right. The chromosomal position (with orientation) and the var gene ID are indicated when applicable. doi:10.1371/journal.pgen.1003293.g006 three major groups (A, B and C) based on chromosomal location, orientation and sequence composition of their coding and noncoding upstream regions [37]. Using our new methodology to identify genetic recombination events, we were able to characterize seven recombination events during mitotic growth. Every recombination event observed here occurred either between two group B var genes or two group B var gene upstream regions, suggesting that recombination events within the same group are more frequent. PfEMP1, an adhesive molecule that is exported to the surface of infected human erythrocytes [38], is a key pathogenicity protein involved in immune evasion. While this adhesive molecule is obsolete in in vitro cultures, previous studies showed similar structural variants in the subtelomeric regions of Peruvian field isolates indicating that these deletions are not an artifact of in vitro cultures [29].
While we were able to detect structural variants with WGS and microarray analysis, the mechanisms by which they occur remain unclear. The recombination events observed here probably resulted from DNA breaks that were repaired either by direct joining of the broken DNA ends (resulting in a deletion) or by homologous recombination with a closely related var gene on a different chromosome (resulting in a gene conversion). While the essential proteins needed for homologous recombination are present in the P. falciparum genome, the key determinants for non-homologous end joining (NHEJ) are not [15]. Alternative end joining mechanisms exist in other eukaryotes (reviewed in [39]), though further studies are needed to identify the full repertoire of DNA repair pathways present in P. falciparum.
Although we observed deletions in individual clones, it may be that no loss occurs due to DNA translocation to another chromosome in a daughter cell on a population level. Because deletions are much easier to detect than duplications, we might have underestimated the number of the latter. The generation of new chimeric variants of var genes could allow the parasite to further evade the immune system, thereby extending its persistence in the patient. This is especially important for the survival of the parasite in semi-endemic areas where there are no meiotic recombination events because no transmission occurs during the dry season. The generation of new variants in polymorphic genes such as the merozoite surface protein 1 (msp1) [6] or multigene families has largely been attributed to meiotic rather than mitotic recombination. However, our data indicate that ectopic recombination and rearrangement during asexual growth is common between members of antigenic gene families located at the telomeres and could extend to other genes bearing repeated sequence motifs.
An interesting feature of var genes is that only one of the several dozen copies is transcribed in any given parasite and transcriptional switching between members occurs [40,41]. Because humans develop antibodies against this surface-exposed protein, the parasite is believed to use this mutually exclusive expression mechanism to evade the immune response and persist in its human host. In vitro switching between members of the var gene family has been estimated to occur at a rate of 2% of the parasites per generation in clones derived from IT 4/25/5 [42,43]. Although this rate is higher than the rate of mitotic change detected here, it may be that genetic change contributes or at least confounds the analysis of var gene switching. For example, recombination events and deletions in the subtelomeric regions might interfere with the tethering of chromosome termini to the nuclear periphery [5,34] or the recruitment of proteins initiating the spread of chromatin, thereby repressing var gene transcription (telomere positioning effect) [34,44,45]. Differences in average TRL between different strains of P. falciparum were reported earlier [34,35]. We observed alterations in the average TRL of individual clones as well as fluctuation of the average TRL over time, suggesting that the telomere ends are highly dynamic. Evidence that the telomere length could have an influence on the expression of nearby genes was found in Trypanosomes, where the telomere adjacent to the active expression site of VSGs was truncated [46]. All of these events could eventually result in a switch of var gene expression. In addition, the changes could further contribute to an evolutionary arms race whereby the host immune response and parasite antigens evolve in parallel to constantly outcompete each other.
The acquisition of new SNVs (most likely due to polymerase errors) and the recombination of the telomeres during mitosis provide the parasite with two mechanisms that generate an enormous amount of genetic diversity. We used a Poisson model to estimate the mutation rate from the number of SNVs accumulated over time in continuous culture (1.7610 29 for 3D7; 4.6610 29 for the ATQ-resistant clones). Our use of the dN/dS-corrected mutation rate from continuous culture provides an estimate that is unaffected by the population bottlenecks caused by dilutions (Materials and Methods). Given that a single human has 10 13 red blood cells, by the time parasites are visible by thin smear (1% parasitemia) more than 85% of the parasites will have acquired at least one novel genetic change, of which over 53% will be in the repetitive regions of the genome. In chronic infections (more than 30 days), most parasites will differ from one another assuming there is no selection for particular variants. This is supported by a PCR product length analysis of var genes in line E5, a clone of a P. falciparum patient isolate, NF54 that is also the parental strain of the 3D7 clone. E5 shows at least 13 var gene differences to 3D7 [33]. Although we did not detect any differences in major antigens such as merozoite surface protein 1 or the circumsporozoite protein, these genes do bear repeat regions and given that they show high levels of genetic diversity in populations [47], they may also have higher rates of mitotic recombination. These high recombination rates may partially explain why people can be continually re-infected with closely related strains.
While it is accepted in the field that most malaria infections are polyclonal, the actual diversity is still vastly underestimated [9,10]. The WHO advises the use of only three markers (merozoite surface proteins 1 and 2, and glutamine rich protein) to establish the MOI in clinical trials on antimalarial drug efficacy [48]. All of the additional genetic changes throughout the parasite genome go unnoticed. Many of these genetic changes might not have an immediate selection advantage to the parasite. However, these unnoticed mutations might become important when a new drug with a new mode of action is introduced and a selection advantage is suddenly introduced. Our results provide the baseline information needed to design diagnostic studies, whole-genome population genetic studies or drug treatment studies.

Materials and Methods
Parasites origin and propagation, DNA preparation, and EC 50 determination 3D7 (MRA-151) was obtained from the Malaria Research and Reference Reagent Resource Center (MR4; American Type Culture Collection deposited by D. Walliker) and cloned in our laboratory at The Scripps Research Institute (TSRI), La Jolla [49]. The San Diego 3D7 clone was derived from the same 3D7 stock as the parental 3D7 propagated at TSRI but was cultured long term at the Genomics Institute of the Novartis Research Foundation before cloning. The St Louis 3D7 clone was obtained from Dan Goldberg and is likely related to MRA-102 (Bei Resources, Plasmodium falciparum 3D7, deposited by D. J. Carucci and obtained from D. Walliker) [49,50] and was used in the genome-sequencing project [51]. The New York strain from David Fidock, Columbia University, was obtained directly from David Walliker. Parasites were propagated in human erythrocytes depleted of white blood cells by filtration using leukocyte filters (Fenwal Inc., Lake Zurich) with medium containing 5% human serum and 1.2% GIBCO AlbuMAX II (Invitrogen, Grand Island) [52]. Parasites were split 1:20 when they reached parasitemias .8%. To obtain single clones, each line was cloned by limiting dilution (0.25, 0.5, 1, 10 and 30 parasites per well) in 96-well plates and two clones from each line (three clones from the sensitive line) were chosen for further analysis. Clones R1a and b and R2a and b were kept in culture for another 118 (R2a and b) or 128 (R1a and b) days and subcloned again. The original clones were termed generation 1 (R1aG1, R1bG1, R2aG1, and R2bG1) and the offspring were termed generation 2 (R1aG2, R1bG2, R2aG2, and R2bG2). Genomic DNA was purified using a standard phenol-chloroform extraction method [53]. EC 50 assays were performed using a 384well format as previously described [54]. Each experiment was repeated three times. The averages of the EC 50 were calculated for each experiment and log transformed. A one-way ANOVA followed by a Dunnet post-test with the 3D7 parent as reference was performed in Graphpad Prism.

Microarray analysis and whole-genome sequencing
Microarray analysis was completed following the protocol in Dharia et al. [14]. In brief, fifteen micrograms of genomic DNA and 2.5 ng each of Bio B, Bio C, Bio D, and Cre Affymetrix control plasmids (Affymetrix Inc., Santa Clara) were fragmented with DNaseI and end-labeled with biotin. The samples were hybridized to the microarrays overnight at 45uC in Affymetrix buffers, washed, and then scanned using a modified protocol with wash temperatures of 23uC to account for the high AT content of P. falciparum. Briefly, for polymorphism detection we scanned for sets of three overlapping probes that had significantly lower hybridization in the sample when compared to the parental 3D7. Polymorphisms termed SFPs (single feature polymorphism) were detected by z-tests using an empirically derived standard deviation of the normal distribution and a p-value cutoff of 1610 28 .
Genomic DNA libraries were prepared for WGS using the standard Illumina protocol of fragmentation, end-polish, and adapter ligation (v. 2011, Illumina, Inc., San Diego). The final PCR enrichment step was conducted with the KAPA HiFi polymerase (KapaBiosystems, Inc., Boston). This enzyme has been shown to more accurately amplify DNA regions with extremely high AT content [55]. DNA libraries were clustered and run on an Illumina Genome Analyzer II, according to manufacturer's instructions. Base calls were made using CASAVA v1.8+ (Illumina, Inc., San Diego).

Microarray analysis
We have found that different amounts of RNA remaining after the DNA extraction process can give rise to hybridization differences in probes to noncoding RNA (ncRNA) genes. These changes are characterized by hybridization loss or gain over a wide range of probes throughout the gene. Altogether, 12.6% of the 1779 SFPs detected between the parental 3D7 and all clones mapped to ncRNA genes such as tRNAs (15), rRNAs (68), snRNAs (4) the signal recognition particle RNA (3), and MAL8P1.310, which showed similar hybridization patterns and was suspected of encoding an ncRNA. These SFPs were excluded from further analysis.
After excluding these ncRNA genes, we identified a total of 46 SFPs suggestive of SNVs that mapped to 23 different locations in the genome. Blemishes on the microarray surface or nonfunctional probes can give rise to false positives, especially for SNVs with only one or two probes. We assumed that a true SNV would not only be detected between hybridizations of the parent and the clone but also when compared to the other clones. Therefore all clones were also compared to one another to confirm or contradict the initial SFPs detected between the parent and the clone. In the end fifteen SNVs were considered real, of which WGS confirmed ten and capillary sequencing detected two false positives that were excluded from further analysis.

Whole-genome sequencing analysis
An internally developed WGS pipeline, dubbed PlaTypUS (M. Manary et al., in preparation), was used to align and analyze all WGS data. The program, compiled as a standalone executable, integrates many community-developed tools into its processing pipeline. In order to generate alignments, PlaTypUS follows a multi-step alignment process to generate and execute qualitycontrol measures on an alignment map. FASTQ files obtained from sequencing were aligned to the Pf3D7 reference (PlasmoDB v9.1) using BWA v0.6.1 with soft clipping of bases with quality score 2 and below [56]. Those reads that did not map to the reference genome were excluded from further analysis using SAMTOOLS v0.1.18 [57]. PCR duplicates were next identified and removed using Picard v1.48 MarkDuplicates (http://picard.sourceforge.net). Aligned reads were then realigned around indels and areas of high entropy using GATK v1.4+ IndelRealigner. The base quality scores of realigned reads were then recalibrated using GATK Table-Recalibration [58]. After realignment and recalibration, the samples were considered ready for use in downstream analysis.
For SNV detection, the PlaTypUS uses a computer-learning algorithm to decide on the characteristics of a true SNV. WGS data for 10,500 known SNVs were obtained from PlasmoDB, and each of these positions was analyzed with regard to 26 metrics. In this way, the profile of a true SNV was discovered. The following criteria were found to be indicative of a true SNV, and were used to filter the set of raw SNVs from our samples. Genetic variants were identified with the GATK UnifiedGenotyper with a minimum base quality score threshold of 20 and subsequently filtered using GATK VariantFiltration with custom filters designed for Plasmodium spp [58,59]. SNVs were excluded if they failed one or more of the following filters: strand bias (p,0.00001, Fisher's exact test), minimum depth of coverage (,5 reads), variant quality (,100 phred scaled), likelihood estimate of genotype being correct (,5 phred scaled), mapping quality bias (p,0.10 using Mann-Whitney Rank Sum test), base quality bias (p,0.10 using Mann-Whitney Rank Sum test), variant quality as a function of depth (,5 per polymorphism), and percentage of non-uniquely mapped reads covering the variant (.10% of total depth with a minimum of 4). Variant annotations to which filters are applied are thoroughly explained on the GATK website (http://www. broadinstitute.org/gsa/). 19,532 raw variant calls across all seventeen clones were filtered down to 38 high quality SNVs. Gene annotations for high quality SNVs were generated using snpEff v.2.0+ (www.snpeff.sourceforge.net).
The PlaTypUS also integrates a novel CNV-calling algorithm, which combines Weierstrass convolution of depth of coverage data with Canny edge detection to identify the break points of copy number events. 18 possible large deletions in the subtelomeric regions were identified for follow-up.

Determination of mutation rates
We assumed that a number of lethal or deleterious mutations might arise during long-term culture that would not be detected, as the corresponding parasites would be lost; therefore, we tested for evidence of selection by calculating the dN/dS ratio and corrected our observed SNV ratio for this assumed loss. To do this we first calculated the codon potential ratio (the likelihood that a single nucleotide replacement results in a nonsynonymous mutation) from the list of all sequences from the 3D7 reference genome. We utilized the Phylogenetic Analysis by Maximum Likelihood (PAML) software suite [60] (yn00, default settings) to generate a nonsynonymous to synonymous codon potential ratio of 4.63. An uncorrected dN/dS ratio was generated by computing the total number of nonsynonymous SNVs over the length of all possible sites containing those SNVs (the exonic genome) and dividing by the total number of non-nonsynonymous SNVs over their entire domain (the length of the genome). This gave an uncorrected dN/dS ratio of 2.73, which corresponds to a true dN/ dS ratio of 0.59, indicating a deficit of nonsynonymous SNVs. In the absence of other information, the assumption was made that the actual ratio should approach 1 in the absence of deleterious mutations, and that only nonsynonymous mutations are deleterious.
We used a generalized linear model with a linear link function and Poisson distributed errors to model the mutation rates for the various lines with and without drug. We assumed that the data followed the model f M 1 :::M m X 1 ::: where M i is the number of mutations scaled by the dN/dS factor; where g i , is the number of generations in culture, l i , is a categorical variable indicating the parasite line, d i , is a categorical variable indicating the presence of drug treatment; m is the number of parasite lines; and h is the vector of coefficients to be determined by the model. We used this generalized linear regression to calculate the mutation rate per clone, using the number of SNVs and time to acquire them for each clone separately. We assumed that the mutation rate regression does not include a constant because there are no mutations at baseline. The expected number of mutations per experiment is thus h'X i~h1 g i zh 2 g i l i zh 3 g i d i zh 4 g i l i d i We applied a Poisson-distributed number of mutations with the assumption that each parasite line accumulated mutations with a constant probability per generation. The probability of a mutant occurring in each generation is approximately binomially distributed B(n,p) with n the number of nucleotides and p the mutation rate. After g generations the distribution of the number of mutations per daughter parasite is B(n,pg), assuming replication is independent and identically distributed. These replications are identically distributed because we assume that the accumulation of mutations neither quickens nor slows the mutation rate. By the Poisson theorem, we can approximate B(n,pg) as a Poisson distribution and thus use the regression model as above.
The reason that we can assume a constant mutation rate (i.e. a constant probability of SNVs occurring) is that we have corrected for the selection bias against deleterious mutations using the dN/ dS ratio. The dN/dS-corrected number of mutations is an unbiased estimator of the number of mutations that would have occurred in the absence of selection. The number of dilutions of parasite cultures during culture maintenance will not affect this quantity, because we assume that mutations are accumulating in both the remaining and disposed culture at the same constant rate. The number of dilutions would affect the mutation rate if we were calculating the rate using the fraction of mutants observed after culturing, because population bottlenecks may increase the variance of the fraction of mutants. Calculating the mutation rate from the proportion of mutants versus wild type organisms after parallel culturing is known as the 'mutation rate problem' [20] and the experiments used to calculate these rates are called fluctuation tests. However, we avoid these issues by calculating rates from continuous cultures after correcting for selection pressure [61,62].
To run these regressions we used the glm command (MATLAB R2012b, The Mathworks) and then divided by the number of base pairs in the P. falciparum genome (2.35610 7 ) to determine the mutation rates with and without drug.

Pulsed-field gel electrophoresis and Southern blot
To confirm the partial deletion of PFB0010w on chromosome 2 in clones R2a and b; G1 and 2, R3a and b and S1a and c, 8610 8 parasites of the 3D7 parent and the S1a clones were prepared, digested and run on a gel according to previous methods [63,64]. The chromosome 2 probe was PCR amplified from 3D7 gDNA (chr2_PFB0015cF2: 59-TACCAACATCGAAAAATACCAAAC-G-39 and chr2_PFB0015cR: 59-TGGCGGAGAGATTTGAT-GATATTG-39) and labeled with [ 32 P]-adCTP (3000Ci/mmol). The membrane was incubated with the probe overnight at 42uC in ECL gold hybridization buffer (GE Healthcare) and washed according to the manufacturer's instructions.
To determine the average TRL, mixed-stage parasites cultures of the parental 3D7, all first generation (G1) clones, second generation (G2) clones R1a and R1b, as well as a clonal Dd2 strain were lysed with saponin and the pellets were resuspended in agarose to generate plugs. Agarose plugs containing 8610 7 parasites were digested with AluI, DdeI, MboII, RsaI. The digested gDNA was run on an agarose gel and transferred to a membrane. The probe for the telomeres (59-GGGTTTAGGGTTTAGG-GTTTA-39) was labeled with [ 32 P]-cATP and hybridized in Church wash (40 mM NaPi pH 7.2, 1 mM EDTA pH 8, 1% SDS) overnight at 55uC and washed. The membranes were placed against a phosphor storage screen for 2 days and then scanned in a phosphor imager.
The average TRL was estimated by quantifying the signal intensities for each lane using QuantityOne 1-D analysis software (BioRad, Hercules). The area under the curve was then calculated using Prism (GraphPad Software, Inc., La Jolla) and the peak was reported.

Sanger sequencing
To confirm mutations predicted by microarray and WGS or to confirm accurate rejection of some mutations that did not make the cut off with the PlaTyPus software, 16 predicted SNVs were PCR-amplified with Phusion polymerase (Finnzymes Inc., Woburn) using genomic DNA in a 100 mL PCR reaction volume for 35-40 reaction cycles. Genomic DNA from the 3D7 parent, R1aG2, R2aG2, R3b, R4b and R5b was used as templates. In addition, three predicted gene conversion events were also analyzed by Sanger sequencing. All PCR products were sequenced directly (Retrogen, Inc., San Diego). The primer sequences and results are summarized in Table S1.

Cladogram analysis
A hierarchical lineage cladogram was constructed from the profile of detected mutations (38 SNVs, 18 deletions, and one duplication) for each clone. Due to the small total number of mutations and the essentially linear process in which they were acquired, an un-weighted pair group method with arithmetic mean was used to generate distances between nodes in the software program Mesquite [65], and a tree diagram was constructed from these calculated distances. Mesquite uses a heuristic algorithm to generate the tree of minimum complexity from the mutation data given as categorical changes (booleans), and then minimizes the number of total number of changes in each tree. Each tree was then re-rooted at the parental strain.

Reconstruction of recombination events
For each suspected recombination event, a de novo assembly of a contig spanning the deletion/recombination event was attempted, using the reads spanning 10,000 bp around the expected site of recombination from both the donor and recipient chromosomes. The PRICE Genome Assembly program (http://derisilab.ucsf. edu/software/price/index.html) was seeded with a single read from the region next to the predicted deletion, whose mate pair aligned to another chromosome, and then run for twenty cycles with otherwise default settings. The largest contig (range 1399 bp to 5405 bp) from each assembly was searched against the entire Plasmodium genome using BLAST and was then aligned using ClustalW2 [66] to the two chromosomal regions with the highest identity score and trimmed. Figure S1 Telomere remodeling adds to the natural genetic plasticity of P. falciparum. A. Mixed-stage parasite cultures of the parental 3D7, all first generation (G1) clones, second generation (G2) clones R1a and R1b, as well as a clonal Dd2 strain were lysed with saponin and the pellets were resuspended in agarose to generate plugs. Agarose plugs containing 8610 7 parasites were digested with restriction enzymes that cut frequently throughout the Plasmodium genome except at the telomeres. The digested gDNA was run on an agarose gel and transferred to a membrane. The telomere repeat length (TRLs) were identified using P. falciparum-specific radioactively labeled telomere probes. B. The signal intensity for each lane was quantified using QuantityOne. The optical densities (OD) for each position (y-axis in kb) in a lane are plotted on the x-axis. The corresponding average TRLs were calculated for each clone and are indicated in parentheses after the clone's name. The number (#) of deletions of each clone is also indicated in parentheses. C. Relationship between the number of deletions and the average TRL of a clone. The linear regression is indicated. na, not applicable. (TIF) Figure S2 Subtelomeric chromosomal differences in independently cultured P. falciparum 3D7 clones. The hybridization patterns of clonal 3D7 genomes from different laboratories (Washington University, St Louis [67]; Columbia University, New York [68] and Genomics Institute of the Novartis Research Foundation, San Diego [18]) were compared to our 3D7 (La Jolla) clone's genome pattern. The y axis shows the log2 ratio of hybridization probe intensities for each strain relative to the 3D7 clone used in these experiments, calculated using probes that are unique in the genome and are colored by the moving average over a 500-base pair window as indicated in the color bar. The left panel shows the hybridization pattern for strains without major genetic changes and the right panel, the pattern at the same chromosomal location for a clone from a different laboratory containing a deletion. (TIF) Figure S3 Low mutation and deletion rate in Dd2 parasites. A. Schematic of drug selection for Dd2 clones. Starting with a single Dd2 parent, resistant parasite lines were established for two different spiroindolones (NITD609 and NITD678) in triplicate while two lines were cultured in parallel without drug pressure (ctr). B. Clones 678_1 and 678_2 acquired duplications around PFL0590c (labeled by a star), which encodes PfATP4, the putative target of spiroindolones. Indicated are the log2 ratios of the intensity of each unique probe of the different Dd2 clones relative to those of the Dd2 parent. The probe log ratios were colored by the moving average over a 500-base pair window. C. Chromosomal locations of mutations detected by microarray and WGS (circles) and small deletions (stars). (TIF)