Rapid antigen diversification through mitotic recombination in the human malaria parasite Plasmodium falciparum

Malaria parasites possess the remarkable ability to maintain chronic infections that fail to elicit a protective immune response, characteristics that have stymied vaccine development and cause people living in endemic regions to remain at risk of malaria despite previous exposure to the disease. These traits stem from the tremendous antigenic diversity displayed by parasites circulating in the field. For Plasmodium falciparum, the most virulent of the human malaria parasites, this diversity is exemplified by the variant gene family called var, which encodes the major surface antigen displayed on infected red blood cells (RBCs). This gene family exhibits virtually limitless diversity when var gene repertoires from different parasite isolates are compared. Previous studies indicated that this remarkable genome plasticity results from extensive ectopic recombination between var genes during mitotic replication; however, the molecular mechanisms that direct this process to antigen-encoding loci while the rest of the genome remains relatively stable were not determined. Using targeted DNA double-strand breaks (DSBs) and long-read whole-genome sequencing, we show that a single break within an antigen-encoding region of the genome can result in a cascade of recombination events leading to the generation of multiple chimeric var genes, a process that can greatly accelerate the generation of diversity within this family. We also found that recombinations did not occur randomly, but rather high-probability, specific recombination products were observed repeatedly. These results provide a molecular basis for previously described structured rearrangements that drive diversification of this highly polymorphic gene family.


Introduction
Despite recent progress, malaria remains an infectious disease that inflicts a tremendous health and economic burden on many regions of the developing world, in particular sub-Saharan Africa [1]. The lack of a truly efficacious vaccine and the ever-present threat of drug resistance represent continuing challenges for public health organizations. Malaria is caused by eukaryotic parasites of the genus Plasmodium, with P. falciparum being the most virulent of the human infectious species. One unique aspect of P. falciparum biology is the tremendous genome plasticity displayed by parasites that are actively circulating within regions of the world endemic for the disease [2]. Much of this diversity is found within the subtelomeric regions of the 14 chromosomes where genes that encode variant surface antigens reside [3]. These chromosomal regions can extend for over 100 kb from the chromosome end and display a unique structure that includes telomere-associated repeat elements (TAREs) and members of several hypervariable gene families, including var, rifin (repetitive interspersed family), stevor (subtelomeric variant open reading frame), and Pfmc-2TM (P. falciparum Maurer's cleft-2 transmembrane domain protein). Each of these families contain multiple members that display great sequence diversity, both within a single parasite's genome and also when comparing the genomes of different geographical isolates [2,[4][5][6]. The best characterized of these gene families is var, with the total number of var genes within the genome of any given parasite varying between approximately 50 and 90 [3].
var genes encode the primary virulence determinant displayed on the surface of infected red blood cells (RBCs), a protein referred to as P. falciparum erythrocyte protein 1 (PfEMP1) [7][8][9]. PfEMP1 is a single-transmembrane-domain-containing protein that crosses the RBC membrane, extending into the extracellular space, where it interacts with molecules found on the endothelial surface of postcapillary venules. The structure of PfEMP1 includes a series of adhesive domains within the extracellular portion of the protein enabling it to anchor the infected cell to the endothelium, thereby removing it from systemic circulation and preventing its movement through the spleen, leading to many of the severe complications of malaria, including cerebral malaria and pregnancy-associated malaria [10]. In both cases, adhesion within specific organs results from the display on the RBC surface of a form of PfEMP1 that binds to endothelial receptors found in these specific tissues.
The placement of PfEMP1 on the infected RBC surface makes it susceptible to the humoral immune response of the infected individual, and people carrying P. falciparum parasites readily develop anti-PfEMP1 antibodies [11,12]. This results in rapid clearance of parasitized cells and a substantial reduction in parasitemia. However, parasites are capable of switching which var gene is expressed, thereby displaying a different form of PfEMP1 that is not recognized by the preceding antibody response. This enables the generation of a new wave of parasitemia and promotes chronic infections that can last over a year. This process is referred to as antigenic variation and underlies many of the difficulties in creating effective vaccines against malaria [13].
Antigenic variation is entirely dependent on extensive sequence diversity within the genes encoding antigenic proteins, both within the genome of an individual parasite, which enables the maintenance of a chronic infection, and also between the gene repertoires of different parasite lines, thereby enabling multiple, sequential infections of a single host. This extensive diversity between parasite isolates explains why long-term immunity against malaria requires multiple infections over several years and provides protection against severe disease rather than preventing new infection [14,15]. Comparison of multiple complete genome assemblies from different geographical isolates indicates that the subtelomeric regions containing the multicopy, variant-antigen-encoding genes undergo much more rapid diversification than the rest of the "core" genome, in which single-copy housekeeping genes reside [3]. Two elegant studies employing the generation of extensive "clone trees" of laboratory-reared parasite lines showed that this process includes frequent and extensive ectopic recombination within chromosomal regions containing variant-antigen-encoding genes during asexual, mitotic replication [16,17]. This results in the creation of chimeric genes that encode mosaic sequences but that maintain open reading frames and basic protein domain structures. These observations provided key insights into how parasites continuously generate extensive sequence diversity and avoid the immune response of their human hosts, including a prominent role for homologous recombination (HR) between var genes. However, how this process is regulated and why subtelomeric regions are hyper-recombinogenic was not understood.
To examine the mechanisms of DNA repair in subtelomeric regions and var gene recombination more closely, we investigated the parasite's response to targeted DNA double-strand breaks (DSBs) within a subtelomeric region of chromosome 12. When provided with a highly homologous template for repair by HR, these breaks were efficiently repaired without major alterations to the structure of the genome. However, when a suitable sequence was not available to serve as template for repair by HR, parasites frequently employed de novo telomere addition or "telomere healing" to stabilize the chromosome end and thus maintain genome integrity. While this process efficiently prevents further degradation of the chromosome and protects the core genome, it also generates a "free" DNA fragment consisting of the subtelomeric domain between the site of the break and the original chromosome end. We observed that this free DNA fragment is highly recombinogenic, resulting in a cascade of ectopic HR events that generated chimeric var genes on multiple chromosomes. Thus, a single DNA break within a subtelomeric region can lead to the creation of multiple new var genes, providing a driving force for diversification of this gene family. The role of telomere healing in creating the hyper-recombinogenic, free DNA fragment also provides a molecular basis for why these regions of the genome display much more rapid diversification than the more central regions of the chromosomes. This type of combinatorial recombination at chromosome ends has parallels in other eukaryotic organisms, including cancer cells, suggesting it is based on evolutionarily conserved mechanisms of DNA repair found throughout the eukaryotic lineage. The discovery of its role in accelerating var gene recombination indicates that it might represent a common mechanism for antigen diversification in a number of pathogens that have a similar arrangement of antigen-encoding genes near their chromosome ends.

Repair of DNA DSBs by HR within a subtelomeric region of chromosome 12
In a previously published study, we utilized several rounds of X-ray irradiation to generate random DNA DSBs within the genome of a recently cloned, cultured line of the NF54 isolate of P. falciparum [18]. This allowed us to observe the products of DNA DSB repair generated by the parasite in response to this damage. We found that within subtelomeric chromosomal regions, parasites used a combination of telomere healing and repair by HR to preserve genome integrity. We also observed the creation of a chimeric var gene; however, given the inexact nature of generating random DNA breaks using repeated rounds of X-ray irradiation, it was not possible to precisely determine the sequence of events that led to the genomic changes that were ultimately defined. To examine the process of DSB repair within subtelomeric regions more accurately, in the current study, we generated precisely defined breaks using Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated protein-9 nuclease (CRISPR/Cas9) within a subtelomeric region at the "left" end of chromosome 12. This region was selected since it appears to be fully intact, containing a large complement of variant-antigen-encoding genes (3 var genes and 3 rifin genes) and an approximately 15-kb array of TAREs extending to the telomeric repeats. Specific guide RNAs were designed to target DSBs to sequences upstream of the coding regions of two of the var genes (Pf3D7_1200400 and Pf3D7_1200600) and a rifin gene (Pf3D7_1200500) within this subtelomeric domain ( Fig 1A). In each case, in addition to the Cas9 expression construct, co-transfection of an episomal plasmid provided a suitable template for repair by HR as well as the guide RNA expression cassette. This repair template included DNA sequences identical to sequences flanking the site of the break and a drug selectable cassette that enabled us to detect repair by HR (Fig 1B).
Each construct was transfected into cultured parasites and several individual subclones obtained. Successful break and repair was determined by polymerase chain reaction (PCR) amplification across the site of the break, followed by sequencing of the amplified product. In all cases, DNA repair appeared to have occurred by HR, yielding the precise products predicted from crossover events within the sequences of shared identity between the plasmid template and the chromosome (Fig 1D and 1E). These repair events are typical of those previously observed in numerous studies that have employed CRISPR/Cas9 for genome editing and reflect the apparent tendency of malaria parasites to almost exclusively use HR for DSB repair [19,20]. Similar HR-mediated recombination was observed for a DSB targeted to Pf3D7_0617400, a var gene within an internal cluster on chromosome 6 ( Fig 1D and 1E). Thus, DSBs within var-gene-containing regions of the genome are readily repaired by HR with an apparent efficiency similar to other regions of the genome. The genome of one of the clones in which the DSB was targeted upstream of Pf3D7_1200600 was also fully sequenced using single-molecule, long-read sequencing technology and displayed no detectable alterations apart from the product of repair at the site of the Cas9-targeted break (all genome sequence data are available at the National Library of Medicine BioProject database, accession PRJNA515738, sample VAR2CAL). This indicates that simply inducing DNA repair by HR within a subtelomeric region does not result in any unusual repair products or cause instability throughout the rest of the genome.

Telomere healing can occur at various positions with respect to a DSB to stabilize a broken chromosome end
During asexual replication in RBCs, parasites are haploid and thus only possess an exact homolog for DSB repair by HR during the relatively brief time after DNA replication but prior to nuclear division. During the remainder of the replicative cycle, no direct homologs for repair are available, and therefore DSB repair typically occurs either through recombination with regions of sequence similarity elsewhere in the genome or, when breaks occur near the ends of chromosomes, through telomere healing. Our previous study on the repair of random breaks generated by exposure to X-rays detected frequent telomere healing events within subtelomeric regions [18]. Analyses of the repair products were consistent with an established model for telomere healing in which after detection of a DSB, an exonuclease resects the chromosome end until a 3 0 single-stranded sequence is revealed that has homology to telomerase RNA. This allows priming by the RNA and initiation of telomere repeat addition by telomerase, resulting in the addition of a telomere and completion of the healing event (Fig 2A). However, given the random nature of breaks caused by X-rays, it was not possible to know the site of the DSB, and thus the process of resection leading to telomere addition could only be inferred. In addition, it was not possible to determine whether there are preferred sites of telomere addition or whether this process can occur at various positions within a stretch of the chromosome.
To more closely examine how telomere healing occurs in P. falciparum, we used the CRISPR/Cas9 system described above to target a DSB to the noncoding sequence between var gene Pf3D7_1200600 and the adjacent acs7 gene (Pf3D7_1200700) (Fig 2A). This site sits near  Rapid var gene recombination the boundary between the subtelomeric region and the core of chromosome 12 and is unique within the genome, thus for HR to be used for repair, we anticipated that a template must be provided. Consistent with this hypothesis, when provided with a template for repair, a break at this site is efficiently repaired by HR (Fig 1). In contrast, when we provided no template for repair, we only detected telomere healing events. Specifically, 9 clones isolated from the transfected population were analyzed by PCR using primers to amplify sequences on both sides of the DSB (Fig 2B). Loss of DNA segments as a result of DNA resection prior to the addition of telomere repeats is a hallmark of telomere healing. These reactions allowed us to map the extent of resection in each clone. Precise mapping of the site of telomere repeat addition was then performed by PCR using a primer specific for the conserved telomere repeat sequence and a primer near the mapped deletion followed by sequencing of the amplified product. From these 9 clones, three different positions of telomere healing were identified as exemplified by clones D2, B10, and H10 (Fig 2C), indicating that during resection, telomerase can initiate the addition of telomere repeats at several different positions within this region of the chromosome. The sequences at the site of telomere repeat addition shared no obvious similarity to each other, therefore it was not possible to derive any conclusions for why healing events occur in any particular position. However, previous work from several groups have identified putative secondary structures, including hairpins and G-quadruplexes, that can influence DNA stability and occur frequently within subtelomeric regions in P. falciparum [21][22][23][24]. It is possible that the existence of such structures could affect the rate of DNA resection or telomerase recruitment, thereby influencing the site of healing even when not found directly at the position where the repeats are added.

Free chromosome fragments generated by telomere healing are highly recombinogenic
Telomere healing is an efficient mechanism for stabilizing a chromosome when a DSB occurs near its terminus and thus is an effective way to maintain genome integrity. However, the chromosome fragment between the site of the break and the original telomere repeats is presumed lost as a result of the healing event (Fig 2A). The typical subtelomeric structure, including variant-antigen-encoding genes and TAREs, can be regenerated through subsequent telomere conversion events; however, the loss of the original subtelomeric region, including any resident variant-antigen-encoding genes, would represent a reduction in overall sequence diversity. Thus, repair of chromosome breaks by telomere healing would lead to homogenization rather than diversification of the parasite's variant antigen repertoire. Alternatively, given that broken DNA ends can be highly recombinogenic, the released DNA fragment could serve as a template for additional recombination events at other sites in the genome. If so, telomere healing could instead act as a driving force for the diversification of genes unlinked to the site of the original DSB. To investigate this possibility, we aimed to determine the fate of the free chromosome fragment between a site of telomere healing and the original chromosome end. The free chromosome fragment liberated by the healing events between var gene Pf3D7_1200600 and acs7 on chromosome 12 described above (Fig 2A) is ideal for this analysis because of its length and the number of genes within this region, thus providing numerous potential sequences for recombination.
As a preliminary assessment of the fate of this chromosome fragment, we used genomic DNA from the 3 clonal lines that displayed independent telomere healing events and performed PCR with primers specific to each gene within this region (Fig 3A). If this chromosome fragment had been lost during the telomere healing event, all the reactions should have failed to amplify a product. However, we instead observed consistent amplification of several genomic segments (Fig 3B), suggesting that these sequences persisted within the genomes of these parasite lines. The var gene Pf3D7_1200400 appeared to have been divided, with its 5 0 end still detectable by PCR while the 3 0 end was apparently lost. Sequencing of the PCR products verified their identity and ruled out the possibility of nonspecific amplification. However, given that our previous experiments indicated that this end of chromosome 12 was truncated, it was not clear where in the genome these gene segments now resided.
To more precisely identify changes in the genomes of these 3 parasite lines, we employed single-molecule, long-read whole-genome sequencing technologies (sequence data available at the National Library of Medicine BioProject database, accession PRJNA515738, samples D2, B10, and H10). These technologies generate individual reads of greater than 10 kb, thereby enabling us to easily identify structural changes in the genome, including recombinations, deletions, duplications, and gene conversion events. For each of the lines analyzed, the genome assembly precisely detected the telomere healing events previously mapped by PCR without ambiguity, providing independent verification of these events. We then examined the entire genome assemblies to determine the fate of the other sequences within the original subtelomeric region adjacent to the site of the break.
Despite having distinct sites of telomere healing, the patterns of PCR amplifications were identical in the 3 lines analyzed (Fig 3B), suggesting that the released DNA fragment had undergone similar or identical recombination events in each of these clones. In these lines, searches of the genome assemblies for the gene fragments that were detectable by PCR but were predicted to have been lost as a result of telomere healing identified two new, chimeric var genes that appeared to be the result of recombination events. In each case, we detected a newly generated chimeric var gene comprised of Pf3D7_1200400 (5 0 end) and Pf3D7_0632500 (3 0 end) as well as a second chimera of Pf3D7_0632800 (5 0 end) and Pf3D7_1300100 (3 0 end) (Fig 4). Both of these recombination events created chimeric var genes with intact open reading frames and thus are presumably functional genes. Gene-specific PCR across the sites of recombination followed by sequencing of the PCR products verified the existence of these chimeric var genes.

A sequential cascade of recombination events is predicted to result in the creation of chimeric var genes
In the examination of the genome sequences from the recombinant parasite lines, not only did we detect recombination events involving genes near the site of the targeted DSB, we also found that recombination had occurred between genes on different chromosomes, completely unlinked to the site of the break. For example, we detected a chimeric var gene resulting from a recombination between Pf3D7_0632800 on chromosome 6 and Pf3D7_1300100 on chromosome 13. How does repair of a DSB on chromosome 12 lead to this distant recombination event? By considering the different chimeric sequences and presuming that HR is the dominant mechanism of repair, we can deduce the likely sequence of events that resulted in the generation of the chimeric var genes in these lines.
Both targeted PCR and whole-genome sequencing indicate that the initial CRISPR/Cas9 induced DSB was repaired by a telomere healing event between var gene Pf3D7_1200600 and acs7. This stabilized the end of chromosome 12 but liberated a free DNA fragment that included 3 var genes and 3 rifin genes (Fig 5A). This DNA fragment then underwent a recombination between one of these var genes (Pf3D7_1200400) and a var gene on chromosome 6 (Pf3D7_0632500), creating the observed Pf3D7_1200400/Pf3D7_0632500 chimera and leading to the transposition of the original end of chromosome 12 into chromosome 6 ( Fig 5B). The genome assemblies place the var gene Pf3D7_1200100 and rifin genes Pf3D7_1200200 and Pf3D7_1200300 on chromosome 6, verifying this event. While this event stabilized the original free DNA fragment, it also liberated the end of chromosome 6, thus generating a new free DNA fragment that includes a var gene (Pf3D7_0632800) and 2 rifin genes (Pf3D7_0632600 and Pf3D7_0632700). This DNA fragment underwent a recombination event between Pf3D7_0632800 and a var gene on chromosome 13 (Pf3D7_1300100), creating the observed chimera and replacing the end of 13 ( Fig 5C). The free DNA fragment generated by this event consists primarily of the highly repetitive TAREs and telomere repeats that are found at most chromosome ends; therefore, it is not possible to determine whether it was subject to additional recombination events.

Recombination events between var genes are not random
Previous analysis of ectopic recombination between var genes indicated that these events are not random but rather display a structured pattern [3]. We similarly observed that certain recombination events appear to be favored. For example, all 3 clonal lines analyzed displayed identical recombination patterns, suggesting that these events are favored when a DSB occurs within this region of the genome. The sites of telomere healing varied in these clones (Fig 2B), indicating that these lines are derived from truly independent initial break and repair events. To confirm this conclusion, we introduced a DSB into the same chromosomal region using the CRISPR/Cas9 construct shown in Fig 2 in an alternative subclone of NF54 called DC-J [25]. Analysis of the resulting parasite population by both PCR and genome sequencing identified the same recombination cascade described in Fig 5 (sequence data available at National Library of Medicine BioProject database, accession PRJNA515738, sample RIF-FLP-DCJ).
Examination of the sites of recombination that resulted in the chimeric var genes identified regions of approximately 95% sequence identity of approximately 50 bp (Fig 4), consistent with the now well-established conclusion that HR is the dominant repair pathway used by these parasites. Thus, it seems likely that DSBs introduced into other regions of the genome would result in alternative recombination cascades.  Given the dominant role of HR in repairing DSBs in malaria parasites and its dependence on stretches of sequence identity for recombination, it is possible that the cascade shown in Fig 5 is the only sequence of events that can result from a DSB at this position of chromosome 12. To determine whether alternative, low-probability recombinations might have also occurred, we isolated genomic DNA from an uncloned population of transfected parasites in which a DSB was targeted to the subtelomeric region of chromosome 12 and subjected it to long-read, whole-genome sequencing. Analysis of the genome sequences from this population identified the expected telomeric healing event and the same recombination cascade observed in D2, B10, and H10; however, we also detected an additional chimeric var gene distant from the site of the DSB, a chimera of var genes Pf3D7_1300100 and Pf3D7_0900100 (sequence data available at the National Library of Medicine BioProject database, accession PRJNA 515738, sample 700CAL-WT). We did not detect this chimeric sequence from genome assemblies obtained from parasites in which we did not induce a DSB, suggesting that it similarly is a product of recombination events induced by the initial Cas9-targeted DSB. However, we were unable to identify additional events linking the creation of this chimeric gene to the original break, and thus it is possible that this chimeric var gene was generated independently.

Evidence for recombination cascades within other chromosomal regions
Our discovery of a sequential cascade of recombination leading to the creation of chimeric var genes resulted from the generation of targeted DSBs within a subtelomeric region of chromosome 12. This recombination cascade appears to be highly reproducible, leading us to question how common similar events are at other regions of the genome and, perhaps more importantly, whether such recombinatorial cascades can be detected in naturally replicating parasites. Upon re-examination of the telomere healing events we previously described [18], we find evidence for a similar recombination that was likely initiated by the release of a free DNA fragment during repair of the chromosome end. Specifically, a telomere healing event resulted in the truncation of approximately 90 kb from the end of chromosome 1. However, rather than being lost, a 13,886 bp fragment of the truncated region was found integrated into a subtelomeric region of chromosome 12 (Fig 6A and 6B), indicative of the type of recombinations linked to telomere healing described in Fig 5. This event did not involve var genes, suggesting that this mechanism is not exclusive to var genes in particular or coding regions in general and instead simply reflects the DNA repair mechanisms that are active in the parasites.
The event described above likely resulted from a DSB induced by exposure to X-ray irradiation. We were curious if similar recombination events could result from spontaneous breaks that occur during normal parasite replication. Several recent papers have described the analysis of clonal parasite lines isolated after extensive growth in culture. Some of these lines displayed multiple recombination events resulting in the creation of several chimeric var genes [17,23,26], suggesting the possibility of recombination cascades. No detailed mechanism was identified for the origins of the chimeric genes in these clones; however, Claessens and colleagues [23] noted that they tended to observe the occurrence of ectopic recombination events in clusters; specifically, that a translocation between 2 subtelomeric var genes highly increased the chance of detecting additional recombination events nearby. Given that such clustered recombination events could result from the type of recombination cascade we observed on chromosome 12, we re-examined the detailed data set provided by Claessens and colleagues to determine whether any of the recombination patterns they reported are consistent with a cascade mechanism. While several clones displayed multiple recombination events that could have occurred sequentially, 3 examples are of particular interest and shown schematically in Fig 6C-6E. Each shows a two-step cascade involving 3 var genes and leading to the creation of  [18] identified a telomere healing event on chromosome 1 (A) that resulted in the translocation of the released subtelomeric fragment into a position near the end of chromosome 12 (B). Spontaneous sequential recombination events were also detected in subcloned parasite lines isolated by Claessens and colleagues [23] (C-D). Whole-genome sequencing of subcloned lines of 3D7 parasites identified multiple recombination events within var genes. For 3 subclones (named BLM-4k, WRN-1h, and BLM-P) closely 2 var gene chimeras. In each case, the central var gene within the recombination cascade has undergone two closely spaced recombinations, with the positions of the recombinations providing the likely order of events as the cascade moves toward the chromosome end. The genome sequence data sets for these clones were generated from 100-bp paired-end reads, from which it is difficult to assemble the highly repetitive sequences near and within telomeres; thus, telomere healing events would have been difficult to detect with confidence. It is therefore not possible to know if the putative recombination cascades shown in Fig 6C-6E were initiated by telomere healing; nonetheless, a sequential cascade of recombinations similar to what we describe in Fig 5 seems highly likely. The recombination events detected in these clones did not involve any of the var genes that underwent recombination after we induced a DSB on chromosome 12, suggesting that this recombinatorial mechanism can likely occur at any subtelomeric region of the parasite's genome.

Discussion
The data presented here describe how a single DSB within the subtelomeric regions of P. falciparum chromosomes can stimulate a sequence of recombination events between var genes, thus accelerating the generation of diversity within this gene family. Two key aspects of Plasmodium DNA repair contribute to this phenomenon: 1) near-complete dependence on HR to repair DSBs and 2) a propensity to use telomere healing when breaks occur with subtelomeric regions. The dependence on HR results from the parasite's inability to use classical nonhomologous end joining (NHEJ) to repair DSBs because of the evolutionary loss of the enzymes required for this pathway [27]. This ensures that all DSB repair events will involve recombination between similar sequences, and thus when breaks occur near or within var genes, chimeric var genes will be the dominant products of repair. Such chimeric genes are likely to maintain intact open reading frames as well as the basic domain structures of the encoded proteins, characteristics observed previously during the generation of chimeric var genes [16,17,21].
The evolutionary loss of the NHEJ pathway resulting in near-complete dependence on HR for DSB repair was initially puzzling [28], particularly given that these parasites exist primarily as haploid organisms and thus generally do not possess a template for repair. However, extensive use of NHEJ to repair DSBs within subtelomeric regions would not only reduce the frequency of the generation of chimeric var genes, it would also lead to the creation of pseudogenes through the frequent introduction of frame shifts and nonsense mutations. Thus, the loss of the NHEJ pathway ensures the use of HR when breaks occur near var genes, thereby driving the diversification of this gene family. Given the importance of antigenic variation in the lifestyle of malaria parasites, this likely provided the selective advantage that led to the evolutionary loss of NHEJ. This is consistent with the dominance of NHEJ for DSB repair in the closely related parasite Toxoplasma gondii [29,30], which does not undergo antigenic variation during infection and thus would not benefit from frequent ectopic recombination events.
The role of telomere healing in driving var gene diversification results from its generation of highly recombinogenic free chromosome fragments, the first step in a sequential recombination cascade that can ultimately result in the creation of multiple chimeric var genes. spaced recombination events were suggestive of sequential cascades of recombination leading to multiple chimeric var genes. The putative order of events was inferred based on the position of the recombinations with reference to the central var gene within each cascade (pink). The initial recombination event for each clone occurred between 2 var genes near the chromosome ends (C), creating a chimeric var gene and releasing a free DNA fragment (D). Recombination of the free DNA fragment with a var gene at another chromosome site resulted in the creation of an additional chimeric var gene (E). Detection of these chimeric var genes was originally reported in supplemental data table S1 of reference [23]. Chrom, chromosome. https://doi.org/10.1371/journal.pbio.3000271.g006 Telomere healing both stabilizes the broken chromosome end and generates a free chromosome fragment that can be resected to reveal sequences that share stretches of identity with subtelomeric regions on other chromosomes. Recombination between such regions not only results in chimeric sequences, it also generates a new free DNA chromosome fragment that can undergo an additional round of resection and recombination. Thus, telomere healing has the potential to initiate a sequential cascade of recombination events that can result in the creation of chimeric genes completely unlinked to the original DSB. Telomere healing is frequently observed in cultured lines of P. falciparum [18,[31][32][33], and evidence indicates that it is a common occurrence in the field as well [34][35][36][37], suggesting that it likely contributes significantly to the generation of sequence diversity in natural parasite populations.
In contrast, one of the first ectopic recombination events described that created a chimeric var gene involved the transposition of a subtelomeric region from one chromosome to another in the apparent absence of healing [38]. This recombination event led to the duplication of one subtelomeric region and the loss of another. It is possible that the displaced subtelomeric region could have initiated a recombination cascade similar to what we describe here; however, this was not investigated. Thus, it is possible that sequential recombination in the absence of telomere healing could also contribute to var gene diversification. Work in human cancer cells has similarly shown that displacement or transposition of chromosome fragments containing telomeres can result in cascades of recombination involving several chromosome ends [39]. In principle, any chromosome break or product of repair, including products of nonreplicative transposition or break-induced repair, that generates an unstable free DNA fragment can initiate additional recombination events throughout the genome [40,41]. These examples from cancer cell lines are consistent with the model we have proposed for cascades of var gene recombination in malaria parasites, suggesting that this process is evolutionarily conserved.
Left unaddressed by the work presented here is the source of the initial DSB that can lead to telomere healing and a cascade of recombination. It is possible that the subtelomeric regions have a rate of DSB formation that is similar to the rest of the genome but that such breaks are often lethal when they occur in the central chromosomal regions and thus are not observed. Alternatively, several studies have suggested that subtelomeric regions are uniquely susceptible to DSB formation, thus driving the process of recombination and generating diversity at an accelerated rate. These studies have described structural elements found within var genes and subtelomeric regions that could serve as a source of DSBs, including secondary structures [21] and G-quadruplexes [22,23]. The concept of hyper-recombinogenic subtelomeric regions is becoming appreciated as a major source of genetic variation in organisms ranging from other protozoans [42] to yeast [43] to plants [44], and it has been proposed that "mild telomere dysfunction" can serve as a significant source of genetic variation driving adaptive evolution [45]. Interestingly, for yeast, it was proposed that sources of telomere instability that favor repair by HR rather than NHEJ would be more likely to drive adaptive evolution [43], a concept consistent with the proposal that loss of classical NHEJ in malaria parasites provides a selective advantage by facilitating increased rates of subtelomeric recombination.
Previous work described structured rearrangements between var genes resulting from recombination events that were "error-free" and maintained the domain architecture of the encoded PfEMP1 [16,17]. Comparisons of var gene sequences and their chromosomal location also found that var genes preferentially recombine with other var genes located within the same chromosomal position and oriented in the same direction with respect to the chromosome end [46]. This recombinational preference is thought to have led to the divergence of type A, B, and C var genes [5,6]. Additional conditions-for example, the alignment of subtelomeric regions within clusters near the nuclear periphery [47], thus bringing specific subsets of var genes into close proximity-might also influence potential recombination events, thereby leading to the structured nature of the recombinations that are observed. These influences appear to be significant because in our study, we observed identical recombination events occurring multiple times independently. Further, BLAST searches identified the stretch of sequence of homology surrounding the site of recombination shown in Fig 4B within var genes on chromosomes 6, 7, 8, 12, and 13; however, we only ever detected recombination between the genes on chromosomes 6 and 13, suggesting the influence of additional constraints.
The cumulative data provided by multiple studies therefore describe a system in which the loss of NHEJ combined with the concentration of variant-antigen-encoding genes within subtelomeric regions leads to the continuous generation of sequence diversity within these gene families. The reliance on HR in the absence of NHEJ limits the number of possible recombination events, thereby preserving the basic architecture of the encoded proteins. This system therefore achieves a balance between the generation of antigenic diversity with constraints on protein structure and function, enabling parasites to maintain chronic infections and efficient transmission between hosts with varying degrees of pre-existing immunity.

Ethics statement
This study uses no human or animal subjects. Human RBCs are used solely to propagate parasites in culture and are obtained from surplus supplies from a commercial source. Weill Cornell Medical College and the NIH have concluded that this does not constitute human subject research and therefore does not require IRB approval.
PCR conditions used throughout the study were initial denaturation at 95˚C 5 min, followed by 32 cycles of 95˚C for 15 seconds, 50˚C for 15 seconds, and 60˚C for 1 min.

Sequencing of parasite genomic DNA and detection of recombination events
Parasite genomic DNA from clones D2, B10, and H10 was extracted using phenol-chloroform extraction, and sequencing libraries were prepared from 1 μg of unsheared gDNA (as measured using the Qubit dsDNA BR assay [Life Technologies, Carlsbad, CA, USA]) using the Oxford Nanopore Technologies ligation sequencing kit version 1D LSK 108 (Oxford, UK). This approach minimizes potential shearing of DNA template fragments and entirely avoids PCR artifacts by ligation of a manufacturer-supplied adapter nucleoprotein complex to native gDNA to facilitate loading of long library molecules into the protein nanopores. Nanopore sequencing was conducted with an Oxford Nanopore Technologies GridION X5 instrument using MIN-106/R9.4.1 flowcells. All samples were sequenced using a standard 48 hr run controlled by Oxford Nanopore Technologies MinKnow software, and raw data were basecalled with Albacore version 2.2.4. FASTQ files were extracted from basecalled FAST5 files with Poretools [52]. Alignments against the reference 3D7 genome sequence (v9.0) were performed using NGLMR [53] and visualized with IGV (version 2.4.16) for manual inspection. The left arm of chromosome 12 served as the initial site of observation, and recombination break points or telomere healing events were identified using the NCBI Blastn tool. All recombination and telomere healing events identified by BLAST were validated by PCR, followed by next-generation sequencing.
Parasite genomic DNA from clones VAR2CAL, RIF-FLIP-DCJ, and 700CAL-WT was extracted from parasites using phenol-chloroform extraction prior to library preparation. 5 μg of gDNA per library was processed using the PacBio 20-kb HMW gDNA ligation-based SMRTbell template prep kit (Pacific Biosciences, Menlo Park, CA, USA) with BluePippin (Sage Science, Beverly, MA, USA) size selection. Genomic DNA was sheared and ligated to sequencing adapters. Libraries were sequenced using v3SMRT Cells, and P6-C5 chemistry using an RSII instrument. SMRT Analysis software (Pacific Biosciences) was used to extract filtered fastq files from raw bax.h5 files produced by the RSII instrument using the subread filtering module in SMRT Analysis. Recombination break points and telomere healing events were validated using the strategy described above.