High genome plasticity and frequent genetic exchange in Leishmania tropica isolates from Afghanistan, Iran and Syria

Background The kinetoplastid protozoan Leishmania tropica mainly causes cutaneous leishmaniasis in humans in the Middle East, and relapse or treatment failure after treatment are common in this area. L. tropica’s digenic life cycle includes distinct stages in the vector sandfly and the mammalian host. Sexual reproduction and genetic exchange appear to occur more frequently than in other Leishmania species. Understanding these processes is complicated by chromosome instability during cell division that yields aneuploidy, recombination and heterozygosity. This combination of rare recombination and aneuploid permits may reveal signs of hypothetical parasexual mating, where diploid cells fuse to form a transient tetraploid that undergoes chromosomal recombination and gradual chromosomal loss. Methodology/principal findings The genome-wide SNP diversity from 22 L. tropica isolates showed chromosome-specific runs of patchy heterozygosity and extensive chromosome copy number variation. All these isolates were collected during 2007–2017 in Sweden from patients infected in the Middle East and included isolates from a patient possessing two genetically distinct leishmaniasis infections three years apart with no evidence of re-infection. We found differing ancestries on the same chromosome (chr36) across multiple samples: matching the reference genome with few derived alleles, followed by blocks of heterozygous SNPs, and then by clusters of homozygous SNPs with specific recombination breakpoints at an inferred origin of replication. Other chromosomes had similar marked changes in heterozygosity at strand-switch regions separating polycistronic transcriptional units. Conclusion/significance These large-scale intra- and inter-chromosomal changes in diversity driven by recombination and aneuploidy suggest multiple mechanisms of cell reproduction and diversification in L. tropica, including mitotic, meiotic and parasexual processes. It underpins the need for more genomic surveillance of Leishmania, to detect emerging hybrids that could spread more widely and to better understand the association between genetic variation and treatment outcome. Furthering our understanding of Leishmania genome evolution and ancestry will aid better diagnostics and treatment for cutaneous leishmaniasis caused by L.tropica in the Middle East.


Introduction
Leishmaniasis is a vector-borne parasitic disease transmitted by sand flies. At least twenty Leishmania species are pathogenic to humans and can cause a spectrum of clinical manifestations, from chronic local ulcers to cutaneous leishmaniasis (CL), and to infection of internal organs in visceral leishmaniasis [1][2][3]. Host genetic variability, host immune responses, sand fly feeding behaviour, environmental factors and parasite species and strain variation all influence the clinical manifestation and the outcome of the infection [4]. Genome sequencing within Leishmania species has mostly shown a lack of a clear association between genetic differences and the clinical pathology [5].
L. tropica is a heterogeneous species complex with a broad geographical distribution across Africa and Eurasia [3,6,7] that causes mainly CL, though visceral leishmaniases has been reported [8,9]. L. tropica is typically an anthroponotic disease, though zoonotic transmission may be possible as well [10][11][12]. Although the burden of leishmaniasis is decreasing globally, regions with L. tropica have increasing rates due to local conflicts and associated population displacement, migration and relocation [3], which exacerbates incorrect CL diagnoses and inadequate access to appropriate healthcare and treatments. CL caused by L. tropica in the Middle East can be difficult to treat: lesions may relapse (called leishmaniasis recidivans) and multiple or prolonged treatments are often necessary [13,14].
Leishmania have specific Phlebotomine sandfly vector compatibilities that affect transmission [15,16]. In most geographic areas, L. tropica is transmitted by its most common vector, the female Phlebotomus sergenti sandfly [17]. Although other vectors like P. arabicus in northern Israel [18] and P. guggisbergi in Kenya [19] also are known. Phlebotomus spp. can be infected by different Leishmania spp., including P. perniciosus [20], P. tobbi by L. tropica and L. infantum [20,21] and P. guggisbergi by L. tropica and L. major [19]. Hybrids capable of transmitting have formed in all these vectors [22]. L. tropica's digenic life cycle has an extracellular promastigote stage where they occupy different regions of the alimentary tract of the vector, and subsequently as amastigotes within mammalian macrophages [23]. During the extracellular growth and development in the vector, Leishmania is capable of non-obligatory meiotic genetic exchange [24].
Evidence of hybridisation has been observed in diverse natural isolates of Leishmania [25-28] and in culture [29]. It is possible but still uncertain that reproduction in Leishmania could be facilitated by genes with high similarity to ones involved in meiosis [24] to allow crossingover, resulting in recombinant chromosomes [30]. As more Leishmania are genetically profiled, the precise mechanisms involved and frequencies of Leishmania meiotic or quasi-sexual events are beginning to become clearer. Classical meiosis like that of the related kinetoplastid Trypanosoma brucei [31] has not been observed in Leishmania-infected sand flies, but there is indirect evidence that it does occur [32]. Although the universal mosaic aneuploidy of Leishmania is inconsistent with classical meiosis, experimental backcrosses can produce hybrids in sand flies with varying degrees of genetic relatedness among parental lines [32]. Possible alternative mechanisms like parasexual reproduction have been considered for Leishmania based on observations in Saccharomyces [32], such as a tetraploid meiotic cycle in which diploid parental cells fuse followed by a meiosis, resulting in diploid F1 [32]. Another model is parasexual reproduction in which a fusion of parental cells is followed by karyogamy, re-shuffling of DNA regions, and gradual loss of chromosome copies during mitoses [24,29,30,33]. There is evidence for this in experimental L. major and L. infantum crosses [34] and Trypanosoma cruzi [31,35]. These mechanisms are not mutually exclusive, and different types of genetic exchange are possible [35,36].
Leishmania hybrids generated in vitro and in sand flies are mostly diploid at the time of culturing and contain approximately equal amounts of genetic material from both parental strains [32,37], suggesting meiosis-like processes. Triploid and tetraploid hybrid offspring have also been observed, which could indicate other mechanisms [24,29,37]. Genome sequencing of triploid hybrids indicated that a triploid F1 (3n) may be a product of fusion between a diploid cell that failed to undergo meiosis (2n) and a haploid cell (1n) [37]. Hybrid strains have been generated experimentally with different Leishmania spp. in sand flies, both within and between species [27,32,[38][39][40]. Natural hybrid isolates have been found in a range of geographical settings, suggesting abundant natural genetic exchange within and between different species that is important for the evolution of the parasite [25-27, [41][42][43] to enhance transmission potential and increase the fitness of the parasites [29,32,34,44]. Leishmania genomes are unstable and lack promoter-dependent gene regulation [45]. Together with post-transcriptional regulations, Leishmania exploits gene dosage by chromosome and gene copy number variation [46][47][48] which are driven by variable environments that maintain expression levels and genetic diversity [49]. Thus, inter-species hybrids may have facilitated the spread of Leishmania to new geographic regions through changes in vector specificity [16]. Although new interspecies hybrids are usually not able to carry out genetic exchange, intra-species hybrids could produce new progeny. The ability to produce hybrids in vitro and in vivo varies between species: it is lower in L. major [24,37] than L. tropica, where a higher frequency of hybrid formation in co-infected sand flies has been observed [32].
Leishmania possesses universal aneuploidy in natural isolates, laboratory strains, in experimental hybrids and during culture [32, [50][51][52]. This mosaic aneuploidy is primarily a result of mitotic asymmetric chromosome allotments [30]. This enhances the genomic variation through differences in gene copy numbers and chromosomal duplication events [47,53]. Aneuploidy may facilitate eliminating deleterious mutations during chromosome loss, and the persistence of an intra-strain genetic heterogeneity may be beneficial for the cell population [54].
In Leishmania, there may be only one single region per chromosome where DNA replication initiation occurs, detected during the S phase [55]. This may lead to incomplete duplication of the larger chromosomes during S phase and an inadequate duplication of the genome prior to cell division [55]. Leishmania employs sub-telomeric DNA replication beyond S phase, which may be less effective in maintaining genome integrity, and in this way contributes to variability and aneuploidy [56].
Gene expression in Leishmania is carried out through polycistronic transcriptional units (PTUs) [57] separated by strand-switch regions (SSRs) where RNA polymerase II can transcribe bidirectionally [58]. Genes are transcribed as multigene pre-mRNAs with a single constitutive transcription start site (TSS) [58,59] and are trans-spliced to form mature mRNAs. This means gene dosage is crucial [46,60] since gene regulation is mostly post-transcriptional [61,62], leading to both intra-and extrachromosomal genome-wide gene copy number variation and mosaic aneuploidy. L. tropica has high levels of allelic diversity and heterozygosity, consistent with frequent full genome-hybridization, most likely due to natural outcrossing [32,63].
Despite previous research in the field, much remains unclear regarding the diversity, evolution, and genetic exchange of L. tropica. In this study, we have investigated 22 isolates of L. tropica from 21 patients, focusing on the genetic diversity within and between the isolates and correlating the results to their geographic sources in Afghanistan, Iran and Syria. The study revealed large-scale intra-and inter-chromosomal changes in diversity driven by recombination and aneuploidy that enhance our understanding of Leishmania genome evolution and ancestry.

Ethics statement
Ethical approval was obtained from the Central Ethical Review Board in Stockholm (2015/ 2162-31). Informed consent was not found to be necessary, as the samples, included in the genome study, and the clinical data were anonymized, and only isolated parasites were studied.

Culturing, DNA extraction, library preparation and short read sequencing
The isolates were stored at -156˚C at the Public Health Agency of Sweden. Promastigotes were cultivated in RMPI 1640 medium with L-Glutamin, HEPES, Penicillin-Streptomycin and fetal calf serum at 23˚C [64]. All isolates were grown for the minimum time necessary to produce adequate parasite numbers to isolate enough genomic DNA (gDNA) for library preparation. A QIAamp Mini Kit (QIAGEN) was used to extract DNA, and Qubit dsDNA BR Assay Kit (Invitrogen, Thermo Fisher Scientific, USA) was used to measure DNA concentrations-all according to manufacturer's protocols. 200 ng gDNA was used for library construction using the Ion Xpress Plus Library Kit following the manufacturer's instruction. The libraries were quantified by an in-house qPCR, MGB (minor groove binder) assay [65]. For template preparation, gDNA libraries were pooled to a final concentration of 30 or 50 pM and clonally amplified using the Ion Torrent Chef system using the Ion 510&520&530 Kit-chef or Ion 540 Kit-chef and then loaded onto an Ion 530 or 540 chip, respectively. Massively parallel sequencing was then performed on an IonTorrent S5 XL instrument (Thermo Fisher Scientific, USA) according to the manufacturer's protocol. The template preparation and sequencing were repeated for each sample until the initial obtained sequencing data corresponded to more than >30x genome-wide coverage.

Read processing, mapping and SNP screening
Sequencing of the libraries resulted in an average of 8.73±4.40 (mean±SD) million single-end raw reads of up to 557 bp in length, with at least 4.14 million raw reads per sample. The quality of the initial DNA read libraries varied considerably, and careful quality control was conducted Table 1. 21 patients with 22 isolates of L. tropica ( � from the same patients) infected in the Middle East (three from Afghanistan, two from Iran, 17 from Syria). Ten had received treatment prior sampling (nine SSG, one cryotherapy) and twelve were cured on first-line treatment after sampling. Six were cured on second-line treatment (3 LA, 3 SSG il). One healed by itself and two were lost after sampling. One patient continued to relapse regarding treatment. �� healed by itself. SSG stands for sodium stibogluconate, LA for liposomal amphotericin, MA for meglumine antimoniate, il for intralesional, im for intramuscular, iv for intravenous, cured for absence of clinical relapse for 6 months after treatment, relapse for recurrence of a lesion after the lesion had healed without any known new exposure to the parasite, treatment failure for absence of clinical signs of re-epithelialisation in the lesion during or within two months after treatment. to ensure only high-quality reads were retained and low-quality reads were discarded (S1 Table). The reads were trimmed including removal of adaptor sequence using the Torrent Suite software with standard settings (Torrent suite v5.2.2) (SRA accession PRJEB45563) resulting in 22 valid short read libraries. Remaining low-quality bases with a base quality <30 were removed with the FASTX-Toolkit v0.0.14 (http://hannonlab.cshl.edu/fastx_toolkit/), and the absence of remaining low-quality reads was verified with FastQC v0.11.5 (www. bioinformatics.babraham.ac.uk/projects/fastqc/). These high-quality reads had lengths of 70-557 bases after screening and an average of 6.98±2.58 million reads per sample (S1 Table). 'The reference genome used for analyses was the L. tropica LRC-L590 (MHOM/IL/1990/ P283) assembly v2.0.2 with annotation from Companion [66] accessed through the Sanger Institute. It was masked using Tantan v13 [67] to exclude repetitive sequences, short tandem repeats, homopolymers and low-quality regions. The screened reads were mapped to this reference with Smalt v7.6 (http://www.sanger.ac.uk/science/tools/smalt-0) with a k-mer length of 19 so that candidate variants could be determined. Initial candidate SNPs (278,616±25,792) were extracted using BCFtools v1.10.2 [68] from which valid SNPs were selected that had base quality >25, mapping quality >30, >4 forward derived alleles, >4 reverse derived alleles, and read coverage >10-fold. This was visualised for each parameter with RStudio v4.0 across the collection (S1 Fig).

Sample ID
Regions with 100 bases of the chromosome edges were excluded. This resulted in a total of 301,659 valid SNPs across all samples, with 170,183±63,717 SNPs per library across the genome, with at least 69,843 SNPs per sample (S1 Table). Excluding contigs not assigned to chromosomes, this meant 169,753±63,717 valid SNPs per sample, with an average of 430±186 valid SNPs on contigs. The association between the numbers of initial reads and candidate SNPs was high (r 2 = 0.243) compared to the association between the numbers of valid reads and valid SNPs (r 2 = 0.028), suggesting that quality-control improved the accuracy of the true variation present. After careful checking, there were 17 mutations at 11 SNPs that were triallelic SNPs where both derived alleles were observed, based on derived allele frequencies <0.9 and read depth >14. SNP density measurements assumed a quasi-Normal distribution across the chromosomes based on Shapiro-Wilk normality tests excluding outlying chromosomes with extreme SNP rates.

De novo assembly
Each read library (except 13_01024) was assembled de novo for all k-mer lengths from 33 to 253 inclusive with a step size of 10 bases with Megahit v1.2.9 [69], retaining contigs of >1 Kb. This optimised the assembly using de Bruijn graph information from different k-mers and can cope well with heterogeneous data whose local copy numbers and read depths varied [69]. Here, it was used to verify large structural rearrangements and has been previously used on Leishmania genomes [70]. These initial de novo assemblies had an average of 8,152±3,025 contigs with an average N50 of 7,816±3,218. They were contiguated using the L. tropica reference with ABACAS [71] to produce assemblies whose chromosomes spanned 32.85±0.94 Mb excluding N bases, which was similar to the reference's length (32.93 Mb). A comparison of the Z-normalised reference chromosome lengths to the Z-normalised de novo ones for all samples indicated that few chromosomes had extreme differences in lengths after Benjamini-Hochberg (BH) p value correction. However, lengths were longer than expected for chromosome 7 for 14_00771 (p = 0.022) and 15_0215 (p = 0.038), and several other chromosomes noted when discussed below.

Long read sequencing, assembly and SNP screening
PacBio long read sequencing of 13_00550 produced 1.05 billion bases across 68,806 reads with a mean length of 15,225 bases and a N50 of 33,952. After quality control and mapping, 64,437 high-quality reads resulted in an average of 40-fold coverage across the L. tropica reference genome. After assembly and polishing by Hierarchical Genome Assembly Process 3 (HGAP3, [72]), the 13_00550 assembly had 2,655 contigs with a contig N50 of 19,171 spanning 32.7 Mb. This assembly masked with Tantan v13 as above and contiguated using the L. tropica reference with ABACAS [71] to produce an assembly whose chromosomes spanned 34,152,472 bp: 29,861,533 bp excluding N bases. This was indexed using Smalt v7.6 and reads for the 13_00550 short read library were mapped to it. After quality control and SNP calling as above using BCFtools v1.10.2, this found 21,663 heterozygous and zero homozygous SNPs across the chromosomes, supporting our SNP calling approach.

Inference of population structure and ancestry
Population structure was evaluated across the 301,659 SNPs by constructing a nuclear DNA maximum-likelihood phylogeny based on 168,327 alignment patterns using RAxML v8.2.11 with a GTR+gamma substitution model [73]. This was visualised with the L. tropica reference as an unrooted phylogeny using R packages ape v5.5 and phytools v0.7-70. To further resolve ancestries within discrete genomic chromosomes and regions, phylogenies per chromosome were constructed as above using RAxML v8.2.11 were visualised with phytools v0.7-70 and dendextend v1.14.0 [74].

Kinetoplast DNA variation
The kinetoplast DNA (kDNA) is a high copy number network of circular minicircle and maxicircle DNA molecules. The 22 libraries were mapped to the masked L. tarentolae reference maxicircle kDNA (20,992 bp, accession M10126.1, [77] using Smalt v7.6. The L. tropica reference maxicircle was tested, but the assembly quality of that contig limited any inferences. Valid SNPs were determined where the mapping quality > 30, the read-depth allele frequency (RDAF) > 0.8, and the read depth of high-quality reads > 10, excluding sites within 100 bp of the maxicircle reference ends. This found 1,594 SNPs distinguishing all of the 22 isolates from the L. tarentolae reference, which phylogenetically clustered most closely with isolate 13_01390. Within the 22 L. tropica isolates, there were 220 variable sites, with a frequency per sample ranging from 9 to 154 SNPs with an average of 82±39 SNPs. The variation was visualised with IGV v2.8.12 with the L. tarentolae annotation. A phylogeny was constructed as above using RAxML v8.2.11 with a GTR+CAT substitution model [73] and using R packages ape v5.5 and phytools v0.7-70. As above, genetically distinct groups were assigned using FastBAPS v1.0.

Aneuploidy and structural variation
Chromosomal somy levels were determined using the read depth coverage per base normalised by the median genomic coverage for each library [78] scaled as the haploid depth. This excluded sites at the first 7 Kb or last 2 Kb of chromosomes [79]. All libraries had at least 21-fold median genome-wide coverage except samples 13_01233 and 13_00550 whose median coverage was 13-fold, so an alternative somy estimation using binning was not essential here. Read depth was determined from the mapping patterns for chromosomal regions using 5 Kb windows to account for local variability. This yielded power to examine large structural changes of >10 Kb: small structural variants and indels were not examined in detail here due to potential artefacts associated with the sequence quality.

Results
We isolated, sequenced and analysed 22 L. tropica isolates from infected patients originally from Afghanistan (n = 3), Iran (n = 2) and Syria (n = 17) to decipher processes driving genome evolution in this unique collection.

Genomic diversity and population structure in L. tropica
The analysis of the sequence data identified 301,659 SNPs across all sample including multiallelic sites and 1,396 unique SNPs in contigs including multi-allelic sites. The isolates had differences in chromosomal diversity, with a range of 69,662 to 277,195 SNPs per isolate (SNPs on contigs were not examined in this study). Most SNPs were heterozygous: the number of homozygous SNPs per sample was 4,886±2,855, corresponding to 0.15 SNPs/Kb, with a range of 2,346 to 13,371. The number of heterozygous SNPs was much higher at 164,886±64,763 per sample, corresponding to 5.0 SNPs/Kb, with a range of 64,391 to 273,463. The numbers of homozygous and heterozygous chromosomal SNPs per sample had a small negative correlation (r 2 = 0.07, S2 Fig), suggesting processes other than genetic drift were contributing to the variation. We validated our careful sequencing, processing and SNP ascertainment approach by long PacBio read sequencing of isolate 13_00550. This resulted in zero genome-wide homozygous SNPs after self-mapping its own short reads to its de novo assembly, and high heterozygosity compared to when the reads were mapped to the L. tropica LRC-L590 reference genome.
Two genetically distinct groups were evident based on the patterns of a chromosome-wide phylogeny constructed with RAxML v8.2.11 [73] and population structure from FastBAPS v1.0 [75] (Fig 1). The first genetically distinct group (n = 7) was related to the L. tropica reference genome and hence is referred to as the reference group. The second (n = 15) was divergent from the reference genome and is here called the non-reference group. Isolates from Syria, Afghanistan and Iran were found in both groups. The two isolates from the same patient in Syria (14_01223 and 17_01604) were both from the non-reference group. Certain isolates with long external phylogenetic branches that were phylogenetically close to being between the two groups had instances of inconsistent group assignment by FastBAPS. For example, 12 out of the 36 chromosomes of 16_00964 were allocated to the reference group by FastBAPS and 24 to the non-reference genome group.

Independent infections in a single patient
The patient sampled twice, in 2014 (14_01223) and 2017 (17_01604), had both isolates taken from a large lesion (diameter 80 mm) at two different locations on the lesion. The patient received treatment between the sampling times but continued to relapse (leishmaniasis recidivans). These two isolates were genetically different from each other with no evidence of a mixed infection (Fig 1): 14_01223 and 17_01604 had 159,535 genome-wide SNPs between them, including 1,616 on chromosome 2, and 15,606 on chromosome 36 (Fig 2).
14_01223 and 17_01604 were both allocated to the same non-reference genetic group, but when examined individually 14_01223's chromosomes 2 and 36 were genetically related to the reference group. 14_01223 had long runs of homozygosity (LROHs) across all of chromosome 2 and at chromosome 36 >1. 63 Mb to the end at 2.715 Mb (Fig 2). In contrast, 17_01604 was heterozygous throughout. This difference in variation was confirmed by the SNPs' RDAF distributions which were skewed for 14_01223 but normal for 17_01604 (S3 and S4 Figs). The chromosome 36 of 14_01223 displayed heterozygosity at <1. 63 Mb, then a LROH at 1.63-1.78 Mb, followed by near homozygosity >1. 80 Mb. Based on inferred patterns from L. major, the 1.63 Mb breakpoint coincided with an acH3 mark, and the 1.78-1.80 Mb region corresponded with a putative transcription start site (TSS) near an inferred SSR. The patient had a mixed infection with two unique isolates, allocated to the non-reference genetic group, but with clear genomic differences.

Varied ancestry and heterozygosity at chromosome 36
Further examination of chromosome 36 in 14_01223 with 14_00642, 16_00075, 16_00964 and 07_00242 presented sharp homozygosity-heterozygosity switches (Fig 3). 07_00242 and 16_00964 had similar patterns like a change at an inferred SSR and putative acH3 mark at 1.42 Mb, where the pattern of heterozygosity changed to a LROH until 1.8 Mb, before moving to homozygous similarity (Fig 2). 16_00075 had a jump in the RDAF at 1.28 Mb (at acH3 and baseJ marks) before a RDAF drop at 1.75 Mb that was also in 14_00642-neither of these samples had LROHs. All these five samples were disomic for this chromosome, except for trisomy in 14_00642 (S4 Fig).
The heterozygous and homozygous SNP densities, phylogenetic patterns, FastBAPS population assignment (S5 Fig), and mean RDAF consistent with somy levels at 0-1.28 Mb of chromosome 36 matched the genome-wide patterns and main trends in other chromosomes (Fig  3). The region at 1.28-1.60 Mb was heterogeneous due to the higher number of potential recombination breakpoints (Fig 3). At 1.60-1.78 Mb, 14_01223, 07_00242 and 16_00964 had This deviation in genetic ancestry continued for the rest of the chromosome, and all four strains were found to be related to the reference group, which was surprising for 07_00242, 16_00964 and 16_00075 had been assigned to the non-reference group at the whole-genome level. 16_00075 was unambiguously disomic for this chromosome, implying that the change in the main RDAF peak of~0.5 at 0-1.28 Mb to a peak of~0. 75

Consistent recombination breakpoints at tetrasomic chromosome 31
The Leishmania chromosome 31 often shows a pattern of tetrasomy and high heterozygosity and this was also observed here (S4 Fig), indicative of likely homologous recombination across all chromosome copies and possible accelerated mutation rates [49]. Consistent with this faster chromosomal substitution rate, the haploid L. tropica reference had high divergence at this chromosome (average across 22 samples 8.19±2.15 SNPs/Kb) compared to the other 35 chromosomes (5.01±0.66 SNPs/Kb, t = -34.8, p = 4.4x10 -16 ). This was due to a high rate of heterozygous SNPs (average 8.13±2.17) relative to the other 35 chromosomes (5.85±-0.55 SNPs/Kb, t = -36.6, p = 4.4x10 -16 ), but this was not evident for homozygous SNPs (0.06±0.06 vs 0.15±0.17 SNPs/Kb). The RDAF distribution of chromosome 31 of heterozygous SNPs was second lowest of the chromosomes (S7 Fig), suggesting more recombination resulting in consistent heterozygosity. The FastBAPS genetic group allocation was consistent with differing heterozygous SNP rates between the reference and non-reference groups, potentially indicating a lack of genetic exchange between these groups for this chromosome (Fig 4).
Chromosome 31 had sharp changes in heterozygosity at 220 Kb and 950 Kb in samples 15_02015 and 15_01620, and at 950 Kb in 15_02597 (for which chromosome 31 was pentasomic) (Fig 4).
In 15_01620, the major RDAF switched from a mode of 0.75 at <220 Kb to 0.5 at 220-950 Kb, before increasing to 0.75 at >950 Kb (Fig 4). In 15-02576, the major RDAF modes were 0.4 and 0.6 <950 Kb, followed by 0.6 at >950 Kb (Fig 4). 15_02597's RDAF had a major peak of 0.5 <950 Kb and a jump to 0.8 at >950 Kb (Fig 4). In 16_00075, the major RDAF peak was between 0.5 and 0.6 throughout (Fig 4). 15_02015's RDAF peaks were at 0.25 and 0.5 <220 Kb, then 0.5 at 220-950 Kb, before 0.75 at >950 Kb (Fig 4). 220 Kb and 950 Kb are at TSSs containing acH3 marks, potentially matching TSSs inferred from experiments in L. major. These results demonstrate the high level of recombination at discrete breakpoints on this chromosome.

A geographic basis for unique ancestry and variation at chromosome 2
Sample 16_00964 from Iran, which also had mixed heterozygosity with putative novel ancestry at chromosome 36, also showed the same pattern on chromosome 29 at >440 Kb, and this was shared with sample 07_00242 (at >490 Kb) from Iran as well. Both had heterozygosity up to 440 (16_00964) or 490 (07_00242) Kb as did the other 20 samples, followed by a LROH from this point to the end of the chromosome at 1,520 Kb (Fig 5). This 440-490 Kb region was near a TSS and a SSR. The genetic variation at >490 Kb in 16_00964 or 07_00242 was unlike the other samples and the reference genome, suggesting ancestry from an unsampled lineage ( Fig  5). A similar picture of genetic exchange in another L. donovani lineage has been described near the region [40].
All three samples from Afghanistan (13_01233, 15_02480 and 16_00075) had 1.7-fold nor- This heterozygous deletion or contraction was present in the Iran-and Syria-linked samples from both the reference and non-reference genetic groups, and not the Afghanistan-linked samples, again from both reference (13_01233) and non-reference (15_02480 and 16_00075) groups (Fig 1).

Geographic structure associated with heterozygosity loss on chromosome 23
Like chromosomes 29 and 36, the region <250 Kb at chromosome 23 had a segment of ancestry that was distinct from the reference and non-reference patterns with high heterozygosity in five samples (all from Iran or Afghanistan) that formed their own genetic group in the Fas-tBAPS population assignment (S10 Fig). This difference in ancestry was due to the high level of homozygous differentiation from the reference genome of the other 17 samples from Syria at <250 Kb, unlike the high heterozygosity seen in 07_00242 and 16_00964 (both Iran), and 13_01233, 15_02480 and 16_00075 (all Afghanistan) (Fig 6). At >250 Kb, all samples conformed to their genome-wide pattern, such that 07_00242 and 13_01233 reverted to membership of the reference group, and 15_02480, 16_00075 and 16_00964 clustered with the nonreference group (Fig 6).
The near-uniform disomy of this chromosome in our isolates (average 2.15±0.27) and evidence of genetic drift from the high positive correlation of heterozygous and homozygous SNP densities in the 17 samples with LROHs suggested that recombination and somy changes were rare at this chromosome. We confirmed these results by read mapping the Ion Torrent library of 13_00550 to its own PacBio assembly to verify an absence of heterozygous SNPs before the 260 Kb position, followed by the heterozygosity >260Kb observed when mapping to the reference genome. This contrasted sharply with previous work on the L. donovani complex where chromosome 23 was typically trisomic with extensive diversity and encoded an amplification of the H-locus drug-resistance region [80].

Monosomy and trisomy on chromosome 10
Chromosome 10 was disomic in all isolates except 14_00642, which was monosomic at 20-250 Kb and approximately trisomic at <20 and >250 Kb (Fig 7). These results based on the normalised haploid depth from read mapping to the L. tropica reference genome were replicated in mapping to the contigs of a de novo Ion Torrent assembly of 14_00642 (Fig 7), and again when mapping to the PacBio 13_00550 assembly, and these unusual patterns were confirmed visually in IGV (S11 Fig). 14_00642 had a much higher rate of homozygous SNPs at chromosome 10 than the other samples (S12 Fig). The sub-telomeric regions of this chromosome could be amplified, coinciding with potential SSRs at 20 Kb and 520 Kb based on L. major experiments (Fig 7). Its RDAF distribution had peaks at~0.67 at >250 Kb and at~0.33 at >520 Kb, consistent with trisomy or possibly a mixed cell population. The switch from monoto tri-somy was near a putative SSR at~270 Kb, 5' of the inferred centromere at~295-301 Kb [81] (Fig 7). 14_00642's pattern of low heterozygosity and LROHs was not in the other isolates except 15_02480, which had a LROH at >520 Kb, but had uniform disomy, like the other 20 samples (S12 Fig). The 230-250 Kb region contained numerous reference genome gaps and regions of low read mapping coverage, preventing clear delineation of a hypothetical break between the putative 5' chromosome 10.1 and 3' chromosome 10.2. Despite this, it is possible that this centromeric region delineates the formation of an acrocentric chromosome.

Aneuploidy as a mechanism for more allelic variation, recombination and adaptive paths
We found 17 mutations at 11 SNPs that were triallelic, such that two derived alleles were present. These were in 14 of the 22 samples, which had between four and eleven such multi-allelic SNPs each. The somy level of chromosomes with triallelic SNPs (4.32±0.39) was higher than that of chromosomes with biallelic SNPs (2.11±0.33, p = 4.5x10 -7 ). In addition to these unusual LROHs at chromosomes 10, 23, 29, and 36 above, we observed rare switches to homozygosity for chromosomes 2 in 14_01223 (entire chromosome, S3 Fig), 4 in 07_00242 (entire

Discussion
Here, we presented the genetic diversity and high degree of genomic variation within 22 isolates of L. tropica and found 301,659 SNPs across all samples, a number comparable to previous work [46]. We detected high heterozygosity and diverse LROHs, resulting in no clear correlation between the number of homozygous and heterozygous chromosomal SNPs per sample, in line with prior studies. This implied recombination and aneuploidy as key factors driving patchy genome-wide heterozygosity arising from older hybridisation events. In particular, chromosome 36 had shared patterns of heterozygosity followed by homozygous differentiation and then homozygous similarity, all on the same chromosome in 14_01223, 07_00242 and 16_00964. Our pattern of recombination breakpoints was similar to or exceeded rates of one cross-over per 1.5 Mb [32] or per 2.3 Mb [40], though an exact rate within L. tropica remains unclear. Across the genome, the extensive nature of the LROHs were indicative of regular recombination, and this pattern was present in both the main reference and non-reference genetic groups. These patterns suggested an increase somy allowed more recombination between chromosomes, a higher rate of derived allele accumulation, and hypothetically more diverse options following reversion to disomy associated with random loss of one chromosome copy.
Chromosome fission and fusion arise due to errors in chromosome replication and are rare in Leishmania. They tend to correlate with SSRs [82] that may be detected from histone H3 acetylation marks in L. major [55,83] and may preserve transcription of the PTUs. All except 14_00642's chromosome 10 were normal, and the 230-250 Kb region contained numerous reference genome gaps and regions of low read mapping coverage, preventing clear delineation of a hypothetical break between a putative 5' chromosome 10.1 and a 3' chromosome 10.2. Consequently, if a model of transient monosomy is excluded, then 50% of 14_00642's cells may have an intact disomic chromosome 10 and 50% may have a tetrasomic chromosome 10.2. Alternatively, all cells may have a disomic chromosome 10.2, and 50% may have an intact disomic chromosome 10 as well. Due to the effects of PCR during library preparation and the single-ended nature of these read libraries, they were insufficient to address subsequent questions on the location and sizes of the telomeric and centromeric DNA regions at new hypothetical chromosomes, though a variety of mechanisms exist to ensure cells can mitigate these chromosomal changes [84].
Telomeric amplification helps stabilise the genome during replication [85]. In Leishmania, the maintenance of telomeres occurs through rolling circle replication via extrachromosomal amplification, as in Kluyveromyces lactis [86,87] rather than telomeric loop formation [78]. And outside of the S cell cycle phase, DNA replication has been detected proximal to the chromosome telomeres, resembling a subtelomeric DNA synthesis activity [56]. Our isolates had extensive aneuploidy and higher heterozygosity than previously reported [46]. Recombination breakpoints coinciding with TSSs or SSRs were observed on numerous other chromosomes, which might affect DNA replication.
L. tropica likely has a higher capacity for interspecies mixing compared to other Leishmania and can create hybrids in vivo and in vitro [24,32,37,40,88]. L. tropica has a high efficiency of hybrid formation in co-infected sand flies and potentially greater capacity to generate matingcompetent hybrids [32]. Genetic exchange occurs during reproduction by promastigotes in sand flies, and patterns of high and varied haplotype diversity were apparent in our results. This raises questions on the role of vector-species compatibility as a trigger for a sexual reproduction, and the potential for different sand flies to have varying microenvironments that differentially facilitate genetic exchange and hybrid generation. An important emerging question thus is the molecular basis of L. tropica's broad vector compatibility, and how local environments mediate this [80,89].
Mosaic aneuploidy may originate through multiple mechanisms, not limited to meiotic nondisjunction, gene conversion [29], mitotic nondisjunction resulting in misallocating chromosomes [47], recombination-directed replication inferred from work on related parasites [55], and chromosomal replication followed by genome erosion [46]. To maintain the mosaic aneuploidy, Sterkers et al [30] argued for a parasexual reproduction, rather than a classical meiosis-like cycle, based on asymmetric chromosome allotments during mitosis. The distinctive heterozygosity blocks and aneuploidy among our isolates could be explained by either meiosis followed by selfing and chromosome copy number variation, or alternatively parasexual reproduction involving a tetraploid intermediate, or indeed both [88]. Recombination involving mobile genetic elements have also been suggested as a possible mechanism for the creation of mosaic aneuploidy [90].
Effect of long-term culturing of Leishmania and the environmental condition in vitro have previously been studied [91,92]. Structural variation, changes in ploidy [93] and altered gene dosage [47,94] has been observed in culture. Our isolates were cultured when diagnosed, before being stored at -156˚C and then re-cultured as promastigotes in axenic culture in vitro for two to six weeks before being DNA sequenced. Consequently, polymorphisms may have arisen during these steps and indeed some may have been lost. Moreover, although our study had the advantage of longer reads than typical Illumina runs, it lacked the information to resolve structural changes using paired read information and higher read depth would yield more precision in assessing copy number variation across chromosomes and subtle changes in heterozygosity. Notwithstanding our long read sequencing of one isolate, improving the quality of our reference genomes can spur new insights into regions with high structural variability and complex ancestry that may have been missed in this study.

Conclusion
The high diversity, frequent changes in heterozygosity and abundant aneuploidy in these 22 L. tropica isolates were consistent with genetic exchange, possibly by sexual or parasexual mechanisms. Different frequencies of recombination have been observed between different Leishmania populations [80]. L. tropica-L. donovani hybrids that have spread to geographically distant regions though diverse Phlebotomomus spp. by genetic exchanges to adapt to the vector have been described [16,28,89]. It remains unclear if this is due to intrinsic differences in Leishmania or due to differences in vector species. What is clear is that L. tropica has adapted to several vector species, which seems to increase hybridization, and can produce new progeny with genetically highly similar or extremely distinct Leishmania. The diverse ancestries, mixing and existence of hybrids may help explain the challenges in treating CL and leishmaniaisis revidans caused by L. tropica in the Middle East, but further research needs to be carried out to better understand the correlations between genetic diversity in Leishmania, the differences among the vectors and treatment outcome.
Supporting information S1 Table. The number