Genome wide comparison of Ethiopian Leishmania donovani strains reveals differences potentially related to parasite survival

Leishmania donovani is the main cause of visceral leishmaniasis (VL) in East Africa. Differences between northern Ethiopia/Sudan (NE) and southern Ethiopia (SE) in ecology, vectors, and patient sensitivity to drug treatment have been described, however the relationship between differences in parasite genotype between these two foci and phenotype is unknown. Whole genomic sequencing (WGS) was carried out for 41 L. donovani strains and clones from VL and VL/HIV co-infected patients in NE (n = 28) and SE (n = 13). Chromosome aneuploidy was observed in all parasites examined with each isolate exhibiting a unique karyotype. Differences in chromosome ploidy or karyotype were not correlated with the geographic origin of the parasites. However, correlation between single nucleotide polymorphism (SNP) and geographic origin was seen for 38/41 isolates, separating the NE and SE parasites into two large groups. SNP restricted to NE and SE groups were associated with genes involved in viability and parasite resistance to drugs. Unique copy number variation (CNV) were also associated with NE and SE parasites, respectively. One striking example is the folate transporter (FT) family genes (LdBPK_100390, LdBPK_100400 and LdBPK_100410) on chromosome 10 that are single copy in all 13 SE isolates, but either double copy or higher in 39/41 NE isolates (copy number 2–4). High copy number (= 4) was also found for one Sudanese strain examined. This was confirmed by quantitative polymerase chain reaction for LdBPK_100400, the L. donovani FT1 transporter homolog. Good correlation (p = 0.005) between FT copy number and resistance to methotrexate (0.5 mg/ml MTX) was also observed with the haploid SE strains examined showing higher viability than the NE strains at this concentration. Our results emphasize the advantages of whole genome analysis to shed light on vital parasite processes in Leishmania.


Introduction
Leishmania donovani, together with L. infantum, are the main causative agents of visceral leishmaniasis (VL). The World Health Organization (WHO) estimates that this disease causes an estimated 200,000 to 400,000 new VL cases worldwide, and >40,000 deaths yearly. The majority of VL cases occur on the Indian subcontinent, Brazil, and East Africa with most cases in the latter region found in Sudan, South Sudan and Ethiopia [1]. While treatment regimens for VL, including combination therapy based on existing drugs, have improved safety and prognosis, they are still suboptimal, and new drugs are urgently needed. High parasite resistance on the Indian subcontinent to the pentavalent antimonial sodium stibogluconate (SSG) has led to its discontinuation, however SSG is still part of the primary treatment regimen for VL in Ethiopia. The situation in Ethiopia is further complicated by the presence of AIDS, a leading cause of adult illness and death in this country [2,3]. Between 20-40% of VL patients are co-infected with HIV and relapses, up to 50% at a year post-treatment, are frequently observed [2,4]. Treatment following relapse normally utilizes alternative drugs like liposomal amphotericin B, pentamidine and paromomycin. Paromomycin is an antibiotic recently approved to treat VL in India and is in clinical trials in Africa [5]. Differences in the dose of paromomycin required to treat VL patients from northern Ethiopia (NE) and Sudan, as compared to southern Ethiopia (SE) and Kenya have been reported [5,6]. Interestingly, other differences in the ecology, sand fly vectors and parasites have been described between NE and SE. Endemic regions of NE are typically semi-arid, with commercial monoculture fields and scattered Acacia-Balanite forests [7][8][9]. Phlebotomus orientalis is the primary vector in this region. On the other hand, transmission in SE occurs in areas of savannah and forest where termite mounds abound; and Phlebotomus martini and Phlebotomus celiae have been implicated as vectors [7,8,10,11]. Molecular characterization including multilocus enzyme electrophoresis (MLEE), multilocus microsatellite typing (MLMT), protein and DNA sequence analysis of individual genes, and k26-PCR targeting the hydrophilic acylated surface protein B (HASPB) repeat region have been used to characterize the L. donovani complex in East Africa [7,[12][13][14][15]. MLMT analysis indicated that different genetic populations and subpopulations are present in NE and SE [13].
Recent developments in whole genome sequencing (WGS) and computational analysis allow the in-depth exploration and comparison of leishmanial genomes at a high level of resolution and accuracy [16][17][18]. Genomes of individual Leishmania species [18]; of L. infantum or L. donovani strains in limited geographic regions of Turkey or India/Nepal respectively [16,19,20], and strains showing differences in drug resistance and tropism [16,[20][21][22][23][24] have been analyzed by WGS. In this report 41 patient strains and clones from NE and SE are analyzed and compared by WGS. This study provides insights into the population structure, and genetic differences of parasites circulating in the distinct ecologies of Ethiopia. The evidence of genomic variation between the two L. donovani populations (NE and SE) may provide an additional insight about parasite virulence, development of drug resistance and give new directions for new treatment strategies.

Results
WGS of 41 L. donovani parasites, both clones and isolates, from 15 Ethiopian patients was carried out and analyzed (S1 Table). All 18 patient strains were isolated in the years 2009 and 2010 in NE (n = 9) or SE (n = 10). The parasites analyzed were isolated from spleens, bone marrows and skin lesions. Five of the patients had AIDS at the time the parasites were isolated; and in three cases, two strains obtained from different organs (either spleen and skin, or spleen and bone marrow), were analyzed. None of the patients received treatment for VL prior to parasite isolation. A Sudanese strain, isolated in 1998, was also included in some analysis (S1 Table).

Aneuploidy
Chromosome copy number was predicted based on whole chromosome median read coverage as described in Material and Methods. The predicted values for each chromosome in the patient isolates and their clones are given in Fig 1 and S2 Table. Normalized read depth, showed that 68.5% of the chromosomes had a predicted chromosome (PCHR) copy number of 2 ± 0.5, i.e. disomic, with smaller percentages showing higher copy numbers; i.e., trisomic (3 ± 0.5) in 23.4%, tetrasomic (4 ± 0.5) in 6.7%, pentasomic (5 ± 0.5) in 1% and hexasomic (6 ± 0.5) in 0.4%, similar to what was previously reported for Nepalese L. donovani strains [16]. Several strains, mostly patient isolates and clones belonging to the SE population; show intermediate ploidy (mixoploidy) for specific chromosomes, e.g. chromosomes 1, 6 and 23 (S2 Table), suggestive of parasite populations with mixed polysomic diversity, perhaps due to variation in chromosomal amplification between individual cells in culture. Differences in chromosome aneuploidy between SE patient strain and its clones was not significant (unpaired two samples student's t-test, p = 0.2), suggesting that aneuploidy in the clonal population is not due to selection of clones exhibiting different ploidy for identical chromosomes. On the other hand, there is a significant difference in chromosome aneuploidy (unpaired two samples student's t-test, p = 0.028) when SE parasites, patient and clones, were compared to NE parasites, patient and clones.
Cluster analysis based on chromosome copy number provides insight on three levels: first, that similarities and differences exist between all parasites examined; second, validation that clones derived from the same patient strain are highly similar; and third, that differences in chromosome copy are not correlated with the parasite geographic origin. Similar to earlier studies [16,19,20,25] where aneuploidy was examined in clinical isolates of Leishmania from India and Nepal (L. donovani), in sand flies from Turkey (L. infantum), and in laboratory strains (L. major, L. braziliensis, L. donovani, L. infantum, L. mexicana and L. panamensis); all the Ethiopian L. donovani lines examined showed aneuploidy, and each has a different karyotype. Seven chromosomes (17, 25, 27, 30, 32, 34 and 36) were disomic in all lines examined, as was chromosome 3 with the exception of one strain AM554. Four of these chromosomes (30, 32, 34 and 36) were used for normalization of average chromosome read coverage as described by Rodgers et al [18]. Interestingly, disomy was also observed for five of these chromosomes, 17, 25, 30, 34 and 36, in all the 17 Indian and Nepalese L. donovani lines originally studied [16], and in almost all 206 strains from the Indian subcontinent recently examined [20]. Several other chromosomes (18, 19, 21 and 28) that were disomic in all the Indian and Nepalese lines showed somewhat more diverse ploidy (2 to 3 copies) in the Ethiopian lines. Chromosome 31 was polysomic in all the lines used in this study. Even GR363sk/cl.I and GR363sk/cl. II, which showed the least aneuploidy of all the lines examined, were still trisomic for chromosome 31. Chromosome 31 is polysomic in every Old World Leishmania strain examined to date [16,19,25]. Chromosome copy number was compared using clones from several strains (AM560, n = 4; GR356, n = 9; GR363sp, n = 3; GR363sk, n = 3; GR364sp, n = 3; GR383, n = 5). In one case (GR364sk, n = 3) two clones and the original patient strain were examined. Validation of aneuploidy similarities was carried out using clustering based on ploidy patterns taking into account all 36 chromosomes. What is readily apparent, from both the heat map and dendrogram (Fig 1), that in most cases karyotypes of clones isolated from the same strain are highly similar. For instance, the karyotypes of the five GR383 clones (I, II, X, XII and XIII) are almost identical (unpaired student's t-test, p > 0.73), and fall in a tight cluster. This is also seen for all the clones of GR363sp, GR363sk, GR364sk and AM560, and 8/9 clones of GR356. Only two clones, GR356/cl.I and GR364sk/cl.II, show karyotypes significantly different from their sister lines or the patient strain from which they were derived, and fail to cluster with the former (Fig 1). Interestingly, clone GR356/cl.1 groups with its sister lines based on SNP analysis (Figs 2 and 3), and exhibits a k26-PCR amplicon identical to the other NE strains (290 bp). Of note, clone GR364sk/cl.II had a k26-PCR amplicon (450 bp) similar in size to SE strains, even though it was derived from GR364sk, a NE strain with a 290 bp amplicon typical of the NE region [7]. This strain was isolated from a HIV-VL patient, and the sister clone, GR364sk/cl.I which is very similar to the patient strain also has a k26-PCR product of 290 bp. Clone GR364sk/cl.II also groups with the SE strains by SNP analysis (Figs 2 and 3), but is distinct from them suggesting that this patient might have had a mixed infection.
Cluster analysis of the karyotype data doesn't separate the SE and NE strains by geographic region instead they are mixed together, interspersed among each other. Despite this, some differences in chromosome copy number between the two populations are apparent. Chromosomes 13, 15, 16, 23, 28, and 31 show significantly higher average ploidy for SE strains compared to NE strains, while chromosome 4 is the only chromosome where the NE strains show a significantly higher average ploidy than the SE population (Table 1 and S2 Table). In addition, we found that SE strains show a significantly higher average total chromosome somy (2.52 ± 0.09) than the NE strains, (2.35 ± 0.06; paired t-test, p = 0.0016), perhaps indicating that the SE strains tend towards higher polyploidy than the NE strains (S2 Table).
Finally, parasites from different organs of three HIV-VL co-infected patients were examined. Parasites isolated from the skin or spleen of the same patient form separate groups, and the karyotypes of parasites isolated from the respective sites show greater differences, in most cases, than clones generated from the same site, either skin or spleen. GR364 spleen and skin strains, with the exception of GR364sk/cl.II, form separate clusters (Fig 1). Overall these five lines (skin-original patient strain and clone I, versus spleen-all three clones) show no significant difference in chromosome ploidy distribution, i.e. karyotype. However, when each GR364 spleen and skin chromosome was compared on an individual basis, 6 out of 36 chromosomes (Ld4, 5, 8, 14, 20 and 31) show significant differences in ploidy (p = 0.0004, 0.04, 0.01, 0.05, 0.01 and 0.001, respectively). This probably accounts for the fact that the skin and spleen strains form separate subgroups (Fig 1). Likewise, GR363sk and GR363sp taken from the skin and spleen, respectively, of the same patient cluster in separate branches of the dendrogram (Fig 1). Significant overall differences in ploidy between the GR363 skin and the spleen clones (two tailed paired t-test, p = 0.0067) was observed. Interestingly, the spleen clones from this patient demonstrate on average higher ploidy than the skin clones. Finally, strains isolated from the spleen and bone marrow of one patient, LDS373sp and LDS373bm respectively, also show different karyotypes (7/36 chromosomes differ), even though they cluster together on the dendrogram. These parasites also show different k26-PCR fragment sizes, gene CNV and SNP profiles. An overall difference in ploidy was found in chromosomes 1, 4, 6, 20 and 35 by the comparison of all spleen against skin clones. The differences in Each colored square in the matrix indicates the percent SNP similarity for a strain/clone listed on the right compared to strain/clone listed along the bottom of the matrix. The heat map was generated with R packages; stats and gplots. The green range indicates on identity level ! 70% and the pink < 70%. Darker green/pink point to higher or lower identity, respectively. . Two NE parasites, GR364sk/cl.II and the atypical LDS373bm, fall with the SE population. Despite the short distance between typical SE (red circles) and NE (black circles) population, these two populations are physically well defined. Panel B. PCA of 39 NE and SE lines without atypical SE strains. The typical SE population (red circles) is very homogenous and falls in a tight cluster unlike the NE population, which is very heterogeneous but still distinct from the SE population. LDS373bm falls equidistance between the SE population and LDS373sp both isolated from the same patient. Line GR364sk/cl.II still clusters together with the SE population (black circle on the middle of the SE cluster). Interestingly lines from spleen and skin taken from the same patients (GR363sp and sk; and GR364 sp and sk) are clustered together and create a well-defined population. chromosome ploidy of strains isolated from different organs may result from clonal selection of the parasites in the host due to specific selective pressures at the infection site.

Single nucleotide polymorphism analysis
SNP calling, compared to the L. donovani reference strain (BPK282A1), for each of the strains and clones was carried out as described in material and methods. The two parasite populations, SE and NE, show significantly different number of SNPs on average,~153K and~168K (p < 0.043) respectively, compared to the Indian reference strain (R). In addition, the percentage of homozygous and heterozygous SNPs within each geographic population (S3 Table), represented by a single alternate allele (A), show significant differences i.e., for the SE (mean homozygous AA = 87.2% and heterozygous RA = 12.7%, respectively, p<0.05) and the NE populations (mean homozygous AA = 83.3% and heterozygous RA = 16.6%, respectively, p<0.05). The relationship between SE and NE strains and clones based on whole genome SNP pair-wise analysis is shown in Fig 2. Each colored square in the matrix indicates the percent SNP similarity for a strain/clone listed on the left compared to strain/clone listed along the bottom of the matrix. Unlike the chromosomal aneuploidy profiles (Fig 1), SNP population analysis divides the L. donovani population into large clades or groups based on geographical distribution (Fig 2). This is easily seen both in the hierarchical cluster tree where SE and NE parasites each form separate groups with clones from each strain showing highest similarity to each other (Fig 2A) and the heat map ( Fig 2B). The only exceptions are three atypical strains/ clones (AM422, AM553, and LDS373bm) which fall outside the main clades, and clone GR364sk/cl.II, as mentioned above, which groups with the SE rather than the NE parasites. The two atypical SE strains; AM422 and AM553 fall closest to the NE clade, yet are distinct from the NE strains. Interestingly, LDS373bm does not cluster with the spleen strain, LDS373sp, isolated from the same patient, even though the karyotypes are similar. AM422, AM553 and LDS373bm seem to have SNP profiles falling between the NE and SE populations.
Principal component analysis on the SNPs was also used to examine the population structure. SNPs showing high linkage disequilibrium were removed prior to analysis by SNP pruning [26] (S4 Table). As can be seen in Fig 3A, two clusters or near-clades containing most the NE (black circles) or SE (red circles) strains and clones are observed. The two atypical SE strains, AM422 and AM553, are outliers falling far outside both clusters (EV1 = 0.229, EV2 = -0.954 and EV1 = 0.945 EV2 = 0.265 respectively), while the two atypical NE strains, GR364sk/ cl.II and LDS373bm, group together with the rest of the SE population confirming the results shown in Fig 2. PCA was repeated excluding the two atypical SE strains to allow better separation of the remaining 39 isolates (Fig 3B). All the SE strains still group together in one dense cluster that includes GR364sk/cl.II suggesting that these strains are more homogenous, Whole genome sequence analysis of Ethiopian Leishmania donovani however the NE strains show more diverse distribution with each strain and its clones forming a separate cluster. LDS373bm no longer groups with the SE strains. SNPs in protein coding regions were examined, and SNPs unique to the SE and/or NE populations identified. No significant difference in the percentage of synonymous, nonsynonymous or nonsense mutations was found between SE and NE parasite populations: 47%, 52.6% and 0.4% versus 50.0%, 49.8% and 0.2%, respectively. Altogether 683 common genes containing at least one SNP causing either a nonsynonymous or a nonsense mutation are present in both NE and SE parasites. The remaining SNPs resulting in nonsynonymous or nonsense mutations are only found in either SE (412 genes) or NE (595 genes) parasite populations (S5 Table). As such, SNPs in these genes are unique markers for parasites in each geographic region.
Gene Ontology enrichment analysis of the "unique" SNP containing genes found in each population indicates that the proteins are involved in different biological processes (S6 Table). A web based semantic cluster algorithm, REVIGO, was used to remove redundant GO terms [27]. After removal of redundant GO terms the remaining terms were graphed as scatterplots in two-dimensional space according to semantic similarity. Semantically similar GO terms should remain close together in the plot, and size of the circle indicates the frequency of GO term. Unique SNPs in NE parasites (Fig 4A and S7 Table) are associated with biological processes such as tRNA aminoacylation for protein translation, glutamine family metabolism, regulation of transferase activity e.g. protein kinases, and phosphate ion transport, while those in SE parasites are primarily associated with cation transmembrane transport, purine nucleoside triphosphate and nucleobase metabolism and DNA conformation ( Fig 4B). Similar differences are also noted when the molecular functions of the genes with unique SNPs are analyzed. Unique SNPs in the NE population are concentrated mainly in genes involved in glutamine family biosynthesis and metabolism, tRNA aminoacetylation, pyrimidine metabolism, and cyclins involved in protein kinase regulation during cell division. On the other hand, unique SNPs associated with the SE population are found in genes such as glutathione metabolism, protein translation initiation and elongation factors, transport and oxidoreductase activity (S5-S7 Tables).
Several genes associated with the development of leishmanial drug resistance also contain nonsynonymous SNPs and/or nonsense mutations. A unique SNP, only present in the NE population, was identified in the aquaglyceroporin (LdBPK_310030) gene, a protein that plays a role in trivalent antimony (Sb III ) uptake, located on chromosome 31 [20]. This unique heterozygote nonsynonymous mutation g.7444A>T causes an amino acid exchange (Ser251Thr) in TML-5 of aquaglyceroporin, and is only found in the NE population (S5 Table). The MRPA gene (PGPA) encodes an ABC-thiol transporter (LdBPK_230290.1) that sequesters thiol-Sb conjugates and is also involved in antimony resistance [28]. This gene contains several unique nonsynonymous SNPs unique to the NE (four homozygote and one heterozygote), and SE (two homozygote) populations (S5 Table). It is not clear how these unique SNPs affect transporter function, as no difference in response to antimonial chemotherapy between L. donovani isolates from NE and SE has been reported.

Gene copy number variation
Comparative read coverage was examined for 41 SE and NE isolates using a sliding window (5000 bp) in order to detect genomic copy number variation (CNV) as described in Material and Methods. Chromosome somy was not taken into account at this stage. While both increases, and decreases in CN were observed (S8 Table), increases in CN (>2) were more prevalent occurring 83% of the time. In addition, a significant overall difference (p < 0.00001) in average CNV between the NE and SE populations was noted ( Table 2). Sixty-two different genes showed significant differences in CN between the SE and NE populations (S9 Table and S1 Fig). In the SE strains, genes with an average CN < 1.5 are primarily found on chromosomes 10, 11, 12, 22, 27, 31, 34, and 36; and include the folate-biopterin transporters, ABC transporters, ATP-binding cassette protein, D-lactate dehydrogenase, branched-chain amino acid aminotransferase, amastin-like proteins, phosphoglycerate mutase, tartrate-sensitive acid phosphatase, mitogen activated protein kinase homolog, as well as numerous uncharacterized proteins. Genomic CNV of the atypical NE strains was very similar to that observed for the SE strains. On the other hand, only one gene, an amastin-like protein (LdBPK_341700) on chromosome 34, shows an average CN <1.5 in a majority of NE strains. Interestingly, low copy number genes were more prevalent in the SE population with 84% of all strains and clones exhibiting an average CN < 1.5 over all genes.

Fig 4. Scatterplots after reduction of semantic redundancy in enriched Gene Ontology (GO) terms for unique SNP containing genes of northern and southern Leishmania donovani populations.
Removal of redundant GO terms was carried out using the web program REVIGO with C set at 0.7 [27]. Enriched GO terms are graphed in twodimensional semantic space with terms that semantically similar closer together. The semantic space units have no intrinsic meaning. Enrichment p-values are shown by circle color as indicated in the key to the right of each panel, while circle diameter indicates the frequency of the GO term i.e. general terms are larger. Numbers in each circle refer to the GO terms listed in S7 One striking difference between the NE and SE strains and clones is in the number of folate/biopterin transporter (FBT) gene(s) on chromosome 10 ( Table 3, S10 Table and S2 Fig).
Leishmania are auxotrophs for folic acid, and 14 different members of FBT family have been identified in L. infantum [29]. Several of these genes are known to play roles in parasite drug resistance and viability. Eight of the 13 L. donovani FBT homologs are located on chromosome 10, of which 7/8 are present in a tandem array. This chromosome is disomic in most leishmanial strains examined to date [16,[18][19][20]25]. Interestingly, three of the FBT genes present in this tandem array on chromosome 10 (LdBPK_100390, LdBPK_100400 and LdBPK_100410) show gene amplification in several NE (GR364sk, GR364sp and GR356) and Sudanese parasites (S10 Table). In addition, LdBPK_355160 on chromosome 35, an ortholog of L. infantum biopterin transporter 1, also a member of the FBT family, is amplified in the NE strain GR383 (CN = 5), but not the other NE isolates.
SE parasites exhibit the opposite trend for these three genes (LdBPK_100390, LdBPK_100400 and LdBPK_100410) on chromosome 10 showing loss of heterozygosity in eight, ten, and ten out of eleven SE strains, respectively (S10 Table). Loss of heterozygosity was also observed for one additional FBT gene on chromosome 10 (LdBPK_100380) in the SE strain AM553. Interestingly, two atypical NE strains, LDS373bm and GR364sk/cl.II, which group with the SE strains by SNP analysis also show loss of heterozygosity for the same three FBT genes on chromosome 10. The average haploid CN taking into account chromosome somy for these three genes is 2.75 in the NE strains versus 1.05, 1.25 and 1.25 respectively in the SE strains (Table 3).
Genomic CN for LdBPK_100400 (L. infantum FT1 homologue) in the NE and SE leishmanial strains was also determined by qPCR (Fig 5) in 20 strains and clones using a novel dual priming oligonucleotide system [30]. qPCR tended to give higher CNs for this gene than found by computational analysis (cn.mops), however there was a good correlation overall between FT1 CN based on computational estimation with cn.mops [31] and qPCR (ρ = 0.91). Strain LDS373bm also showed loss of heterozygosity by qPCR, similar to what was found above by computational analysis.
In addition, both methods show significant difference in FT1 CN between the SE and NE populations. The mean FT1 CN for the two different methods and populations is as follows: qPCR; SE mean = 0.79, NE mean = 2.44, p = 0.00034; cn.mops; SE mean = 1, NE mean = 2.6, p = 0.00009 confirming the trend toward loss of heterozygosity in the SE and amplification in the NE strains / clones examined. FT1 is the main transporter for folate. Resistance to methotrexate (MTX) is correlated with reduced folate uptake [32,33], and CN for this gene was reduced in some resistant parasites [29,34]. Therefore, the effect of MTX on the viability of eight SE and NE strains/clones that vary in FT1 CN was examined (Fig 6). Several of the SE and NE strains examined also show CNV for other FBT genes on chromosome 10 flanking FT1 (S10 Table). All of the SE strains/ clones tested are single copy for FT1 and are significantly less sensitive (6-30% growth inhibition) to MTX (0.5 mg/ml) than the NE strains/clones (42-78% growth inhibition), p = 0.005; and a linear correlation (r 2 = 0.937) between CN, for the genes demonstrating CNV on chromosomes 10 (LdBPK_100380, LdBPK_100390, LdBPK_100400 and LdBPK_100410) and 35 (LdBPK_355160), and sensitivity to MTX was observed (Fig 6 and S11 Table). NE parasites are significantly more sensitive (p = 0.02 to 0.0009) to inhibition by MTX over a wide range of concentrations (33 to 900 μg/ml) when grown at limiting folate concentrations (S3 Fig). A high correlation between FT1 CN and sensitivity to MTX was found (Pearson's correlation coefficient ρ = 0.85, p = 0.007).
Plasticity in gene organization has been reported for several Leishmania species with the number of gene copies varying between isolates from the same species [35][36][37] and changes in gene dosage may be correlated with differences in protein expression [18]. For instance, the region on chromosome 10 containing LdBPK_100480, LdBPK_100510, LdBPK_100520 and LdBPK_100521 encodes a Zn-binding protein whose function is unknown, two tandem copies of gp63 and an uncharacterized protein, respectively (S8 and S9 Tables). This region is amplified (CN = 3) in 3/10 SE strains and one SE-like NE clone, GR364sk/cl.II that clusters by SNP analysis with the SE strains. All other strains are diploid for this region. Gp63 is a protease involved in parasite virulence and survival [38,39], and is frequently present on chromosome 10 in other species as a multicopy gene family e.g., L. infantum (LinJ.10.0490, 10.0500, 10.0510, 10.0520 and 10.0530) or L. major (LmjF. 10.0460, 10.0465, 10.0470 and 10.0480). Likewise, on chromosome 19 there are two glycerol uptake proteins (LdBPK_191340 and LdBPK_191350) that have an additional gene copy (CN = 3) in 9/10 SE strains and the SE-like clone (GR364sk/ cl.II). Interestingly, in other species these genes are part of a tandem multicopy gene family (L. infantum 7 genes, L. major 6 genes, L. braziliensis 8 genes) that may be involved in the remodeling of lipids on glycerol phosphoinositol lipid anchors. Amplification of the 48 kb H-region on chromosome 23 has been associated with drug resistance in vitro [40,41]. Part of this region is also amplified in some wild-type strains [40,41]. Deletions (CN = 1) or duplications (CN = 4) of part of the H-region (9 kb) were seen in several SE and NE parasites, respectively. The deleted region was found in 3/10 SE strains and contains the genes coding for the ABCthiol transporter (MRPA) (LdBPK_230290), involved in resistance to antimony [42], and argininosuccinate synthase (LdBPK_230300), involved in arginine synthesis [43]. Interestingly a similar region is amplified in 8 clones from 2 different NE patient strains, and contains the genes coding for argininosuccinate synthase (LdBPK_230300), a putative uncharacterized protein (LdBPK_230270), the Terbinafine resistance locus protein (Yip1) (LdBPK_230280), and the PTR1 gene (LdBPK_230310). These genes are present in the H-region and frequently amplified in some drug resistant cell lines [44], however they were unchanged, diploid, in all the other parasites belonging to NE and SE populations. This is similar to CN analysis of antimony resistant and sensitive L. donovani strains from Nepal where amplification was not observed for the H-region genes [16], even though MRPA, and in some cases the PTR1 gene, were shown to be amplified in naturally resistant parasites examined by other techniques [45,46]. Finally, CNV was also found in part of a 15.8 kb region located on chromosome 36 known as the MAPK locus (LdBPK_366740, LdBPK_366750, LdBPK_366760 and LdBPK_366770). Amplification of this region was found in antimony resistant L. donovani from Nepal and associated with higher gene dosage in drug resistant lines [16] [47]. Interestingly, CN of 3/4 genes found in this locus were significantly lower (p 0.5 x 10 −7 ) in the SE and SE-like NE strains/clones than the NE strains/clones (S8 and S12 Tables). No significant different in CNV between the NE and SE parasites, CN = 2 in 40/41 strains and clones, was observed in the case of the histidine secretory acid phosphatase (LDBPK_366770) which is considered part of the MAPK-locus and amplified in antimony resistant parasites [16]. The only exception was seen with the SE-like NE strain (LDS373bm) that show a complete deletion of LDBPK_366770, as well as LDBPK_366780 (CN = 0). When haploid gene CN is calculated, taking into account both gene CNV and chromosome ploidy, most of the differences between the NE and SE parasite populations (Table 3 and S13 Table) still remain even though chromosome polyploidy is statistically more common in SE parasites.

Discussion
Genome wide sequencing (WGS) and analysis of pathogens has proven widely useful for investigations on molecular epidemiology and evolution; genotype-phenotype associations; identification of genes involved in various biological processes such as drug resistance and virulence; as well as new targets for drug and vaccine development [16-18, 21, 48, 49]. Developments in next generation sequencing over the last two decades have provided a relatively low cost, fast pipeline for the exploration and comparison of Leishmania genomes. This study focused on comparison and analysis of WGS data from a large number of L. donovani strains and clones (n = 41) originating from fifteen VL patients in southern and northern Ethiopia. Previous studies on population genetics using multilocus microsatellite typing (MLMT) or individual gene sequences suggest that L. donovani is comprised of distinct populations associated with specific geographic regions in East Africa [7,15,50], and that African parasites are in large part distinct from those found on the Indian subcontinent [13,17]. Differences have also been documented in the parasites, sand fly vectors and host responses between these geographic regions e.g., sensitivity of antigen based serodiagnostic assays [51], clinical response to paromomycin [6], incidence of PKDL [52,53]. Our whole-genome sequence data confirms the presence of two very different populations of L. donovani in the region, exemplified by an absolute difference of~15,000 SNPs between the NE and SE populations. These parasite populations likely arose due to unique evolutionary pressures associated with local sand fly species, hosts, reservoirs, ecology, and other factors. A unique advantage of whole-genome data is that it gives us a comprehensive catalog of genetic variation that could underpin these adaptations.
When chromosome aneuploidy is analyzed, a picture appears suggesting great diversity among the Ethiopian strains within each population. This picture is unlike that shown for the Ethiopian reference strain LV9 where all 35 chromosomes, except for chromosome 31, were disomic [18]. This reference strain was extensively passaged in numerous laboratories since first isolated from a VL patient in 1967. Unlike the reference strain, all the isolates used in this study were rapidly cryopreserved and only briefly cultured under identical conditions prior to DNA purification for WGS, yet they still show unique, highly variable karyotypes compared to strains from the same geographic population or strains isolated from different organs of an identical patient. This indicates that chromosome aneuploidy, unlike SNPs, cannot be used to map leishmanial population genetics. Interestingly, MLMT analysis of L. donovani strains from Libo Kemkem, a previously non-endemic region in NE where an outbreak of VL occurred in 2004-2005, also demonstrated high genetic diversity among parasites isolated from patients, including a unique genetic group that shared several alleles with strains from SE [50].
Leishmania chromosome ploidy in individual cells can change rapidly in response to environmental conditions, even routine culture, resulting in mosaic aneuploidy [54], however in this study individual clones generated from patient strains generally showed similar karyotypes clustering together, and when examined, were highly similar to the parental patient isolate. These results suggest that the average karyotype for each strain is relatively stable i.e., minimally affected by culturing and/or cloning prior to DNA extraction. Downing et al [16] also reported that chromosome aneuploidy was stable in culture for 17 Nepalese L. donovani strains even though they also showed diverse karyotypes. This suggests that the karyotypes observed for the parasites in this study are probably very similar to the original patient isolate, assuming changes don't take place upon differentiation from intracellular amastigote to extracellular promastigote. However, this can only be confirmed by direct measurement of aneuploidy in parasites taken directly from patients without prior culturing, something not currently possible.
Similar to previous reports, chromosome 31 was supernumerary in all 41 Ethiopian strains and clones studied. This appears to be a defining characteristic in all Leishmania species examined so far [16,18,19,25,55]. As genes involved in iron metabolism and related functions are highly enriched on chromosome 31, it has been suggested the chromosome polyploidy arose to expedite iron uptake, and that expression of Iron-Sulfur proteins that are important in oxidation-reduction reactions, and synthesis of metabolites essential for parasite survival and growth [55].
One interesting finding is that strains concurrently isolated from different organs of identical patients, in the two cases examined, have significantly different karyotypes. Thus, while clones and/or parental strain from spleens of each patient clearly grouped together, they clustered separately from clones originating from the skin of the same patient. While changes in specific chromosome ploidy associated with parasite tropism were not identified, these results suggest that the aneuploidy patterns observed are a result of parasite origin (spleen or skin), and the differing conditions, perhaps temperature or host immune responses, to which the parasite is exposed. SNP analysis, similar to MLMT [15], clearly shows that the Ethiopian L. donovani strains group, in large part, into two main populations, NE and SE, delineated by geography rather than clinical history (VL or HIV-VL co-infection; spleen or skin). Interestingly, the NE population appears to be more polymorphic than the SE population (Figs 2 and 3), reflecting the finding from MLMT data that inbreeding is higher for SE strains than NE strains [15]. In most cases, clones generated from an individual strain (GR363sp/sk, GR364sp/sk, GR383, GR356, AM560, etc), show more limited genetic polymorphism than that observed between strains, generally clustering together regardless of the method used for analysis (chromosome aneuploidy, SNPs or gene CNV). This was the case even when the patient strain(s) were isolated from different organs, such as skin and spleen, of the same HIV-VL co-infected patient, showing that the genotypes present in visceral organs can spread systemically in immunosuppressed patients to the skin where they get transmitted to sand flies. Only in one case, GR364sk/cl.II, did a cloned line fail to cluster with other clones generated from the patient strain. Instead, this clone grouped with SE strains both by SNP and CNV analysis. While contamination during cloning can't be ruled out, parasites isolated from HIV co-infected patients have been shown to be more polymorphic than those isolated from patients with VL, and differences following patient treatment have been noted [56][57][58]. The chromosome karyotype and SNPs for this clone are distinct from all the SE strains suggesting that contamination, if it occurred, did not take place during generation of the cloned line.
SNP analysis also identified three strains, AM422, AM553 and LDS373bm, that didn't fall, as expected, together with their respective geographic genotypes, SE and NE. LDS373bm and LDS373sp were isolated in parallel from different organs, bone marrow and spleen respectively, of the same HIV co-infected patient. The latter parasite (sp) is genetically similar to other parasite strains belonging to the NE population, while the former (bm) belongs to another genotype. These parasites are also different by k26 PCR typing of the HASP B repeat region [7]. This patient was apparently infected by at least two genotypes, with the genotype present in the bone marrow perhaps less virulent and only surviving in immune suppressed hosts. While several reports using MLMT, multilocus sequence typing and kDNA RFLP show that HIV-VL patients can be sequentially infected with genetically different parasites [58][59][60], to our knowledge this is the first time that an HIV-VL patient was shown to be simultaneously infected with two genetically different parasites. The amplicon (290 bp) seen for LDS373sp was typical of most NE strains examined (37/41), while for LDS373bm it was larger (410 bp), and observed in 4/41 NE strains, all HIV-VL co-infected patients. WGS of other parasites exhibiting the 410 bp amplicon was not carried out. It would be interesting to analyze more of these parasites and see if they form a separate genetic group. AM422 and AM553, both isolated in SE, fell outside the main NE and SE populations when SNPs were analyzed by two methods (Figs 2 and 3). Neither of these isolates was from patients co-infected with HIV. AM422 showed a k26 PCR amplicon typical of NE strains (290 bp) and originated in the Omo valley near the border with South Sudan, while AM553 had a unique k26 amplicon (360 bp). The unique SNP profiles for these two isolates suggest that additional genotypes are circulating in SE, perhaps a result of the more varied ecology in this region.
Gene CNV analysis also identifies differences that are typical of each geographic parasite population, NE and SE. Candidate genes that can be attributed to essential biological processes like drug resistance, virulence and parasite viability demonstrate differential CN among SE and NE strains and clones. While it is not clear which environmental and host factors resulted in the selective amplification of different genes in the NE and SE populations, many of them are essential genes important for parasite survival. Amplification or deletion of specific genes may give the parasites a growth advantage in the sand fly vector or human host. In the NE parasites, there are three times more copies of folate/biopterin transporter (FBT) genes on chromosome 10. Leishmania are folic acid auxotrophs, and LDBPK_100400 is a homologue to the Leishmania infantum FT1 transporter that was defined as the main folate transporter in this Leishmania species. It is also known that FT1 transporter expression is upregulated in log phase of promastigote stage [34,44], the stage found in sand fly midgut. Therefore, increased folate concentration in sand fly midgut may result in better parasite growth in the vector and provide the parasite with a better chance for survival and infection.

Ethics statement
This study was conducted according to the Helsinki declaration, and was reviewed and approved by the Institutional Review Board (IRB), Medical Faculty, Addis Ababa University. Written informed consent was obtained from each adult study participant.

Parasites origin and patient pathology
For this work 18 L. donovani strains isolated from 15 patients with VL in Ethiopia during the years 2009-2010 were selected for WGS (S1 Table). The selection was based on three criteria: 1) geographical origin (northern or southern Ethiopia); 2) Patient's pathology such as HIV/VL co-infection versus VL; and 3) source of parasites, skin versus spleen. Parasites were cultivated in M199 medium with supplements and rapidly frozen [7]. Additional Ethiopian (GR373 [7]) and Sudanese (LEMS3570, kindly provided by Prof. Patrick Bastein, National Reference Center of Leishmania, University Hospital Centre of Montpellier, France) strains were included in analysis of FT1 copy number, and are also listed in S1 Table. All parasites used in this study were characterized by ITS1 -, cpb-and k26-PCR (

Cloning
Eight patient strains were cloned prior to DNA extraction for WGS. The cloning procedure was carried out essentially as described [61].

Next generation sequencing and sequence analysis
DNA was purified from Leishmania promastigotes that were harvested in their stationary growth stage in~20ml M199 medium. DNA extraction was carried out as described by [7] Genomic DNA was sheared into 400-600-base pair fragments by focused ultrasonication (Covaris Adaptive Focused Acoustics technology (AFA Inc., Woburn, USA)) and standard Illumina libraries were prepared. 100 base pair paired end reads were generated on the HiSeq 2000 v3 according to the manufacturer's standard sequencing protocol [62]. Raw sequence data was deposited in the European Nucleotide Archive with the accession number ERP016010.
Sequence reads were mapped against the reference genome Leishmania donova-ni_21Apr2011 [16] using SMALT (version 0.7.4 https://sourceforge.net/projects/smalt/) to produce bam files. SMALT was used to index the reference using a kmer size of 13 and a step size of 2 (-k 13 -s 2) and the reads aligned. Reads were mapped if they had an identity of at least 90% to the reference genome and mapped uniquely to the genome. Reads in pairs were mapped independently, and marked as properly paired if they mapped in the correct orientation no more than 1.5 kb apart. PCR duplicate reads were identified using Picard v1.92(1464) and flagged as duplicates in the bam files.

Prediction of chromosome aneuploidy
Aneuploidy was predicted based on whole chromosome median read coverage. For the normalization of median read coverage over all 36 chromosomes for a given strain, the average median coverage of four stable diploid chromosomes (chromosome 30, 32, 34 and 36) was calculated and taken as the mean read coverage for a diploid chromosome (DC mean ). These four chromosomes have been previously shown to be diploid in almost all L. donovani isolates examined, and have been used for prediction of chromosome copy number [18,20]. The predicted chromosome copy number is calculated as the fold change compared to DC mean . This prediction was applied to all 41 L. donovani sequences over 36 chromosomes, and saved in a 41 Ã 36 chromosome copy number matrix. We used R (version 3.1.3) [63] to evaluate of similarities and differences between strains and clones, and their chromosome aneuploidy patterns and computed a heat map over the chromosome copy number matrix with R heatmap.2() function from the gplots (version 2.17.0) R package. The numerical data for the creation of the heat map (Fig 1) is given in the supporting information (S2 Table). Median chromosome ploidy was used to compare average ploidy between NE and SE strains in Table 1. For isolates with multiple clones, average chromosome ploidy was calculated by determining the median somy of all the clones from an individual strain separately for each of the 36 chromosomes (S2 Table).

SNP analysis
For the identification of population typical SNPs a procedure was implemented based on SNP and indel calling with the Genome Analysis Toolkit (GATK version 3.1) [64], VCFtools (version 0.1.12b) [65] and a custom R script based on Bioconductor packages (www.bioconductor. org) (Release (3.2)). The parameters for the first filtering procedure with GATK were set as follows: Emission confidence threshold > 10, Calling confidence threshold >50, readdepth > 500. A calculation of level of similarity between all 41 samples based on SNP profiles was computed with VCFtools, and processed with R based procedures into a similarity matrix. Finally, this matrix was visualized with heatmap.2() function from the gplots R package. Principal component analysis (PCA) following pruning of SNPs with high linkage disequilibrium was run using the Bioconductor package SNPRelate [26]. For the creation of unique SE or NE SNP profiles, only SNPs that were identified in >4 or >7 of the SE or NE clones and strains, respectively, were taken for final analysis. This protocol created a consensus SNP profile for each parasite population, as repetition of SNPs in more than 1/3 of the strains and clones for each population supported the accuracy of SNP calling and served as an additional quality control step. As a final step, unique SNP profiles were detected with VCFtool for each population. Unique SNPs present in coding regions that affect protein translation, namely non-synonymous or nonsense mutation(s) that change the amino acid or cause a stop codon were identified by a self-implemented R procedure based on Bioconductor packages. The mean absolute number of SNPs was compared between the SE and NE parasite populations, as well as to the Indian reference strain. Mean absolute SNPs for an individual patient isolate was calculated by averaging the SNPs from all clones of the respective strain. Variation in the number of SNPs between clones of an individual isolate was small, and % standard error varied from = 0.34-6.0 x 10 −3 , except for GR364sk (% s.e. = 0.064) when the atypical clone GR364sk/cl.II was included. In the absence of the atypical clone the % s.e. = 0.001.

Prediction of gene copy number variation
The detection of copy number variations (CNV) and aberrations was done using the R-bioconductor [66] (www.bioconductor.org) package cn.mops (version 1.16.1) (Copy Number estimation by a Mixture Of PoissonS). cn.mops detected copy number variation as the normalized read depth variation at a certain genomic position over all 41 strains and clones. For that cn.mops calculated the read count matrices across all BAM files. For this analysis, the genomic window length was set to 5 Kbp and used as a sliding window for the prediction of genomic CNV. Therefore, a genomic position of certain strain or clone is considered as one with "CNV" if it shows a significant change in normalized read depth (> or < two i.e., diploidy) in a window length of 5 Kbp compared to other strains or clones at the same given genomic position. Further analysis genomic CNV that is localized in coding regions was carried out.

Drug sensitivity
Parasites 2 x 10 6 / ml were added in triplicate to 96-well plates and incubated with 0.5 mg/ml MTX (Sigma catalog number M8407) and in non-drug treated medium in final volume of 200 μl for 66 hours at 26˚C. AlamarBlue (25 μl/well; AbD Serotec) was added and the viability was measured after four hours (λ ex = 544; λ em = 590; Fluroscan Ascent FL, Thermo) [67]. The percentage of killing was calculated as the fraction of fluorescence level of MTX treated wells compared to the non-drug treated wells.

Quantitation of folate transporter 1 (FT1) gene copies by qPCR
The polycistronic folate/biopterin transporters (FT) on chromosome 10 have high DNA sequence similarity [29]. FT1 (LDBPK_100400) dual priming oligonucleotides (FT1-DPO) were used to specifically evaluate genomic CN for this gene by qPCR. FT1-DPO primers, forward FT1-DPO 5'-CGCCAGAACCCGAAGCCTGIIIIIGCACTGG-3' and reverse FT1-DPO 5'-GTTCATCACAGTCGCGATGAGTIIIIIAATCATTATG-3', were designed to include a polydeoxyinosine linker (IIIII) [30] that allows the specific primer annealing to the LDBPK_10 0400 (FT1 ortholog) gene, and not to the other LD FT homologues. Specificity of the FT1-DPO PCR product was confirmed by cloning and sequencing of the amplicon. The L. donovani housekeeping gene alpha-tubulin was used for normalization. The qPCR conditions were the same for both FT1-DPO and housekeeping genes, and was carried out as follows: DNA (~10-20 ng) or no DNA control was added to HRM PCR Kit reaction mix (10 μl, QIAGEN GmbH, Germany) containing the FT1-DPO primers (1 μM each final concentration), and ultra-pure PCR-grade water (final volume 25 μl/PCR). Amplification conditions were as follows: 3 min denaturation at 95˚C, followed by 40 cycles of denaturation 1 s at 95˚C; annealing 20 s at 55˚C; and extension 1 s at 65˚C. HRM Ramping was carried out at 0.2˚C/s from 65 to 95˚C. HRM PCR and analysis were performed using a Rotor-Gene 6000 real-time PCR analyzer (Corbett Life Science, Australia). All reactions were carried out in duplicate and a negative-control reaction without parasite DNA was included in each experiment. For the calculation of the FT1 relative copy number (CN rel ) the threshold (Ct) for all FT1 amplified samples were compared with their corresponding alpha-tubulin amplified samples as followed: CN rel = 2 Ct(alphatubulin)-Ct(FT1) . The CN rel was further normalized based on the mean of the six lowest predicted relative CN. The mean value was considered as GCN = 1. A crosshatch titration on two parasite clones (SE-AM560/cl.IV and NE-GR383/cl. XIII) is shown below. Parasites were cultured in medium containing increasing concentrations of folic acid (FA) from 0 to 28.5 μg/ml (x-axis). To each of the culture conditions, increasing concentrations of methotrexate (MTX) from 0 to 1800 μg/ml (y-axis) was added, and parasite growth measured after 70 hours. The measurement of live promastigotes was carried out with alamarBlue 1 assay as described by Shimony and Jaffe [67].