SOLiD™ Sequencing of Genomes of Clinical Isolates of Leishmania donovani from India Confirm Leptomonas Co-Infection and Raise Some Key Questions

Background Known as ‘neglected disease’ because relatively little effort has been applied to finding cures, leishmaniasis kills more than 150,000 people every year and debilitates millions more. Visceral leishmaniasis (VL), also called Kala Azar (KA) or black fever in India, claims around 20,000 lives every year. Whole genome analysis presents an excellent means to identify new targets for drugs, vaccine and diagnostics development, and also provide an avenue into the biological basis of parasite virulence in the L. donovani complex prevalent in India. Methodology/Principal Findings In our presently described study, the next generation SOLiD™ platform was successfully utilized for the first time to carry out whole genome sequencing of L. donovani clinical isolates from India. We report the exceptional occurrence of insect trypanosomatids in clinical cases of visceral leishmaniasis (Kala Azar) patients in India. We confirm with whole genome sequencing analysis data that isolates which were sequenced from Kala Azar (visceral leishmaniasis) cases were genetically related to Leptomonas. The co-infection in splenic aspirate of these patients with a species of Leptomonas and how likely is it that the infection might be pathogenic, are key questions which need to be investigated. We discuss our results in the context of some important probable hypothesis in this article. Conclusions/Significance Our intriguing results of unusual cases of Kala Azar found to be most similar to Leptomonas species put forth important clinical implications for the treatment of Kala Azar in India. Leptomonas have been shown to be highly susceptible to several standard leishmaniacides in vitro. There is very little divergence among these two species viz. Leishmania sp. and L. seymouri, in terms of genomic sequence and organization. A more extensive perception of the phenomenon of co-infection needs to be addressed from molecular pathogenesis and eco-epidemiological standpoint.


Introduction
The family of Trypanosomatidae belongs to the order of Kinetoplastida protozoa. These are the most primitive organisms in eukaryotic evolution to have mitochondria and peroxisomes [1]. These parasites have different life cycles that involve one host (monoxenous) e.g. Leptomonas, or two hosts (heteroxenous) e.g. Leishmania. The latter involves an invertebrate that acts as a vector between other vertebrates or plants. During differentiation in the insect gut and in culture, these kinetoplastid protozoans appear as promastigotes (Leptomonas and Leishmania), and amastigotes (Leishmania), which develop in the mammalian host macrophage and causes disease. The ancestral form of Leishmania was Leptomonas, an organism living solely in the invertebrate host and transmitted by the ingestion of resistant forms (cysts) expelled with the excreta of the host [2]. Unlike Leptomonas, Leishmania produces no resistant cysts capable of development in the invertebrate host and has adapted to a life cycle alternating between invertebrate and vertebrate hosts [3,4].
Leishmania, a trypanosomatid protozoan parasite of humans, causes a wide spectrum of clinical disease referred to as leishmaniasis. Leishmaniasis represents a global health problem and is prevalent in Europe, Africa, Asia and the Americas; up to twenty million people are infected and half a million are affected by the lethal VL (www.dndi.org). In the Indian sub-continent, visceral leishmaniasis or Kala Azar (KA) as it is popularly known, is caused by Leishmania donovani and transmitted by the sandfly of genus Phlebotomus argentipes. One hundred fifty million people are living with the risk of VL in the Indian subcontinent (India, Nepal, and Bangladesh) [5].
Indian Kala-azar (VL) has a unique epidemiological feature of being anthroponotic; human are the only known reservoir of infection [6]. Localized cutaneous leishmaniasis (LCL) in India is mostly due to Leishmania tropica and is endemic in the deserts of Rajasthan [7,8]. Few cases of LCL among travelers have been documented in other Indian states such as Kerala [9], Assam, and Haryana, which are not disease-endemic areas [8]. A recognized endemic focus of leishmaniasis in Satluj river valley in Himachal Pradesh has been reported [10]. This endemic focus of leishmaniasis appears peculiar where localized cutaneous leishmaniasis (LCL) co-exists with visceral leishmaniasis (VL), and Leishmania donovani is predominant pathogen for LCL whereas only a few cases have been due to Leishmania tropica. P. longiductus a known vector for L. infantum, is the main vector in this endemic focus. L. donovani infantum causing both cutaneous and visceral leishmaniasis and K39 seroprevalence in dogs (known reservoir for L. infantum) have been reported for this region [11].
During 1993-1994, scientists from developing and developed countries planned and initiated a number of parasite genome projects and several consortiums for the mapping and sequencing of these medium sized genomes were established. Genomes of three Leishmania species, which were cultivated in the laboratory (L. major, L. infantum, L. braziliensis and L. mexicana) have been sequenced [12]. The New World parasite L. braziliensis is the causative agent of mucocutaneous leishmaniasis, whereas the Old World species L. major and L. infantum, which are present in Africa, Europe and Asia, are parasites that cause cutaneous and visceral leishmaniasis, respectively [4]. It has been reported that L. donovani is genetically distinct from L. infantum [13]. To add to this plethora of knowledge, we undertook whole genome sequencing of clinical isolates of L. donovani believed to be causing Kala Azar in India. The genome analysis can provide insights into the functional characteristics of the visceral manifestation of the disease and also provide an avenue into the biological basis of parasite virulence in the L. donovani complex prevalent in India in comparison with the other species of Leishmania sequenced. The genome of L. donovani from India has so far not been sequenced.

Clinical Isolates
During the year 1998-2000, resistance to the widely used antimonial drug sodium antimony gluconate (SAG) had reached alarming heights in India [14]. At this time we had cultivated many isolates collected from the eastern region of India. Clinical isolates from confirmed patients of Kala Azar from endemic zone of Bihar and Uttar Pradesh were collected as splenic aspiration performed by our authorized clinical collaborator and co-author Dr Shyam Sundar, with prior written consent of the patients. Institutional Review Board (Banaras Hindu University, Varanasi) approved the study. The diagnostic criteria for Visceral Leishmaniasis (VL) were the presence of LD bodies (Leishman Donovan) in splenic aspirations performed and graded as per standard criteria [15]. Isolate 39, used for whole genome sequencing in this study, was isolated on 28.05.2000 from Muzaffarpur, Bihar from splenic aspirate of a patient who did not respond to SAG therapy, whereas isolate 2001, isolated on 01.02.2000, from Balia, Uttar Pradesh, responded to SAG therapy. Isolate Ld BHU 1095, responsive to amphotericin B was collected relatively recently from Muzaffarpur in Bihar on 31.07.2010. Splenic aspirates were collected and adapted to culture as described [16]. The virulence and level of susceptibility or resistance of these isolates was confirmed in vitro and in vivo, by infection in experimental animals as described [17]. Species identity of these promastigotes was confirmed to be similar to the donovani in the sequence based RFLP of their single-copy proteincoding gene, N-acetylglucosamine-1-phosphate transferase [18]. The isolates used in our study have also been the subject of various studies in leishmaniasis by many groups worldwide. In our hands (16,17) and also with others who have worked and published on these two isolates, they served as an excellent model of visceral leishmaniasis, producing typical clinical outcome consistently, including invariable death and splenic LD loads upon infection in hamsters of up to 10 10 per heavily infected spleen.

Solid DNA Sequencing
The genomic DNA of the parasites was isolated using the QIAamp DNA isolation kit (Qiagen, Catalog No. 51104). Sequencing runs were done using cycled ligation sequencing on a SOLiD TM Next Generation Sequencer (Applied Biosystems, India). Mate pair library approach was utilized thus generating data for two tags denoted as F3 and R3. Two segments of the Quad slide format were utilized to generate the data for each sample. Approximately, 5 ug of purified genomic DNA was sheared for a mate paired library with insert size between 1.5-2 Kb The blunt-ended ligation of sheared DNA was carried out to convert DNA with damaged or protruding ends to phosphorylated, blunt-ended DNA. After that LMP CAP ligation was performed to add the LMP CAP adaptors to the sheared, end repaired DNA. Size-selection was performed after CAP adaptor ligation to remove unbound CAP adaptors. The sheared DNA was

Mapping and Assembly
The mapping module of the standard Resequencing workflow of BioScope v 1.3 software was utilized which has an algorithm designed on seed and extend approach of mapping. A quality value is associated with each alignment. The quality value estimates the probability that the alignment is correct. Output of pairing was a BAM file (Binary format for Sequence Alignment Map) which stores the read alignment in coordinate order. The colorspace reads from SOLiD sequence were aligned to the reference genomes. Reads from each of the isolates were mapped to the reference genome separately. Reads unmapped were identified. Reads with low complexity characteristics, containing homopolymer tract, at least four repeats of the same di nucleotide or tri nucleotide in a row, were removed from the data set before further analysis. Although these reads maybe representing true genomic regions, however, the inherent difficulty in assigning them to a particular genomic region limits their value. This is an inherent problem with short read data of SOLiD sequencing system. Summary of pairing results sequencing quality parameters were ascertained by Samtools v0.1.6. De Novo Analysis V 2.0 software was utilized and assembly was performed by velvet assembler. The estimation of genome size by k-mer frequency distribution analysis was done.
As a first step, tags (F3 and R3) of each of the three genomes were mapped to the reference genome Leishmania infantum (LinJ), which was downloaded from the following ftp site: ftp://ftp.sanger.ac.uk/ pub4/pathogens/Leishmania/infantum/V52010/artemis/EMBL/ Linfantum/1/with a total genome length incl. Gaps = 32,126,170 bp. The mapping module of the standard Resequencing workflow of BioScopeTM v1.3 software was utilized which has an algorithm designed on seed and extend approach of mapping. A quality value is associated with each alignment. The quality value estimates the probability that the alignment is correct. Output of pairing was a BAM file (Binary format for Sequence Alignment Map) which stores the read alignment in coordinate order.
Phylogenetic Analysis on the Basis of GP63 Gene GP63 gene sequences from Ld39, BHU1095 and Ld2001 were obtained from annotated assembled contigs. The GP63 gene sequences of L. infantum and L. donovani Nepal strain Ld_BPK282A1 were obtained from Ensembl and Leptomonas was identified using homology based method. A phylogeny of these strains was carried using GP63 gene sequences by multi-sequence alignment using CLC genomics workbench v5.1 with gap open cost 10, gap penalty score of 1. Phylogenetic tree was measured by using the bootstrap method with 1,000 replicates; for more computationally intensive ML trees, we used 100 bootstrap replicates by UPGMA methods.
Phylogenetic Analysis on the Basis of ITS Gene ITS gene sequences from Ld39, BHU1095 and Ld2001 were obtained from annotated assembled contigs. The ITS gene sequences of L. infantum were obtained from Ensembl, L. donovani Nepal strain Ld_BPK282A1 and Leptomonas was identified using homology based method. A phylogeny of these trains was carried using ITS gene sequences by multi-sequence alignment using CLC genomics workbench v5.1 with gap open cost 10, gap penalty score of 1. Phylogenetic tree was measured by using the bootstrap method with 1,000 replicates; for more computationally intensive ML trees, we used 100 bootstrap replicates by UPGMA methods.

Accession Codes
This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession ANFN00000000. The version described in this paper is the first version ANFN01000000 for Leishmania donovani Ld 39; ALJU00000000, the version described in this paper is the first version, ALJU01000000 for Leishmania donovani Ld 2000; ANAF00000000, the version described in this paper is the first version, ANAF01000000 for Leishmania donovani BHU 1095.

SOLiD Sequencing Reads
The total number of 35-bp reads for each isolate were Ld 2001 219610 7 , Ld 39 220.4610 7 , BHU 1095 229.7610 7 yielding approximately 200 fold coverage for each of the three genomes with the assumption that all the data was usable (Table 1). Using De Novo Analysis V2.0 software, assembly was performed by velvet assembler at a kmer of 31, full set of data was utilized for this analysis. Results of final outcome of assembly in contig and scaffold file in base space was obtained ( Table 2).      fold coverage based on the raw number of reads obtained for each isolate genome, some number of unmapped reads were also obtained which is a limitation of short read technology. There is also a possibility that these maybe derived from regions of low complexity in the sequenced genomes. Remaining umapped reads which are not of low complexity, could be truly unique sequences or errors in the sequencing system. In an attempt to identify between these two possibilities, mapping the reads in SOLiD color space using MAQ was employed to identify orthologs. Largest number of matches were found with genomic sequence of Leptomonas seymouri.
L  Tables 3 and 4, it's clear that raw reads of BHU1095, Ld_39 and Ld_2001, covers more than 90% of the Leptomonas genome whereas less than 1% of the Ld_BPK282A1 genome (L. donovani Nepal strain) indicating that there is a presence of Leptomonas and Leishmania, and Leptomonas is predominant.
It has been noticed that with very high coverage such as used here, at times the data actually becomes harder to interpret as the number of chimeric clones and errors start to generate a large amount of noise. Therefore we used a subset of the data, 10 million reads were selected from the total data set of sample BHU 1095, this being the most recent of the clinical isolates to have been collected from patient in endemic region, and denovo assembly was performed using velvet 1.1.04 at kmer 25 and kmer31. The contigs from both the kmers were merged and final assembly was performed using CAP3. The assembly contained 1,27,259 contigs (N50 contig size of 159 bp) and total genome length of 19.3 Mb. For downstream analysis we proceeded with 1,27,259 contigs. The cut-off of 500 bp was applied for filtration of contigs resulting in 1485 contigs. BLASTN against nr database was carried out for 1485 contigs, out of which 73 contigs had a BLAST result. 44 contigs out of 73 had hit against Leishmania.
The data from different platforms concluded that all the three samples were having a major portion of Leptomonas genome and small percentage of Leishmania genome. We then carried out Hsp70 PCR-RFLP [20] and confirmed mixed infection, signals of both Leptomonas spp. and Leishmania donovani were obtained in the two isolates Ld 2001 and Ld 39 whereas Ld BHU 1095 showed L. donovani pattern (Fig. 1). It is clear from this gel that Ld 2001 and Ld 39 contained a mixture of Leptomonas and Leishmania DNA (with considerably more of the former). Both these isolates did not appear to be a perfect match for either Leptomonas or Leishmania    traditional multilocus typing methodology is not indicative of revealing complete genetic structure. Presently, we are also undertaking sequencing of amastigote proteins of our samples. Preliminary experiments using LC MALDI of Ld 2001 identified 100 proteins with L. major database (unpublished results). Possibly, these are Leptomonas proteins that have peptides similar enough to L. major. When the search was done using Leptomonas proteins, (predicted from our assembled genome of Ld 2001, which has been confirmed by us through whole genome sequencing, to be Leptomonas seymouri like, we obtained 236 peptide hits which is much more higher than what we got in L. major. This information will prove helpful subsequently for validation of the Leptomonas gene prediction/annotation on our samples. The possibility of contamination of our cultures is highly unlikely. There has never been any Leptomonas culture in our or clinical collaborator's laboratory. The original isolates itself must be co-infection. Using PCR-RFLP, co-infection of Leptomonas in splenic aspirates of kala azar patients in India, is also being reported by many other groups too [21,22], however, as our results point out in the case of isolate BHU 1095, PCR RFLP may not be foolproof in differentiating between the genera genera of Leptomonas with Leishmania. Using 18S rRNA gene sequencing, it has been reported [20] that 7% isolates of Kala Azar were similar to Leptomonas sp in Bihar region. However, we are of the opinion that 18S rRNA gene sequencing is not the absolute indication of correct detection of Leptomonas sp. parasites in clinical isolates of Kala Azar patients as it has been well established that overall DR structure [the maxicircle control region, also termed divergent region (DR)] is quite conserved in the species of Leishmania -Leptomonas group (the slow evolving 18S rRNA sequences). In this aspect our study is of prime importance in unequivocally establishing by whole genome sequencing the presence of Leptomonas in clinical isolates of Kala Azar in India. Our study has made an important contribution in generating whole genome sequencing which can be developed by researchers into interesting evolutionary biology analysis.
The genomes of various Leishmania parasites contain tandemly arrayed genes encoding an abundant 63-kDa surface glycoprotein called GP63 and present in all insect and plant trypanosomatids. Even though the three clinical isolates of Leishmania donovani used in the present study have major portion of the Leptomonas genome, yet using these isolates, the taxonomic status of L. donovani and L. infantum, as discrete species has been established by phylogenetic analysis (Fig. 2; Table 5). The phylogenetic analysis of GP63 genes shows that Ld39, Ld2001, BHU1095 are closely related to Leptomonas. Leishmania infantum and L. donovani Nepal strain Ld_BPK282A1 fall in same clade.
We also analyzed genetic diversity based on the amplification of the internal transcribed spacers (ITS), located within the rRNA gene array. ITS sequences are used to generate information useful for phylogenetic reconstruction and molecular evolution studies. It is clear from the ITS phylogenetic tree (Fig. 3; Table 6) that Ld 2001, Ld 39 and Ld BHU 1095 are closely related. L. infantum is most distantly related from the three. The phylogenetic analysis of ITS genes based on the distance (,0.5) shows that Ld39, Ld2001, BHU1095 are closely related to Leptomonas. Leishmania infantum and L. donovani Nepal strain Ld_BPK282A1 belong to different clade of Leishmania.

Discussion
The occurrence of Leptomonas along with Leishmania amastigotes in splenic aspirates of Kala Azar patients is a unique phenomena worth exploring. Srivastava et al [20] have attributed this to the immune system depression in the patient to explain the opportunistic parasitism by this trypanosomatid. We put forth certain questions which need investigation.
Flagellates of the family Trypanosomatidae fall into two natural groups. The primitive genera is Leptomonas which is confined to invertebrates, and the more advanced genera is Leishmania which uses both vertebrate and invertebrate host [23]. The monogenetic species Leptomonas are also known as ''lower trypanosomatids'' because the digenetic genera Leishmania are thought to have arisen from a monogenetic ancestor [24] and are parasitic in arthropods mainly in Insecta and Diptera [1]. The parasites are found in various sections of the alimentary tracts of infected insects, and transmission is assumed to largely follow contaminative pathways [2]. The morphology of the insect-inhabiting stages of the pathogenic digenetic species resembles that of the monogenetic species. Leptomonas shares a promastigote stage of development with Leishmania. These lower trypanosomatids are characterized by ease of cultivation and less fastidious nutritional requirements than Leishmania. Promastigotes of the clinical isolates in this study transformed into amastigotes and survived in cultured macrophages as well as in experimental hamsters [16,17]. Promastigotes of Leptomonas costoris, a kinetoplastid parasite of water striders, transformed into amastigotes but did not survive in cultured macrophages [25]. So does our study indicate that Leptomonas is influencing the pathogenesis of leishmaniasis? Is the distribution of Leishmania strains containing Leptomonas limited to specific eastern region of India, or occur elsewhere, also needs to be ascertained.
The results of our study put forth certain debatable issues. In India for VL in Bihar region, anthroponotic transmission with no intermediate host has been observed [26]. But our study now questions the fact that is VL or KA in India actually not anthroponotic as is believed till today? The source could be zoonotic, dogs could be the reservoir. The theory of canine origin of human kala azar was postulated by Nicolle as way back as in 1908 [27]. At that time dogs were found infected in nearly every endemic centre of human kala azar except India. Human and canine Kala Azar of the Mediterranean region is transmitted by the dog flea (Ctenocephalus canis) and perhaps also by the human flea (Pulex irritans) [28]. Small proportion of the dog fleas in many regions harbor a natural parasite Herpetomonas ctenocephali [29]. A possible case of human infection by Herpetomonas has been reported [30]. The sites of natural leptomonad infections in dog fleas are typically the hindgut and rectum, but in many insect groups, salivary glands and hemocoel have been reported to be infected [31]. Opportunistic infection with an insect trypanosomatid Leptomonas pulexsimulantis, a trypanosomatid found in the dog's flea was diagnosed in an HIV positive patient presenting a clinical picture of visceral leishmaniasis co-infection [32]. The presence of Leptomonas of the dog's flea in an HIV positive patient reinforces the idea that humans under immuno suppression conditions may be vulnerable to other insect trypanosomatids giving rise to clinical manifestations similar to leishmaniasis. Feces deposited by infected adult fleas are usually well supplied with amastigotes which retain their infectiousness after drying [30]. To facilitate transmission, these flagellar cysts, known as straphangers are capable of long term survival in adverse conditions. Dedet et al. [33] reported the first human case of Leptomonas infection in an HIV-infected patient. Amastigote forms were found to be present in the bone marrow aspirate of the HIV positive patient and these parasites grew in culture as promastigotes. However, infection in laboratory animals could not be established [34]. Monoxenous trypanosomatids can be pathogenic for human beings. Similar cases have also been reported [35,36] where in HIV infected patients, Leptomonas parasites were detected with symptoms sometimes resembling those of visceral or cutaneous leishmaniasis. The present article also sustains the possibility of lower trypanosomatids infecting humans exists and should be considered by attending physicians. Leptomonas are opportunistic parasites with human infection possibly occurring per os, and so through our study we raise the question whether Kala Azar is leptomoniasis in some clinical cases in India? Due to morphological similarity and cross-reactivity with Leishmania species, human cases of infection with these lower trypanosomatids may have been underestimated. Or on the other hand, these clinical cases indeed represent Leishmania-Leptomonas co-infection. A Leptomonas of insect origin was highly susceptible to several standard trypanocides and leishmanicides in vitro and easily grown in defined media [37].
The important link then in the transmission of infection i.e. the sandfly vector, also needs to be questioned. Natural infection of Phlebotomus longipalpis by Leptomonas, in a focus of kala-azar has been reported [38]. Co-infection of Leishmania and Leptomonas in some sandflies in Nepal was confirmed by their rDNA signature [39]. Results of several studies have shown that Phlebotomus argentipes, the only known vector for Leishmania donovani in the Indian subcontinent, prefer to feed on both bovine and human blood [40]. Being a preferable host for P. argentipes, cattle was shown to play an undecided role in several epidemiological studies in the Indian subcontinent [41]. Chakravarty et al. [42] surveyed 64 cows along with dog, but could not find any amastigotes based on direct observation of smears from peripheral blood, liver, spleen, and bone marrow. On the other hand, Leishmania DNA was detected in several domestic animals including cattle from an endemic area in Nepal [43]. Studies conducted in Bangladesh to investigate the role of any domestic animal in VL transmission [44] shows that cattle are not a reservoir host for L. donovani despite its preference by P. argentipes as blood source. Ecological conditions should also be considered, changes in habitat associated with human development might create conditions suitable for establishment of anthroponotic cycles of infection with parasites which otherwise so far had been regarded as only monogenetic parasites of invertebrates.
Considering that Leptomonas was present in the splenic aspirate at time of collection, a question comes to mind that whether it could it be a hybrid profile (through genetic exchanges by recombination between the two species). P. argentipus, sand fly species might be transmitting both parasites? Co-infection of Leishmania and Leptomonas in some sandflies in Nepal has been established [39]. Leishmania parasites are capable of having a sexual cycle consistent with meiotic processes inside the insect vector [45]. Hybrid genotypes have been observed in field isolates involving most Leishmania species [46][47][48][49]. With the formation of genetic hybrids [50] new foci of disease may emerge as the hybrid progeny are transmitted to the mammalian vertebrate host by sandfly bites. Hybrid progenies within the vector host of Leishmania major have been established [51]. Is leishmaniasis in India caused by insect parasites?
For better disease management and healthcare, monitoring of Leishmania infection in sandflies is important to precise the ecoepidemiology of Kala-azar in India. Examination of domestic cattle for serological and molecular evidence of Leishmania infection in the VL endemic area in Bihar, India needs to be carried out. Parasites isolated from VL cases in India are routinely not typed, assuming that they are all L. donovani in contrast to other countries where typing is more systematically done. Investing in infrastructure to set up good typing centers and parasite banks needs to be undertaken. Ongoing clinical drug trials in India, are prone to result in dynamic selective pressures which may mould the genome of the parasites, therefore, profiling of VL in India using deep sequencing as a prospect of continuous surveillance of pathogenic parasites and their threat to public health should be greatly supported and encouraged by Indian government. We have established through this study the success of second generation sequencing technologies for building parasite whole genomes. This approach can now be adapted to studying local population genetics of the kinetoplastid parasites in India. Work is underway to assess and quantify the presence of L. donovani and Leptomonas directly from the human splenic aspirates using PCR.