The Dynamics of Lateral Gene Transfer in Genus Leishmania - A Route for Adaptation and Species Diversification

Background The genome of Leishmania major harbours a comparably high proportion of genes of prokaryote origin, acquired by lateral gene transfer (LGT). Some of these are present in closely related trypanosomatids, while some are detected in Leishmania only. We have evaluated the impact and destiny of LGT in genus Leishmania. Methodology/Principal Findings To study the dynamics and fate of LGTs we have performed phylogenetic, as well as nucleotide and amino acid composition analyses within orthologous groups of LGTs detected in Leishmania. A set of universal trypanosomatid LGTs was added as a reference group. Both groups of LGTs have, to some extent, ameliorated to resemble the recipient genomes. However, while virtually all of the universal trypanosomatid LGTs are distributed and conserved in the entire genus Leishmania, the LGTs uniquely present in genus Leishmania are more prone to gene loss and display faster rates of evolution. Furthermore, a PCR based approach has been employed to ascertain the presence of a set of twenty LGTs uniquely present in genus Leishmania, and three universal trypanosomatid LGTs, in ten additional strains of Leishmania. Evolutionary rates and predicted expression levels of these LGTs have also been estimated. Ten of the twenty LGTs are distributed and conserved in all species investigated, while the remainder have been subjected to modifications, or undergone pseudogenization, degradation or loss in one or more species. Conclusions/Significance LGTs unique to the genus Leishmania have been acquired after the divergence of Leishmania from the other trypanosomatids, and are evolving faster than their recipient genomes. This implies that LGT in genus Leishmania is a continuous and dynamic process contributing to species differentiation and speciation. This study also highlights the importance of carefully evaluating these dynamic genes, e.g. as LGTs have been suggested as potential drug targets.


Methodology/Principal Findings
To study the dynamics and fate of LGTs we have performed phylogenetic, as well as nucleotide and amino acid composition analyses within orthologous groups of LGTs detected in Leishmania. A set of universal trypanosomatid LGTs was added as a reference group. Both groups of LGTs have, to some extent, ameliorated to resemble the recipient genomes. However, while virtually all of the universal trypanosomatid LGTs are distributed and conserved in the entire genus Leishmania, the LGTs uniquely present in genus Leishmania are more prone to gene loss and display faster rates of evolution. Furthermore, a PCR based approach has been employed to ascertain the presence of a set of twenty LGTs uniquely present in genus Leishmania, and three universal trypanosomatid LGTs, in ten additional strains of Leishmania. Evolutionary rates and predicted expression levels of these LGTs have also been estimated. Ten of the twenty LGTs are distributed and conserved in all species investigated, while the remainder have been subjected to modifications, or undergone pseudogenization, degradation or loss in one or more species.

Conclusions/Significance
LGTs unique to the genus Leishmania have been acquired after the divergence of Leishmania from the other trypanosomatids, and are evolving faster than their recipient genomes. This implies that LGT in genus Leishmania is a continuous and dynamic process contributing to species differentiation and speciation. This study also highlights the importance of carefully evaluating these dynamic genes, e.g. as LGTs have been suggested as potential drug targets.

Introduction
Trypanosomatids are single flagellated, kinetoplastid protozoa with parasitic lifestyles. Primary hosts are often insects [1], but secondary hosts may be vertebrates and some species, such as the sister genera Leishmania and Trypanosoma, cause major diseases in humans [2]. The kinetoplastids uniquely harbour the kinetoplast, a tubular network of interlocked circular mitochondrial DNA. Based on several molecular markers [3][4][5][6][7], genus Leishmania is divided into the three subgenera Leishmania, Viannia and Sauroleishmania. Fraga and co-workers (2010) further sub-divides these three subgenera into several informal complexes, in general congruence with most previous studies. The main advancements in the most recent classifications is the separation of the major-and tropica-complexes, and the erection of a guyanensis-complex encompassing strains previously included in the braziliensis-complex [3,4].
Species of genus Leishmania causes leishmaniasis, a neglected and largely tropical disease, threatening the health of 350 million people in 88 countries worldwide according to WHO estimates [8]. The zoonotic parasite is transferred to humans and animals via the bite of several species of sand flies of subfamily Phlebotominae [9]. From the complete sequencing of the genome of Leishmania major [10] it became evident that this genome includes a number of genes acquired by lateral gene transfer (LGT) from prokaryote donors [11][12][13].
LGT implies the uptake and fixation in a genome of DNA originating from another, distantly related, organism.
LGT has been recognized to contribute to genome evolution in prokaryotes [14,15] and eukaryotes alike [16][17][18], and offers a rapid route to the acquisition of new capacities, such as utilization of alternative metabolites or adaptation to new environmental niches [19,20]. The mechanism(s) of LGT in eukaryotes is not known, although phagosytosis by means of protozoan grazing [21], uptake via transposable elements [22][23][24] and donations from symbionts [25] have all been suggested. The broad collection of donor organisms within our dataset [11,12] speaks against one, single event of acquisition from one single symbiotic donor. The majority of donors inferred by phylogeny are proteobacteria, and these are abundant in the gut of the insect host of Leishmania. However, since we predict that most transfer events preceded the radiation of genus Leishmania, assumptions on putative hosts of this ancestral Leishmania are uncertain. LGT between prokaryotes have been shown to be transient, and although some LGTs are fixed in the host genomes and passed on after divergences into new species, these fixed LGTs are outnumbered by genes rapidly gained and lost [18,26,27]. Whether or not this is also the case for LGTs in eukaryote genomes, has not been completely elucidated [18], but recent studies have shown that eukaryote genomes can acquire long stretches of prokaryotic genes, from which beneficial genes are conserved and superfluous genes rapidly discarded [24]. With the aim to determine the dynamics of LGT generally and the depth of the transfer within genus Leishmania specifically, we have analysed orthologous groups of LGTs, previously confirmed with phylogenomic analyses in L. major only [28][29][30], and also in the genomes of the close relatives Trypanosoma brucei and Trypanosoma cruzi [11,12]. Furthermore, we have used a PCR-based approach to investigate ten additional strains of Leishmania parasites with unpublished genomes, for the presence of a subset of twenty LGTs that are unique to genus Leishmania. For these genes we have evaluated features of amino acid and nucleotide composition, as well as phylogeny in order to gain increased understanding of the fate of transferred prokaryotic genes in their new eukaryotic genomic environments.
LGTs have been suggested as potential drug targets in trypanosomatids [12] and careful evaluation of the stability of potential drug targets is an essential initial step towards the proposal of future drugs to treat infection.

Data mining and sequence retrieval
Orthologous sequences of the LGTs previously confirmed with phylogenomic analyses in L. major [11,12] were retrieved from the completely sequenced genomes of L. braziliensis L. infantum, L. mexicana, L. tarentolae, T. brucei and T. cruzi using homology searches with the BLAST [31] tool in TriTrypDB v 6.0 [32].

Sequence similarity and G+C content
The average amino acid and nucleotide identities were calculated between each LGT and orthologs in L. braziliensis, L. infantum and L. tarentolae respectively. Identity scores of LGTs were compared to the calculated average amino acid and nucleotide identity between the coding content of each complete genome [28,29].
The G+C content was calculated for each LGT and orthologs in L. braziliensis, L. infantum and L. tarentolae respectively. The G+C content of LGTs was compared to the coding G+C content of each complete genome [28,29].

DNA preparation
Frozen cell cultures of Leishmania promastigotes from ten different strains of Leishmania were kindly provided by the Public Health Agency of Sweden (Folkhälsomyndigheten) as listed in

Primer design
Oligonucleotide primers for amplification of orthologs to the LGTs were primarily designed based on the sequences and flanking regions of LGTs in the published genome of L. major. In some cases, oligonucleotide primers where also designed for the orthologs of the LGTs in the other Leishmania strains for which there are published genomes. Where needed multiple primer pairs were designed to amplify LGTs. The primers were designed in Geneious v.5/v.6 (Biomatters, Auckland, NZ) using the Primer3 algorithm [33,34]. Detailed information about the oligonucleotide primers can be found in S1 Table. PCR Polymerase chain reactions were performed in a total reaction volume of 50 μl, containing Taq DNA Polymerase Master Mix (75 mM Tris-HCl pH 8.5, 20 mM (NH 4 ) 2 SO 4 , 2.0 mM MgCl 2 , 0.2 mM dNTPs, 0.1% Tween 20 and 1.25 U Ampliqon Taq polymerase (VWR International, Radnor, PE), 0.2 μM of each primer (Sigma Aldrich, St. Louis, MO), 5-20 ng genomic DNA and double distilled, sterile filtered milliQ water (Merck Millipore, Billerica, MA). The DNA was amplified using an Eppendorf Mastercycler personal (Eppendorf, Hamburg, D) and a Bio-Rad C1000TM thermal cycler (Bio-Rad, Hercules, CA) under the following conditions: initial denaturation at 95°C for three minutes, followed by 35 cycles with denaturation at 95°C for 30 seconds, annealing at the annealing temperatures, Ta, optimized for the different oligonucleotide primer pairs (S1 Table) for 60-90 seconds, elongation at 72°C for 60 seconds and a final extension step at 72°C for 10 minutes. Positive amplifications were confirmed on a MCE-202 MultiNA instrument, a microchip electrophoresis system for DNA and RNA analysis (Shimadzu Biotech, Kyoto, J), in accordance with the manufacturer's instructions.

Phylogenetic analysis
The sequencing results were assembled using Geneious v.5/v.6 (Biomatters, Auckland, NZ) and the obtained sequences were named with the strain abbreviation (Table 1) followed by the chromosome and gene location of the LGTs in L. major F. All obtained sequences were uploaded to GenBank and allocated the following accession numbers; KM411704-KM411876. The sequences where aligned using MUSCLE [35] and the alignments were manually inspected and refined.
All phylogenetic analyses were performed using the software PAUP Ã 4.0b10 for UNIX [36,37] on an Apple MacBook Pro. Under a Fitch parsimony optimality criterion 1000 random addition sequences were followed by "tree bisection-reconnection" branch-swapping saving all optimal trees [38]. Support analyses were performed with 10000 bootstrap replicates, each consisting of three random addition sequences and subsequent "subtree pruning and regrafting" branch-swapping [39].

Synteny analysis-LGTs remnants identification
The flanking genes of the LGTs that lack orthologs in one or several of the already published genomes of L. braziliensis, L. infantum, L. mexicana and L. tarentolae where identified using TriTryp DB v 6.0 [32]. Any orthologs to these LGT flanking genes in L. braziliensis, L. infantum, L. mexicana and L. tarentolae where compared in terms of synteny. If the LGT flanking genes in L. major and their orthologs in above mentioned strains were syntenic, the intergenic region between the genes were aligned with the LGT and possible orthologs from other strains using MUSCLE. The alignments were manually inspected and refined in order to find remnants of LGT orthologs in strains where no orthologs are annotated.

Rate of sequence substitution
The ratio of non-synonymous mutations (K a ) to synonymous mutations (K s ) were calculated for pairwise alignments of all orthologs obtained for each of the studied LGTs, using the modified Yang-Nielson method (MYN) [40] incorporated in the software package K a /K s _Calculator [41]. Protein coding genes in the Leishmania genomes have been shown to have a strong codon bias [42][43][44]. MYN adopts the Tamura-Nei model, which takes codon frequency bias and differences among transitional and transversional substitution rates into consideration when calculating K a and K s . The K a /K s ratios between the different strains and means were compared for the different genes. The K a /K s -ratios of the LGTs were compared to the housekeeping genes heat-shock protein 70 (HSP70) and elongation factor Tu (EF-Tu) (S2 Table). The setting "standard" was chosen for the genetic code.

Functional annotation of LGTs
The predicted functions of LGTs were retrieved from annotations of L. major genes in Tri-TrypDB v 6.0 [32]. All orthologues to L. major LGTs within our dataset are assumed to have the same function.

ID numbers of genes included in this study
LmxM. 32

Data mining and sequence retrieval
Fifty-nine of the LGTs previously confirmed in L. major [11,12] were found to have orthologs also in close relatives, belonging to genus Trypanosoma, indicating that their transfer preceded the radiation of the trypanosomatids. We found that these universal trypanosomatid LGTs are, with a few exceptions, present in all published genomes of genus Leishmania-four are lost in L. braziliensis, two in L. tarentolae, and one each is lost in L. mexicana and L. infantum. In contrast, thirty-one LGTs detected in L. major only, and not found in any of the other trypanosomatids, exhibit a more heterogeneous distribution in genus Leishmania-five are lost in L. braziliensis, six in L. tarentolae, four in L. mexicana and one is missing from the genome of L. infantum (Fig 1A and S3 Table).

Sequence similarity and G+C content
We have compared the nucleotide and amino acid identities, as well as the G+C contents, of the LGTs in L. braziliensis, L. infantum and L. tarentolae with corresponding values of the coding content of the complete genomes. The nucleotide and amino acid identities between the universal trypanosomatid LGTs ranges from 76.7% to 94.0% (nucleotide level), and from 76.6% to 93.5% (amino acid level), respectively. This resembles the nucleotide and amino acid identity scores between the published genomes of L. major, L. braziliensis, L. infantum and L. tarentolae ranging from 79.2% to 94.0% and from 74.8% to 92.0% respectively. In contrast, nucleotide and amino acid identities in orthologs to the LGTs unique to genus Leishmania are significantly lower (p-value <0.05) than those of the genomes, ranging from 68.7% to 92.1% and from 66.8% to 88.8% respectively ( Table 2).
The average G+C-content of orthologous universal trypanosomatid LGTs, as well as the average G+C-contents of the LGTs unique to genus Leishmania, is lower than the coding G+C-content of the L. major, L. braziliensis, L. infantum and L. tarentolae complete genomes. However, the differences are at the most only 1.6% (L. major: 62.5%-60.9%) and 3.4% (L. infantum: 62.5%-59.1% and L. major 62.5%-59.1%) respectively, and not statistically significant (p<0.05) ( Table 3).

Distribution of LGTs in genus Leishmania
To deepen our study of LGT in genus Leishmania we used a PCR based approach to determine the presence of orthologs to twenty of the LGTs initially detected in Leishmania major but missing from the genomes of Trypanosoma brucei and Trypanosoma cruzi. We investigated ten additional strains of Leishmania, all with unpublished genomes (Table 1), and three universal trypanosomatid LGTs were added for comparison (S4 Table). Since the divergence of completely sequenced genomes defines the outer borders of our dataset, primers for PCR to detect the LGTs in species not yet sequenced, can be designed within conserved regions, and are thus likely to amplify orthologous regions within our dataset. The presence of orthologs in different strains is defined by sequence homology, with a minimum of 90% nucleotide identity to a sequence in a published Leishmania genome. This is well above the so called twilight zone for identification based on sequence homology [46]. We found that eleven of the twenty LGTs have orthologs in all fifteen strains investigated in this study, and that further six LGTS have LGTs. (B) Consensus tree summarizing the agreement between the phylogenetic trees for the twenty-three LGTs, investigated in detail in this study. Genus Leishmania is divided into the three subgenera Viannia, Sauroleishamania (here represented by only one species and one strain-L. tarentolae) and Leishmania. Subgenera Viannia and Leishmania are further divided into several complexes, and the topology of the consensus tree is completely congruent with the systematics proposed by Fraga and co-workers (2010). The fate of each LGT is illustrated with coloured boxes. "Gene acquisition" (green boxes) and "Gene loss" (red boxes) are based on presence of orthologs in published genomes or amplification of orthologs with PCR. Insertions/deletions (indels) (blue and orange boxes) and presence of stop codons (brown boxes) have been determined by manual inspections of alignments of obtained sequences and are relative to the gene sequence in L. major F. "In-frame Indels" (blue boxes) are insertions or deletions of more than 50 nucleotides not causing any frame-shift. "Frame-shift Indels" (orange boxes) are insertions or deletions that causes frame-shifts. Remnants (black boxes) of orthologs have been identified with synteny analyses (see Material and Methods).  Table).
Phylogenetic analysis of available data for each separate gene was performed, and all most parsimonious trees obtained from the separate analyses were used to construct a reference tree, shown in Fig 1B. The reference tree is completely congruent with the systematics proposed by Fraga and co-workers (2010). We also identified remnants of orthologs to the LGTs in already published genomes by analysing intergenic regions between syntenic orthologs of the flanking genes of the LGTs in L. major. Our aim was to be able to date the uptake of these LGTs to before, or after, the divergence of genus Leishmania into different subgenera. Based on presence in different strains of Leishmania (S4 Table), seventeen of the twenty LGTs appear to have been transferred after the divergence of genus Leishmania from other trypanosomatids, but before the division into the three subgenera present to day (Fig 1B). Three LGTs; LmjF03.0390, LmjF31.1810 and LmjF. 33.0990, appears to have been transferred after the division of the Leishmania genus into the three subgenera, since LmjF.03.0390 and LmjF.33.0990 lack orthologs in subgenus Viannia, and LmjF.31.1810 only have orthologs in the donovani, major and tropica complexes (Fig 1B and S4 Table). However, there are remnants of orthologs to LmjF.03.0390 and LmjF.33.0990 in L. tarentolae strain P, with nucleotide identities of 34% and 63% respectively, suggesting that these two LGTs have been transferred after the divergence of subgenera Leishmania and Sauroleishmania from subgenus Viannia.  Fig 1B).
We have also analysed the orthologs of the LGTs with respect to presence of stop codons, frame shift nucleotide insertion or deletions (indels) and major (comprising more than fifty nucleotides) nucleotide indels. Ten of the twenty LGTs are intact and of similar length in all orthologs investigated. Five LGTs have orthologues affected by major in-frame indels, while five LGTs are also affected by internal stop codons and/or major framshift indels (Fig 1B).

Rate of sequence substitution
In order to determine if the acquired LGTs are under neutral, positive, or negative selection in the different strains of Leishmania, the evolutionary rate of LGTs was estimated by calculating the non-synonymous over synonymous substitutions (K a /K s ratio) for pairwise alignments of all orthologs of LGTs. To put the rates into perspective we also calculated the K a /K s ratio for a set of reference genes (S2 Table). As such, K a /K s ratios of reference genes were all determined to be less than one. In three LGTs; LmjF.15.0740, LmjF.23.0260 and LmjF.36.0260, the K a /K s ratios are greater than one, and these three LGTs are also interrupted by stop codons in one or more species. In addition, one LGT, LmjF.03.0390 detected only in subgenus Leishmania, display K a /K s ratios close to one, and all detected orthologs to LmjF.03.0390 are degraded to some extent (Fig 2). In the remaining LGTs, K a /K s ratios are less than one but greater than the K a /K s ratios of the reference genes. Detailed information can be found in S5 Table.

Codon bias calculation
To elucidate whether the LGTs share patterns similar to those of genes determined to be functional in Leishmania [43,44,47] the codon bias was determined using Codon Adaptation Index (CAI) scores [48]. The mean CAI scores for the majority of orthologous LGTs show no significant variations indicating that these genes are expressed. However, three cases; LmjF.15.0740, LmjF.23.0260 and LmjF.36.0260, display extensive codon bias variations (high coefficient of variance), indicated in bold text in Table 4. A closer inspection reveals that these three LGTs have a subset of orthologs with significantly lower average CAI-scores than the remaining orthologs (pooled two-sample t-test, p<0,001). Additionally, LmjL. 15

Discussion
In this study we contribute data to increase the understanding of the fate of genes acquired by LGT in eukaryote genomes. We have combined a PCR based screen and sequence determination of orthologues to previously detected LGTs in un-sequenced Leishmania genomes with thorough bioinformatics analyses of our unique dataset. Our aim was to determine the depth of the transfer of LGTs previously reported in L. major within genus Leishmania and to evaluate the fate of the Leishmania-specific transferome in comparison to the LGTs shared by all trypanosomatids.
The L. major genome contains about 8300 genes [10] and approximately 6200 of these are shared with the closely related trypanosomatids; T. brucei and T. cruzi. Less than 1000 genes are unique to L. major [12]. The number of universal trypanosomatid LGTs has been estimated to fifty-nine while thirty-one LGTs appeared to be unique to genus Leishmania (S3 Table). Hence less than 1% of the universal trypanosomatid genes are of prokaryote origin and these are, with a few exceptions, present in all published Leishmania genomes. However, 3% of the genes unique to L. major are LGTs. The LGTs uniquely detected in genus Leishmania, display a more heterogeneous distribution, with a relatively high degree of gene loss, compared to the shared trypanosomatid set of LGTs. Comparisons of the published Leishmania genomes reveal that the number of species specific genes are surprisingly low, considering the early diversification of genus Leishmania [28][29][30]. Our results hence indicate that LGTs contributes to speciation as well as species diversification within genus Leishmania.
Comparative studies of the published genomes of Leishmania also unambiguously show that they are similar with respect to G+C-content, nucleotide composition and amino acid content [28][29][30]. Our analyses show that the G+C content of universal trypanosomatid LGTs and  the LGTs uniquely present in genus Leishmania resemble that of the recipient genomes. Anomalous G+C content of genes have been used as tools to identify LGTs [49,50]. However, previous surveys have concluded that comparisons of G+C-content alone are poor markers for LGT, due to the effects of amelioration [51,52]-the process by which foreign DNA adapts to resemble the nucleotide composition of the host genomes [49]. Our results indicate that LGTs detected in genus Leishmania have ameliorated so that the G+C content resemble that of the recipient genomes. Notably, comparisons of nucleotide and amino acid identities between the orthologs to LGTs detected in published Leishmania genomes reveal that the universal trypanosomatid LGTs are more conserved than the LGTs present in genus Leishmania only. A likely consequence of have resided in their host genomes a longer time period than the species specific LGTs. Consequently the LGTs detected in L. major, but not found in any other trypanosomatids, were most likely transferred after the divergence of genus Leishmania, rather than before speciation of the trypanosomatides and then subsequently lost in the trypanosomes. These observations also support the indication that LGTs contribute to speciation and species diversification in genus Leishmania.
To further determine the depth and dynamics of LGTs in genus Leishmania, we have explored and analysed the gene sequences of orthologs to twenty LGTs unique to genus Leishmania in ten additional strains from eight species of Leishmania. Three universal trypanosomatid LGTs were also included, for comparison. Presence of LGTs in each genome as well as stop codons and intragenic indels were mapped onto a phylogenetic tree (Fig 1B), showing that seventeen of the LGTs were acquired prior to the division of the Leishmania genus. Eleven of these have intact orthologs in all strains investigated, and are thus assumed to be fixed in each respective genome. Two genes appear to have been acquired after the divergence of subgenus Viannia from subgenera Leishmania and Sauroleishmania, and one single gene even later than that-after the divergence of the mexicana complex from the donovani, tropica and major complexes. Two out of the three universal trypanosomatid LGTs are distributed throughout the entire genus Leishmania, while one is absent in subgenus Viannia (Fig 1B). Altogether our results indicate that LGT in genus Leishmania is an ongoing process, and they also support the conclusion that LGT contributes to successive species-specific diversifications.
LGTs contribute to genome evolution in prokaryotes [14,15] and eukaryotes alike [16][17][18]. Recent studies have shown that the fate of LGTs in eukaryote genomes are dependent on the new capacities that they are providing to the organism-beneficial genes are conserved and superfluous genes are discarded [24].
To evaluate the rate of evolution we have determined whether the LGTs are under neutral, positive or negative selection in the genomes of the different strains of Leishmania by comparison of synonymous and non-synonymous substitutions in orthologous LGTs. We also analysed a set of so called housekeeping genes, chosen as references. K a /K s ratios between orthologs of reference genes confirm that they are under purifying selection. In a similar fashion, sixteen Leishmania specific LGTs, and all three of the universal trypanosomatid LGTs, display K a /K s ratios indicative of conservation. However, the remaining four LGTs evolve under a low selective pressure (Fig 2). Among these four genes we detected orthologs interrupted by stop codons, a direct indication of relaxed pressure. The variation in conservation among LGTs strengthens the conclusion that LGT in genus Leishmania is indeed a dynamic process, facilitating diversification.
Comparative studies of gene expression in Leishmania indicate that the majority of genes are constitutively expressed by means of polycistronic transcription [43,53,54]. Differentiation of expression is thus hypothesized to be post-transcriptional, and partly facilitated via codon usage bias [44]. Synonymous codon usage bias can be estimated by calculation of CAI scores [48], which has been used to predict expression levels of all genes in the close relative Trypanosoma cruzi [44]. These predictions were confirmed by comparisons to the complete proteome of T. cruzi [55], and it has also been concluded that relative codon bias is conserved among the trypanosomatids [45]. We have calculated the CAI scores of orthologous LGTs, with the purpose to evaluate whether LGTs may be expressed by the parasite or not. The mean CAI scores for the majority of LGTs show no significant variations and fall in the range of the expression levels of genes in tandem-i.e. highly expressed genes [10,30]. However, three LGTs (LmjF.15.0740, LmjF.23.0260 and LmjF.36.0260) display large variations in average CAI scores (Table 4). Our further comparisons show that LGTs displaying large average CAI score variations evolve with high evolutionary rates and show signs of either degradation or gene innovation (Fig 2). In conclusion, the dynamics of LGT in genus Leishmania correlates with predicted expression levels.
The impact of LGTs on prokaryotes has been extensively studied [14,15], and it is now clear that LGTs from prokaryotes have an influence on metabolic pathways in eukaryotic microorganisms, either by homologue replacement or by providing new capacities to the organism [11,[16][17][18]. It is also clear that gene transfer that are not beneficial for the recipient genomes will be either lost or modified i.e. even though the uptake of LGTs is assumed to be a random process, the fixation of foreign genes depend on selective evolutionary pressure. [21,24,56]. It has also been suggested that LGTs is a paramount mechanism for gene innovation [57]. Our data set contains examples of conserved LGTs as well as LGTs that are evolving fast. Among fast evolving LGTs we have observed genes potentially developing new functions, but also genes that are being degraded. Among our data set of 20 LGTs, eleven LGTs have been assigned a predicted function (S4 Table). However, the remaining nine LGTs encode hypothetical proteins, and eight of these are both conserved and predicted as highly expressed. This highlights the importance of further investigations on neglected pathogens to elucidate the function of uncharacterized proteins and their impact on organism adaptation and diversity.
In conclusion, when a gene is acquired there are three possible scenarios; conservation, degradation or innovation. If orthologues of the same ancestral gene takes on different routes of evolution, these differences in the metabolic repertoire may fuel speciation. In our dataset we detected genes that show features indicative of all these three different scenarios.
For example, LGT LmjF. 16.0950 encodes a sucrose-phosphate synthase-like protein, which converts sucrose to fructose-a primary energy sources for promastigotes residing in the sandfly [58]. The sucrose-phosphate synthase-like protein is, according to our survey, widely distributed in the genus Leishmania and conserved as well as predicted as highly expressed. Hence a result of a gene transfer facilitating adaptation to the life stages in the insect vector and thus generally beneficial for the parasites. Macrophages have many functions in the human body, one being the removal of erythrocytes from the blood. This causes a release of heme into the phagolysosome, where it is degraded and proposed to be utilized by Leishmania amastigotes in synthesis of heme-proteins. All trypanosomatides are auxotrophs for heme, but Leishmania spp. only harbours genes encoding the last three steps of the hemeproduction. The first two steps are catalysed by coproporphyrinogen-III oxidase followed by protoporphyrinogen oxidase (PPOX), both proteins encoded by LGTs [58]. PPOX (LmjF.06.1280 and orthologs) is distributed and conserved in the entire genus Leishmania, however, the average CAI scores are the lowest (0.52) among the average CAI scores of all the studied LGTs. This predicted, limited expression of LmjF.06.1280 correlates with the function of the gene, since it is suggested to be important only during a short period in the life cycle of Leishmania parasite.
Another example is LmjF.36.0260 that encodes xyuolokinase, an enzyme that together with ribulokinase and ribokinase is involved in the metabolism of C5-sugars. This acquisition allows Leishmania parasites to assimilate nutrients from other types of sugars than the ones present in the diet of the sand fly [58] and is thus important for the adaptation to the life stages in the human host. The divergence between the LmjF.36.0260 orthologs in subgenus Viannia and the orthologs in the other subgenera together with the K a /K s ratios for these orthologs, (Fig 2) indicate that either these LGTs are being degraded, or the groups are diverging from each other with higher rates than expected. The latter of these two alternatives is supported by the fact that none of the orthologs except LamH.36.0260 contain stop codons. The CAI scores of the orthologs in subgenus Leishmania and subgenus Sauroleishmania indicates that these genes are highly expressed, hence the possible degradation of LamH.36.0260 may be in an early stage. The comparably low CAI scores for orthologs in subgenus Viannia (Fig 3C) suggest that the function of LmjF.36.0260 in the three different subgenera may not be the same. It seems as if this additional enzyme for metabolizing C5-sugars is still beneficial for the majority of the strains in subgenus Leishmania and in subgenus Sauroleishmania, but maybe less so in the strains of subgenus Viannia. Thus, this may be an example of an LGT going down a route for gene innovation. In addition, LmjF.26.2710 encodes a glutamate-5-kinase, which is involved in the metabolism of amino groups. The relaxed constraints of evolution inferred in orthologs of subgenus Viannia might be indicative of diversifying selection possibly leading to novel protein functions among orthologues in subgenus Viannia.
LmjF.23.0260 encodes argininosuccinate synthetase, which is active in the urea cycle. Leishmania parasites secretes urea in the phagolysosome to sustain the low pH that is required for the differentiation of promastigotes into amastigotes [59][60][61]. The LmjF.23.0260 orthologs in subgenera Leishmania and Sauroleishmania lack stop codons, however the high K a /K s ratios (Fig 2) and predicted high expression levels (Fig 3B), indicate that these orthologs are possibly diverging to acquire new functions. The orthologs to LmjF.23.0260 in strains from subgenus Viannia harbour stop codons and exhibits K a /K s ratios >1 (Fig 2). The predicted levels of expression are also lower than those of other LmjF.23.0260 orthologs (Fig 3B), indicating that all of the subgenus Viannia orthologs may be degrading. In addition, LmjF.15.0740 (a hypothetical protein) shows signs of degradation by means of stop codons and low CAI values in some species.
LGTs have previously been suggested as potential drug targets [62] and our study highlights the importance of paying attention to the dynamics of LGTs while selecting such genes as targets for new drugs to hamper infection. In conclusion, our results show that universal trypanosomatid LGTs are widely distributed and conserved in genus Leishmania, which is a likely consequence of the long time period that the universal trypanosomatid LGTs have resided in their respective recipient genomes. The LGTs that are uniquely present in genus Leishmania are more divergent than the rest of the genomes, which either indicates that many of these LGTs and their orthologs have not yet ameliorated to resemble the host genomes, or that they are evolving at a faster rate than the recipient genomes. The latter scenario is supported by our observations of high variations in nucleotide and amino acid composition, high rates of evolution and pronounced codon bias. In addition several examples of differential pseudogenization among these LGTs can be inferred; i.e. genes are acquired, initially found beneficial for one specific lineage (and thus fixated in closely related strains), but later rendered superfluous, or modified in some strains, yet conserved in others.
LGT in genus Leishmania is thus an ongoing continuous and dynamic process where gains and losses of genes occur as the lineage evolves and diverges. We thus suggest that LGTs contribute to the shaping and speciation of genus Leishmania.
Supporting Information S1 Table. Primers used. Information about primers used to amplify the orthologs to the LGTs investigated in this study. The information includes primer sequences, annealing temperatures and expected size for primer pairs. (XLSX) S2 Table. Reference genes. Information about the reference genes used for comparisons of K a / K s ratios. (XLSX) S3 Table. Presence of LGTs in different strains of Leishmania. A compilation of orthologs to LGTs previously confirmed with phylogenomic analyses in L. major. Fifty-nine have orthologs also in close relatives belonging to genus Trypanosoma, and thirty-one have only orthologs in genus Leishmania. (XLSX) S4 Table. Presence of LGTs in different strains of Leishmania. A compilation of the twentythree LGTs deeply investigated in this study, and the presence of orthologs to these genes in the fifteen strains included in this study. (XLSX) S5 Table. Results from K a /K s calculations. Results from calculations for pairwise alignments of all orthologs obtained for each of the studied LGTs, using the modified Yang-Nielson method (MYN) [40] incorporated in the software package Ka/Ks_Calculator [41]. (XLSX)