Parallel Evolution of a Type IV Secretion System in Radiating Lineages of the Host-Restricted Bacterial Pathogen Bartonella

Adaptive radiation is the rapid origination of multiple species from a single ancestor as the result of concurrent adaptation to disparate environments. This fundamental evolutionary process is considered to be responsible for the genesis of a great portion of the diversity of life. Bacteria have evolved enormous biological diversity by exploiting an exceptional range of environments, yet diversification of bacteria via adaptive radiation has been documented in a few cases only and the underlying molecular mechanisms are largely unknown. Here we show a compelling example of adaptive radiation in pathogenic bacteria and reveal their genetic basis. Our evolutionary genomic analyses of the α-proteobacterial genus Bartonella uncover two parallel adaptive radiations within these host-restricted mammalian pathogens. We identify a horizontally-acquired protein secretion system, which has evolved to target specific bacterial effector proteins into host cells as the evolutionary key innovation triggering these parallel adaptive radiations. We show that the functional versatility and adaptive potential of the VirB type IV secretion system (T4SS), and thereby translocated Bartonella effector proteins (Beps), evolved in parallel in the two lineages prior to their radiations. Independent chromosomal fixation of the virB operon and consecutive rounds of lineage-specific bep gene duplications followed by their functional diversification characterize these parallel evolutionary trajectories. Whereas most Beps maintained their ancestral domain constitution, strikingly, a novel type of effector protein emerged convergently in both lineages. This resulted in similar arrays of host cell-targeted effector proteins in the two lineages of Bartonella as the basis of their independent radiation. The parallel molecular evolution of the VirB/Bep system displays a striking example of a key innovation involved in independent adaptive processes and the emergence of bacterial pathogens. Furthermore, our study highlights the remarkable evolvability of T4SSs and their effector proteins, explaining their broad application in bacterial interactions with the environment.


Introduction
Adaptation to different ecological niches can lead to rapid diversification of a single ancestor into an array of distinct species or ecotypes. This process, called adaptive radiation, typically occurs after the arrival of a founding population in a novel environment with unoccupied ecological niches ('ecological opportunity') and/or by the acquisition of a novel trait ('evolutionary key innovation') allowing the exploitation of so far unapproachable niches [1]. Spectacular examples of adaptive radiation come from different metazoan lineages with the cichlid fishes of the East African Great Lakes and the Darwin finches on Galapagos Islands representing the most prominent examples [2,3]. Although known from a few cases only, bacterial lineages also underwent adaptive radiation -as documented in natural settings as well as in evolution experiments [4][5][6]. It remains a fundamental problem to biology to understand why and how certain lineages diversified; adaptive radiations, and in particular the genetic and genomic basis thereof, provide an ideal set-up to address this question [3,7].
One of the most fascinating aspects of adaptive radiation is the frequent occurrence of evolutionary parallelism resulting in independent adaptation to same ecological niches [8][9][10]. Such evolutionary parallelisms are excellent examples for the action of similar, yet independent, selective forces and, hence, for the key role of natural selection in evolution [11]. Furthermore, parallel adaptive radiations in a single group indicate the existence of traits conferring a high degree of adaptability allowing the group members to efficiently occupy distinct environments. Therefore, lineages that radiate in parallel are of great value to study the molecular basis of adaptation and their independent evolutionary trajectories [1,9]. This is of particular interest in case of hostadapted bacteria differentiating into divergent ecological niches and potentially resulting in the emergence of new pathogens. Species of the a-proteobacterial genus Bartonella are specifically adapted to distinct mammalian reservoir hosts where they cause intra-erythrocytic infections [12]. Different animal models revealed that Bartonella upon reservoir host infection colonizes a primary cellular niche from where the bacteria get seeded into the bloodstream adhering to and invading erythrocytes [13][14][15]. In most cases, infections of the reservoir host do not lead to disease symptoms suggesting a highly specific adaptation to the corresponding host niche. The transmission between host individuals is mediated by blood sucking arthropods.
An integrative genome-wide analysis showed that most factors essential for Bartonella to colonize their mammalian reservoir hosts are found within the core genome of this genus [16]. This is not surprising as it reflects the common strategy used by divergently adapted species to colonize their hosts. However, this study also revealed that two type IV secretion systems (T4SS), Trw and VirB, which are essential for host interaction at different stages of the infection cycle represent the few colonization factors exclusively found in the most species rich sub-lineages of bartonellae. It was assumed that the horizontal acquisition of these T4SS substantially refined the infection strategy of Bartonella facilitating concurrent adaptation to a wide range of different hosts [16]. The VirB T4SS translocates a cocktail of evolutionarily related effector proteins into host cells of the primary infection niche where they modulate various cellular processes [17][18][19][20][21]. The Trw T4SS is involved in the erythrocyte invasion by binding to the erythrocytic surface with its manifold variants of pilus subunits [22][23][24].
Here, we study the evolutionary relationship of Bartonella species adapted to distinct reservoir hosts and investigate the genetic mechanisms underlying adaptive radiation in different lineages. We uncover two parallel adaptive radiations in the genus Bartonella. Our genome-wide analysis revealed a remarkable evolutionary parallelism in the horizontally acquired VirB T4SS in the two radiating lineages. This parallelism is characterized at the molecular level by the lineage-specific chromosomal integration of the virB loci and the independent origination of versatile sets of effector proteins for the interaction with host cells. Providing an arsenal of host-subverting functions that can be efficiently modulated, the VirB T4SS thus seems to represent an evolutionary key innovation triggering the independent radiations of the two lineages. Our study provides detailed insights into the molecular mechanisms underlying parallel adaptive radiations in a bacterial pathogen. Furthermore, many of the diversified T4SS effector proteins carry a FIC domain recently shown to mediate 'AMPylation', a lately recognized post-translational modification [25,26]. FIC domains are highly conserved in evolution and the diversified variants of the Bartonella effector proteins may display a suitable model to study their activity spectrum in the future.

Results/Discussion
Genome sequencing of ecologically distinct Bartonella strains To study the adaptive evolution of Bartonella on a genomic level, we aimed for a set of genome sequences from species adapted to distinct mammalian reservoir hosts. To this end, we included in our analysis the published genome sequences of Bartonella bacilliformis (Bb), Bartonella grahamii (Bg), Bartonella henselae (Bh), Bartonella quintana (Bq), and Bartonella tribocorum (Bt) [16,27,28]. These five species are adapted to human (Bb and Bq), cat (Bh), mouse (Bg), and rat (Bt). Further, we sequenced the complete genome of Bartonella clarridgeiae (Bc) and generated draft sequences of Bartonella schoenbuchensis (Bs), Bartonella rochalimae (Br), Bartonella sp. AR 15-3 (BAR15), and Bartonella sp. 1-1C (B1-1C). Bs was selected as representative of a solely ruminant-infecting clade [16]. Bc, Br, BAR15, and B1-1C were previously shown to be closely related [29][30][31]. However, they were isolated from different mammalian reservoir hosts and therefore display a suitable set of species to study adaptive processes on the genomic level. BAR15 and B1-1C were recently isolated from American red squirrel and rat, respectively [30,31], whereas Br was predominantly recovered from canidae like dogs or foxes, and Bc from cats [32][33][34].
Genome sequencing by 454-pyrosequencing resulted in an average sequence coverage of .35x. The single chromosome of the completely assembled genome of Bc was found to be 1,522,743 bp in size and thus belongs to the smaller genomes of Bartonella (Table 1). The draft genomes of Br, BAR15, B1-1C, and Bs consist of 13 to 19 contigs with total genome sizes similar to the one of Bc. On average, 99% of all 454-sequencing reads were assembled into the analyzed 13 to 19 contigs indicating that our draft genomes did not miss essential sequence data for subsequent analysis. Genomic features of the strains used in this study are summarized in Table 1. A genome-wide phylogeny reveals parallel adaptive radiations in two sister clades of Bartonella We inferred a robust species tree of the genus Bartonella based on 478 core genome genes of the ten available Bartonella genomes sequenced ( Figure S1). To exclude that recombination or horizontal gene transfer within this set of core genome genes was affecting our phylogenetic analysis, we reconstructed single gene trees of the entire data set and performed a recombination analysis using the GARD algorithm [35]. 471 of the 478 genes revealed the same overall topology as our genome-wide phylogeny with the two monophyletic clades of lineage 3 and lineage 4 (see Figure 1). Further, the GARD analysis detected significant recombination breakpoints with a p-value,0.01 in only two out

Author Summary
Adaptive radiation is the rapid origination of an array of species by the divergent colonization of disparate ecological niches. In the case of pathogenic bacteria, radiations can lead to the emergence of novel human pathogens. Being divergently adapted to a range of different mammalian hosts, including humans as reservoir or incidental hosts, the genus Bartonella represents a suitable model to study genomic mechanisms underpinning divergent adaptation of pathogens. Here we show that two distinct lineages of Bartonella have radiated in parallel, resulting in two arrays of evolutionary distinct species adapted to overlapping sets of mammalian hosts. Such parallelisms display excellent models to reveal insights into the genetic mechanisms underlying these independent evolutionary processes. Our genome-wide analysis identifies a striking evolutionary parallelism in a horizontally-acquired protein secretion system in the two lineages. The parallel evolutionary trajectory of this system in the two lineages is characterized by the convergent origination of a wide array of adaptive functions dedicated to the cellular interaction within the mammalian hosts. The parallel evolution of the two radiating lineages on the ecological as well as on the molecular level suggests that the horizontal acquisition and the functional diversification of the secretion system display an evolutionary key innovation underlying adaptive evolution. Table 1. Genomic features of analyzed Bartonella species and information about their reservoir host and phylogenetic lineage (see also Figure 1).  Figure 1. Phylogeny of Bartonella based on a genome-wide dataset. Maximum likelihood analysis using an alignment of 478 genes (515,751 nt) of ten sequenced Bartonella species (indicated by bold and color type) and Brucella abortus. Based on sequence data from rpoB, gltA, ribC, and groEL genes, additional Bartonella species were included in the analysis. Numbers above the branches represent maximum likelihood bootstraps (100 replicates); numbers below represent Bayesian posterior probabilities. Values $80% are shown. Lineages harboring the VirB T4SS [16] are shaded in gray; the primary mammalian hosts are indicated for each species. Phylogenetic trees inferred from either the genomic data set excluding non-sequenced bartonellae or the sequences from only rpoB, gltA, ribC, and groEL genes revealed the same four lineages for the Bartonella ingroup ( Figure S1). Estimates of average evolutionary divergence over sequence pairs within lineages and between lineages are presented in Table  S1. l1, lineage 1. doi:10.1371/journal.pgen.1001296.g001 of the 478 core genome genes. Together these analyses show that our genomic data set is suitable for inferring a consistent species tree. Based on available sequence information for the housekeeping genes rpoB, gltA, ribC, and groEL, we included most other Bartonella species in the analysis resulting in a so-called supertree phylogeny ( Figure 1) [36]. Just as the analysis based on the 478 core genome genes of the ten sequences species alone, this supertree revealed four major clades in the monophyletic bartonellae: ancestral lineage 1 represented by the highly virulent human pathogen Bb [37]; lineage 2 comprising of Bs and three other ruminantinfecting species; lineage 3 consisting of the closely related Bc, Br, BAR15, and B1-1C; and the most species-rich lineage 4 with 13 species including Bg, Bh, Bq, and Bt ( Figure 1). A phylogeny based on only the four housekeeping genes resulted in the same clustering of these taxa into the four different Bartonella lineages ( Figure S1).
In contrast to the ancestral lineage 1, lineages 2, 3, and 4 are ramifying to different degrees comprising species isolated from various hosts. While the species of lineage 2 are limited to infect ruminants and have overlapping host range [38], the diversification of lineage 3 and 4 seems to result from the specific adaptation to distinct mammalian hosts [12]. To substantiate the ecological divergence within these two lineages, we analyzed the genotypehost correlation of Bartonella isolates sampled from diverse mammals. Based on gltA sequences, this analysis revealed clustering of strains isolated from same or similar hosts in lineage 3 and 4 ( Figure S2). Further support for the host specific adaptation of different Bartonella species comes from recently published laboratory infections [39,40] and from our own rat infection experiments with the strains of lineage 3 ( Figure S3). It is to mention that some Bartonella species can incidentally be transmitted to other hosts like humans [12]. These so-called zoonotic Bartonella species do not cause intraerythrocytic bacteremia in the accidental human host reflecting the lack of specific adaptation. However, such accidental transmissions might facilitate the emergence of new specificity resulting in host switches and the origination of new species. In particular, several species of lineage 3 and 4 are known to display such zoonotic pathogens, whereas for lineage 1 or 2 to our knowledge no such case has been reported so far [12,29].
In summary, our genome-wide phylogenetic analysis shows that the sister lineages 3 and 4 have evolved by adaptive radiations into same or similar ecological niches (i.e. hosts). Long internal branches separating the two lineages from each other and preceding the radiations are evidence for their independent occurrence ( Figure 1). Due to the lack of calibration time points, the exact timing of these independent radiations cannot be deduced. However, the phylogenetic tree in Figure 1 might suggest that lineage 3 diversified more recently compared to lineage 4. This is supported by the mean p-distances inferred for the sequenced taxa of these two lineages: lineage 3 = 0.0760.0002, lineage 4 = 0.1260.0003 (see also Table S1). Two alternative explanations for the observed differences in lineage diversification could be (i) a sampling bias, i.e. the full diversity of lineage 3 was not captured or (ii) smaller population sizes for species of lineage 4 over lineage 3 leading to faster evolution at purifying sites. Significant differences in population size might be rather unlikely as Bartonella species are thought to share a common life style in their respective reservoir host. Whether sampling of Bartonella species in animal populations was exhaustive enough is difficult to assess. However, a newly discovered species would only change the coalescent point of a lineage if it would hold a more ancestral position than the already known species of this lineage. In contrast to these alternative hypothesis, epidemiological studies rather seem to support the scenario of a more ancient onset of the radiation in lineage 4: (i) lineage 4 comprises a much wider range of divergently adapted species and (ii) they represent the most frequently found Bartonella species in natural host populations [30,41]. In contrast, except for Bc, taxa of lineage 3 were only recently detected and so far only sampled at low prevalence [29][30][31][32].
The VirB T4SS displays an adaptive trait specific to the radiating lineages of Bartonella The evolutionary parallelism of the radiating Bartonella lineages provides an ideal setting to study independent evolutionary processes linked to the adaptation to divergent niches. An important driving force for these adaptive radiations might be the presence of ecological opportunities. The niche of Bartonella in the mammalian reservoir host, the bloodstream, displays a privileged environment in which other resource-competing microbes are typically absent. Thus, the adoption of the characteristic intra-erythrocytic infection strategy together with the vector-borne transmission route might have enabled adaptive radiations of Bartonella by the specialization to different hosts. However, not all Bartonella lineages appear to have diversified to the same extent (see Figure 1) suggesting that the availability of such an ecological opportunity alone is not sufficient to explain the pronounced radiation of lineages 3 and 4. Supposedly, key innovations, i.e. lineage-specific traits underlying the adaptation to the mammalian host niches are responsible for the adaptive radiations. Potential adaptive traits would have to be involved in species-environment interactions, such as molecular factors responsible for causing bacteremia. Further, in analogy to the modulation of the adaptive traits in metazoan radiations [2,3], any molecular factor used to exploit distinct environments in a specific manner should be divergent among niche-specialized species [1]. Molecular evolutionary analyses provide the means to identify divergent adaptive traits as the genes encoding them are expected to show signs of adaptive evolution, i.e. an excess of nonsynonymous (dn) over synonymous (ds) substitutions as the result of positive selection. We performed a genome-wide natural selection analysis in the two radiating lineages to detect genes (and therefore traits) with divergent evolution. To this end, we analyzed all orthologous genes from the available genomes of the radiating lineage 3 (Bc, Br, BAR15, and B1-1C) and lineage 4 (Bh, Bg, Bq, and Bt) for signs of adaptive sequence evolution by inferring the natural selection of orthologs by estimation of v, the ratio of nonsynonymous (dn, amino acid change) to synonymous (ds, amino acid conservation) substitution rates (v = dn/ds). Generally, v,1, v = 1, v.1 represent purifying, neutral, and positive selection (adaptive evolution), respectively [42].
We first calculated ''gene-wide'' dn/ds for all orthologous genes of the two lineages (lineage 3: 1,097 genes, lineage 4: 1,091 genes). We excluded one gene from this analysis for each of the two lineages, because the GARD analysis detected statistically significant recombination breakpoints. As adaptive evolution is typically affecting only a few sites of a gene rather than the entire gene sequence [42], we first looked for genes exhibiting an elevated value of v$0.25 over the entire gene length. This analysis revealed 133 (12%) and 86 (8%) genes in lineage 3 and lineage 4, respectively, under relaxed purifying selection indicating signs of adaptive evolution ( Figure 2). To have an additional measurement for adaptive evolution, we subjected our genomic data sets to a maximum likelihood analysis for the detection of site-specific positive selection. To this end, we used the CodeML module implemented in the PAML package. CodeML compares the likelihoods of different evolutionary models for each analyzed gene alignment. When comparing model M2a (PositiveSelection) vs. model M1a (NearlyNeutral), we detected 62 or 34 genes for lineage 3 and 26 or 14 genes for lineage 4 harboring sites under positive selection with a p-value of ,0.05 or ,0.01, respectively. A large fraction of these genes (29 for lineage 3 and 12 for lineage 4) exhibited also gene-wide dn/ds values $0.25 indicating them as good candidates for encoding adaptive traits ( Table 2). Comprehensive lists of the genes identified to have dn/ds values $0.25 and/or exhibiting site-specific positive selection in the CodeML analysis are provided in Table S2 and Table S3 for lineage 3 and lineage 4, respectively. As non-synonymous mutations accumulate over time, the higher number of genes identified for lineage 3 could be a further indication for its more recent radiation, or alternatively, the effect of larger population sizes compared to lineage 4. Irrespectively, these findings render the dataset derived from lineage 3 to be more sensitive to the detection of adaptive sequence evolution ( Figure 2, Table 2).
Interestingly, there was a marked number of genes with v$0.25 over the entire gene length that overlapped in both lineages (Table 3). Among those were genes encoding autotransporters, hemin-binding proteins, and different components of the VirB T4SS. They all constitute important host colonization factors [43][44][45] and thus are likely to display adaptive traits of Bartonella. For many of these genes, also our analysis of site-specific natural selection detected positive selection (Table 3). Most remarkably, all analyzed Bartonella effector protein (bep) genes of the VirB T4SS were among the genes with v$0.25. Particularly in lineage 3, they showed strong signs of adaptive evolution by exhibiting v.0.4 over the entire gene length. Further, in eight out of nine analyzed bep genes of lineage 3, we detected site-specific positive selection (Table 3). Being exclusively found in the radiating lineages and showing strong signs of adaptive evolution, the VirB system and its effector proteins, thus, fulfill the criteria of an evolutionary key innovation likely contributing to the parallel adaptive radiations of Bartonella. Autotransporters and hemin-binding proteins could represent further adaptive traits important for radiations in Bartonella. They exhibited strong positive selection in our analysis and are known to be important factors for host colonization. Further, their conservation throughout the genus Bartonella indicates an important role for the life style and infection strategy of this pathogen (Table S2, Table S3). However, factors conserved in radiating and non-radiating lineages appear unlikely to represent specific key innovations, unless other factors, such as the absence of ecological opportunities or ecological separation, prevented certain lineages to radiate and to colonize more divergent niches.
Importantly, our analysis revealed some lineage-specific colonization factors to carry signs of adaptive evolution. Among others, surface-exposed pilus-components of the Trw T4SS exclusively (marked by crosses). This is explained by the fact that between Br and B1-1C, only non-synonymous mutations but no synonymous mutations have been detected in these three genes. As all three genes revealed v,1 in the other pair-wise comparisons, they were treated as outliers and excluded from further analysis. doi:10.1371/journal.pgen.1001296.g002  Table 3.  (Table S3). This is in agreement with previous studies and appears to reflect the adaptation of this putative adhesion factor to the erythrocytic surface of different host species [22,24]. As the Trw T4SS is only present in lineage 4, it might have specifically contributed to the radiation of this most species-rich clade of Bartonella. Factors known to be important for colonization and exclusively present in lineage 3 were not identified by our analysis. We cannot exclude that the selective pressure imposed by the immune-system might have contributed to the adaptive evolution detected in our genome-wide analysis. It was previously reported that the arms race between host and pathogen can drive the diversification of secretion system-and effector protein-encoding genes [46,47]. However, in case of the Trw T4SS, recently published in vitro infections with erythrocytes isolated from different mammals demonstrated that the Trw-dependent binding and invasion of Bartonella is host-specific [22]. Although experimental data is not yet available, our data suggest that the VirB T4SS and its effector proteins evolved by similar mechanisms. Together with the previous finding that the VirB T4SSs belong to the few colonization factors specific to the radiating lineages [16], this analysis reveals these horizontally acquired host interacting systems as potential key innovations facilitating adaptation to new hosts and therefore driving the radiations of Bartonella.

Independent evolutionary history of the VirB T4SS in the two parallel adaptive radiations of Bartonella
To further assess the role of the VirB T4SS for the independent adaptive radiations of Bartonella, we compared the chromosomal organization of the VirB and effector protein-encoding genes of the two lineages. Remarkably, our analysis uncovered independent evolutionary scenarios for the chromosomal incorporation of this horizontally acquired trait. In the genomes of lineage 4 (Bg, Bh, Bq, and Bt), the virB T4SS genes, virB2-virB11 and the coupling protein gene virD4, are encoded at the same chromosomal location (Figure 3, Figure S4). Also, the bep genes are encoded in this region. In contrast, the genome sequences of lineage 3 (Bc, Br, BAR15, and B1-1C) revealed marked differences in organization, copy number, and chromosomal localization of the genes encoding the VirB T4SS. In the completely assembled genome of Bc, we found three copies of the virB2-virB10 genes encoded at two different chromosomal locations (Figure 3, Figure S4). Two copies are encoded at the same locus and belong to inverted repeats of ,10kb. They are separated by several bep genes and the gene virD4. A third copy of the virB2-virB10 cluster including an additional bep gene is encoded in another genomic region highly conserved across different Bartonella lineages (Figure 3).
The same chromosomal integration and amplification of the virB T4SS genes was found in the other three genomes of lineage 3. However, in a common ancestor of Br and B1-1C, one of the three copies must have been partially deleted, as only virB2, virB3, and a remnant of the virB4 gene were found in the corresponding region of these two genomes (Figure 3, Figure S4). Interestingly, the different copies of virB2-virB10 are identical to each other within one species, but divergent across different species indicating the presence of an intra-chromosomal homogenization process. The fact that duplicated components of another T4SS, Trw, also evolved in concert, and the finding of several other identical genes or gene clusters in different Bartonella genomes [24] suggests that sequence homogenization is a common mechanism in Bartonella to conserve paralogous gene copies. The inverted organization of the two virB T4SS gene clusters seems to result from a duplication event subsequent to the integration of a first copy. Evidence comes from a remnant of the glutamine synthetase I gene (glnA) flanking the entire locus at its upstream end. The full-length copy of this vertically inherited housekeeping gene is located directly downstream of the integration site ( Figure 3, Figure S4).
In addition to the effector genes adjacently located to the virB genes (as in lineage 4), we found six additional loci encoding bep genes in lineage 3 ( Figure S4). These effector genes are not entirely conserved throughout lineage 3, and the existence of gene remnants provides evidence of their deterioration in certain species. Altogether, we identified 12 to 16 bep genes in lineage 3, whereas only five to seven bep genes are present in lineage 4.
Incomplete synteny in the corresponding regions may hinder comparison between the two different lineages, however, no gene remnants could be found at the different integration sites across the two lineages. We cannot fully exclude that massive genomic recombination events resulted in the different chromosomal locations and the lineage-specific dissemination of the virB and bep genes. Yet, such a scenario appears unlikely, as the overall genomic backbone is largely conserved (Figure 3) and the flanking regions of the virB T4SS integration sites do not encode verticallyinherited orthologs across the two lineages. Furthermore, the absence of mobile elements adjacent to the virB T4SS genes such as recombinases, transposases, or integrases is not supportive of an intra-chromosomal mobilization of this genomic locus. T4SS are   ancestrally related to conjugation machineries [48]. Thus, the virB genes might have been transferred from a conjugative plasmid into the chromosome by independent events after the divergence of lineage 3 and lineage 4. In Bg, a closely related T4SS, the Vbh, is encoded on a plasmid in addition to a chromosomally integrated copy [28]. This indicates that these horizontally acquired elements can be maintained on extra-chromosomal replicons within Bartonella from where they are integrated into the chromosome. Similarly, pathogenic Escherichia coli strains from different phylogenetic clades were shown to have evolved in parallel by the independent incorporation of virulence traits from mobile genetic elements [49].
Bartonella effector proteins of the two radiating lineages evolved by independent gene amplification and diversification processes As the chromosomal organization implicates different evolutionary histories of the VirB T4SS in the two radiating lineages, we investigated the relation among the effector proteins translocated by this secretion system. It was previously shown that the bep genes have evolved from a single ancestor by duplication, diversification, and reshuffling of domains resulting in modular gene architectures [17]. The C-terminal BID (Bartonella intracellular delivery) domain is shared by all Beps as it constitutes the secretion signal for the transport via the VirB T4SS. In their N-terminal part, Beps either harbor a FIC (filamentation-induced by cAMP) domain or repeats of additional BID domains ( Figure S5). We assessed the evolutionary relationship among the bep genes by inferring phylogenetic trees on the basis of either the BID or the FIC domain, or the entire gene sequence. This revealed that the bep genes of lineage 3 and lineage 4 form two separate clades (Figure 4, Figure S6). Apparently, consecutive rounds of lineage-specific duplications of an ancestral effector gene resulted in the parallel emergence of two distinct arsenals of bep genes. These duplication events preceded the adaptive radiation in both lineages as phylogenetic clusters of effector genes (Bep clades in Figure 4) comprise positional orthologs present in all or a subset of the analyzed genomes of the corresponding lineage ( Figure S4). Gene duplications frequently display the primary adaptive response after the acquisition of beneficial factors, because they occur at much higher frequency than other adaptive mutations [50]. This might have been the initial selective pressure for the independent amplification processes. However, in both lineages, the duplicated bep genes subsequently diversified by accumulating mutations as indicated by different branch lengths separating bep genes in Figure 4.
To analyze the sequence evolution during the parallel amplification and diversification processes, we used a branch test for positive selection. Since positive selection is not continuously acting during evolution, this analysis allows the detection of episodic adaptive evolution on single phylogenetic branches. We detected positive selection on many of the internal branches suggesting that subsequent to their duplication different Bep clades have undergone adaptive sequence evolution in both lineages (Figure 4, Figure S6). For Bartonella, experimental studies showed that effector proteins exhibit distinct phenotypic properties on host cells indicating that the evolutionary diversification of the duplicated effectors was substantially driven by the acquisition of novel functions [18][19][20][21]. Not all branches exhibit dn/ds values .1, though, suggesting episodic changes in the selection pressure acting on different effector gene copies. For example, functional redundancy of paralogous effector copies could have resulted in neutral drift, whereas conservation of an advantageous function might have led to purifying selection on certain branches. The basis for the functional versatility seems to lie in the adaptability of the domains encoded by bep genes. In case of the FIC domain, recently published work showed that this domain mediates a new post-translational modification by transferring an AMP moiety onto a target protein [25]. Proteins 'AMPylated' by FIC domains belong to the family of GTPases. The diversity of these targets and their numerous functions in cellular processes might allow the diversified Beps to target and subvert a variety of host cell functions by target-specific 'AMPylation'. The high degree of conservation of the FIC domain in different kingdoms of life provides further evidence for the remarkable versatility of this domain [51]. Interestingly, also the BID domain, constituting part of the translocation signal of the effector proteins, seems to be capable of adopting various functions in the host cell [17]. In case of BepA, it was shown that the BID domain is sufficient to mediate the anti-apoptotic property of this effector protein [18]. BID domains of other Beps with the same domain constitution as BepA do not exhibit this phenotype indicating specific adaptive modulation of this domain for BepA. This functional adaptability might also explain why certain effector genes carry more than one BID domain.
At last, tandem-repeated tyrosine-phosphorylation motifs found in a subset of effector proteins confer another multifaceted molecular mechanism to modulate cellular processes. Phosphorylated effector proteins are thought to recruit cellular binding partners resulting in the formation of signaling scaffolds that interfere with specific host cell signaling pathways [21]. For several effector proteins of Bh (lineage 4), tyrosine phosphorylation by host cells has been reported and the targeted host interaction partners studied [17,21]. Beside Bartonella, a number of other pathogens, as E. coli (EPEC), Helicobacter pylori, or Chlamydia trachomatis are using tyrosine-phosphorylation of effector proteins to modulate their hosts in very distinct ways demonstrating the versatility of this type of host subversion [21]. In Bartonella, the tyrosine-phosphorylated effector proteins seem to display an important functionality of the VirB-mediated host modulation as we found effector proteins of this type in both radiating lineages (see below).

Parallel evolution of Bartonella effector proteins harboring tandem-repeated tyrosine-phosphorylation motifs
Our analyses suggest that the domain structure of the ancestral effector gene consisted of an N-terminal FIC and a C-terminal BID domain (FIC-BID). In both lineages, the FIC-BID domain structure displays the most abundant effector protein type. In lineage 3, only effector genes of Bep clade 9 consist of domain architectures different than FIC-BID ( Figure S5). The gene tree in Figure 4 shows that bep genes with the shortest evolutionary distance across the two lineages are the ones harboring the FIC-BID structure (BepA clade and Bep clade 1). bep genes with different domain architecture constitute more distantly related clades across the two lineages indicating that they derived by independent recombination from the ancestral domain structure. Furthermore, the distantly related Vbh T4SS of Bg and Bs encodes an effector protein consisting of the FIC-BID domain structure [28].
As mentioned above, it was shown for lineage 4 that some of the derived bep genes become phosphorylated by host cell kinases at conserved tandem-repeated tyrosine-phosphorylation motifs leading to the interference with specific host cell pathways [21]. Strikingly, we found that bep genes with derived domain architecture in lineage 3 also harbour regions with tandemrepeated tyrosine motifs ( Figure 5, Figure S5). In silico predictions of tyrosine-phosphorylation sites with three different programs  [52][53][54] consistently revealed a high number of potentially phosphorylated motifs within these repetitive regions ( Figure 5, Table S4). We ectopically expressed these effector proteins in HEK293T cells and showed that they are indeed phosphorylated within eukaryotic cells by tyrosine kinases implicating their functional importance for host interaction ( Figure 5). Interestingly, the motifs found in lineage 3 are clearly different from the ones present in lineage 4 and are also less conserved as depicted by their consensus sequences in Figure 5. This suggests that the motifs in lineage 3 may generally be under weaker purifying selection than in lineage 4, because they target either less conserved or even different pathways in their hosts. Further, the lower degree of conservation and the higher number of motifs per effector found in lineage 3 could also indicate that these proteins and particularly their motifs are under positive selection and evolved more recently than their equivalents in lineage 4.
Together with the fact that tandem-repeated phosphorylation motifs are only found in bep genes with derived domain architecture, our findings, thus, suggest parallel evolution of this class of effector proteins within the two radiating lineages. Whether similar pathways are targeted by these effectors in the two lineages remains unknown. Yet, the striking parallelism in the molecular evolution of this class of effector proteins indicates their central role in the VirB T4SS mediated host modulation by Bartonella.

Conclusions
Emerging infectious diseases are frequently caused by zoonotic pathogens which are incidentally transmitted to humans from their reservoir niche (e.g. other animal hosts). Therefore, the understanding of the mechanisms driving diversification of host-adapted bacteria in nature is of relevance for human health. In the present study, we explored the adaptive diversification of host-restricted bartonellae. Our genome-wide phylogeny revealed that two sister clades of this a-proteobacterial pathogen have evolved by parallel adaptive radiations (lineage 3 and lineage 4 in Figure 1). Both lineages comprise species adapted to same or similar reservoir hosts including zoonotic (e.g. Bh and Br) or human specific (Bq) pathogens. The more recent diversification of lineage 3 including the recently recognized incidental human pathogen Br [29] underlines the importance to study the molecular basis of such lineage diversifications.
In line with the 'ecological' parallelism of their radiations, our comparative genomic analyses between lineage 3 and lineage 4 uncover striking evolutionary parallelisms at the molecular level of a likely key innovation -the VirB T4SS -essentially involved in the infection of the mammalian hosts. Chromosomal fixation of this horizontally transferred trait occurred by independent evolutionary events. In both lineages, the arsenal of effector proteins translocated via the VirB T4SS was shaped independently by gene duplications and positive selection of diversified gene copies. This amplification process mostly occurred before the onset of the radiations. Strikingly, beside the diversification of effector proteins encoding the evolutionary conserved 'AMPylase' domain (FIC), both lineages have convergently evolved a novel effector class with derived domain structure and tandem-repeated tyrosinephosphorylation motifs. By these evolutionary processes, large reservoirs of distinct biological functions were invented from a single ancestral effector gene. This functional versatility provides the framework for the adaptive potential of the VirB T4SS. Apparently, the plasticity of the underlying genomic loci seems to have favored the parallel occurrence of these adaptive processes in two distinct lineages, thereby essentially contributing to the parallel radiations of Bartonella.

Ethics statement
Animals were handled in strict accordance with good animal practice as defined by the relevant European (European standards of welfare for animals in research), national (Information and guidelines for animal experiments and alternative methods, Federal Veterinary Office of Switzerland) and/or local animal welfare bodies. Animal work was approved by the Veterinary Office of the Canton Basel City on June 2003 (licence no. 1741).

Genome sequencing, assembly, and annotation
Using the QIAGEN Genomic DNA Isolation kit (Qiagen), DNA was isolated from bacteria grown from single colonies. For 454-sequencing, the DNA was prepared with an appropriate kit supplied by Roche Applied Science and sequenced on a Roche GS-FLX [56]. To assemble the reads, Newbler standard running parameters with ace file output were used. Newbler assemblies were considerably improved by linking overlapping contigs on the basis of the ''_to'' and ''_from'' information appended to the read name in the ace files. For the assemblies of Bc, BAR15, B1-1C, Br, and Bs, we obtained a 454-sequence coverage of 35x, 37x, 39x, 39x, and 29x, respectively (for details on 454-sequencing see Table  S5). Repeats were identified by analyzing the coverage of each Newbler contig. If the link between two contigs was ambiguous, PCR and long-range PCR were used to confirm contig joins. For the complete assembly of the Bc genome, a library of 35 kb inserts was generated using the CopyControl Fosmid Kit (Epicentre). By end-sequencing of library clones with Sanger technology, 983 high-quality reads were obtained and mapped onto the 454sequencing-based assembly. Remaining sequence gaps were closed by PCR. The final singular contig was fully covered by staggered fosmid clones indicating a correct assembly of the circular chromosome of Bc. Gene predictions of the genome of Bc and the draft genomes of Bs, Br, BAR15, and B1-1 were performed using AMIGene software [57]. Automated functional gene annotation was conducted with the genome annotation system MaGe [58]. For orthologous genes, the annotation was adopted from the manually annotated genome of Bt [16]. Manual validation of the annotation was performed for the virB and bep genes. By using the ''FusionFission'' tool of MaGe [58] fragmented genes were identified and the corresponding sequences subsequently examined for 454-sequencing errors. After correcting these errors, the updated sequences were re-annotated as described above. The sequence data of the genome of Bc and the contigs of the draft genomes of Br, BAR15, B1-1C, and Bs is stored on the web-based interface MaGe (Bartonella2Scope, https://www.genoscope.cns.fr/agc/mage/bartonella2Scope) and has been deposited in the EMBL Nucleotide Sequence Database under accession numbers FN645454-FN645524.

Phylogenetic analyses
Phylogenetic trees were based on nucleotide sequence data. Alignments were generated on protein sequences with ClustalW [59] and back-translated into aligned DNA sequences using MEGA4 [60]. Tree topologies were calculated with maximum likelihood and Bayesian inference methods as implemented in the programs PAUP* [61] and MrBayes [62], respectively. The genome-wide phylogeny of Bartonella was calculated on the basis of 478 orthologous genes of the ten sequenced Bartonella genomes and the genome of Brucella abortus (bv. 1 str. 9-941). Orthologs were determined by using the ''PhyloProfile Synteny'' tool of MaGe [58] with a threshold of 60% protein identity over at least 80% of the length of proteins being directional best hits of each other. The alignments of the 478 identified genes were concatenated resulting in a total of 515,751 aligned nucleotide sites. Tree topology and branch lengths were obtained by maximum likelihood analysis using the HKY85 model. Bootstrap support values were calculated for 100 replicates. For Bayesian inference, the program MrBayes [62] was run for one million iterations with standard parameters (two runs with four heated Monte-Carlo Markov chains in parallel; number of substitutions = 6; burnin = 25%). For the Bartonella ingroup, single gene trees were calculated with maximum likelihood and tree topology congruency assessed with PAUP*. 471 of the 478 single gene trees revealed the same monophyletic clustering of the eight taxa into lineage 3 and lineage 4 as the genome-wide phylogeny. Further, we performed a recombination analysis for each of the 478 single gene alignments using the GARD algorithm as implemented in the HYPHY package [35]. The GARD analysis was run with the GTR model using a general discrete distribution with three rate classes. To identify statistical significant recombination breakpoints in our alignments, we used the Kishino-Hasegawa test as implemented in the GARDProcess.bf algorithm of the HYPHY package. To include nonsequenced Bartonella species in the genome-wide phylogeny, we used available sequence data from the gltA, groEL, ribC, and rpoB genes (7731 aligned sites). Trees were obtained as described above. MrBayes [62] was run for five million iterations. Branch lengths for tip branches of non-sequenced taxa are calculated on the basis of the four housekeeping genes. Branch lengths for tip branches of sequenced taxa and internal branches separating sequenced and non-sequenced taxa are based on the genomic data set. The maximum likelihood tree only based on the gltA, groEL, ribC, and rpoB genes was inferred as described for the genome-wide phylogeny. Bep gene trees were inferred from nucleotide alignments of either the most C-terminal BID domain including the C-terminus (948 sites), the FIC domain including the Nterminal extension (1,305 sites), or the entire bep sequence of genes harboring FIC domains (3,972 sites). To select an appropriate substitution model, the Akaike information criterion of Modeltest 3.7 [63] and MrModeltest 2.0 [64] was used for the maximum likelihood and Bayesian inference analysis, respectively. For the alignments based on the BID domain or the entire bep gene sequence, we obtained the GTR+G+I model with both programs. For the alignments based on the FIC domain, the TVM+I+G model (Modeltest 3.7) and GTR+G+I model (MrModeltest 2.0) were selected. Trees were inferred with the parameters provided by these models as described above. MrBayes [62] was run for one million iterations. The Neighbor-joining phylogeny of different Bartonella isolates in Figure S2 was inferred from a 242 nt segment of the gltA gene with the program MEGA4 [60]. Bootstrap values were calculated for 1,000 replicates.

Natural selection analyses
Based on the four available genomes, orthologous genes for each of the two lineages 3 and 4 were determined by using the ''PhyloProfile Synteny'' tool of MaGe [58]. The threshold was set to 30% protein identity over at least 60% of the length of proteins being directional best hits of each other. The same tool was used to detect genes without orthologs. By comparing these automatically identified orthologs and non-orthologs, genes present in neither of the two lists were detected and manually assigned to one of the two lists. Alignments were generated and a GARD recombination analysis conducted as described above. To obtain the average dn/ ds value (v) of each ortholog, the arithmetic mean of pair-wise dn/ ds values (calculated by the method of Yang and Nielsen implemented in PAML 4.1 [65]) was used. Site tests of positive selection were performed with PAML 4.1 using the CodeML module [65]. To detect positive selection model M1a (Nearly-Neutral) vs. model M2a (PositiveSelection) and model M7 (beta) vs. model M8 (beta+v) were analyzed. PAUP* [61] was used to infer maximum likelihood trees for each set of orthologs. For the CodeML control file, standard parameters were used. The relative significance of model M2a (PositiveSelection) vs. model M1a (NearlyNeutral) and model M8 (beta+v) vs. model M7 (beta) was assessed using likelihood-ratio-tests (two degrees of freedom). Genes for which significant positive selection was detected were inspected for alignment errors potentially affecting the results of this analysis. If necessary, the alignments were manually modified and the CodeML analysis repeated. Phylogenetic branches were tested for positive selection by using the TestBranchDNDS.bf module implemented as standard analysis tool in HyPhy [66].

Rat infections
Ten weeks old female WISTAR rats obtained from RCC-Füllinsdorf were housed in an BSL2-animal facility for two weeks prior to infection allowing acclimatization. For inoculation, bacterial strains were grown as described above, harvested in phosphate-buffered saline (PBS), and diluted to OD 595 = 1. Rats were anesthetized with a 2-3% Isuflurane/O 2 mixture and infected with 10 ml of the bacterial suspension in the dermis of the right ear. Blood samples were taken at the tail vein and immediately mixed with PBS containing 3.8% sodium-citrate to avoid coagulation. After freezing to 270uC and subsequent thawing, undiluted and diluted blood samples were plated on tryptic soy agar and heart infusion agar containing 5% defibrinated sheep-blood. CFUs were counted after 8-12 days of growth.

Nucleotide distances
Nucleotide distances were calculated with the program MEGA4 [60] for the alignments based on the genome-wide dataset and the four housekeeping genes. The numbers of base substitutions per site from averaging over all sequence pairs within and between groups were calculated. Codon positions included were 1 st , 2 nd , and 3 rd . All positions containing gaps and missing data were eliminated from the dataset (Complete deletion option).

Cloning of plasmids for expression of HA-GFP-Bep fusion proteins
To construct the plasmids pPE2002 and pPE2004, bep genes BARCL_1034 (Bc) and BARRO_80017 (Br) were amplified from genomic DNA with primer pairs containing flanking BamHI/NotI sites: prPE453 (ATAAGAATGCGGCCGCGATGAAAAC-CC-ATAACACTCCTG)/prPE454 (CGGG-ATCCTTAATGTGT-TATAACCATCGTTC) and prPE455 (ATAAGAATGCGGC-CGCG-ATGAATTTTGGAGAAAAGAAAAAAATG)/prPE456 (CGGGATCCTTAAATAGC-TACAGCTAACGATTTTTTC), respectively. PCR products were digested with the enzymes BamHI and NotI and ligated into the BamHI/NotI sites of the backbone of plasmid pAP013 (kindly provided by Arto Pulliainen). The resulting constructs pPE2001 (BARCL_1034) and pPE2003 (BARRO_80017) were cut with NotI and ligated with a GFP fragment obtained from NotI digested pAP013. The plasmid pPE2007 was constructed by cutting bepE of B. henselae from plasmid pRO1100 (kindly provided by Rusudan Okujava) with NotI and BamHI and ligating it into pAP013. All plasmid DNA isolations and PCR purifications were performed with Macherey-Nagel and Promega columns according to manufacturer's instructions.

Immunoprecipitation and Western blot analysis of HA-GFP-Bep fusion proteins
The protocol for growth and transfection of HEK293T was performed as described previously [18]. 36 h after transfection, cells were incubated for 10 minutes with 10 ml Pervanadate medium (5 ml PBS containing 100 mM orthovanadate and 200 mM H 2 O 2 , incubated for 10 min with 500 ml Catalase [2 mg/ml in PBS] before 45 ml M199 medium were added). After washing three times with 7 ml of PBS at room temperature, cells were scraped off and resuspended in 1 ml of ice-cold PBS containing 1 mM EDTA, 0.5 mM phenylmethylsulfonyl fluoride (PMSF), 1 mM orthovanadate, 1 mM leupeptin, and 1 mM pepstatin and collected by centrifugation (3,000g at 4uC for 60 sec). The resulting pellet was lysed in 300 ml of ice cold modified RIPA buffer (50 mM Tris-HCl [pH 7.4], 75 mM NaCl, 1 mM EDTA, 1 mM orthovanadate, 1 mM leupeptin, 1 mM pepstatin) for 1 hour at 4uC. The lysate was centrifuged (16,000g at 4uC for 15 min) and 12 ml of anti-HA-agarose (Sigma) added to the supernatant. After 150 min of incubation at 4uC on a slowly turning rotation shaker, the agarose was washed three times with 300 ml of modified RIPA buffer (3,000g for 10 sec). The affinitygel pellet was then resuspended in 20 ml of modified RIPA buffer, 20 ml of SDS-sample buffer (26) were added, and the sample was heated for 5 min at 95uC. Proteins were separated on a 10% SDSpolyacrylamide gel, blotted on a nitrocellulose membrane (Hybond-C Extra, Amersham Pharmacia), and examined for tyrosine phosphorylation by using monoclonal antibody 4G10 (Millipore) and anti-mouse IgG-horseradish peroxidase (HRP) afterwards. The HRP-conjugated antibody was visualized by enhanced chemiluminescence (PerkinElmer). For visualization of the signal from GFP-fusion proteins, the membrane was subsequently incubated in 4% PBS-Tween containing 0.02% NaN 3 and anti-GFP antibody (Invitrogen), followed by incubation with anti-mouse IgG-HRP and visualization by enhanced chemiluminescence. Figure S1 Phylogenetic trees inferred from (A) the nucleotide alignment of 478 genes of ten sequenced Bartonella species and (B) the nucleotide alignment of the four housekeeping genes gltA, rpoB, ribC, and groEL for sequenced and non-sequenced Bartonella species. Brucella abortus (bv. 1 str. 9-941) was used as outgroup. The tree topology and branch lengths were obtained by maximum likelihood analysis (as implemented in PAUP* [61]). Numbers above the branches represent maximum likelihood bootstrap values (100 replicates); numbers below the branches represent posterior probabilities obtained with MrBayes (one million iterations) [62]. Values $80% are shown. The two monophyletic clades harboring the VirB T4SS are marked by the shaded area. The four different lineages of Bartonella are indicated by red, orange, blue, and green color (l1, lineage 1, l2, lineage 2). Found at: doi:10.1371/journal.pgen.1001296.s001 (0.49 MB PDF) Figure S2 Genotype-host correlation of Bartonella isolates. Neighbor-joining tree of 220 Bartonella isolates based on a 242 nt sequence of the gltA gene. Bootstrap values $50% are shown (1000 replicates). Clusters of strains isolated from the same mammalian family, subfamily, or genus are indicated by coloring. For each strain the name, accession number, and host (the strain was isolated from) are given. The typing strains and their described reservoir hosts are depicted by arrows. For Bartonella alsatica and Bq, only one and two isolates, respectively, are included in the tree, as sequence data for additional isolates is not available for the analyzed fragment of the gltA gene. However, for both species, a number of different isolates originating from their corresponding reservoir hosts (Bq, human and B. alsatica, rabbit) were described [68,69]. For B1-1C and BAR15, only one strain was described so far [30,31]. All strains included in the analysis have been isolated and cultured from the bloodstream of their corresponding host [55,. Exceptions display some typing strains which were isolated from incidentally infected humans. Strains sampled from other tissue than blood or only detected by PCR were not included in the analysis. Found at: doi:10.1371/journal.pgen.1001296.s002 (2.62 MB PDF) Figure S3 Experimental infections of rats with Bartonella species of lineage 3. Groups of rats (n = 5) were inoculated intradermally with one of the four different species Bc, Br, BAR15, or B1-1C. At the time points indicated in the graphs, blood was drawn, diluted, and plated for counting of colony-forming units (CFUs). Plotted graphs represent single animals from which bacteria could be recovered at a given time point post-infection. In all five animals infected with the rat-specific strain B1-1C, bacteria were detected between day 8 and 28 post-infection (green graphs). In only three of five animals infected with the red squirrel-specific strain BAR15, bacteria could be detected (red graphs). In these three animals, bacteria came up much later and did not reach the same blood titer as B1-1C. From none of the animals infected with either Br or Bc bacteria could be recovered (threshold of detection #10 2 CFUs/ml). Although Br is closer related to B1-1C than BAR15 (Figure 1), the adaptation to Canoidae hosts seems to have abolished the potential to colonize rodents. In contrast, the closer relationship between the two rodent host species, rat and red squirrel, might explain the limited potential of BAR15 to infect rats. Additionally, a recent study showed that Br is reliably infecting dogs, its natural reservoir host, but neither cats nor guinea pigs [39]. These independent studies confirm the host specificity of Bartonella species.  Figure 4 (Bep clades). BARCL_0629 and BARCL_0635 are excluded from groups, because their phylogenetic positions are not well supported in the inferred gene trees (Figure 4 and Figure S6). Tandem-repeated tyrosine-phosphorylation motifs depicted for Bep clade 9, BepD, BepE, BepF, and BepH clade (green bars) were identified with the program NetPhos2.0 [52] using a threshold of 0.8. For other bep genes, no tandem-repeated motifs were identified. bep genes from different species of lineage 3 are only indicated by their locus_tags (BARCL_, Bc, BARRO_, Br, B11C_, B1-1C, BAR15_, BAR15).    [53], and KinasePhos [54]. Except for BARCL_1035, tandem-repeated motifs were detected in all genes of Bep clade 9. Found at: doi:10.1371/journal.pgen.1001296.s010 (0.14 MB PDF)