Terminal Reassortment Drives the Quantum Evolution of Type III Effectors in Bacterial Pathogens

Many bacterial pathogens employ a type III secretion system to deliver type III secreted effectors (T3SEs) into host cells, where they interact directly with host substrates to modulate defense pathways and promote disease. This interaction creates intense selective pressures on these secreted effectors, necessitating rapid evolution to overcome host surveillance systems and defenses. Using computational and evolutionary approaches, we have identified numerous mosaic and truncated T3SEs among animal and plant pathogens. We propose that these secreted virulence genes have evolved through a shuffling process we have called “terminal reassortment.” In terminal reassortment, existing T3SE termini are mobilized within the genome, creating random genetic fusions that result in chimeric genes. Up to 32% of T3SE families in species with relatively large and well-characterized T3SE repertoires show evidence of terminal reassortment, as compared to only 7% of non-T3SE families. Terminal reassortment may permit the near instantaneous evolution of new T3SEs and appears responsible for major modifications to effector activity and function. Because this process plays a more significant role in the evolution of T3SEs than non-effectors, it provides insight into the evolutionary origins of T3SEs and may also help explain the rapid emergence of new infectious agents.


Introduction
The type III secretion system (T3SS) and the proteins that traverse it are essential components of the virulence arsenal of many destructive bacterial pathogens. Pathogens utilize the T3SS to inject type III secreted effectors (T3SEs) into the host cell cytosol where they function to promote disease by facilitating cell attachment and entry, suppressing the host defense response, and modulating vesicular traffic, the host cytoskeleton, and hormones [1][2][3]. Consequently, T3SEs play a prominent role in bacterial pathogenesis and hostassociation [4] .
T3SEs are modular proteins, with the signals required to direct secretion from the bacterial cell and translocation into the host cell generally localized to the N terminus of the protein, and the functional domains typically localized to the central and C-terminal portions [5,6]. This modularity has been exploited in genetic screens designed to identify new T3SEs [7][8][9] and for creating reporters to monitor T3SSdependent secretion [6,10]. Another feature common to the vast majority of T3SEs is that they are co-regulated with the assembly of the T3SS. This is achieved through T3SS-specific transcription factors that bind regulatory motifs found immediately upstream of nearly every T3SE and T3SS structural gene [11]. Motifs include the Pseudomonas hrp box [12], the Salmonella ssrAB box [13], and the Xanthomonas pip box [14]. An important functional consequence of the respective locations of these two T3SE features is that the regulatory motifs required for transcriptional activation of T3SEs are tightly linked to the signals required for secretion and translocation.
Despite several commonalities, T3SEs are evolutionarily diverse and highly variable in their distribution, both within and among species [9]. Their intimate interactions with host factors expose them to very strong selective pressures [15,16] resulting in their rapid evolutionary turnover [17,18]. Given their high degree of genetic diversity, it is perhaps not surprising that a number of studies have identified T3SEs that are truncated or chimeras of other sequences [10,[19][20][21][22]; nevertheless, no study has recognized this particular form of variation as being significant from the perspective of either evolution or function. Here, we report on an evolutionary process that plays an extremely important role in the evolution of new T3SEs. Unlike previous studies that have reported the introduction of genetic variation through homologous intragenic recombination [23][24][25][26][27][28], the evolution of T3SEs is strongly influenced by a non-homologous recombinational process that is analogous to exon shuffling seen in eukaryotes. This process not only explains the high frequency of truncated and chimeric T3SEs, but also provides insight into how the particular structure of these genes contributes to dramatic evolutionary and functional changes that may play a key role in the ongoing arms race between pathogen and host.

Results/Discussion
The complete nucleotide and protein sequences of all experimentally confirmed T3SEs were collected from the 23 species having at least one characterized T3SE (Table S1). Using a combination of BLASTP, TBLASTX, and pair-wise BLAST (BL2SEQ), we identified two common and interrelated features among T3SEs from all species. First, the N or C terminus of many T3SEs are homologous (E value ,10 À5 ) to other loci or open reading frames. We will refer to these homologs as orphan effector termini (ORPHETs). Second, many T3SEs are chimeras of either two known T3SEs, or a T3SE and another gene.
ORPHETs and chimeric T3SEs were found in nine species, representing plant pathogens, animal pathogens, and mutualists. The largest number of chimeric T3SEs and ORPHETs were identified in the species with the largest T3SE repertoires, which included Pseudomonas syringae (12 of 56 T3SE families, 21%), Salmonella enterica (5/28, 18%), and Xanthomonas campestris (7/22, 32%) ( Table 1). The high frequency of ORPHETs and chimeric effectors has led us to propose that the evolution of new T3SEs frequently occurs by the reassortment of termini from preexisting T3SEs. This stochastic process, which we will refer to as ''terminal reassortment,'' involves the fusion of an existing T3SE N terminus to another T3SE, or an unrelated coding or non-coding sequence ( Figure 1). Alternatively, terminal reassortment may occur via a large deletion that brings the N terminus of a T3SE into contact with a region downstream of the effector (Figure 1). Terminal reassortment does not describe or rely on a specific recombinational mechanism, but rather describes an evolutionary process that results in the rapid formation of new T3SEs via a single quantum evolutionary step in which the regulatory and secretion/translocation signals are coupled to new sequences.
P. syringae, which has approximately 190 T3SE homologs and derivatives distributed among 56 T3SE families [10], provides the most interesting examples of ORPHETs and ORPHETderived T3SEs. For example, one group of related T3SEs in P. syringae contains two ORPHETs, HopW1-2 PmaES4326 and HopW1-2 Pph1448A , and two larger T3SEs, HopW1-1 PmaES4326 and HopAE1 PsyB728a , all of which are homologous along their first 85 amino acids (Figure 2A). The two ORPHETs are clearly related to the N terminus of the larger T3SEs, while the two larger T3SEs are chimeras themselves, with HopW1-1 PmaES4326 sharing C-terminal homology to a prophagerelated sequence from Escherichia coli O157:H7 EDL933, and HopAE1 PsyB728a [29] sharing C-terminal homology to several putative Xanthomonas virulence genes. These data suggest that all four of these loci derived from a common N-terminal ORPHET. Indeed, hopW1-1 PmaES4326 is situated near its ORPHET on plasmid pPMA4326B and is flanked by numerous IS elements [30]. All of these related T3SEs except hopW1-1 PmaES4326 also have a highly conserved ribosome-binding site and hrp box. Surprisingly, the upstream region of hopW1-1 PmaES4326 is closely related to the upstream region of a completely unrelated P. syringae pv. tomato DC3000 T3SE, hopD1 (64% nucleotide identity over 300 bp) ( Figure 2A). This exchange of promoters is functionally significant since changes to the hrp box sequence and its relative distance from the start codon has been shown to alter T3SE expression and bacterial virulence [31].
An analysis of the flanking regions of ORPHETs indicates that they are not merely misannotations due to frameshift mutations in ''full-length'' T3SEs, nor are they incomplete sequence submissions. ORPHETs are homologous to their respective T3SEs and exhibit both conservation in either their 59 or 39 flanking regions (Table S2), and variation at the end of their protein-coding region, which is consistent with random mobilization and insertion, or large-scale deletion downstream of the T3SE secretion signal. Furthermore, because a single ORPHET can constitute the N terminus of multiple effectors, as with the HopW1/HopAE1 ( Figure 2A) and HopAB3 effector families (Table 1), ORPHETs likely serve as precursors for the formation of new T3SEs.
Xanthomonas provides an excellent example of chimeric T3SEs. T3SEs XopJ [32], AvrXccE1, and hypothetical protein XAC3230 [33] share a common N terminus, but show absolutely no similarity throughout their C terminus ( Figure  2B). The 100-bp region immediately upstream of all three effectors shares 70% nucleotide identity and contains a highly conserved ribosome-binding site and pip promoter box.
Seven S. enterica effectors, srfH/sseI, sseJ, sspH1, sspH2, slrP, sifA, and sifB share a common N terminus [21,34,35] and provide some of the most complex and interesting examples of terminal reassortment. This group comprises four distinct homology groups, with group I containing the full-length homologs SspH1, SspH2, and SlrP; group II containing SseJ; group III containing the relatively divergent full-length homologs SifA and SifB; and group IV containing only SrfH/SseI ( Figure 3A). There is no sequence similarity among the C termini of those T3SEs from the different groups. One example of terminal reassortment within this group involves SspH2 and SrfH, which share 88% amino acid identity in

Synopsis
Many pathogenic bacteria rely on specialized virulence proteins to cause disease. These proteins, known as type III secreted effectors (T3SEs), are directly injected into the host's cells and facilitate the disease process by interacting with host proteins and interfering with the defense response. Although most T3SEs lack any sequence similarity, several T3SEs share a common terminus, suggesting that part of these proteins was derived from the same sequence. The authors propose an evolutionary mechanism, called ''terminal reassortment,'' in which the termini of T3SEs reassort with other genetic information to create new chimeric proteins. This study shows that this process has given rise to T3SEs with new virulence functions and that it may influence bacterial host specificity. Chimeric T3SEs are present in eight different genera and in some cases are present in as many as 32% of known T3SE families. This is significantly more than what is found in other protein families, suggesting that terminal reassortment plays a disproportionately important role in the evolution of T3SE. Terminal reassortment may lead to the very rapid evolution of new T3SEs, thereby contributing to the emergence of new infectious diseases.
their N terminus and 98% identity in the 257 bp immediately upstream of their coding region that includes the ssrAB promoter box ( Figure 3B). A second, yet more complex example, was identified within the full-length group I homologs, using the phylogenetic-based method of bootscanning on a nucleotide alignment of sspH1, sspH2, slrP, and the group IV effector sseI/srfH. Bootscanning is a general method for identifying recombination breakpoints based on incongruence among genealogies constructed from subregions within a locus. The bootscan revealed that SspH1 is a recombinant protein composed of an SlrP N terminus and an SspH2 C terminus, with a clear breakpoint near base 400 ( Figure 3C). Interestingly, SlrP and SspH1, which share the most similar N terminus, are translocated by both S. enterica T3SS secretion systems SPI-1 and SPI-2, while all others are translocated by only SPI-2 [36]. This suggests the presence of a common chaperone binding domain or motif within the N terminus of these two functionally distinct proteins, in addition to the already identified WEK(I/M)XXFF motif [35], which permits translocation by both SPI-1 and SPI-2.
Terminal reassortment has been demonstrated experimentally in genetic screens designed to identify T3SS-specific substrates [7][8][9]. In these screens, an effector C terminus (reporter) is linked to a transposon, which integrates randomly in the bacterial genome, occasionally creating a functional fusion with an appropriate T3SS regulatory and secretion signal. Using these and related techniques, secretion and translocation into the host has been experimentally demonstrated for many ORPHETs, including HopAT1 [42] and HopS1 [43] from P. syringae, and OspD1 [44] and OspG     The ORPHET/Chimera columns indicate the name, origin, size, and linkage to mobile elements for the observed N-or C-terminal ORPHET or chimeric T3SE. Homolog columns indicate the same information for the full-length T3SE related to the specified ORHPET or chimera. Similarity columns indicate the degree of similarity between the specific ORPHET or chimera and the related T3SE. Further information on each ORHPET and chimera is presented in  [45] from Shigella flexneri. As with these contrived screens, the natural evolutionary process of terminal reassortment likely involves mobile genetic elements such as insertion sequences, integrative conjugative elements, phage, and plasmids. In-deed, over 50% of ORPHETs or chimeric T3SEs are associated with mobile elements (Table 1). Many effectors are plasmid-borne, placing them on a genome that is typically in genetic flux and susceptible to frequent rearrangements. Prophage are also implicated in terminal reassortment, illustrated by C-terminal homology of effectors EspFu and HopW1-1 ES4326 to prophage-related genes. Phage may serve as mixing vessels for T3SEs, or provide the novel genetic variation and homologous sequence for recombination. Both phage and plasmids, as well as transposable elements, ultimately provide multiple copies of ORPHETs or T3SE precursors. Consequently, terminal reassortment can occur without loss of the original effector or ORPHET. An analysis of the relationship between the number of ORPHETs and chimeras, and the number of T3SE homology families identified for each species ( Figure 4A) reveals a very strong linear relationship (R 2 ¼ 0.91, p , 0.0001, Figure 4B), reinforcing the significance of terminal reassortment on the evolution of T3SEs. Approximately one out of every 4.2 T3SE families (weighted mean ¼ 24% of T3SE families) carries an ORPHET or chimeric T3SE. This is significant because it indicates that terminal reassortment is strongly influencing the evolution of T3SEs. Additional support for the importance of terminal reassortment in the evolution of T3SE is apparent when the frequency of ORPHETs and chimeras in T3SE homology families is compared to the frequency of truncated and chimeric loci among families of non-T3SE proteins. To reduce bias introduced by incomplete T3SE inventories, we conducted this analysis on P. syringae, X. campestris, and S. enterica, which have the largest, bestcharacterized effector complements. While, on average, 24% of T3SE families have at least one ORPHET or chimera, only 7% of non-T3SE protein families were found to carry a truncated or chimeric locus among the 2,943 protein families of P. syringae (204 gene families, p ¼ 3 3 10 À5 , two-tailed chisquared test), the 2,760 non-T3SE protein families of S. enterica (198 gene families, p ¼ 0.03), and the 2,688 non-T3SE protein families of X. campestris (195 genes, p ¼ 1 3 10 À5 ). Transporters (18%) and regulators (10%) are among the most highly represented functional groups among the 204 non-T3SE truncated or chimeric loci in P. syringae. Interestingly, proteins that are expected to be under strong selective pressures (e.g., outer membrane proteins, alginate biosynthetic genes, type IV pilus subunits) only represent 4% of these chimeras.
T3SEs are assumed to be predominantly associated with the flexible genome-the part of the genetic compliment that varies among strains within a species. Given this, it is possible that terminal reassortment simply occurs more frequently in the flexible genome, and therefore, our observations are a consequence of the genomic context of T3SEs, rather than something inherent to T3SEs themselves. To address this, we determined that 28.3% of all P. syringae coding sequences are absent from one or two of the three sequenced P. syringae genomes (Nahal and Guttman, unpublished data) and are therefore putatively components of the flexible genome. In comparison, only 9.3% of the 204 non-T3SE truncated and chimeric loci were absent from one or two of the three sequenced P. syringae strains. One would expect a larger fraction of the non-T3SE truncated and chimeric loci to be in the flexible genome if terminal reassortment were simply a byproduct of those processes acting on the flexible genome. Each homology group has been encapsulated in a shade of blue and contains proteins that share fulllength homology. The interconnecting lines indicate the degree of Nterminal relatedness, with dashed lines indicating ,40% similarity, thin, solid lines 40%-60% similarity, and thick, solid lines .60% similarity. (B) S. enterica effectors sspH2 and srfH share a highly conserved upstream region, ssrAB promoter box, and N-terminal region, while the C terminus of these proteins is non-homologous. (C) Bootscan of the homologous S. enterica effectors sspH1, sspH2, slrP, and N-terminal homolog sseI/srfH to detect recombination, with the nucleotide sequence of sspH1 as the query sequence. sseI/srfH had ,35% permuted trees and was omitted from the figure. Bootscanning was performed with Simplot v 3.5.1 using neighbor joining with the F84 model, gapstrip off, and window and step sizes of 200 and 20, respectively. Other members of this effector family were excluded to facilitate optimal nucleotide alignment. DOI: 10.1371/journal.ppat.0020104.g003 Furthermore, 69.7% of the 204 non-T3SE truncated or chimeric loci have homologs in Pseudomonas putida, Pseudomonas aeruginosa, and Pseudomonas fluorescens, further supporting the contention that these loci are not components of the flexible genome. Clearly, the processes that are responsible for the evolutionary dynamics of the flexible genome are not strictly responsible for the formation of chimeras and ORPHETs.
We believe that the high frequency of ORPHETs and chimeras among T3SE is due to three key features that are collectively unique to T3SEs. First and foremost, T3SEs are modular [5,6], with all of the signals required for expression, secretion, and translocation located in close proximity. Second, T3SEs are under intense selective pressures due to their central roles in pathogenesis and host adaptation [16]. Third, T3SEs are commonly associated with mobile elements [40,41], which can act as catalysts for the reassortment of linked loci.
Horizontal gene transfer also likely plays an important role at all levels of the terminal reassortment process. Horizontal gene transfer may be responsible for the supply of precursor ORPHETs and T3SEs and the introduction of novel genetic variation that will eventually serve as the target. Horizontal gene transfer may even lead to the intra-or inter-specific dissemination of the new chimera following terminal reassortment. The association of ORPHETs and T3SEs with multi-copy mobile elements provides a substantial pool of these elements, and as a result, terminal reassortment does not necessarily lead to loss of the original T3SE or ORPHET precursor.
Terminal reassortment may fundamentally affect bacterial virulence and host specificity. For example, the P. syringae T3SEs HopD1 and HopAO1 are N-terminal chimeras that have different biological functions due to their unique C termini. Unlike HopD1, the C terminus of HopAO1 has tyrosine phosphatase activity, which is responsible for the suppression of the innate immune response of Nicotiana benthamiana [22,37]. Since only one of these two related T3SEs has tyrosine phosphatase activity, terminal reassortment has in this case given rise to T3SEs that may act in a host-specific manner. A similar situation is found with the Xanthomonas chimeras XopJ and AvrXccE1. XopJ and other members of the YopJ family have SUMO isopeptidase activity governed by a catalytic triad of histidine, glutamic acid, and cysteine residues [3,38]. This catalytic triad is present in XopJ, but absent from its N-terminal homolog AvrXccE1, making it very likely that these two effectors have different biological functions. Another example comes from the chimeric T3SEs SspH2, SspH1, and SlrP, which have been shown to differentially modulate Salmonella host specificity. SspH1 and SspH2 contribute to Salmonella virulence in calves, while SlrP plays a similar role during infection of mice [35,39]. Since these chimeric T3SEs have differential effects in different hosts, they can be considered host-specificity factors.
Because ORPHETs are often mobilized with their upstream regulatory and binding motifs, virtually any coding or noncoding gene segment can be instantly recruited by an Nterminal secretion domain to serve as part of a new T3SE with potentially new virulence function, effectively making a novel virulence protein in a single quantum step. The random shuffling of any one ORPHET or T3SE has only a small probability of giving rise to a functional protein that is of use to the bacterium. Nevertheless, over evolutionary time, terminal reassortment will produce a wide spectrum of novel proteins, some of which will confer a selective advantage to their bacterial host. These proteins may persist and even become fixed in the population. This evolutionary process, a bacterial version of Goldschmidt's hopeful monster, has significant implications for pathogenicity and virulence as it enables the near instantaneous evolution of novel genes or altered expression patterns that can permit the bacterium to evade host defenses or exploit new eukaryotic targets. It will be interesting to determine if this remarkably simple and efficient evolutionary process is responsible for some of the dramatic host-specificity shifts that underlie the emergence of new infectious agents.

Materials and Methods
Bioinformatic identification of chimeric effectors. Nucleotide and protein sequences of all known and experimentally confirmed T3SEs were collected from NCBI following extensive literature and database searches. Genera analyzed include Aeromonas, Bartonella, Bordetella, Bradyrhizobium, Brucella, Burkholderia, Citrobacter, Chromobacterium, Chlamydia, Escherichia, Edwardsiella, Haemophilus, Mesorhizobium, Pantoea, Photorhabdus, Pseudomonas, Ralstonia, Rhizobium, Salmonella, Shewanella, Shigella, Sinorhizobium, Sodalis, Vibrio, Xanthomonas, and Yersinia. The NCBI nr database was searched with BLASTN, BLASTP, and TBLASTX to identify candidate chimeras and ORPHETs, with subsequent verification achieved with BL2SEQ and DIALIGN2. A protein was considered an ORPHET if its full-length sequence had .45% similarity to the N or C terminus of a different protein.
Similarly, two proteins were considered chimeras if they shared .45% similarity in either terminus. In both cases, similarity was required to begin within 30 amino acids of their start position for Nterminal homologs and 30 amino acids of their stop position for Cterminal homologs. Sequences flanking effectors were extracted from NCBI manually and compared using BL2SEQ and ClustalX1.83 [46]. Upstream promoter sequences and ribosome-binding sites were identified and extracted manually from available sequence data, except for hopAB3-2 PmaES4326 , whose upstream sequence was obtained by inverse PCR (see below). Hrp box promoter sequences for Pseudomonas syringae will be deposited at the PPI site: http://www. pseudomonas-syringae.org.
Phylogenetic analyses and bootscanning. Phylogenetic trees were constructed using alignments generated with ClustalX1.83 using MEGA3 [47]. The JTT amino acid substitution model was used with pair-wise deletion and 1,000 bootstrap replicates. Bootscanning was performed with Simplot v 3.5.1 [48] using neighbor joining with the F84 (maximum likelihood) model, gapstrip off, and window and step sizes of 200 and 20, respectively. The inclusion of the group IV Salmonella effectors was required for this analysis as bootscanning requires a minimum of four sequences. Because the other effector groups were highly divergent, they were excluded from the analysis to facilitate optimal nucleotide alignment. A significant recombination event is usually indicated by a percentage of permuted trees over 70%.
Inverse PCR and DNA sequencing. Genomic DNA of P. syringae pv. maculicola ES4326 was extracted using the Puregene DNA isolation kit (Gentra Systems, Minneapolis, Minnesota, United States). Approximately 8 lg of genomic DNA was completely digested with BsaAI (New England Biolabs Incorporated, Beverly, Massachusetts, United States) and self-ligated at 16 8C overnight using T4 DNA ligase. The upstream region of hopAB3-2 PmaES4326 was obtained by inverse PCR with 20 ng of self-ligated DNA as template, using the primers 59GGCTCTGTTGATACTACCCACCATG 39 and 59 GTGCCGCTACCGCCGTGCC 39 at an annealing temperature of 57 8C. The DNA sequence of the single 420-bp PCR product was obtained with the same primers using a CEQ8000 sequencer (Beckman Coulter Incorporated, Fullerton, California, United States).