Comparative Genomic and Phenotypic Characterization of Pathogenic and Non-Pathogenic Strains of Xanthomonas arboricola Reveals Insights into the Infection Process of Bacterial Spot Disease of Stone Fruits

Xanthomonas arboricola pv. pruni is the causal agent of bacterial spot disease of stone fruits, a quarantinable pathogen in several areas worldwide, including the European Union. In order to develop efficient control methods for this disease, it is necessary to improve the understanding of the key determinants associated with host restriction, colonization and the development of pathogenesis. After an initial characterization, by multilocus sequence analysis, of 15 strains of X. arboricola isolated from Prunus, one strain did not group into the pathovar pruni or into other pathovars of this species and therefore it was identified and defined as a X. arboricola pv. pruni look-a-like. This non-pathogenic strain and two typical strains of X. arboricola pv. pruni were selected for a whole genome and phenotype comparative analysis in features associated with the pathogenesis process in Xanthomonas. Comparative analysis among these bacterial strains isolated from Prunus spp. and the inclusion of 15 publicly available genome sequences from other pathogenic and non-pathogenic strains of X. arboricola revealed variations in the phenotype associated with variations in the profiles of TonB-dependent transporters, sensors of the two-component regulatory system, methyl accepting chemotaxis proteins, components of the flagella and the type IV pilus, as well as in the repertoire of cell-wall degrading enzymes and the components of the type III secretion system and related effectors. These variations provide a global overview of those mechanisms that could be associated with the development of bacterial spot disease. Additionally, it pointed out some features that might influence the host specificity and the variable virulence observed in X. arboricola.


Introduction
Xanthomonas arboricola [1] is a species of Gram negative, rod-shaped bacteria exclusively associated with plants. Most of the strains of this species cause diseases on several herbaceous and woody plants of agricultural interest. Beside these, some other strains have been identified as non-pathogenic, saprophytic or opportunistic pathogens. Based on the host specialization of the pathogenic strains, nine pathovars have been recently proposed [2].
X. arboricola pv. pruni, causal agent of bacterial spot disease on at least 13 species of the genus Prunus, is considered one of the most economically important pathovars within X. arboricola. This pathogen, which is classified as a quarantinable organism in the European Union, causes damages on leaves, fruits, twigs, branches and trunks of the trees [3]. Damages in 25-75% of the peach fruits in orchards in the USA have been reported [4].
Previous molecular typing analysis of a selection of strains isolated from Prunus has demonstrated the low diversity on X. arboricola pv. pruni, which forms a monophyletic group comprised of a unique clonal complex regardless of the host, the continent or the year of isolation, corresponding to a pandemic lineage that stays as a monomorphic group during evolution [2,5].
Next generation sequencing approaches have provided valuable information related to the genomics of the genus Xanthomonas, allowing understanding of the role of several pathogenic factors as well as the key determinants of bacterial adaptation and host restriction [6]. Since 2012, genome sequencing projects have started for X. arboricola, generating at least one complete genome and 15 draft genome sequences, comprising meaningful information for understanding pathogenesis in this species and for the improvement of diagnostic tools addressed to disease prevention. Comparative genomic studies in X. arboricola are attracting increased interest in this species and have started to bring important results. For instance, in a study on X. arboricola pv. juglandis, causal agent of bacterial blight on walnut (Juglans spp.), a group of non-pathogenic strains isolated from J. regia were identified and classified as phylogenetically distant from the pathogenic strains. Complete genome analysis of these atypical strains revealed differences from pathogenic strains in features related to the initial steps of bacterial infection, such as those connected to structural components of the flagellar system, non-fimbrial adhesins and chemosensors. Furthermore, the profile of type III effectors was correlated with the capacity to produce disease on walnut [7,8].
Previous studies of plant-pathogen interaction in other models such as X. citri [9,10], X. campestris [11][12][13] and X. oryzae [14,15], have demonstrated that the study of phenotypic and genotypic features associated with bacterial sensing and attachment, chemotaxis, motility, xanthan production, biofilm organization, the metabolism of carbon sources, as well as secretion of virulence factors, were basic for improving the knowledge of the bacterial ability to penetrate the plant tissue and to cause disease in a specific host range of plants.
The general genome features of two X. arboricola pv. pruni pathogenic strains, isolated from almond (Prunus amygdalus, syn. P. dulcis) and Japanese plum (Prunus salicina), as well as one of a non-pathogenic strain isolated from Santa Lucía SL-64 rootstock (Prunus mahaleb) were described in a previous work [16,17]. Herein, our goal was to associate the genomic content of these xanthomonads with the phenotypic features of X. arboricola pv. pruni that causes bacterial spot of stone fruits. Differences in key aspects associated with bacterial sensing, motility, attachment and secretion of cell-wall degrading enzymes as well as type III effectors were determined in pathogenic and non-pathogenic strains of X. arboricola.

Molecular characterization using multilocus sequence analysis (MLSA), genome based phylogeny and genome comparison
As an initial approach to characterize a group of strains phenotypically similar to Xanthomonas isolated from Prunus in Spain, one Xanthomonas strain isolated from asymptomatic leaves of P. mahaleb (strain CITA 44), and 14 Xanthomonas strains isolated from Prunus spp. with symptoms of bacterial spot disease, were typed at four loci (dnaK, fyuA, gyrB and rpoD) in order to determine their taxonomic position [18] (Table 1).
Analysis of the concatenated sequence (2,858 nucleotide positions) revealed a percent of similarity from 96.86 to 100% within the group encompassed by six X. arboricola reference strains and the 15 strains isolated from Prunus spp. Despite this, strain CITA 44, according to the maximum likelihood analysis, could not be consistently associated with any of the reference strains (Fig 1). The remaining 14 strains isolated from symptomatic hosts were clustered into a group with all the reference strains of X. arboricola pv. pruni. Comparative analysis between these two intra-specific groups revealed a total of 43 nucleotide variations in the concatenated sequence of CITA 44 (11 variable nucleotide sites for dnaK, nine for fyuA, 16 for gyrB and seven for rpoD genes), or a sequence similarity of 98.49% between this strain and all the other X. arboricola isolated from Prunus. These nucleotide changes could be associated with silent mutations, due to the fact that the translated amino acid sequence of each locus showed a 100% of sequence coverage and 99-100% of sequence identity to several strains of X. arboricola according to the Blastp results. Phylogenetic analysis of the 18 X. arboricola strains, based on the core genome sequence (S1 and S2 Tables), showed that the three strains that cause disease on stone fruits and almond (CITA 33, IVIA 2626.1 and MAFF 301420) clustered according to the pathovar classification, as observed previously using the multilocus sequence analysis (MLSA) approach. In the same manner, strain CITA 44 did not cluster with the other xanthomonads isolated from Prunus nor to any of the well-established pathovars described for X. arboricola. Instead, CITA 44 was included in a clade along with the strain 3004 of X. arboricola, recently described as the causal agent of disease on barley (Hordeum vulgare) [19] (Fig 2).
After automatic annotation of 18 genome sequences publicly available of X. arboricola [7,16,17,[19][20][21][22][23], 3,927 protein coding sequences (CDS) were predicted for CITA 44, whereas in the draft genome sequence of the other X. arboricola strains that did not represent a well- Maximum likelihood tree of concatenated nucleotide sequences for partial sequences of the genes dnaK, fyuA, gyrB and rpoD of selected strains of X. arboricola isolated from Prunus. Target strains are in bold. For comparative purposes other X. arboricola strains are included. X. citri subsp. citri strain ICMP 24 was considered as outgroup. Bootstrap values (1,000 replicates) are indicated below and above the branches.
established pathovar (strains 3004, CFBP 7634 and CFBP 7651), a total of 5,221 different CDS were determined. For the strains of the pathovars celebensis, juglandis and pruni, 4,485; 5,700 and 5,048 CDS were predicted, respectively (S1 Table). The core genome sequence of the analyzed strains was comprised by at least 2,525 CDS (S2 Table). CITA 44 shared 3,387 CDS with the three strains of the pathovar pruni ( Fig 3A, S1 Fig) and 3,720 CDS with all the strains of X. arboricola (Fig 3B). This strain shared the highest number of CDS with strain 3004 of X. arboricola (3,485 CDS), not classified in any pathovar, whereas the lowest number of CDS was shared with the strain MAFF 301420 of the pathovar pruni (3,181 CDS). Unique CDS for strains CITA 44, CITA 33 and IVIA 2626.1 were predicted. CITA 44 showed 206 exclusive CDS, 55 of which were classified into 18 clusters of orthologous groups (COG) functional categories, being predominantly those related to cell motility and carbohydrate transport and metabolism. Signal peptide cleavage sites and transmembrane helices were predicted only in 19 and 33 CDS, respectively. In the case of the X. arboricola pv. pruni strains, 149 unique CDS were found, 54 of them were classified into 20 COG functional categories, with a predominant presence of CDS related to cell wall/membrane biogenesis and amino acids transport and metabolism. Additionally, 17 CDS showed signal peptide cleavage sites and transmembrane helices were predicted in 33 CDS (Fig 3C, S3 Table).

Carbon sources utilization and chemotaxis profile
The profile of carbon sources metabolized by CITA 44 was determined using the BIOLOG GN2 microplate system and compared to that obtained from the strains of the pathovars corylina, juglandis, populi and pruni (Table 1). Seventeen carbon sources compounds over 95 were utilized by CITA 44, meanwhile 45 carbon sources were not utilized and 33 showed variable reactions and, consequently, were considered as not informative. On the other hand, the profile observed in 20 strains isolated from Prunus and identified as X. arboricola pv. pruni, as well as the patterns observed in the remaining strains of X. arboricola, were different compared to CITA 44, as represented in the dendrogram obtained from the similarity analysis (Fig 4). All the strains of the pathovar pruni, unlike CITA 44, were able to metabolize dextrin and proline and unable to metabolize D-saccharic acid.
In addition to the phenotypic variants observed in the features mentioned above, genome comparison also revealed variants in the profile of the environmental receptors studied within this species. X. arboricola strains harbored 14 of the 28 TonB-dependent transporters (TBDTs) analyzed [25] (S4 Table), and from these, only two homologs to the TBDTs encoded by the loci XAC3620 (outer membrane receptor FepA) and XCC3595 (ferric pseudobactin receptor), described in X. citri and X. campestris, respectively, were shared by all the X. arboricola strains. All strains isolated from Prunus spp., both pathogenic and non-pathogenic, harbored eight  TBDTs. The presence of homologous sequences to the TonB-dependent receptor loci XCC1719 and XAC3077, and the absence of XCC0304 and XCC2867 in the strains of X. arboricola pv. pruni, differentiated them from the non-pathogenic strain CITA 44. The distribution of the TBDTs revealed that none of these variations was unique for those strains that inhabit the Prunus hosts. Nevertheless, the TBDT cirA (XAC3077) was found only in those strains that caused disease on stone fruits and Turkish hazel (Corylus colurna).
Regarding the distribution of sensors of two-component regulatory system (STCRS) [25], 58 orthologous CDS of 86 genes previously described in Xanthomonas were found in X. arboricola. All the analyzed strains shared 44 STCRS (S4 Table), and CITA 33, CITA 44 and IVIA 2626.1 have 53 STCRS in common. Homologous CDS to a diguanylate cyclase of X. citri (XAC2804) was only present in the pathogenic strains, meanwhile homologues of four STCRS (XAC1819, XAC2804, XAC0136 and XAC1345) were only found in CITA 44. Additionally, as compared to other X. arboricola strains, CITA 33 and IVIA 2626.1 presented two unique STCRS homologous to the loci XAC0136 and XAC1345 of X. citri, respectively (Fig 5).
Out of 26 methyl-accepting chemotaxis proteins (MCPs) previously described [25], 22 were found in the genome sequences of X. arboricola (S4 Table). The number of MCPs varied from 16 in strain CFBP 7634 to 22 in strains 3004, CFBP 7651 and NCPPB 1630; all the X. arboricola strains shared a MCPs pattern composed by 13 genes (S4 Table). Prunus-associated strains shared 20 MCPs, pathogenic strains differed from CITA 44 in the existence of a CDS orthologous to the MCP XCV1933 and the absence of the MCP XCV1938, both described in X. campestris pv. vesicatoria. No MCPs could be detected as specific for a particular host plant, however, orthologous CDS for three MCPs described in X. citri and X. campestris pv. vesicatoria (XAC3768, XCV1952 and XCV1938) were absent only in the strain that causes disease on Turkish hazel (S4 Table).
X. arboricola genome sequences harbored 15 to 17 genes homologous to the chemotactic related che genes [25] (S4 Table). Pathogenic strains of Prunus had 17 homologous CDS for these genes, and CITA 44 had 16 CDS. CITA 44 did not show homologous CDS to the locus XAC2447 (cheW) described in X. citri, which was present in all the remaining genome sequences of X. arboricola (Fig 5).
Flagella, fimbrial and non-fimbrial adhesins associated with motility and attachment in X. arboricola Swarming motility was evaluated in all the 21 X. arboricola strains isolated from Prunus (Table 3). Two kinds of swarmer colony phenotype were observed. Dendritic pattern,  [26], using the genome sequence of X. arboricola pv. juglandis Xaj 417 as the reference. The circular map was constructed using CGview. From outside to center: CDS on forward strand, CDS on reverse strand, GC content and GC skew. encompassed by several tendrils that extended away from a central colony, was observed in CITA 44 as well as in other 13 strains of the pathovar pruni after 12-15 hours post inoculation (hpi). Despite this, CITA 44 showed some unique characteristic such as the presence of shorter and wider tendrils in the swarming colony ( Fig 6A). The second colony morphology was characterized by a circular shape shown by seven strains of X. arboricola pv. pruni ( Fig 6A). Light and electron microscopy observations discarded the existence of hyper-flagellated bacterial cells in the swarming colony for all strains, including CITA 44 (S2 Fig).
Surfactant activity was evaluated for the 21 strains mentioned above according to the atomized oil assay [27]. A bright halo, associated with a change in the surface tension, was observed around the colonies of all the strains tested. Surfactant production, estimated as the ratio between the radius of the bright halo and the area of the colony, was variable among strains (Table 3). CITA 44 showed a mean ratio of 3.69 ± 0.54 (mean ± standard deviation), meanwhile strains of the pathovar pruni showed a mean ratio of 3.94 ± 2.62. Strains IVIA 3162 and IVIA 3847-1 showed mean ratios of 0.52 ± 0.10 and 11.98 ± 0.89, being the lowest and the highest activity identified, respectively ( Table 3).
The swimming ability of these strains was also assayed on 0.3% agar MMA plates. After inoculation, all the strains showed a circular and turbid growth around the inoculating point ( Fig 6B). This activity, recorded as the radius of the bacterial colony, was variable among the strains (Table 3). CITA 44 presented the highest activity with an average of the radius of the α Mean ± SD of the area (cm 2 ) covered by the swarmer colony on PYM 0.5% agar plates after 24 hpi * Dendritic swarming phenotypes shown by the analyzed strains.°C ircular swarming phenotypes shown by the analyzed strains. β Mean ± SD of the ratio between the radius of the bright halo and the area of the colony observed 24 hpi after atomizing mineral oil over the bacterial colony.
halo of 2.42 ± 0.10, meanwhile the mean for pathogenic strains was 0.65 ± 0.46, with a maximum value of 1.25 ± 0.05 for CFBP 5724 and a minimum value of 0.30 ± 0.05 for IVIA 2626.1.
Twitching motility, which is carried out by type IV pilus [28], was observed in all the strains on the plastic surface of the culture plate, after crystal violet staining ( Fig 6C). Microscopic observation at the edge of the colony revealed a twitching zone composed by bacterial rafts moving away from the colony, creating a concentric pattern preceded by a lattice like network and microcolonies.
Genome analysis performed to characterize the presence of the major structural components of the two motility and adherence structures described above [25,29], revealed that 33 (X. arboricola 3004 and X. arboricola pv. corylina) to 35 (X. arboricola pv. pruni and X. arboricola CFBP 7634 and CFBP 7651) flagellar components were found in X. arboricola. The genomes in these species shared 29 CDS (S4 Table). The strains isolated from Prunus presented 34 (CITA 44) and 35 (CITA 33 and IVIA 2626.1) structural components; CITA 44 did not show homologous sequence to fliD (XAC1974), which is present in all the analyzed strains ( Fig 5, S4 Table). Despite the similar pattern of flagellar components observed among the X. arboricola strains, variants in the amino acid sequence of the flagellin protein were observed between pathogenic strains (CFBP 7179, CITA 33, IVIA 2626.1, MAFF 301420 and NCCB 100457), which showed the identical protein WP_039814449.1, and the non-pathogenic or low-pathogenic strains (3004, CFBP 7634, CFBP 7651, NCPPB 1630 and CITA 44), which showed the identical protein WP_024939608.1. Both proteins showed 354 identical amino acids of a total of 399 (pairwise identity of 88.7%). A remarkable difference, associated with the potential to infect in other xanthomonads, is the substitution of aspartic acid by valine in the amino acid position 43 of the N-terminal region of the flagellin gene in the pathovars corylina, juglandis and pruni (S3 Fig).
Regarding the structural and regulatory components of the type IV pilus, 22 out of 31 CDS homologous to the components described in X. citri subsp. citri strain 306 [30] were found in the nine genome sequences of X. arboricola; 16 of these CDS were shared by all the strains (S4 Table). The other Prunus-associated strains presented the same components, including CITA 44. The remaining strains of X. arboricola showed a similar pattern, with the exception of NCPPB 1630, CFBP 7634 and CFBP 7651, which did not have homologous sequence to the Comparative Genome Analysis Reveals Virulence Features of X. arboricola pv. pruni locus XAC0259 of X. citri. No sequence associated with pilF was observed in strain CFBP 7634 and none homologues of pilQ was found in strain 3004 (S4 Table).
Regarding the presence of the genes associated with type IV pilus, that are present in most of the Xanthomonas species, the cluster pilB, C, D, R, S was found in all the strains, as well as the cluster that encodes the minor pilins, pilE, V, W, X, Y1 and fimT, that showed a sequence identity lower than 80.0%. Presence of pilE, pilV, pilY1 and fimT was variable among the strains, meanwhile pilW and pilX were found in all the genome sequences. Prepilin pilA was found in all the strains of X. arboricola, despite its identity to the locus XAC3505 of X. citri was lower than 80.0% (S4 Table).
In addition to the fimbrial adhesins described above, the repertoire of non-fimbrial adhesins [25] of X. arboricola comprised five genes (S4 Table). All the strains shared homologous CDS to panB (XAC1816), yapH (XAC2151) and xadA (XCV3670) of X. citri and X. campestris pv. vesicatoria. Prunus-pathogenic strains harbored an adhesin homologous to the locus XAC3672 of X. citri which was absent in CITA44 (Fig 5; S4 Table). Finally, the hemagglutinin encoded by the locus XAC444 of X. citri was found only in those strains that were able to colonize hosts of the genus Corylus, Junglans and Prunus.

Pathogenicity tests and genomic components associated with late stages of infection in Prunus-associated strains
Detached leaf assay was carried out by inoculating 21 X. arboricola strains, isolated from Prunus spp., on almond (cv. Ferraduel), apricot (cv. Canino), peach (cv. Calanda) and European plum (cv. Golden Japan). CITA 44 did not cause bacterial spot disease symptoms on almond, apricot, peach or plum 28 days post inoculation (dpi). Significant differences (p < 0.05) in the virulence among strains of the pathovar pruni were shown (Table 4). These strains were able to induce disease symptoms on almond, peach and European plum but none of them caused clear disease symptoms on apricot. Representatives of this assay are shown in Fig 7. The secretory pathway components, and plant cell wall-degrading enzymes, were also analyzed in X. arboricola genomes (S4 Table). Regarding the encoding elements for the two type II secretory systems (T2SS) described in Xanthomonas [31,32], all strains harbored the 11 genes of the xps cluster but, in the case of the xcs cluster, only CDS homologous to xcsD, xcsE, xcsF, xcsG and xcsJ were found. The remaining seven components of this secretory system were automatically annotated, but amino acid sequence analysis showed for all of them an identity lower than 80% when compared with the xcs cluster described in X. campestris pv. campestris [32].
Variability in the profile of type II secreted virulence factors was found in X. arboricola (S4 Table). A total of 11 pectolytic enzymes [13,33] were found in this species and only the polygalacturonase, encoded by the locus XCC3459, and the rhamnogalacturonan acetylesterase, encoded by XCC0154 in X. campestris, were shared by all the strains. With respect to the strains isolated from Prunus, they shared, in addition, one pectate lyase (XCC2815), one polygalacturonase (XCC2266) and one rhamnogalacturonase (XAC3505). CITA 44 had homologous sequences to a pectate lyase (XCC0112), a pectin methylesterase (XCC0121) and a pectinmethylesterase (XCC2265) that were absent in strains CITA 33 and IVIA 2626.1, meanwhile these two pathogenic strains had a homologous sequence to the degenerated pectate lyase of X. citri (XAC2373) which was absent in CITA 44 (Fig 5).
Variation in the profile of cellulolytic enzymes [13] was also found in X. arboricola, 14 of these degrading molecules were present in this species but only nine of them were shared by all the strains. None of the enzymes were unique in those strains isolated from Prunus spp. Pathogenic strains differed from CITA 44 in the presence of a homologous CDS to a beta-glucosidase (XCC1775), and in the absence of the cellulases encoded by the loci XAC3516 and XCC2387, which were present only in CITA 44 ( Fig 5, S4 Table).
Moreover, 11 CDS homologous to hemicellulolytic enzymes, previously reported for Xanthomonas [13], were found in X. arboricola and eight of these were shared by all the strains (S4 Table). CITA 33 and IVIA 2626.1 presented a xylosidase which was absent in CITA 44 ( Fig  5, S4 Table). Those strains of X. arboricola which belongs to the pathovars corylina, juglandis and pruni harbored a xylosidase/arabinosidase enzyme which was absent in all the non-pathogenic strains of X. arboricola and in those strains with a lower pathogenic ability, such as the one described from the pathovar celebensis. Finally, the lipase virulence factor, LipA [34], was found in all the genomes analyzed. The gum gene cluster, associated with the biosynthesis of xanthan in Xanthomonas [35], was analyzed. None of the strains presented homologous sequences to gumG, and CITA 44 and 3004 did not show homologous sequences to the gumF. (S4 Table).

Discussion
MLSA has been shown to be useful to characterize strains of X. arboricola [2,5,18,45,46]. In this work, we utilized a MLSA scheme proposed by Young and collaborators [18], which is based in the analysis of partial sequences of the genes dnaK, fyuA, gyrB and rpoD. Here, we have concluded that for 14 X. arboricola strains isolated from Prunus, the selected MLSA scheme was a good approach to characterize and discriminate the members of the pathovar pruni from atypical or commensal strains of X. arboricola.
Classification of CITA 44 as an atypical strain of X. arboricola was corroborated by the phylogenetic analysis conducted with 2,525 genes identified as the components of the core genome of this species. This atypical strain isolated from Prunus did not present the same phylogenetic origin as the pathogenic strains classified as pathovar pruni, but it was more similar to strain 3004 of X. arboricola, which did not cluster within any of the well-established pathovars of this taxon. Even though CITA 44 was closely related to 3004, this strain did not produce symptoms  on inoculated barley in assays performed in our group. This result is similar to a previous work on X. arboricola strains isolated from walnut that were more similar to strains isolated from Musa sp. than to those of the pathovar juglandis isolated from walnut [7].
Moreover, the study of these atypical strains, found on barley, walnut and here in P. mahaleb, open the discussion about the origin and evolution of pathogenicity in this species as it has been proposed previously for the genus Xanthomonas [43]. Based in the current data, it is not possible to determine if these non-pathogenic strains were predecessors of the pathogenic groups or the result of the loss of their pathogenicity. Further exploratory studies regarding some other Prunus-associated strains with other phenotypic and genotypic variants are needed to elucidate the evolutionary process of pathogenicity in this bacterial species.
In addition, this study contributes to our understanding of the diversity of X. arboricola associated with stone fruit trees and almond. Moreover, it also provides information to develop tools to identify and discriminate pathogenic and non-pathogenic Xanthomonas strains found in Prunus spp. This precise characterization of atypical strains of X. arboricola will avoid bacterial identification mistakes, as occurred sometimes in other bacterial models which implied unnecessary control measures and resulted in high economic losses [47,48].
The phylogenetic variation among strains of X. arboricola isolated from Prunus concurred with the existence of differences in several phenotypic features including pathogenicity. Additionally, genomic information generated from xanthomonads isolated from Prunus [16,17] and from other X. arboricola [7,[19][20][21][22][23], has been used to reveal those features which could be involved in the disease process.
Carbon source utilization profiles, as expected [1,2,5], showed high homogeneity among strains of the pathovar pruni. Nevertheless, the profile of carbon sources utilization shown by CITA 44 was different to the one presented by the other strains as well as the one provided in the description of X. arboricola species [1]. This disparity from the original metabolic description of the species has been also described for some strains of other Xanthomonas species, such as X. vesicatoria [49] and X. campestris pv. campestris [50]. The discordant profile of CITA 44 could be due to the fact that the original description of the species X. arboricola was based on strains of seven pathogenic pathovars which presented a high intra-pathovar homogeneity [1,2] and did not consider a wider ecological diversity including the atypical non-pathogenic strains currently of great research interest.
Initial stages of bacterial adaptation and host colonization encompass a series of sensors and receptors that detect stimuli, providing to the cells the information of their biotic and abiotic environment which triggers a series of processes such as the cell motility, chemotaxis, quorum sensing, biofilm formation and many other cellular events [15,51]. At this stage, a group of protein complexes denominated TBDTs, STCRS and MCPs play a crucial role [25].
The repertoire of TBDTs in X. arboricola, which are bacterial outer membrane proteins associated with the transport of different substrates including carbohydrates [52], was extensive compared to other species of this genus [25,33], resulting similar to the one observed in X. campestris and other epiphytic xanthomonads [25]. Additionally, this repertoire is in accordance with the one previously observed in two not publicly available genome sequences of X. arboricola pv. fragariae (strains LMG 19145 and LMG 19146) and one of X. arboricola pv. pruni (strain LMG 25862) [33]. This high number of TBDTs has been associated in other xanthomonads with the need for carbohydrates scavenging in variable conditions encountered in epiphytic niches [53].
Bacterial sensing and chemotaxis are essential components of the initial infection processes [54]. Regarding STCRS proteins, which are part of the dominant molecular mechanism by which unicellular organisms respond to environmental stimuli, the large repertoire observed in Xanthomonas arboricola was similar to that observed in X. campestris [25], and it was in accordance with the number of STCRS observed in most of the complete genome sequences of Xanthomonas, with the exception of those species which inhabit in restricted niches such as X. albilineans [15,51]. Evaluation of TBDTs and STCRS content revealed differences between CITA 44 and X. arboricola pv. pruni strains, despite of the general low variability observed among the different X. arboricola strains evaluated.
Beside this, CITA 44, compared to strains of the pathovar pruni, also showed a distinct chemotactic pattern, being influenced by just a few compounds. This kind of intraspecific chemotactic variability has been also observed in other xanthomonads models like X. citri subsp. citri with dissimilar pathogenic ability [24].
In order to explain these chemotactic behaviors, MCP content was evaluated. MCPs sense beneficial or toxic environmental compounds and transduce the signal to the cytoplasm by CheW protein, causing changes in the flagella direction and rotation speed, which finally guides bacterial cells to favorable environments [54,55]. Here, we have found that the core repertoire of the MCPs in nine genome sequences of X. arboricola included at least 13 chemoreceptors. Differences in the MCPs content were observed among the genome-sequenced strains and could be associated with their host range. Differences in MCPs content were found between CITA 44 and other xanthomonads from Prunus spp. Moreover, a variation in the cheW locus was also identified.
Although functional analyses are necessary to confirm the role of these sensors at initial stages of the infection process, our results point out that these processes and the genes involved may mark the diverse behavior of the different strains. Moreover, the very initial stages of the bacterial-host interaction described above could trigger other molecular routes which involve components such as the flagella and the type IV pilus, that are related not only to motility on liquid or solid environments, but also with attachment to the host or the development of biofilm structures [30,56,57]. Motility on solid and semisolid surfaces, which are controlled by flagella or type IV pilus, was also variable among the assayed strains of X. arboricola isolated from Prunus. This variability among different strains in swimming, swarming and twitching motility have been shown in other xanthomonads such as X. citri [58], as well as in X. arboricola strains isolated from walnut [7]. Strain CITA 44 showed the higher ability to swim that maybe connected to a more restricted niche to survive and higher requirements to locate it. Similarly, this enhanced ability to swim has been described in other bacterial models like X. citri subsp. citri, which less virulent strains were described to have a higher swimming ability [58]. In relation to the surface motility in X. arboricola, two different colony phenotypes were observed; the circular one matched with the one observed for other species such as X. oryzae and X. citri [59,60], described previously as independent of flagella and defined as sliding type motility instead of swarming [57]. Contrary to this, some other strains showed dendritic swarmer colonies, which had not been previously described in Xanthomonas but confirmed in other bacterial species such as Pseudomonas aeruginosa as a real swarming type motility [61]. In addition, these strains showed other swarming-related features such as the production of surfactants and a rapid outward migration [62]. Again, CITA 44 showed a different pattern to that of X. arboricola pv. pruni, presenting a colony with an intermediate phenotype between dendritic and circular and also an intermediate surfactant production.
Genome analysis of the flagellum components in CITA 44 revealed an interesting amino acid substitution in the N-terminal region of the flagellin that was also found in other nonpathogenic xanthomonads. Previously, Cesbron and collaborators [7] reported an amino acid polymorphism (Asp-43/Val-43) in the flagellin domain flg22 among pathogenic and non-pathogenic strains of X. arboricola isolated from walnut. Here we report that this variation is not only present in X. arboricola pv. juglandis strains but it is also found in the other two major pathogenic pathovars of this species, corylina and pruni. In X. campestris, strains with Val-43 in flg22 were not detected by the flagellin sensing 2 kinase (FLS2) of Arabidopsis, therefore they did not elicit the pathogen-associated molecular (PAMP)-triggered immunity and, additionally, these strains were more virulent than those with an Asp-43 residue in flg22 [63]. The genomic analysis also revealed the absence of fliD in CITA 44. In several bacterial species the absence of fliD was associated with non-flagellated and non-motile cells [64,65]. This is not the case of CITA 44, which showed a single polar flagellum like X. arboricola pv. pruni strains [17].
The type IV pilus is an important structure related to the movement across the surface, adhesion, microcolony formation, secretion of proteases and colonization factors, being a key pathogenesis factor [66]. In general, strains isolated from Prunus, despite their pathogenicity, harbor all the four subcomplexes that permit the biogenesis and function of this structure [30]; additionally these components were demonstrated as functional in the twitching motility assay.
As the type IV pilus, non-fimbrial adhesins are involved in bacterial attachment to host surfaces; in X. oryzae, they play an important role in pathogenesis and are particularly involved at the initial stage of leaf attachment and penetration into the host [67,68]. X. arboricola, shows various combinations of adhesins among the different strains studied. This variability could be associated to the different bacterial-plant compatible interactions of the different strains and hosts [69]. Once more, CITA 44 lacked one of the genes involved in adhesion synthesis which was present in X. arboricola pv. pruni. Further functional studies in this way could be valuable to define the definitive role of these adhesins in the pathogenicity and host specificity observed in the pathovars of X. arboricola.
After the very initial stages of pathogenesis described above, some virulence genes related to host colonization, multiplication and development of symptoms are expressed. T2SS permits the export of proteins from the bacterial cell and it is involved in the translocation of degradative enzymes which causes damage to the host cells and tissues [70]. Most species of Xanthomonas encode the components of two T2SS. One of them is Xps, which contributes to bacterial virulence by the secretion of xylanases and proteases in X. campestris and X. oryzae [32]. In X. arboricola, all the components of the xps gene cluster were conserved. As in all the Xanthomonas species, X. arboricola showed a core repertoire of cell wall degrading enzymes which is formed by at least 30 CDS. Beside this, a high variation in the repertoire of these enzymes was also found among the analyzed strains. These variations could be related to the different cell wall composition of the hosts or tissues and the requirements to produce symptoms during the infection [69]. Differences in the profile of degrading enzymes were identified among the Prunus-pathogenic strains and other strains of X. arboricola, including CITA 44.
Another gene cluster associated with pathogenesis in Xanthomonas is the xanthan producing gum gene cluster which is composed by 12 genes. Deletion of this group of genes inhibits the disease development in X. campestris on Arabidopsis and Nicotiana [71]. In X. arboricola, this cluster was conserved, with the exception of gumG that was absent in all the sequenced strains and the O-acetyltransferases encoded by gumF lacking in strains CITA 44 and 3004. Nevertheless, the absence of these two genes has been demonstrated to not affect the xanthan polymerization [72].
T3SS, as well as the T3Es, plays an essential role in pathogenicity once bacteria have penetrated the host tissue and could be related to host specificity in Xanthomonas [37]. As occurred in other Xanthomonas species, all the X. arboricola strains harbored orthologs CDS to the genes that encode HpaR2, HpaS, HrpG and HrpX, which are involved in the regulation of T3SS, T3Es as well as other pathogenic factors such as those encoded by pehA and pehD genes [43]. This was also previously reported in the strains isolated from walnut as well as for strains 3004, CITA 44 and IVIA 2626.1 [7,17,43].
It is important to note the localization in X. arboricola of an ortholog to the T3E, called xopAQ, in the plasmid pXap41, next to other two virulence associated proteins, xopE3 and mltB [44]. This effector has been previously found in the chromosome of X. citri and X. gardneri and a similar sequence has been found in Ralstonia solanacearum [40,73]. Nucleotide sequence analysis revealed a high pairwise nucleotide identity (99.7%) to the xopAQ sequence of X. citri subsp. citri A w 12879, as well as a putative plant-inducible promoter box (PIP-box) sequence 67 bps upstream the start codon, but this conserved sequence associated with the regulon HrpX showed a variant in one nucleotide (S4 Fig). The presence of at least three T3Es in pXap41 reinforces the hypothesis that this plasmid may contribute to the virulence of this pathovar as well as to the specialization of the pathovar pruni towards hosts of the Prunus genus [44]. CITA 44 did not show xopAQ or any of the T3Es found, which is consistent with the absence of pXap41 and its non-pathogenic character. Currently, functional studies on the effect of pXap41 and its related T3Es are being conducted in our group to confirm their role in the pathogenicity and the host specificity of the pathovar pruni strains.
In conclusion, this study of strains with diverse virulence range, and particularly strain CITA 44, has provided an opportunity to elucidate essential mechanisms in the host-bacteria interaction in Xanthomonas arboricola pv. pruni. In addition, it is useful to understand the evolution of the pathogenicity in this species. Furthermore, the comparative analysis reported here highlights several differences regarding to key genotypic and phenotypic features of initial and late stages of the infection process. The different TBDT, STCR, MCP profiles, the genomic variation in the components of flagellin, the type IV pilus and in cell-wall degrading enzymes repertoire, and the alteration in main virulence factors provide information that contributes to explain the phenotypic differences observed between pathogenic and non-pathogenic strains. Although the work presented shows a global overview of mechanisms involved in virulence, further functional studies are needed to test and improve the understanding of the role of all the virulence-related features mentioned here.

Multi Locus Sequence Analysis (MLSA)
Bacterial DNA was extracted from cultures in LB broth obtained after 24 h incubation at 27°C, using a QIAamp DNA miniKit according to the manufacturer's instructions (Qiagen, Barcelona, Spain). DNA was used for PCR or stored at -20°C until further use. Degenerate primers, previously determined as useful in MLSA analysis conducted in Xanthomonas [18,46], were used for PCR amplification of partial sequences of the housekeeping genes dnaK, fyuA, gyrB and rpoD. PCR amplifications were carried out in a 50 μL volume containing 1X PCR buffer (10 mM Tris-HCl, 50 mM KCl, 0.1% Triton X-100 [pH 9.0]); 0.2 μM of each primer; 1.25 U Taq DNA polymerase (Biotools, Madrid, Spain); 0.2 mM each dNTP (Biotools Madrid, Spain); 1.5 mM MgCl 2 and 1.0 μg/μL of DNA template. All PCR reactions were performed in an ABI 2720 thermal cycler (Applied Biosystems, Foster Urban district, CA, USA) with an initial denaturation at 94°C for 5 min, 40 cycles of denaturation at 94°C for 1 min, annealing at 55°C for 1 min and extension at 72°C for 2 min, and a final extension step at 72°C for 10 min. PCR products were visualized under UV light in 2% agarose gels stained with ethidium bromide and purified with the Wizard SV Gel and PCR Clean-up System Kit (Promega Corporation, Madison, USA). PCR products were sequenced at STAB VIDA (Lisbon, Portugal), and edited using BioEdit Sequence Alignment Editor [74]. Additionally, sequences of the housekeeping genes used for MLSA analysis from X. arboricola pathovars celebensis (ICMP 1488), corylina (ICMP 5726) and juglandis (ICMP 35) as well as X. axonopodis pv. citri strain ICMP 24, included as outgroup, were obtained from the National Center for Biotechnology Information database (NCBI) (http://www.ncbi.nlm.nih.gov).
Nucleotide sequences were aligned with ClustalW version 1.83 [75] using default parameters. Both ends of each alignment were trimmed to the following sizes: dnaK, 864 positions; fyuA, 607 positions; gyrB, 631 positions and rpoD, 756 positions. Nucleotide sequences from all strains analyzed here and from those Xanthomonas spp. available in databases [18] were aligned and concatenated to give a total length of 2,858 nucleotide positions. The programs jModelTest 0.1.1 [76] and MEGA 5.05 [77] were used to determine the best model of evolution for Maximum Likelihood analysis (ML) based in the akaike information criterion (AIC) [78]. For the concatenated gene dataset the model selected was TN93+G. Maximum likelihood trees, using 1,000 bootstrap re-samplings, were generated in MEGA 5.05.
Nucleotide sequences were deposited in GenBank. Accession numbers for the partial sequences of the genes used in this study are: KR054426 to KR054449 for dnaK; KR054450 to KR054473 for fyuA; KR054474 to KR054497 for gyrB and KR054498 to KR054521 for rpoD.
For the purpose of homogeneity for further comparison, all the genome sequences were annotated using Prokka [79]. GFF3 archives generated by Prokka were used as the input to determine the core and the dispensable accessory genes in the analyzed genomes using Roary [80]. The online tool available at bioinformatics.psb.ugent.be, was used to generate the Venn diagram to compare the genome composition of the Prunus-isolated strains with the remaining subspecific groups of X. arboricola. Signal peptides and transmembrane domains for the unique protein coding sequences (CDS) of X. arboricola CITA 44 and X. arboricola pv. pruni CITA 33 or IVIA 2626.1, were determined using the signalP 4.1 server and the TMHMM server version 2.0 [81,82]. Beside this, the assignment of these genes to the clusters of orthologous groups (COG) database [83] was performed with the NCBI`s conserved domain database using an expected value of 0.001 [84].
The core genome sequence obtained for each strain was aligned using MAFFT [85] for further phylogenetic analysis. Subsequently, a maximum likelihood tree, using 1,000 bootstrap resamplings, was constructed to accurately determine the phylogenetic position of the atypical strain isolated from Prunus, CITA 44, within X. arboricola. Maximum likelihood tree was carried out using the RaxML tool [86] and the dendrogram obtained was visualized using Dendroscope [87]. X. campestris pv. campestris strain ATCC 33913 was used as an outgroup in the analysis.

Carbon sources utilization analysis and environmental sensors profile
To prepare the inocula, bacterial strains used in the MLSA analysis (Table 1) were cultured in LB plates and resuspended in sterile phosphate-buffered saline (PBS) (OD 600 = 0.3). Suspensions (150 μL) were inoculated into each well of the Biolog GN2 microplates (Biolog Inc., USA). Microplates were read at 570 nm using a Labsystems Multiskan RC spectrophotometer (Fisher Scientific, Walthman, USA) after 24 h incubation at 27°C [1,88]. Three independent assays were performed, each including two microplates per strain and three reads per well. Means from all the reads were performed and used in further analysis.
The carbon sources utilization profile of strain CITA 44 was determined and compared with those obtained for strains of the pathovars corylina, juglandis, populi and pruni; results were converted to a binary form by scoring the metabolism observed for each compound as 0 (no catabolism of the carbon source) and 1 (catabolism of the carbon source). Similarity for pairs was calculated according to the Jaccard´s coefficient and was subjected to Unweight Pair Group Method with Arithmetic Mean (UPGMA) cluster analysis. Finally, the reliability of the similarity trees was determined using the cophenetic correlation index. All the analysis were computed on NTSYS 2.11T (Exeter Software, Setauket, NY).
Profiles of sensors of two-component regulatory system (STCRS) and TonB-dependent transporters (TBDTs) for X. arboricola strains isolated from Prunus (CITA 33, CITA 44 and IVIA 2626.1) and six other Xanthomonas (3004, CFBP 7634, CFBP 7651, NCPPB 1630, NCCB 100457 and CFBP 7179) (S1 Table) were determined based in the analysis of the complete genomes and the search of homologous sequences for 86 STCRS and 28 TBDTs previously described in other Xanthomonas species [25]. Sequence homology searches were conducted using the Blast tool according to the protocol proposed by the NCBI [90], using the nucleotide sequence that encodes each one as the input. Those CDS which presented an identity and a query coverage percentage over 80% were considered as orthologous genes in the target genomes of X. arboricola.
The chemotactic effect of the carbon compounds was evaluated according to a microtiter plate chemotaxis assay previously developed [24]. Briefly, 10 μL tips containing 5 μL of the carbon source tested were inserted into 48 of a 96 wells plate previously inoculated with 200 μL of bacterial suspension in 10 mM MgCl 2 (10 8 CFU/mL). The number of bacteria that moved into the tip during one hour was estimated by means of serial dilutions of the tip content plated on 1.5% agar LB plates. Significant differences (p < 0.05) between the means from each carbon source tested and the negative control (10 mM MgCl 2 ) were determined by an analysis of variance (ANOVA) according to the Student-Newman-Keuls method. Statistical analyses were performed using STATGRAPHICS Plus v.5.1 (Manugistics Inc. Rockville Maryland, USA). A carbon source was considered as a chemoattractant when the average number of bacteria contained in the tip (6 replicates in 2 independent assays) was significantly higher to the blank control (10 mM MgCl 2 ), and considered as a chemorepellent when the average was significantly lower (p < 0.05).
Orthologous genes for the 26 methyl accepting chemotaxis proteins (MCPs) as well as for 18 specific-chemotaxis che genes [25], described in X. campestris pv. campestris, X. campestris pv. vesicatoria, X. citri subsp. citri and X. oryzae, were searched in the genome sequences of nine X. arboricola strains as mentioned above. Those CDS with more than 80% of identity and more than 80% of sequence coverage containing MCPs periplasmic domains were considered as orthologous genes.

Motility in solid and semisolid surfaces and surfactant activity
Swarming, swimming and twitching motility assays were conducted for strain CITA 44 as well as for 20 strains classified as X. arboricola pv. pruni by MLSA. Bacterial cultures in exponential growth phase were centrifuged at 6,350 g for 15 min, then washed and resuspended in 10 mM MgCl 2 (OD 600 = 1.0). To analyze swarming motility, 0.5% agar PYM swarming plates (peptone 0.5%, yeast extract 0.3%, malt extract 0.3%, glucose 1.0%) were inoculated with 10 μL of bacterial suspension. For swimming motility assay, cultures resuspended in 10 mM MgCl 2 were centrifuged at 6,350 g during 15 min and the pellets were inoculated using a sterile toothpick in the center of semisolid 0.3% agar minimal medium A (MMA) plates (K 2 HPO 4 0.7%, KH 2 PO 4 0.3%, MgSO 4 Á7H 2 O 0.01%, (NH 4 ) 2 SO 4 0.1%, sodium citrate 0.005%, glycerol 0.2%). For twitching assay, bacterial cultures were prepared as mentioned above, and then inoculated with a sterile toothpick through a 5 mm PYM 1.5% agar layer to the bottom of the plate. After 72 h, culture medium was removed and the bottom of the plate was stained with 0.3% crystal violet for 15 min and washed using sterile distilled water.
Surfactant production, which is closely related to bacterial motility on solid surfaces, was also assessed according to the atomized oil assay [27]. Briefly, bacteria were inoculated using a sterile toothpick on 1.5% agar LB plates, and after 24 hpi, a fine mist of mineral oil droplets were sprayed on the colony. Bacterial strains that instantaneously showed a bright halo around the colony were recorded as surfactant producers.
Plates from swarming motility and surfactant activity assays were incubated at 27°C during 24 h, whilst swimming and twitching plates were incubated for 72 h. Images from all the plates were recorded. All assays were performed in three independent experiments with three replicates each time.
Homologous CDS to the major structural components associated with flagellum [25,29,92] as well as the fimbrial adhesin type IV pilus [30] and a group of non-fimbrial adhesins [25] were identified in the three genomes of X. arboricola isolated from Prunus and in other six X. arboricola strains as described above.

Pathogenicity tests on Prunus spp
Bacteria were grown on 1.5% agar LB plates and incubated at 27°C for 48 h, then a single bacterial colony was resuspended in 30 mL of LB broth and incubated at 27°C on a rotary shaker for 24 h. After incubation, cultures were centrifuged at 6,350 g for 15 min and washed three times with 10 mM MgCl 2 . Finally, cultures were adjusted to 10 8 CFU/mL (OD 600 = 0.1) and used to inoculate detached leaves of different Prunus species.
Young, fully expanded leaves from greenhouse-grown plants of almond (cv. Ferraduel), apricot (cv. Canino), peach (cv. Calanda) and European plum (cv. Golden Japan) were collected, brought to the laboratory and washed three times in sterile distilled water. After surface sterilization with 70% ethanol and 0.05% sodium hypochlorite, the leaves were rinsed three times with sterile distilled water and dried on a hood. Three leaves per host were selected randomly for being inoculated with each strain in three independent assays.
Surface sterilized leaves were placed on 0.5% water agar plates and the abaxial surfaces were gently inoculated using a sterile cotton swab damped with the bacterial inoculum. Sealed plates were incubated in a grow chamber adjusted at 27°C, 80-90% relative humidity and 12 h photoperiod. Inoculated leaves were digitally recorded at 28 dpi and the percentage of symptomatic area per host and strain was quantified using the ImageJ 1.45s software. Samples, inoculated with sterile 10 mM MgCl 2 , were used as blank control for comparison.
Additionally to the genomic analysis related to the early events in the bacteria-host interaction described below, homologous CDS to several genes associated with later stages of pathogenesis, like the type III secretory system (T3SS) and the type III effectors (T3Es), were searched in the genome of the nine strains of X. arboricola according to the percentage of the sequence length and identity with the amino acid sequence of the genes previously associated with these features [36][37][38][39][40][41]93,94]. In the same manner, orthologous CDS to other remarkable processes and features associated with virulence such as the quorum sensing [95], the xanthan biosynthesis [35], the type II secretion system [31,32] as well as the cellulolytic, hemicellulolytic, pectolytic and lipases enzymes, previously described in other species of Xanthomonas [13,33,34,96], were searched in the genome sequences of the nine strains of X. arboricola. Only those CDS with a percentage of coverage and identity over 80.0% were considered as orthologous sequences.
Finally, the presence of the plasmid pXap41 [44], putatively involved in virulence in X. arboricola pv. pruni, was searched in the analyzed genomes based in the nucleotide sequence similarity and graphically represented using the BLAST Ring Image Generator (BRIG) tool [97]; blastn was used for the sequence comparative analysis with an expected value threshold of 0.001. All the CDS associated with pathogenesis, mentioned above, were represented in a circular genome map using CGView [98]. For this purpose, contigs of the draft genome sequence of CITA 33, CITA 44 and IVIA 2626.1 were arranged by Mauve [26] using the circularized genome sequence of X. arboricola pv. juglandis strain Xaj 417 as the reference [22].