Sequencing and Analysis of the Pseudomonas fluorescens GcM5-1A Genome: A Pathogen Living in the Surface Coat of Bursaphelenchus xylophilus

It is known that several bacteria are adherent to the surface coat of pine wood nematode (Bursaphelenchus xylophilus), but their function and role in the pathogenesis of pine wilt disease remains debatable. The Pseudomonas fluorescens GcM5-1A is a bacterium isolated from the surface coat of pine wood nematodes. In previous studies, GcM5-1A was evident in connection with the pathogenicity of pine wilt disease. In this study, we report the de novo sequencing of the GcM5-1A genome. A 600-Mb collection of high-quality reads was obtained and assembled into sequence contigs spanning a 6.01-Mb length. Sequence annotation predicted 5,413 open reading frames, of which 2,988 were homologous to genes in the other four sequenced P. fluorescens isolates (SBW25, WH6, Pf0-1 and Pf-5) and 1,137 were unique to GcM5-1A. Phylogenetic studies and genome comparison revealed that GcM5-1A is more closely related to SBW25 and WH6 isolates than to Pf0-1 and Pf-5 isolates. Towards study of pathogenesis, we identified 79 candidate virulence factors in the genome of GcM5-1A, including the Alg, Fl, Waa gene families, and genes coding the major pathogenic protein fliC. In addition, genes for a complete T3SS system were identified in the genome of GcM5-1A. Such systems have proved to play a critical role in subverting and colonizing the host organisms of many gram-negative pathogenic bacteria. Although the functions of the candidate virulence factors need yet to be deciphered experimentally, the availability of this genome provides a basic platform to obtain informative clues to be addressed in future studies by the pine wilt disease research community.


Introduction
Pseudomonas fluorescens are Gram-negative bacteria that have diverse lifestyles and versatile metabolisms. These bacteria are found in decaying and living plants, soil, and water. Some P. fluorescens isolates benefit plants by suppressing pathogens, aiding in nutrient absorption, and degrading environmental pollutants [1]. Other isolates produce compounds that negatively affect the plant's growth [2].
To date, the genomes of four isolates of P. fluorescens have been sequenced and are publicly available, including SBW25, WH6, Pf0-1, and Pf-5 [3][4][5]. The published genomes revealed that P. fluorescens has an open pan-genome of approximately 6 to 7 Mb. In addition to the core genes, each isolate possesses 1000 to 1500 unique genes. Previous phylogenetic studies categorized the four sequenced P. fluorescens isolates into two distinct clusters: SBW25 and WH6 formed one cluster, and Pf0-1 and Pf-5 formed a second cluster. In these sequenced isolates, Pf-5 and SBW25 are rhizosphere bacteria that promote plant growth and suppress plant pathogens [3] [6]. WH6 produces a chemical compound called the germination-arrest factor (GAF) that specifically and irreversibly blocks the germination of the seeds of a large number of grassy weed species without significantly affecting the growth of established seedlings or mature plants [7]. And Pf0-1 is a bacterium that is well adapted to the soil environment and contributes significantly to the turnover of organic matter [5].
Another isolate of P. fluorescens, GcM5-1A, was reported to increase egg production, developmental rate, body length and diameter of both male and female pine wood nematodes (PWN) [8]. GcM5-1A is one of the main bacteria isolated from the surface coat of Bursaphelenchus xylophilus (the nematode causing pine wilt disease), which has been implicated in connection with the pathogenesis of pine wilt disease (PWD) in several previous studies [9][10][11][12]. The devastating pine wilt disease spreads among pine trees in Asian countries, especially in Japan, China and Korea. This disease is caused by PWN (whose genome has been sequenced in 2011 [13]), and is transferred by pine sawyer beetles (Monochamus alternatus) [14]. PWN can reproduce quickly in the sapwood of the susceptible pine species, which causes wilting and death of the host in a short period of time. Recently, this disease has also been observed in European countries [15], and is gradually becoming a threat to pine forests worldwide.
The pathogenesis of PWD remains debatable. PWN was previously recognized as the only pathogen that caused this disease [16]. However, it was reported that PWN lost pathogenicity after surface sterilization [17], which led to the hypothesis that the pathogenicity might be a combined effect of the PWN and the microbes adherent to their surface coat. Various evidence supported this hypothesis. For instance, Shinya et al.'s study indicated that the surface coat of PWN could protect the microbes and was essential for the infection of pine trees by PWN [9]. In addition, Guo et al. reported that P. fluorescens GcM5-1A, isolated from the surface coat of PWN, could produce a flagellin, namely fliC, that was able to increase populations of pine wood nematodes and their associated bacteria [10]. It was also found that fliC could induce the death of Pinus thunbergii suspension cells in 24 h [11], and Xu et al. proved that fliC increased damage to the host pine through enhancing the oxidative stress [12].
Although several studies have been carried out testing the connection of the PWN surface coat microbes with the pathogenicity of PWD, their relationship remain debatable [18], and the underlying mechanisms need to be deciphered at the molecular level. Many virulence factors may play a role in the infection process. For instance, it was found that many gram-negative pathogenic bacteria employed a type III secretion system (T3SS) to subvert and colonize their host organisms. The T3SS injects effector proteins directly into the cytosol of eukaryotic cells, and thus allows the manipulation of host cellular activities to the benefit of the pathogen [19]. T3SS provides a unique virulence mechanism to infect host cells [20]. Among the published genomes of the P. fluorescens isolates, a complete T3SS was only identified in WH6 [4], which produces the GAF herbicide that helps to control grassy weeds [21]. Whether GcM5-1A possesses a complete T3SS remains unknown. Besides the T3SS system, other candidate virulence factors may exist and need to be addressed in understanding the role of PWN surface coat microbes in connection with the pathogenicity of PWD at the molecular level.
Taken together, the perspectives of this study are to: 1) sequence the genome of GcM5-1A to provide a basic platform for the pine wilt disease research community; 2) compare the genome of GcM5-1A and study the phylogenetic relationship of GcM5-1A with that of the other four sequenced P. fluorescens isolates at genome-wide level; and 3) detect the candidate virulence factors, including the T3SS system, based on sequence annotation, to provide clues to be addressed in future studies.

Results and Discussion
Genome sequencing and assembling The GcM5-1A genome was sequenced using a 454 GS-FLX sequencer with a GS titanium XLR70 kit (Roche Inc.), which resulted in 1,534,199 sequencing reads with an average length of 409 bp. The total throughput exceeded 600 Mb. The genome sizes of the four sequenced P. fluorescens isolates ranged from 6.27 to 7.07 Mb. Using this range as a baseline, the sequencing depth of GcM5-1A was estimated to be approximately 100X. After filtering with the Newbler de novo Assembler (Roche Inc.), 98.23% of the total reads were selected and used in the final assembly. As a result, the genome assembly generated 263 contigs greater than 100 bp, among which 79 contigs were greater than 500 bp. The final assembly covered 6.01 Mb. Subsequently, all of the contigs were submitted to BLAST against the nr database using BLASTX [22]. No contigs exhibited high similarity with genes from organisms outside the bacterial kingdom, except for contig24, which shared a highly homologous DNA fragment with the Pseudomonas phage phiPsa374, with 73% sequence identity. The homologous fragment of contig24 covered 34% of the genome of phiPsa374. Therefore, we placed contig24 at the terminus of chromosome replication of the established circular graph (Fig 1). The sequence alignment revealed that the contig shared almost no sequence similarity with any of the other P. fluorescens isolates. Further analysis revealed that contig24 had a size and GC content similar to those of several Pseudomonas phages. We hypothesized that contig24 might represent a horizontally transferred phage sequence that diverged significantly from the archetype during the evolutionary process or an exogenous DNA segment from an unknown phage.
To order the obtained contigs, a high-quality reference genome was required. As revealed by BLASTX, GcM5-1A exhibited higher similarity with SBW25 and WH6 compared with Pf0-1 and Pf-5. Compared with SBW25, WH6 was incompletely sequenced [4]. Therefore, the genome assembly of SBW25 was selected as a reference to order the obtained sequence contigs. All contigs greater than 1 kb were ordered in line with the reference genome by the Mauve Aligner [23]. There were 188 contigs smaller than 1 kb. Together with contig24, these contigs only accounted for 2.1% of the GcM5-1A genome. Among the ordered contigs, contig4 contained the gene which is responsible for initiation of replication; thus, it was marked as the starting point of the GcM5-1A genome.
Genome comparison of the sequenced P. fluorescens isolates A total of 5,413 coding sequences (CDSs) were predicted in the genome of GcM5-1A. The sequences were annotated based on similarity with the nr database. According to the features of the GcM5-1A genome ( Table 1), there were fewer predicted CDSs noted in GcM5-1A than in any of the other four sequenced P. fluorescens isolates. In contrast, the GcM5-1A genome contained an increased number of tRNA compared with any other P. fluorescens isolates. We used OrthoMCL [24] to search for the orthologous groups, and 2,988 orthologous groups were found (Fig 2). These orthologous groups formed the core gene set of P. fluorescens; the Clusters of Orthologous Groups (COG) database was then used for functional annotation of the core gene set and unique genes of GcM5-1A [25]. The annotation results (Fig 3) were similar to From the outer to the innermost circle, circle 1 presents the physical coordinates, in 100,000 bp per interval, as defined by the black sticks. The red sticks on circle 1 indicate the sites of the physical gaps. Circles 2 and 3 depict the CDS on the positive and negative strands. Circles 4, 5, 6 and 7 depict the orthologous similarity between GcM5-1A and WH6, between GcM5-1A and SBW25, between GcM5-1A and Pf-5, and between GcM5-1A and Pf0-1, respectively. Different colors indicate different similarity levels. Blue indicates similarity greater than 80%. Green indicates similarity within the range of 60 to 80%. Pink indicates similarity within the range of 20 to 60%. White reflects similarity less than 20%. Circle 8 indicates the GC content (red > 60.5%, blue < 60.5%). Circle 9 depicts the GC-skew (purple > 0, cyan < 0).

Phylogenetic and syntenic analyses
To investigate the phylogenetic relationship among the five sequenced P. fluorescens genomes, 2,935 single-copy orthologous groups were used to generate a multiple alignment using MUS-CLE [28] with default parameters. A phylogenetic tree was then constructed using MEGA5 [29] with the neighborhood-joining method and the bootstrap parameter set to 500. Previous studies categorized the P. fluorescens isolates into two distinct clades. SBW25 and WH6 formed one clade, and Pf0-1 and Pf-5 formed the second clade. In this study, GcM5-1A was evident in the SBW25 and Wh6 clade (Fig 4A).
Long-range synteny of GcM5-1A with other P. fluorescens isolates appeared more often at the origin and terminus of chromosome replication of the genome. Compared with Pf0-1 and Pf-5, fewer outliers were noted between the genome of GcM5-1A and those of SBW25 and WH6, indicating that the genome of GcM5-1A shares higher colinearity with that of the latter than the former (Fig 5). Thus, GcM5-1A was more closely related to SBW25 and WH6 than to Pf0-1 and Pf-5, which was consistent with the results obtained from the phylogenetic tree. Given that the genome assembly of GcM5-1A was ordered using SBW25 as reference, the synteny between GcM5-1A and SBW25 was expected to be better than for any other pair-wise comparison. In the genome of a bacterium, the leading strand often has an excess G content compared with C, whereas the lagging strand has excess C compared with G [30,31]. From the circular graph of GcM5-1A (Fig 1), the GC skew index (GC Skew = (G-C)/(G + C)) clearly demonstrated a G bias, with most of the GC skew indexes above 0 in the leading strand. C bias was observed in the lagging strand, which suggests that the established synteny was reliable and properly reflected the relationship between these P. fluorescens isolates.

Candidate virulence factors
A previous study of the P. fluorescens GcM5-1A indicated that the isolate secreted fliC flagellin, which is deleterious to black pine seedlings [10]. When GcM5-1A and PWN were cultured together, the fecundity, egg-hatch rate and the development of PWN were significantly improved [32]. Xu et al. reported that treating the black pine cell suspension with the flagellin of GcM5-1A results in an oxidative burst of the pine cells and subsequent cell death, this indicating the host's response to the attack of the pathogen [12]. However, the cell death of pine callus induced by flagellin promoted the proliferation of PWN and the associated GcM5-1A, which was feeding on the callus cells [11]. Li et al. further revealed that the flagellin of GcM5-1A would attach to the pine cell membrane and lead to shrinkage of the cell membrane, a concentration of the cytoplasm, formation of micronuclei, degradation of cytoplasmic RNA, and breakage of genomic DNA. However, DNA ladder formation was not observed, which indicated that an unusual form of apoptosis occurred in the flagellin-treated pine cells [33]. The above evidence suggested that GcM5-1A potentially plays a role in the pathogenesis of pine wilt disease. Alignment of the N-terminal sequences of the fliC genes demonstrated that this gene shared high similarity between GcM5-1A and Pf-5 [12]. The expression the fliC genes of these two isolates in Escherichia coli produces recombinant flagellin that exhibit similar toxicity to pine suspension cells [10]. In a previous study, researchers observed that pine seedlings inoculated with the axenic PWN and transformed E. coli that produced secretive flagellin of Pf-5 resulted in symptoms of wilt. In contrast, pine seedlings inoculated with axenic PWN and untransformed E. coli did not cause symptoms of wilt [34]. Thus, the fliC gene is a crucial pathogenic factor in the genome of GcM5-1A and is located on contig57 in the genome assembly. Based on sequence similarity, we constructed a gene tree for the fliC genes in the five sequenced isolates of P. fluorescens (Fig 4B). The tree demonstrated that the fliC genes from Pf0-1 and Pf-5 were closely related, as were those of SBW25 and WH6. Whereas the fliC gene in GcM5-1A was more phylogenetically diverged from others, but it was more close to Pf0-1 and Pf-5 than SBW25 and WH6. In addition to the fliC gene, we searched for additional virulence factors that might relate to pathogenesis in the genome of GcM5-1A. For this purpose, all of the predicted CDSs were submitted to BLAST against the VFDB, which was built for bacterial pathogens [35]. This analysis identified 79 CDSs as candidate virulence factors (S1 Table). Among these factors, the Alg genes encoding the polysaccharide alginate participate in the formation of biofilms, which play an important role in the plant-pathogen interaction [36]. The Fl genes encode the bacterial flagella that affect the bacterial motility. These genes are also essential for the formation of biofilms in P. syringae [37], whereas the Waa genes are involved in the synthesis and transformation of lipopolysaccharides in Gram-negative bacteria. Lipopolysaccharides play direct roles in the plant-pathogen interaction [38,39]. The Rhl genes encode rhamno lipids that are crucial for P. syringae swarming, which is an important factor in pathogen infection [40]. All of the candidate virulence factors that were identified in the genome of GcM5-1A suggested that this microbe plays a role in the pathogenesis of pine wilt disease.

The flagellum secretion system T3SS
In Pseudomonas spp., the complete T3SS was first identified in P. syringae pv. tomato DC3000. This microbe is the model system used to study T3SS, and it is pathogenic to tomato and Arabidopsis [41]. In pathogenic bacteria, T3SS uses its needle-like structure, known as the needle complex [42], to detect the presence of eukaryotic organisms. When eukaryotic organisms are detected, the T3SS injects bacterial proteins into the host cell and helps the bacteria colonize and multiply in the host cells [43,44]. In the previously sequenced isolates of P. fluorescens, the complete T3SS was only identified in WH6. Using the T3SS genes from DC3000 and WH6 as queries, the complete T3SS was identified in the genome of GcM5-1A; thus, GcM5-1A appears to be the second P. fluorescens isolate that possesses the complete T3SS. In the GcM5-1A genome, CDS00691~CDS00712 were identified as T3SS genes (Fig 6). Similarity analysis of the T3SS candidates of GcM5-1A and DC3000 was performed by using BLASTP with 30% sequence identity and 80% sequence overlap. GcM5-1A was found to possess all the T3SS genes except for the D, Z, A, and S genes when compared with DC3000. GcM5-1A lacked D and A genes compared with WH6. In T3SS, the A and F genes were homologous to each other [45,46]. The S and R genes shared high similarity and were functionally redundant [47]. Deletion of the Z gene did not affect T3SS function [48][49][50]. The function of the D gene remains unclear. Although upon consideration of the standard T3SS in DC3000, the genes that are lacking in GcM5-1A should have little effect on T3SS function. Based on the above analyses, we propose that the GcM5-1A genome possesses a fully functional T3SS.

Conclusions
Pine wilt disease is a disastrous for pine forests, and the pathogenesis of this disease remains debatable. Some pathologists have proposed that the surface coat of PWN shelters microbes that might play a role in the onset of this disease. To provide insight into the pathogenicity of such microbes, we sequenced a Gram-negative bacterial isolate of P. fluorescens, GcM5-1A, which inhabits the surface coat of PWN. We obtained 1.5 million high-quality reads, which were assembled into sequence contigs 6.01 Mb in length. Sequence annotation predicated 5,413 CDSs. Among the five sequenced P. fluorescens isolates, GcM5-1A was more closely related to isolates SBW25 and WH6 than to Pf0-1 and Pf-5. We detected 79 virulence factors in the genome of GcM5-1A, including gene coding the major pathogenic protein fliC. GcM5-1A also possessed a fully functional T3SS. The detected candidate virulence factors provide informative clues to be addressed on in elucidating the role of GcM5-1A in the pathogenesis of pine wilt disease at molecular level in future studies. The availability of this genome sequence alongside that of PWN provides essential platforms for further study of their interaction and their roles in the pathogenesis of pine wilt disease.

Materials and Methods
Bacterial culture and genomic DNA extraction P. fluorescens GcM5-1A (CCTCC No: M 204065) was isolated from PWN obtained from wilted Pinus thunbergii in Nanjing Forestry University, China. The isolation of GcM5-1A was performed as described by Zhao et al. [51]. Field studies were conduct under the permission of Nanjing Forestry University China. This isolate was shake-cultured in Luria-Bertani (LB) medium at 28°C for 24 h, and bacterial cells were harvested by centrifugation at 5,000×g for 10 min at 4°C. Genomic DNA of P. fluorescens GcM5-1A was isolated from the bacterial pellet according to the method described by Chen et al. [52]. DNA purity was examined by agarose gel electrophoresis, and the concentration was measured using a Nanodrop ND2000 (Thermo Scientific, Waltham, MA, USA).

Sequencing and assembly
The GcM5-1A genome was sequenced on a 454 GS-FLX sequencer (Roche, Inc, Basel, Switzerland.) using the FLX Titanium Sequencing Kit (Roche, Inc, Basel, Switzerland.). Sequence assembly was conducted with a Newbler de novo Assembler (version 2.5.3) with default parameters. In the final assembly, contigs less than 100 bp in length were discarded. Sequence data from this study were deposited at DDBJ/EMBL/GenBank under accession number JJOE00000000.

Gene annotation
The coding DNA sequences (CDSs) of GcM5-1A were predicted by Glimmer3 [53], which uses interpolated Markov models (IMMs) to identify the coding regions and distinguish them from noncoding DNA. Glimmer3 used the long-CDS program to find the long and non-overlapping CDSs, and then set the long-CDS training system for the subsequent predictions. The predicted genes were annotated against the nr database and the COG database, with an e-value cut-off of 1e-5.

Non-coding RNA prediction
Transfer RNAs (tRNAs) were predicted using tRNAscan-SE [54]. This method first pre-filters the input sequence to identify candidate tRNAs; then, a highly selective tRNA "covariance model" is implemented for tRNA predictions and allows for the identification of 99 to 100% of tRNA genes in the DNA sequence with less than one false positive.
Ribosomal RNA (rRNA) was predicted using RNAmmer [55], which utilized two levels of hidden Markov models. An initial spotter mode searched both strands for detecting the approximate position of an rRNA gene. Flanking regions were then extracted and parsed to the full model, which matched the entire rRNA gene. By enabling this two-level approach, the program avoided running the entire genome sequence through the model and allowed faster predictions.

Phylogenetic tree construction
Orthologous groups among the 5 isolates of P. fluorescens were distinguished using OrthoMCL, which utilized a Markov clustering algorithm to cluster the highly homologous genes from different isolates. By selecting the single-copy orthologous group, we first performed multiple alignment of each orthologous group on an amino acid level using MUSCLE (MUltiple Sequence Comparison by Log-Expectation). All multiple alignments were then combined into a multiple alignment or a so-called super-alignment file. By using the super-alignment file, the phylogenetic tree was constructed and visualized by MEGA5 using the neighbor-joining algorithm with a bootstrap support node set as r = 500. The construction of the phylogenetic tree of fliC genes was performed following the same method described above.

Synteny and the circular graph
Synteny between GcM5-1A and the P. fluorescens isolates was performed by MUMmer [56] and visualized by ad hoc Perl scripts. The circular graph of GcM5-1A, which was consistent with the other four isolates of P. fluorescens, was constructed using DNAplotter [57].

Identifying the T3SS and candidate virulence factors
To identify the T3SS in GcM5-1A, we submitted the T3SS genes of P. syringae pv. tomato DC3000 to BLAST against the entire gene set of GcM5-1A using BLASTP with parameters as follows: an e-value of 1e-5, 30% sequence similarity, 80% sequence length overlap and max_-target_seqs of 1.
All of the predicted CDSs of GcM5-1A were submitted to BLAST against the Virulence Factor Database (VFDB) using BLASTP with an e-value of 1e-5 and max_target_seqs of 1. The BLAST results were then filtered using the criteria of sequence identity greater than 60% and sequence overlap greater than 90% between the query and the template.
Supporting Information S1 Table. Candidate virulence factors identified in the genome of GcM5-1A. (XLS)