Bioinformatics characterization of BcsA-like orphan proteins suggest they form a novel family of pseudomonad cyclic-β-glucan synthases

doi:10.1371/journal.pone.0286540

Fig 1.

Orphan proteins are not recent duplications of BcsA proteins.

Shown here is the bacterial cellulose synthase (bcs) operon found in Pseudomonas fluorescens SBW25 known as the Wrinkly Spreader Structural (wss) operon [41] (A). WssB/BcsA (dark blue) and WssC–E (light blue) are the core cellulose synthase subunits, WssF–I (green) are involved in the partial acetylation of the cellulose polymer [41, 42], and Wss A and WssJ (rose) are likely to be involved in the positioning of the cellulose synthase and are required for cellulose production. A second BcsA homolog known as the Orphan is located upstream of dapE (yellow) in a different region of the chromosome. DapE and other genes (grey) indicated here are not involved in cellulose production. P. fluorescens SBW25 locus tag (PFLU) numbers are shown below the genes. Heatmaps of the amino acid sequence identity (left panel) and similarity (right panel) (B) determined from Water pairwise comparisons [70] of amino acid sequences of BcsA and Orphan proteins from P. fluorescens SBW25, P. putida KT2440 and P. syringae DC3000, as well as with the Escherichia coli MG1665 BcsA (EcBcsA) reference protein, suggest that the Orphan genes are not recent duplications of the bcsA genes in these model pseudomonads (see S1 Table for protein locus tags & UniProtKB accessions and S1 File for protein sequences). These comparisons also show that the PfSBW25, PpKT2440 and PsDC3000 Orphan proteins and EcBcsA share a central overlapping region (C) of ~240 residues in the multiple sequence alignment (this region is indicated in blue with the residue numbers provided for PfSBW25 Orphan and EcBcsA proteins).

More »

Expand

Fig 2.

Pseudomonas spp. Orphan and BcsA proteins originate from two different but related groups of proteins.

Shown here is a simple un-rooted UPGMA phylogenetic tree produced by Clustal Omega Simple Phylogeny [70] (A) of BcsA and Orphan proteins from eleven pseudomonads plus the Escherichia coli MG1665 BcsA reference protein (see S1 File for protein sequences). The tree is drawn with the real (relative) genetic distances and as a cladogram with a uniform distance between the root, indicated by the small grey circle, and the terminal nodes shown as large circles for the Orphans and squares for the BcsA proteins. The dashed lines separate the two main clades of the tree with the Orphan and BcsA proteins in different branches. The real and uniform distance (horizontal) scales are shown at the bottom-left of each cladogram. The Orphan and BcsA proteins can also be differentiated by hierarchical cluster analysis (HCA) based on amino acid profiles (B). The same symbols and colours are used to indicate the arbitrary root which is located at the mid-point of the longest branch, and Orphan and BcsA proteins. The dashed arc indicates the branch containing most of the Orphans from the rest of the cladogram which includes the BcsA proteins and the remaining Orphan. The x-y scale is indicated at the bottom-left of the cladogram.

More »

Expand

Fig 3.

Orphan proteins are predominantly found within the Pseudomonas genus.

Shown here are two schematics of an un-rooted UPGMA phylogenetic tree produced by Clustal Omega Simple Phylogeny [70] showing the seven main clades (A) and five subclades within Clade 6 that containing all Pseudomonas spp. Orphan proteins (B). These are simplifications of the original UPGMA phylogenetic tree of 190 Orphan protein homologs (see S2 Fig for the full tree; see S1 File for protein sequences) and are drawn with real (relative) genetic distances. Clades and Subclades: Clade 1, This clade includes the fungal Rhizomucor miehei CAU432 β-(1,3)-Glucanosyltransferase, Rm Bgt17A and contains a total of 20 proteins from fungi, plants, and Gammaproteobacteria with glucosidase, glucanosyltransferase, glycogen synthase, and mannosyltransferase annotations. Clade 2, This clade includes the Escherichia coli MG1655 and Rhodobacter sphaeroides 2.4.1 BcsA reference proteins and contains a total of 14 BcsA cellulose synthase proteins from the Alphaproteobacteria and Gammaproteobacteria. Clade 3, This clade contains 11 proteins from the Alphaproteobacteria and Epsilonproteobacteria with glucosyl/glycosyl transferase and glucanase annotations. Clade 4, This clade contains 10 proteins from the Betaproteobacteria and Gammaproteobacteria with benzoate transporter, glucanase, glucosyl transferase, and glycosyl hydrolase family annotations. Clade 5, This clade contains 17 proteins from the Alphaproteobacteria, Deltaproteobacteria, and Gammaproteobacteria, with glucanase, glycosyltransferase, and cellulose synthase annotations. Clade 6, This clade contains 112 representative Pseudomonas spp. Orphan proteins and four non-pseudomonad homologs in the following five subclades. Subclade 6.1, This subclade contains six P. syringae strain Orphan proteins including PsDC3000, with glucosyl/glycosyl transferase annotations. Subclade 6.2, This subclade contains two non-Pseudomonas spp. Orphan proteins from the Betaproteobacteria and Gammaproteobacteria with gluco/glycosyl transferase annotations. Subclade 6.3, This subclade contains two non-Pseudomonas spp. Orphan proteins from the Deltaproteobacteria and Gammaproteobacteria with glucanase annotations. Subclade 6.4, This subclade contains a total of 94 Pseudomonas spp. Orphan proteins, including P. fluorescens SBW25 and P. putida KT2440 and excluding all P. aeruginosa and P. syringae Orphans, with glucanase, glucan biosynthesis protein, glucosyl/glycosyl transferase, Glyco trans 2-like domain-containing protein, and cellulose synthase annotations. Subclade 6.5, This subclade contains eight P. aeruginosa strain Orphan proteins including PaPA01 and one additional Pseudomonas spp. Orphan, with glucanase, glycosyl transferase, and synthases of periplasmic glucan annotations. The unmarked nodes in (B) represent single Pseudomonas spp. Orphan proteins are not included in the subclades. Clade 7, This clade includes the Alphaproteobacteria bacterium Rhizobium meliloti 1021 NdvB and the fungal Schizosaccharomyces pombe 972 Ags1 proteins chosen as outliers for this tree.

More »

Expand

Table 1.

Clade and subclade characteristics.

More »

Expand

Fig 4.

Functional domains identified in the Orphan protein sequence by HMMER.

Shown here are schematics of the fungal Rhizomucor miehei CAU432 β-(1,3)-Glucanosyltransferase, the Pseudomonas fluorescens SBW25 Orphan protein, and the Escherichia coli MG1665 cellulose synthase catalytic BcsA subunit, aligned to show the positioning of the (first) GH17 (trans)glycosidase and (second) GT2 nucleotide-diphospho-sugar transferase domains identified by HMMSCAN [75]. The PfSBW25 Orphan protein includes a peptide signal sequence (yellow) and a series of transmembrane helices (dark grey), but not the regulatory PilZ domain (Pfam 07238) present in BcsA. HMMSCAN also identified catalytic residues (red marker) but not in both homologous domains. The first domain of the PfSBW25, P. putida KT2440 and P. syringae DC3000 Orphan proteins share significant homology with the (Trans)glycosidase superfamily / β-Glucanases family (Superfam 51445 and 51487; Conditional E-values of 1.3e-45–5.4e-57) and Glycosyl hydrolase family 17 (Pfam 00332; 2.0e-07–2.7e-8). The second domain shared significant homology with the NDP-sugar-transferase superfamily (Superfam 53448; 2.4e-50–1.2e-52) and Glycosyltransferase-like family 2 (Pfam PF13641; 1.1e-27–1.0e-34).

More »

Expand

Table 2.

Pairwise comparisons of protein structures and models.

More »

Expand

Table 3.

Conserved motifs and residues identified in the Orphan proteins.

More »

Expand

Fig 5.

The Orphan proteins share conserved features found in GH17 glucanosyltransferases and GT2 cellulose synthases.

Shown here is a map of amino acid conservation scores (A) determined by Shannon entropy [73] from a Clustal Omega multiple sequence alignment [70] of 26 Orphan proteins found in Pseudomonas aeruginosa AZPAE12140, BL14, PAK, PA01, PA14, LESB58, 19BR and 3573, P. fluorescens ICMP 11288, ICMP 3512, KF1, LMG 5329, SBW25, SS101, WH6 and WS 5037, P. putida KT2440, S610, W619 and YKD221, and P. syringae B728a, DC3000, ICMP 9617, NCPPB 4273, UMAF0158 and 41a strains (see S1 File for protein sequences and S3 File for an annotated multiple sequence alignment of these proteins). A value of 1 indicates complete conservation while lower values indicate less conservation of that residue. Overlaid onto this map are conserved domains, motifs, and residues, found in fungal GH17 β-(1,3)-glucanosyltransferases and Rhizomucor miehei CAU432 Bgt17A, as well as bacterial GT2 synthases such as Rhodobacter sphaeroides 2.4.1 BcsA, that are indicated by vertical blue lines and coloured rectangles. Below this is a simplified schematic of the Orphan protein (B) in which the proposed functional GH17 and GT2 domains are indicated along with the signal peptide sequence (yellow) (not identified in P. syringae strain Orphans) and transmembrane domains (grey). Note that the x-axis and Orphan schematic shown here are longer than individual Orphan proteins and the additional length is a result of the INDELS (mainly insertions) introduced by the multiple sequence alignment. Fungal GH17 / RmBgt17A conserved domains, motifs, and residues: 1, Conserved D. 2, Conserved R. 3, Conserved Y. 4, Conserved E. 5, Conserved G. 6, Conserved W. 7, VGNE motif. SD1 & SD2, Sub-domains. 8, Putative catalytic E. 9, GWP catalytic site. 10, Conserved G. 11, Conserved G. 12, Conserved WK. 13, Conserved WG. Bacterial GT2 / RsBcsA conserved domains, motifs, and residues: TM1 –TM3, Transmembrane helices. IF1, Amphipathic interface helix 1. BcsA Active Site, Active site of GT2 synthases. 14, DDG motif (but only D). 15, HAKAG motif. 16, DAD motif. 17, QTPH motif. 18, FFCGS motif (but only G). 19, TED motif (but only ED). 20, Conserved E not seen in the Orphans. IF2, Amphipathic interface helix 2. 21, QRxRW motif. TM4 –TM6, Transmembrane helices. 22, Conserved T not seen in the Orphan proteins. TM7 & TM8, Transmembrane helices. 23, RxxxR motif associated with c-di-GMP binding not seen in the Orphan proteins. Note some conserved motifs / residues not identified in the Orphan proteins are also indicated for reference.

More »

Expand

Fig 6.

Two-domain structural prediction of the PfSBW25 Orphan protein.

Shown here is the AlphaFold predicted structure of the Pseudomonas fluorescens SBW25 Orphan protein showing the relative positioning of the GH17 domain, transmembrane (TM) region and GT2 domain. The transmembrane ovoid-like structure of the protein (A) is dominated by α-helices (magenta) in the TM region with some β-sheets (gold) found in the GH17 TIM-barrel and GT2 Rossmann-like fold. Loops (green) (sections with poor certainty are in light green and white) are also indicated. Surface hydrophobicity (B) is represented by cold colours with hydrophilic surfaces indicated by warmer colours. The model was produced by AlphaFold Colab Notebook [82, 83] and the PDB file is available (see S2 File). The model was visualised with Mol* 3D Viewer [92] using molecular surface and membrane orientation representations and colouring residues according to secondary structure or hydrophobicity.

More »

Expand

Fig 7.

Common features are seen in PfSBW25 Orphan protein structural predictions produced by different servers.

Shown here are predicted structures of the Pseudomonas fluorescens SBW25 Orphan protein produced by AlphaFold (A), TrRosetta (B), RoseTTAFold (C), and IntFOLD6 (D). The relative positioning of the GH17 domain, transmembrane (TM) region and GT2 domain are indicated along with the position of a predicted lipid bilayer (grey ovals). Note that relative sizes vary from image to image and volumes may be hard to assess. Surface hydrophobicity is represented by cold colours with hydrophilic surfaces indicated by warmer colours. The models were produced by AlphaFold Colab Notebook [82, 83], IntFOLD6 [84], RoseTTAFold [86], and TrRosetta [89–91] and PDB files are available (see S2 File). Models were visualised with Mol* 3D Viewer [92] using molecular surface and membrane orientation representations and colouring residues according to hydrophobicity.

More »

Expand

Fig 8.

The Orphan protein two-domain structure is conserved across clades.

Shown here are the AlphaFold predicted structures of Orphan proteins from other model Pseudomonas spp. and representatives of sister clades identified in the UPGMA analysis of homologs. Clade 3 representative Arcobacter butzleri ED-1 (A); Clade 4 representative Azoarcus strain DN11 (B); Clade 5 representative Methylomonas methanica MC09 (C); Clade 6 representative P. viridiflava LMCA8 (D); Clade 6 member P. aeruginosa PA01 (E); Clade 6 member P. fluorescens SBW25 (F); Clade 6 member P. putida KT2440 (G); Clade 6 member P. syringae DC3000 (H). The relative positioning of the GH17 domain, transmembrane (TM) region and GT2 domain are indicated along with the position of a predicted lipid bilayer (grey ovals). Note that relative sizes vary from image to image and volumes may be hard to assess. Surface hydrophobicity is represented by cold colours with hydrophilic surfaces indicated by warmer colours. The models were produced by AlphaFold Colab Notebook [82, 83] and the PDB files are available (see S2 File). Models were visualised with Mol* 3D Viewer [92] using molecular surface and membrane orientation representations and colouring residues according to hydrophobicity.

More »

Expand