Building blocks and blueprints for bacterial autolysins

doi:10.1371/journal.pcbi.1008889

Fig 1.

LEDGOs workflow.

For each organism of interest, the user provides as input a set of “seed” proteins, here based on GO terms indicative of peptidoglycan recognition and catalysis. The LEDGOs data collection pipeline then gathers organism-specific homologs of the seed proteins by repeated PSI-BLAST searches. The LEDGOs pipeline further annotates the catalytic and cell-wall binding domains within the collected sequences according to Pfam families, and catalogs the domain architectures of the proteins. Note that the identified homologs can extend (past the marked “||”) to include additional domains beyond those in the seeds. There can also be uncharacterized sequence regions (marked “?”) between the annotated domains. The domain sequences, annotations, and architectures within full enzymes are stored in the LEDGOs database. The LEDGOs data analysis tools then query this database to characterize and compare/contrast the lysin domain building blocks and architectures employed by the different organisms.

More »

Expand

Table 1.

LEDGOs database construction and composition.

A breakdown, by organism, of the counts of the initial sets of sequences with relevant annotation for peptidoglycan binding and catalysis function; representative seed sequences; identified sequence homologs; those homologs with catalytic domains; unique architectures; domain sequences; domain types; RUF sequences; and clustered RUFs.

More »

Expand

Fig 2.

Domain usage frequency by organism.

Bar charts indicate percentages of representative proteins containing each of the domain types. Waffle plots indicate percentages of each domain type among the set of domains comprising the representative proteins, separately counting duplicates of a domain type within a protein. With the entire set of domain sequences totaling 100%, each block in a waffle indicates that 1% of the domains are of a given domain type. Only domain types that appear in at least 2% of some organism’s representative proteins are shown. Bars and waffle cells are ordered and colored by domain type as summarized in the legend.

More »

Expand

Fig 3.

Common gram negative lytic enzyme architectures.

In each graph, the nodes indicate domains (along with the N terminus and C terminus), with size reflecting frequency within the organism’s clustered proteins and empty circles for domains with no representation in that species. The edges represent connections within a single protein, with edge shading and thickness representing relative frequency. Note that there are some self-edges (e.g., LysM loops back to itself), indicating a repeated domain. A path from Nterm through one or more domains to Cterm thus represents a protein, though not all such proteins have been observed in LEDGOs (see text).

More »

Expand

Fig 4.

Common gram positive lytic enzyme architectures.

In each graph, the nodes indicate domains (along with the N terminus and C terminus), with size reflecting frequency within the organism’s clustered proteins and empty circles for domains with no representation in that species. The edges represent connections within a single protein, with edge shading and thickness representing relative frequency. Note that there are some self-edges (e.g., LysM loops back to itself), indicating a repeated domain. A path from Nterm through one or more domains to Cterm thus represents a protein, though not all such proteins have been observed in LEDGOs (see text).

More »

Expand

Fig 5.

Domain sequence diversity.

In each heatmap, each row and column represent a single non-redundant domain sequence in the LEDGOs database, and the cell for a pair of sequences is colored to indicate sequence identity (darker blue, higher). Cells are grouped by organism and clustered within an organism based on sequence identity patterns, so that similar sequences within an organism appear together as “blocks” on the diagonal, and blocks of similar sequences across organisms as off-diagonal blocks.

More »

Expand

Fig 6.

Domain sequence diversity by architecture.

Heatmaps as in Fig 5, except limited to non-redundant domain sequences appearing in architectures with a frequency of at least 2% are shown, and with row/column colors indicating the organism and architecture rather than the organism and gram status.

More »

Expand

Fig 7.

Domain sequence diversity by repeat position.

Heatmaps as in Fig 5, except with the colors above the columns indicating the organism and architecture, and the colors beside the rows indicating the organism and repeat number. Thus cells are grouped by organism and repeat number, and clustered within those based on sequence identity patterns.

More »

Expand

Fig 8.

Regions of Unknown Function, RUFs.

Entries give each RUF’s number of sequences, median sequence length, median pairwise sequence identity, organism in which it appears, and architecture graph (as in Figs 3 and 4) constructed from all architectures occurring at least 5 times.

More »

Expand