A Taxonomy of Bacterial Microcompartment Loci Constructed by a Novel Scoring Method

Simplified workflow of LoClass for locus similarity network generation.

(A) After genes encoding BMC shell proteins (PF00936, dark blue; PF03319, yellow) are identified using hmmsearch, their position on the chromosome is determined. The region 10 kb upstream and downstream of each PF00936 and PF03319 domain is considered a Prospective BMC Locus (pale blue). The envelope (blue) is defined as the maximal portion of the Prospective BMC Locus bounded by BMC shell protein genes. (B) Where Prospective BMC Loci overlap, they are merged into one Prospective Locus. (C) All non-shell protein genes in the Prospective Locus are searched against Pfam [12]. Pfam hits are represented by colored regions of the genes. Genes without pfams hits (white) are not considered. (D) Loci are represented by their pfam set, excluding genes containing PF00936 and PF03319 domains. Pfams, represented by colored rectangles, are weighted based on their relative distance from the envelope. This distance weight is represented by the darkness of the background behind the rectangles, where a black background corresponds to a pfam found inside the envelope with a weight of 1, and where a light grey background corresponds to a pfam separated from the envelope by at least four open reading frames with a weight of 0.6. PI is the set of pfams found in Locus I, while PJ represents the set of pfams found in a different Locus J (not shown). (E) By comparing the sets of pfams PI and PJ, we determine the set CI,J of common pfams to both loci and the two sets DI,J and DJ,I of pfams unique to Locus I and Locus J, respectively. These three sets, along with the distance weight and the other weights (Materials and Methods) are then used to calculate the locus similarity score between these two loci.

