Detecting Key Structural Features within Highly Recombined Genes

doi:10.1371/journal.pcbi.0030014

Table 1.

Data Filtering of pbp2x Using BLAST Miner

More »

Expand

Figure 1.

Frequency and Distribution of Modules within Partial pbp2x Alleles

The y-axis indicates the number of modules assigned to each property indicated by the x-axis.

(A) Number of iterations of the module correlation algorithm used to define each module.

(B) Number of module occurrences in the entire dataset of pbp2x alleles.

More »

Expand

Figure 2.

Module Maps of pbp2x Alleles

The module content (column Module) and nt position of the module start site (column Location) are shown for three pbp2x alleles. Accession numbers for the pbp2x alleles are X16367, AY0950541, and AY0950557 for the leftmost, middle, and rightmost module maps. The penicillin-resistance phenotypes for the gene products are susceptible, intermediate-resistance, and resistant, respectively [2].

More »

Expand

Figure 3.

Module Networks of pbp2x

(A) Shows the complete pbp2x module network for 41 pbp2x alleles harboring 38 modules. Modules are represented by nodes (depicted as circles), whereby the diameter of the node is proportional to the percent of alleles that harbor it. Node colors represent the number of correlation iterations used to define each node, as follows: blue (more than four iterations), and green (four iterations). Arrows represent the connections between contiguous modules. Arrow thickness is proportional to the frequency with which each connection is observed in the dataset. The module maps of Figure 2 can be traced via arrows through the network graph.

(B) Depicts a simplified pbp2x network, showing only those modules present in >10% of pbp2x alleles (35 modules), and connections that occurred more than two times. Three connection pathways within the central portion of the module network are labeled I, II, and III (see Figure S4 for explanation).

More »

Expand

Table 2.

Data Filtering of sof Using BLAST Miner

More »

Expand

Figure 4.

Frequency and Distribution of Modules within Partial sof Alleles

The y-axis indicates the number of modules assigned to each property indicated by the x-axis.

(A) Number of iterations of the module correlation algorithm used to define each module.

(B) Number of module occurrences in dataset.

(C) Number of sof alleles harboring each module.

More »

Expand

Figure 5.

Clustal W Alignments of nt Sequence Segments

Segments of sof sequences were extracted at the module start site, and included four extra nts upstream from the module start site in order to facilitate alignment. All extracted sequences assigned to a module were aligned using the Clustal W algorithm. The aligned sequences were trimmed to remove the extra 4 nt at the 5′ end, and the 3′ ends were trimmed to yield a total length of 24 nt, allowing for indels. Duplicates of the trimmed sequences were removed from the data shown. Aligned and trimmed sequences are displayed for four modules (A–D). Indicated in each panel are the number of iterations used to define each module and the number of unique trimmed sequence segments. Module 7 provides an example of a 3-bp insertion within the 24-mer backbone. Module 12 (D) provides an example of the module start site slippage that is observed in some high-iteration modules.

More »

Expand

Figure 6.

Module Maps and Module Rearrangements

Shown (right panel) is the module content (column Module) and nt position of the module start site (column Location) of two sof sequences (AF138799 and AF139751); this diagram constitutes a module map. Two major blocks of modules are boxed: short dashes for Modules 41, 63, 58, and 12, and long dashes for Modules 107 and 15. Arrows connecting boxes indicate their relative position within each sof allele; the corresponding aligned sequence segment is also shown (left panel). Multiple instances of occurrence of the highly repetitive Module 13 are boxed (thin lines) in order to highlight its position relative to the two major blocks of modules. The module slip threshold parameter (set to 4 nt), which is used in the iterative module correlation process, leads to the identification of sequence segments that are offset by ±4 nt sites; when the offset exceeds twice the module slip threshold parameter, an additional occurrence of that module is declared, even though the start sites of the two modules may be positioned only 8 nt apart. Module start site slippage and discrete blocks composing multiple copies of the same module are depicted in the module maps.

More »

Expand

Figure 7.

Module Networks of sof

(A) Shows the complete sof module network for 139 sof alleles harboring 97 modules. Modules are represented by nodes (depicted as circles), whereby the diameter of the node is proportional to the percent of alleles that harbor it. Node colors represent the number of correlation iterations used to define each node, as follows: blue (fewer than four iterations), green (four to eight iterations), and red (more than eight iterations). Arrows represent the connections between contiguous modules. Arrow thickness is proportional to the frequency with which each connection is observed in the dataset; for this diagram, one pixel is equivalent to ten connections. Tandem repeats of modules are not displayed in the network, but are shown in the module maps of each allele. The module maps of Figure 6 can be traced via arrows through the network graph.

(B) Depicts a simplified sof network, showing only those modules present in >10% of sof alleles (49 modules), and connections that occurred more than two times. The boxed area highlights a region of reduced recombination, and illustrates two discrete pathways of connections within the network.

More »

Expand

Figure 8.

Key Features of Module 6 and 13 Sequences

The sof sequence AF139751 is presented as an example to show nontandem direct repeats in Module 13 (top strands), tandem direct repeats in Module 13 (middle strand), and complementary inverted repeats in Module 6 (bottom strands). The predicted amino acid sequence of nt positions 148 to 210 demonstrates the Ser-, Thr-, and Ala-codon–rich quality of this region.

More »

Expand

Table 3.

Inverted Repeats in a Representative sof Allele (AF139751)

More »

Expand