A Degenerate Primer MOB Typing (DPMT) Method to Classify Gamma-Proteobacterial Plasmids in Clinical and Environmental Settings

Transmissible plasmids are responsible for the spread of genetic determinants, such as antibiotic resistance or virulence traits, causing a large ecological and epidemiological impact. Transmissible plasmids, either conjugative or mobilizable, have in common the presence of a relaxase gene. Relaxases were previously classified in six protein families according to their phylogeny. Degenerate primers hybridizing to coding sequences of conserved amino acid motifs were designed to amplify related relaxase genes from γ-Proteobacterial plasmids. Specificity and sensitivity of a selected set of 19 primer pairs were first tested using a collection of 33 reference relaxases, representing the diversity of γ-Proteobacterial plasmids. The validated set was then applied to the analysis of two plasmid collections obtained from clinical isolates. The relaxase screening method, which we call “Degenerate Primer MOB Typing” or DPMT, detected not only most known Inc/Rep groups, but also a plethora of plasmids not previously assigned to any Inc group or Rep-type.


Introduction
Plasmids exert a great evolutionary impact in their bacterial hosts, allowing them to colonize new niches, obtain advantages against either natural competitors, or overcome artificial selective pressures. These beneficial characteristics easily spread between bacterial populations because of horizontal gene transfer. Among the clinically important disseminated traits are determinants for antibiotic resistance (AbR) and virulence [1,2].
Basic physiological functions of plasmids are autonomous replication, stability and propagation (conjugation and establishment in new hosts) [3]. Differences in replication and stability constituted the basis for classifying plasmids, first by incompatibility (Inc) and later by replicon typing. Incompatibility (the inability of two plasmids to coexist within the same cell) is a phenotypic expression of the interactions in plasmid replication [4] or partition [5]. By Inc testing [6], enterobacterial plasmids were divided in 27 groups, with some further subdivisions [7]. Inc groups include historical R-plasmids, which largely contributed to AbR dissemination, together with xenobiotic biodegradation and virulence plasmids. The Inc classification did not always reflect true evolutionary divergence: highly similar plasmids can be compatible [8,9,10,11,12,13,14], while largely non homologous plasmids can be incompatible (e.g. IncX1 and IncX2 plasmids [15,16,17], some IncQ1 and IncQ2 plasmids [13]). As a consequence of the technical drawbacks of Inc testing, plasmid classification turned to molecular comparison of replication regions, leading to the development of two replicon typing methods. The first was based on DNA hybridization with specific plasmid probes (Inc/Rep-HYB) that contained either copy number control or partition DNA sequences of 19 Inc groups [18]. The second and presently most widely used method is called PCR-based replicon typing (PBRT). It was first used to identify five Inc groups of broad-host-range plasmids in environmental samples (IncW, IncP1, IncQ1, IncN [19,20,21] and IncP9 [22,23]) and later on to detect replicons predominant in Enterobacteriaceae [24,25,26,27,28] as well as 19 groups of resistance plasmids of Acinetobacter baumanii [29]. Plasmid multilocus/double sequence type methods [27,30,31,32,33] and PCRs detecting plasmid genes other than replication/partition modules [19,34] were also developed to detect some plasmid backbones. PBRT and these other methods allowed plasmid identification and circumvented the technical problems associated to Inc testing. As a drawback, they narrowed plasmid classification within the boundaries of Inc groups or small clusters of highly similar backbones. Thus, PBRT kept a significant fraction of plasmid groups out of assortment.
Mobilizable plasmids code only for oriT, relaxase and nickingaccessory protein(s) (and only rarely for T4CP), requiring the help of a conjugative plasmid to be transferred. Thus, the only common component to all transmissible (conjugative and mobilizable) plasmids is the relaxase. Relaxases are multidomain proteins, the relaxase activity residing in their N-terminal domain [36]. The 3D structures of four relaxase domains have been solved: the MOB F relaxases TrwC_R388 [37] and TraI_F [38], the MOB Q relaxase MobA_R1162/RSF1010 [39] and the MOB V relaxase MobM_pMV158 (M. Espinosa, personal communication). In these proteins, the architecture of the active centre is highly similar in spite of the fact that they belong to three different MOB families [35]. Homology at the sequence level resides on three conserved motifs: motif I that contains the catalytic Tyr residue(s) involved in DNA cleavage-joining reactions; motif II that contains an Asp or Glu residue involved in activation of the nucleophilic hydroxyl of the catalytic Tyr, and the most conspicuous motif III, which contains a His triad that coordinates a divalent cation directly involved in the catalytic reactions [37,40]. The evolutionary relationships among relaxase sequences were traced and transmissible plasmids distributed in six relaxase MOB families [35,36]. Here, we developed a set of oligonucleotide primers for relaxase identification based on the relaxase protein phylogenies. The method is called ''Degenerate Primer MOB Typing'' (DPMT). As an application, we used DPMT to identify new relaxases and to classify plasmids isolated from clinical isolates of c-Proteobacteria.

Design and Validation of the DPMT Oligonucleotide Set
Phylogenetic trees of the five plasmid relaxase families which contained suitably populated and well supported subfamilies in c-Proteobacteria were traced as shown in Figures 1, 2, 3, 4, 5, 6, 7. They served as guides for designing oligonucleotide primer pairs able to amplify relaxases clustered in those subfamilies. Each primer was partially degenerated, up to 24 degeneracy at its 39 sequence, to encompass a relaxed codon usage. Primers for which the design resulted in degeneracy larger than 24, were reduced to degeneracy-24 by considering only the sequences present in the respective DNA relaxase alignment. Each primer pair was tested on a reference collection of 33 relaxases encoded by transmissible plasmids originally isolated from c-Proteobacteria (Table 1). Once their specificity was validated, the set of validated primers was used to identify relaxases in plasmid collections from clinical isolates, leading to the identification of both known and non-previously reported relaxase sequences. Details for the design and range of substrates of the primer pairs selected for each MOB family follow.
MOB F family. Figure 1A shows the phylogenetic reconstruction of MOB F relaxases from c-proteobacterial plasmids. Two subfamilies contain most MOB F relaxases found in clinically relevant plasmids. Subfamily MOB F11 includes, among others, relaxases of AbR plasmids from Inc groups W, N as well as metalresistance and xenobiotic-biodegradation plasmids of Pseudomonas group IncP-9. Subfamily MOB F12 contains relaxases of AbR and virulence plasmids of the IncF complex (IncFI, IncFII, IncFIII and IncFV) and Inc9 (also known as com9), widely distributed among different genera of Enterobacteriaceae. Specific amplification of MOB F11 and MOB F12 plasmids was obtained with two forward primers (F11-f and F12-f) and one reverse primer (F1-r) (Table 2, Figure 1B-D). Since both forward primers differ only by a single nucleotide, cross-amplification was occasionally observed between MOB F11 and MOB F12 relaxases. Thus, the two amplification reactions identified the most relevant MOB F plasmids but did not discriminate among them. MOB P family. Within c-Proteobacteria, MOB P contains relaxases of AbR plasmids belonging to the IncP1 complex (IncP1a, IncP1b, IncP1d, IncP1c, IncP1e, and IncP1f), many of them recovered from soil and manure isolates [21], virulence and AbR plasmids of the IncI complex (IncI1a, IncI1c, IncK, IncB/ O), AbR plasmids IncL/M, IncQ2 (IncQ2a, IncQ2b, IncG/IncP-6, IncX1, IncX2, IncU and IncQ3 groups, plus several other branches that contained no Inc prototype. The ample diversity of this family was reflected in the MOB P phylogeny, which showed several well-resolved monophyletic groups, as well as additional, poorly-defined deep branches [35]. Thus, to construct the set of MOB P primers we had to manage each subfamily separately. Relaxases of IncP1a, IncP1b, IncP1d, IncP1c, IncP1e, IncP1f, IncI1a, IncI1c, IncK, IncB/O, IncL/M, IncQ2a, IncQ2b and IncG/IncP-6 plasmids -among others without Inc assignment-are grouped in the MOB P1 subgroup ( Figure 2A); those of IncX1 and IncX2 plasmids are in group MOB P3 ( Figure 3A); IncU plasmid relaxases are in group MOB P4 ( Figure 3A), and relaxases of ColE1-related plasmids in MOB P5 ( Figure 4A). Neither subfamily MOB P6 , which contains a scarce number of c-Proteobacteria relaxases (including those in IncI2 plasmids), nor other poorly resolved clades (as the one containing IncQ3 plasmids), were considered in this study.
MOB P1 subfamily. One reverse and four forward primers were needed for amplification of MOB P1 relaxases ( Figure 2B, Table 2). The P11-f forward primer led to amplification of MOB P11 plasmids (including IncP1). Similarly, the P12-f forward primer identified MOB P12 plasmids (including IncI1, IncK, and IncB/O), P131-f forward primer identified MOB P13 plasmids (including IncL/M), and P14-f forward primer identified MOB P14 plasmids (including IncQ2 and IncG). Results are shown in Figure 2C-F. No cross-amplification was observed, except for P131-f + P1-r when using plasmid p9555 as template ( Figure 2E). The non-specific amplicon was larger than that obtained from the reference MOB P131 relaxase gene nikB_pCTX-M3, so the interpretation of the data was unambiguous.
MOB P3 and MOB P4 subfamilies. MOB P3 relaxases correspond to IncX1 and IncX2 plasmids while MOB P4 contains relaxases of IncU plasmids ( Figure 3A and Table S1). One primer pair was designed for each subfamily. No cross-amplification was observed ( Figure 3C-D), except for the fortuitous amplification of some Salmonella chromosomes described in Methods, subsection ''Validation and methodologies comparison''.
MOB Q family. Phylogenetic reconstruction of c-proteobacterial MOB Q relaxases showed two distinguishable MOB Q clades, MOB Q1 and MOB Qu ( Figure 5A). For amplifying the first broad clade, two primer pairs were designed, Q11 and Q12, and one primer pair, Qu, for the MOB Qu cluster ( Figure 5B, Table 2). Some phylogenetic overlapping between MOB Q and MOB P families has been reported [35]. Nevertheless, primers that hit each relaxase branch did not cross-amplify ( Figure 5C-E).
MOB C family. All MOB C relaxases encoded in c-proteobacterial plasmids cluster in a single clade, MOB C1 , when outgrouping with Firmicutes/Tenericutes MOB C relaxases ( Figure 7A, Table S1). MOB C relaxases present in ICEs, such as ICEKp1 and ICEEc1 also cluster in clade C1. MOB C is a peculiar relaxase family that does not contain the three classical signature motifs present in all other MOB families. Two primer pairs were designed to amplify each MOB C1 subclade: C11 and C12 (Figure 7 B-D, Table 2).

Analysis of Clinical Plasmid Collections Using DPMT
Once validated by testing the reference collection of relaxases (Table 1), the set of 19 primer pairs was used to screen two plasmid collections from clinical samples as test cases (Table 3).
Test collection 1 consisted of 135 isolates of Enterobacteriaceae, recovered in different countries (Canada, Portugal, Spain, France and Kuwait) from 1989 to 2008, and producing extended spectrum beta-lactamases (ESBL). 104 of them were E. coli transconjugants harbouring ESBL-coding plasmids from different Enterobacteriaceae donors while the remaining 31 were original donors unable to conjugate the ESBL determinant. The collection mainly included plasmid-encoded ESBLs from class A (SHV (4/ 135; [45], TEM (18/135; [46,47,48]) and CTX types (91/135; [45,46,47,49,50,51,52]). A total of 237 relaxases were identified in the 135 strains, distributed among the five MOB families targeted by the primer set. The resulting amplicons were sequenced. Out of 237 sequenced amplicons, only five corresponded to relaxase sequences not previously reported (we consider a relaxase new when it shows less than 95% amino acid sequence identity with the closest hit in the NCBI nr database). Two of them, corresponding to plasmids pAA-TC1-69 and pAA-TC1-30a (GenBank Accession numbers JN167247 and JN167248), respectively exhibited 62% and 64% amino acid identity to the MOB F11 relaxase of plasmid pCT14 (nearest hit). Two others, those of plasmids pAA-TC1-79a and pAA-TC2-33a, were 78% identical to R46 relaxase (details in Information S1), suggesting overall more diversity within the MOB F11 relaxase branch than anticipated from the analysis of present genome databases. Complete sequencing of the relaxase domain of these plasmid genes and the ensuing phylogenetic analysis classified them as well defined new branches in the MOB F11 phylogeny (incorporated to Figure 1 in red color). Similarly, a fifth relaxase, that of plasmid pAA-TC1-14a, was 87% identical to pKPN4 relaxase and was classified as MOB F12 (see Information S1). The finding of these five new relaxase sequences underscores the potency of DPMT to detect and classify plasmids unidentifiable by PBRT. The most represented MOB subfamilies in Test Collection 1 were MOB P5 (71 relaxases), MOB F12 (60), and MOB P12 (39), followed by MOB H (23), MOB Q (16) and MOB F11 (14). Finally, 7 out of 135 isolates, corresponding to transconju-gants, did not render any relaxase amplicon. Since they probably code for relaxases of MOB subfamilies not considered in this work or new deviant relaxases, they were selected for complete sequencing and further investigation (work in progress).
Test collection 2 comprised E. coli isolates from urine cultures of Swedish women who suffered from uncomplicated, communityacquired urinary tract infections treated with pivmecillinam [53]. The isolates were assorted according to their PFGE profiles (Ellen Zechner, personal communication). We analyzed 49 representative isolates for the presence of relaxases using the same set of 19 MOB primer pairs. 30 out of the 49 primary strains gave positive amplification with at least one primer pair. The 19 isolates without positive DPMT results were used as donors in mating experiments. Transconjugants were obtained for 18 of them by using a battery of antibiotic resistances matching the donor AbR profiles. Selected transconjugants were tested again with the same set of primers. 13 out of 18 rendered amplicons with at least one primer pair, while five transconjugants remained unidentifiable. A total of 77 relaxase amplicons were obtained from the collection. 50 of them were sequenced, from which two corresponded to non-previously reported relaxase sequences; one MOB P12 , pAA-A3201, was 80% identical to pO113 relaxase; and one MOB Qu , pAA-A3488, was 72% identical to pSMS35_4 relaxase (see Information S1). Finally, a third relaxase, pAA-A3180 (Accession number JN167246), showed 97% amino acid identity to MOB F12 plasmid R1 relaxase. In summary, the analysis of this second collection identified two new relaxase sequences, representing in turn new branches in the MOB family trees. The most abundant MOB family was MOB P with 31 relaxases (18 belonging to subfamily MOB P5 , 7 to MOB P3 and 6 to MOB P12 ), followed by MOB F , with 30 amplicons, all members of subfamily MOB F12 . It is worth mentioning that the identification of 4 MOB Qu and 9 MOB C plasmids of this collection would have not been possible by using the available PBRT or Inc/Rep-HYB probes.

Discussion
PBRT typing methods significantly improved the assignment of plasmids to Inc groups without the need to test for plasmid incompatibility despite some drawbacks like cross-hybridization between members of closely related Inc groups (such as IncI, IncK and IncB/O [18,24]), false negative PCR results obtained when classifying more divergent plasmid groups (e.g. IncL/M [24]), and poor coverage of some groups (e.g. IncA/C [54], and ColE1-like [25]). PBRT identifies plasmids that belong to well-defined Inc groups. Nevertheless, a relevant part of the existing plasmid diversity, found in different ecological niches [55,56,57,58,59] that includes clinical settings [60,61], remains elusive to PBRT classification (see Figure 8). In order to capture a broader range of plasmids, we considered groups of evolutionary related plasmid sequences instead of focussing on single sequences as PBRT usually does. Therefore, our set of primer pairs was not mainly designed to be used for screening purposes, but for the discovery of new relaxases and thus to expand and better delimit the known MOB subfamilies.
A computational protocol to search for conjugative and mobilizable genetic modules in a set of 1,730 completely sequenced plasmids recorded in the NCBI database, detected a relaxase in 260 out of the 503 plasmids hosted in c-Proteobacteria [62]. We used that plasmid set to compare the detection capabilities of the available PBRT and DPMT probes (Table  S1). Our set of 19 degenerate primer pairs was potentially able to detect 193 out of the 271 relaxases contained in the 260 transmissible c-proteobacterial plasmids, that is, it would allow  the classification of 186 out of these plasmids. Available PBRT probes (58 primer pairs) could potentially detect 153 plasmids in the total set, of which 98 were contained in the transmissible plasmid set. 87 out of 260 transmissible plasmids could be potentially detected by both PBRT and DPMT probes. This comparison suggests that DPMT is a powerful tool to detect and phylogenetically classify c-proteobacterial transmissible plasmids.
A reference collection of 33 relaxases, containing representatives of the main MOB subfamilies, was used to test for specific amplification of the chosen primer pairs (Table 1). With few exceptions (see sections MOB F family and MOB P1 subfamily), no cross-amplification between MOB subfamilies was observed. Several DPMT primer pairs have already been successfully used conjointly with PBRT for identifying plasmids from clinical strains [47,64,65]. In this work we analyzed two enterobacterial plasmid collections by DPMT, capturing not only the known Inc plasmid groups but also a number of others undetected by PBRT, some of which contained new relaxase sequences. The DPMT method only failed to identify a MOB relaxase in 12 out of 122 transconjugants from these collections. Failure to find a relaxase in an experimentally verified transconjugant could be attributed to: i) the sequence bias introduced in some primers to avoid high degeneracy (see Table 2), ii) the presence of relaxases belonging to subfamilies not included as targets by our primer set, or iii) the existence of relaxases whose sequences could be largely deviant from the subfamily consensus. In any case, the results presented in this work suggest that the present implementation of the DPMT method identifies more than 90% of the transmissible R-plasmids in transconjugants of clinical isolates. Once less-populated or poorly-resolved relaxase phylogenetic clades become more robust by accretion of further data, our method could be expanded to allow the identification of a higher proportion of relaxases. Our ongoing work aims to do so, with the collaboration of a number of clinical research groups in Spain and Europe.
Detection of transmissible plasmids by PBRT and DPMT underscores their complementarities in focus and scope. While PBRT focuses in replication or partition regions shared by clusters of highly-related plasmids (.95% nucleotide identity), DPMT targets relaxase motifs conserved in large groups of plasmids with deep phylogenetic diversity. As shown in Results, we can detect relaxases with as little as 60% amino acid sequence identity to the nearest known hit in the databases. Thus, PBRT is useful at detecting blooms of redundant backbones that carry different cargo genes (''zoom in'' strategy), while DPMT finds and classifies backbones that share a common relaxase ancestor (''zoom out'' strategy). Most PBRT primers were designed for detecting plasmids from Enterobacteriaceae [24,27], although there are a few available for detection of plasmids from other taxonomic families of c-Proteobacteria, such as IncP-1 [19,20,21], IncP-9 [22,23], or Acinetobacter baumannii replicons [29]. The vast diversity in the plasmid world makes the design of probes that target small groups of highly-related plasmids a strategy limited in practical terms for specific purposes, not suitable for studying global diversity neither for finding deviant plasmids from well-studied backbones. The DPMT strategy is more inclusive, allowing the detection of plasmids hosted by a larger number of taxonomic families. Nevertheless, it should be emphasized that it still recovers a higher proportion of plasmids from Enterobacteriaceae (85%) than from other c-Proteobacterial families (51.4%). This is mostly due to the lack of a suitable number of related relaxase sequences to construct robust phylogenetic trees, as exemplified, for instance, by the Moraxellaceae, Vibrionaceae, Pseudomonadaceae and Aeromonadaceae plasmids [35,62]. Perhaps investigators in public health surveillance, veterinary or environmental science should consider the interest of developing sets of oligonucleotide pairs more specifically adapted to their needs. Most clinically relevant transmissible plasmids detected by PBRT probes are also uncovered by DPMT, as shown in this work. On the contrary, no PBRT probes are available for many plasmids detected by DPMT such as the virulence plasmids IncFIII/IV (MOB F12 ), IncQ2 (MOB P14 ), IncP-7 (MOB H12 ), and a number of others out of Inc assignment. Of course, results obtained by DPMT can help PBRT to design primers for the assessment of the newly discovered plasmid groups. As an example, the classification of virulence plasmids in the IncF and IncI1 complexes, reviewed by [26], will obviously gain by a joint PBRT+DPMT analysis.
An added advantage of the DPMT method is its applicability in the identification of ICEs (see figures 6 and 7). ICEs are also vehicles that disseminate virulence and AbR genes [66,67]. They are known to constitute an integral part of most bacterial genomes, outnumbering plasmids by 2 to 1 in sequence databases [63]. ICEs are beginning to be closely linked to some of the more powerful AbR mechanisms such as ESBL, metalloand AmpC type b-lactamases. For instance, chromosomal MOB H121 (R391-like) elements putatively involved in bla CMY-2 mobilization were detected by DPMT in enterobacterial isolates [64]. The MOB families considered in our primer set are also abundant in ICEs of c-Proteobacteria [63]. The expanded diversity that DPMT discovered in c-proteobacterial plasmids (and ICEs) will help to populate poorly solved branches of the existent phylogenetic trees and, therefore, lead to better consensus sequences to improve the design of new primer sets and, eventually, to design a multiplex set of non-degenerate oligonucleotides for faster plasmid screening and identification procedures (work in progress). Additionally, and due to their broad amplification capabilities, the DPMT method could be used in the analysis of plasmids and ICEs in total community DNA. In this case, the DNA fragments obtained from amplification with the 19 DPMT primer pairs could be combined and subjected to deep sequencing methodology. As a result, all amplifying sequences could be identified and quantified, resulting in a quantitative description of the plasmid and ICE composition of the analyzed populations and given environmental conditions.
The analysis of relaxases and replicons of c-proteobacterial plasmids carried out in this and previous works strongly suggests that there is a high correlation between the MOB and the Inc/ Rep group. That is, in a single MOB subfamily, relaxases from different Inc plasmids can be grouped, but plasmids of such Inc groups do not contain relaxases dispersed in different MOB subfamilies. Some exceptions are observed, which can usually be explained by plasmid cointegration and secondary deletions. Thus, DPMT provides not only the relaxase identity but a quick inference of the phylogenetic relationships with other plasmids as well as an idea of the constitution of the plasmid backbone. In summary, the combination of both methods, DPMT and PBRT, could better serve in the identification and characterization of plasmid species which are relevant in human and animal medicine. We hope they will help to inspire more effective clinical and environmental policies to manage the dreadful increase of more virulent and multi-antibiotic resistant human pathogens.  (H2-f and H2-r). Symbols, colour codes and lanes as in Figure 1. doi:10.1371/journal.pone.0040438.g006

Conclusions
The Degenerate Primer MOB Typing (DPMT) method allows rapid and accurate identification of transmissible plasmids based on their relaxase sequences. It detects a broader range of plasmids than the PCR-based replicon typing (PBRT) method and highlights a significant plasmid diversity that was underestimated. The DPMT method can be useful in the analysis of plasmids from both clinical and environmental isolates. The philosophy that guided the development of the c-Proteobacteria MOB primer set can be easily extended to encompass relaxases of other taxonomical groups of bacteria.

Plasmids, Bacterial Strains, Growth Conditions and DNA Extraction
Relaxases representing five out of six MOB families described in Garcillán-Barcia, 2009 (MOB F , MOB P , MOB Q , MOB H , and MOB C ) were used as standards for DPMT validation. MOB V relaxases were not included since they are barely represented in c-Proteobacteria. The resulting reference collection included six conjugative or mobilizable plasmids and 27 recombinant plasmids containing cloned relaxase genes (Table 1). For their construction, relaxase domains were delimited by using PSIpred (http://bioinf. cs.ucl.ac.uk/psipred/) [68,69] and GOR (http://npsa-pbil.ibcp. fr/cgi-bin/npsa_automat.pl?page = npsa_gor4.html) [70]. Relaxase domains contained approximately the 300 N-terminal amino acids of these large multidomain proteins. Gene segments amplified by PCR were cloned either in the NdeI or NdeI/BamHI sites of vector pET3a (Novagen) or in the NcoI/BamHI sites of vector pET3d (Novagen), and introduced in E. coli DH5a by electroporation. Host strains were grown in Luria-Bertani broth (LB) in the presence of suitable antibiotics for plasmid selection. Total DNA was obtained using InstaGene Matrix (BioRad Laboratories), according to the manufacturers recommendations and starting from 100 ml saturated cultures.

Bacterial Matings
Donors (E. coli primary isolates) and recipients (either DH5a [71] or HMS174 [72]) were grown to saturation, mixed in ratio 1:1 and mated o/n on LB-agar plates at either 30uC or 37uC. Cells were resuspended in LB and dilutions plated on appropriate antibiotics (recipient marker + plasmid marker) to select for transconjugants. Nalidixic acid (20 mg/ml) was used to select for DH5a and rifampicin (50 mg/ml) for HMS174.

Database Search
PSI-Blast [73] searches for relaxases were carried out using the N-terminal 300 amino acids of each MOB family prototype, following the method described in [36] and [35], but querying

PCR Primer Design
For each MOB family, relaxase domains were aligned and their phylogenetic relationships traced as previously described [35]. For each well-populated and well-resolved subfamily, the corresponding protein alignment was used to find blocks of at least four contiguous, usually invariant amino acids located within or close to the conserved relaxase motifs. Among them, two blocks were finally chosen to design forward and reverse primers for each subfamily. Oligonucleotide pairs were selected that detected most subfamily members while minimizing codon degeneracy and resulting in amplicons smaller than 400 bp. When a single primer pair did not encompass all subfamily members, it was further subdivided (e.g., MOB C1 in C11 and C12). The primer pair for amplifying each MOB family was designed using CODEHOP [74] (Table 2). This strategy was already applied for the identification of DNA sequences of distantly related members of several gene families [75,76,77]. In CODEHOP, oligonucleotides derived from the selected blocks contain a 39 partially-degenerate sequence, called CORE, comprising different codon variants of the highly conserved residues (11 nucleotides); and a 59 non-degenerate sequence of variable size (around 14 nucleotides, to give a hybridization temperature of 55 to 60uC), called CLAMP, composed of the upstream contiguous nucleotides most conserved in the relaxase DNA alignment.

Validation and Methodologies Comparison
Each primer pair was tested for amplification of the collection of 33 reference plasmids in standard PCR reactions. Each reaction contained PCR buffer (50mM KCl, 10 mM Tris-HCl (pH 8.8), 0.1% Triton X-100), 1.5 mM MgCl 2 , 0.2 mM dNTP, 1 mM of the corresponding pair of degenerate oligonucleotides, 2-5 ml (0.4-1 mg) of total DNA, and 1 U of BioTaq polymerase (Bioline) in a final volume of 50 ml. Details of amplification conditions for each primer pair are described in Table 2. Generally, the standard PCR protocol involved a 4 min step at 94uC, 25-30 cycles of 30 sec at 94uC, 30 sec at the annealing temperature and 30 sec at 72uC (the extension time had to be varied to adapt to the expected size of some amplicons; see Table 2 for details), and a final extension step for 10 min at 72uC. A touchdown PCR protocol [78] was used for amplification of MOB H11 and MOB C11 groups, to avoid the appearance of aberrant amplification products. It should be noted   (Table S1). B) The Inc groups contained within each MOB subfamily are indicated at the right, boxed in the same colour. When no Inc group is contained, the name of a prototype plasmid is given. doi:10.1371/journal.pone.0040438.g008 that the P4 primer pair ( Table 2) fortuitously amplified a segment of some Salmonella chromosomes (corresponding to gene fucO, for instance in S. typhimurium DT104), thus impeding relaxase identification in this genomic background. No additional fortuitous amplicons were obtained when using clinical samples from Escherichia, Salmonella or Klebsiella. Amplicons were visualized after 2% agarose gel electrophoresis, using a GelDoc (BioRad Laboratories) and, when appropriate, sequenced by Macrogen Laboratories (Seoul, South Korea).

Supporting Information
Table S1 Plasmids from c-Proteobacteria contained in the NCBI database. (DOC) Information S1 Nucleotide sequences and their translated amino acid sequences of relevant relaxases obtained by DPMT from different test collections. (DOC)