A Comprehensive Evaluation of PCR Primers to Amplify the nifH Gene of Nitrogenase

The nifH gene is the most widely sequenced marker gene used to identify nitrogen-fixing Bacteria and Archaea. Numerous PCR primers have been designed to amplify nifH, but a comprehensive evaluation of nifH PCR primers has not been performed. We performed an in silico analysis of the specificity and coverage of 51 universal and 35 group-specific nifH primers by using an aligned database of 23,847 nifH sequences. We found that there are 15 universal nifH primers that target 90% or more of nitrogen fixers, but that there are also 23 nifH primers that target less than 50% of nifH sequences. The nifH primers we evaluated vary in their phylogenetic bias and their ability to recover sequences from commonly sampled environments. In addition, many of these primers will amplify genes that do not mediate nitrogen fixation, and thus it would be advisable for researchers to screen their sequencing results for the presence of non-target genes before analysis. Universal primers that performed well in silico were tested empirically with soil samples and with genomic DNA from a phylogenetically diverse set of nitrogen-fixing strains. This analysis will be of great utility to those engaged in molecular analysis of nifH genes from isolates and environmental samples.


Introduction
Nitrogen-fixing microorganisms are globally significant in that they provide the only natural biological source of fixed nitrogen in the biosphere. These organisms enzymatically transform dinitrogen gas from the atmosphere into ammonium equivalents needed for biosynthesis of essential cellular macromolecules. Nitrogenfixing bacteria are diverse, and most of the known taxa have not yet been cultivated in the laboratory [1]. Nitrogen fixation is carried out by the nitrogenase enzyme whose multiple subunits are encoded by the genes nifH, nifD, and nifK (as reviewed in [2]). Of the three, nifH (encoding the nitrogenase reductase subunit) is the most sequenced and has become the marker gene of choice for researchers studying the phylogeny, diversity, and abundance of nitrogen-fixing microorganisms. Thus, many PCR primers have been developed to target the nifH gene with the purpose of amplifying this gene sequence from environmental samples.
Through use of nifH as a marker gene, researchers have been able to characterize aspects of the diversity and ecology of nitrogen-fixing Bacteria and Archaea. A wide range of environments have been sampled for nifH gene diversity including marine [3], terrestrial [4], extreme [5], anthropogenic [6], host-associated [7], and agricultural [8]. Analysis of these data indicate that the distribution of diazotrophs in the environment varies as a function of habitat type [1]. While more than 3,358 OTU 0.05 nifH sequence types have been determined, the global census of diazotroph diversity remains far from complete [9]. Rates of nitrogen fixation have been associated with both nifH abundance [10] and nifH diversity [11], and thus knowledge of diazotroph community structure and dynamics is required to understand the ecological constraints on nitrogen fixation in microbial communities.
Phylogenetic analyses of nifH gene sequences have revealed five primary clusters of genes homologous to nifH [12][13][14][15]. Cluster I consists of aerobic nitrogen fixers including Proteobacteria, Cyanobacteria, Frankia, and Paenibacillus. Cluster II is generally thought of as the alternative nitrogenase cluster because it contains sequences from FeFe and FeV nitrogenases which differ from the conventional FeMo cofactor-containing nitrogenase. Cluster III consists of anaerobic nitrogen fixers from Bacteria and Archaea including for instance the Desulfovibrionaceae, Clostridia, Spirochataes, and Methanobacteria. Cluster IV and cluster V contain sequences that are paralogs of nifH and which are not involved in nitrogen fixation [13].
We set out to provide a comprehensive evaluation of primer coverage for researchers wishing to use the nifH gene as a molecular marker for the study of nitrogen-fixing Bacteria and Archaea. Primers that target diverse nifH sequences must be degenerate to encompass the sequence variability of the nifH gene, and Zehr and McReynolds were the first to design such degenerate primers [16,17]. There have since been numerous efforts to design both universal and group-specific nifH primer sets. In a survey of the literature, we have found 51 universal and 35 group-specific primers that have been paired to make 42 universal and 19 group-specific primer sets. We have performed an in silico evaluation of all of these nifH primers using an aligned database of all publicly available nifH sequences which we constructed previously [9]. We then performed empirical tests of the best of these primers using genomic DNA from a phylogenetically diverse set of nitrogen fixers and DNA from soil.

Results
Any effort to assess PCR primer coverage in silico must account for variation in sequence depth along the gene alignment of the database being queried. We observe that nucleotide positions near the beginning and end of the nifH gene alignment are underrepresented in sequence databases relative to nucleotide positions in the middle of the gene alignment ( Figure 1). This problem occurs because a majority of nifH sequences have been generated using PCR primers that bind to conserved nucleotide positions found within the nifH gene. A majority of the 393 full-length nifH sequences currently present in the nifH database are derived from sequenced genomes. The two dips in nucleotide coverage (at position 199 and 350 in Figure 1) result from insertions in the Azotobacter vinelandii nifH reference sequence relative to other genes in the alignment. In addition, some sequences in the alignment have insertions relative to A. vinelandii (data not shown). Due to the variations observed in sequence depth along the alignment, all estimates of primer coverage were calculated with respect to the total number of sequences available at the alignment positions where each primer binds.
We mapped the 51 universal primers to their complementary binding positions along the A. vinelandii nifH gene ( Figure 1, Figure  S1). Many primers bind to the same region ( Figure 1, Figure S1), and thus may vary only slightly in binding position, oligonucleotide length, or degeneracy.
The quality and characteristics of universal nifH PCR primers vary widely (Table 1 and Table 2). Of the universal primers 15 of the 51 were found to hit 90% or more of all nifH sequences while 23 hit less than 50% of these sequences and 9 hit 10% or fewer sequences (Table 1 and Table 2). In general, those universal primers that had .90% coverage for clusters I and III did not demonstrate systematic bias against individual phylogenetic groups within these clusters (Table 1 and Table 2). The primer KAD3 is notable, however, because it misses much of cluster III relative to cluster I (Table 1 and Table 2). Those primers with the highest coverage also tended to recognize a number of non-target sequences from cluster IV (Table 1 and Table 2).
The group-specific primers we evaluated generally show poor coverage of the phylogenetic groups they have been designed to target, except for the Frankia-specific primers nifH-f1-forA, nifH-f1-forB, nifH-269, and nifH-f1-rev (Table 3). The primer cyanoR targets Cyanobacteria, but has coverage of only 25%, and its intended pair, primer cyanoF, has a coverage of only 1% of cyanobacterial sequences (Table 3).
Given that PCR requires two primers used in combination, a useful indication of specificity must account for the coverage obtained when using specific primer pairs (Tables 4 and 5). We evaluated both primer combinations that have been reported in the literature as well as new primer combinations. As expected, the coverage obtained with primer pairs is always lower than the coverage obtained for each individual primer. We evaluated 42 universal primer pair combinations, of which 7 hit .90% of nifH sequences in the database, 24 hit .50%, and 6 hit 10% or less. Those primer sets which had .90% coverage are 19F/nifH3, Nh21F/nifH1, Nh21F/nifH3, IGK/nifH3, F2/R6, nifH2/R6, and nifH1/nifH2 (ie: the Zehr and McReynolds primers). The 6 primer sets which hit 10% or less of cluster I and III are Primer-f/ Primer-r, FGPH19/FGPH2739, FGPH19/PolR, IGK/ FGPH2739, nifHF/nifHRb, and nifHF/nifHRc. While we evaluated 19 group-specific primer combinations, very few primer sets had high coverage of the designated target groups ( Table 5). The primer set ChenBR1/ChenBR2 is designed to target b-Rhizobia but also hits 35% of the sequences within the Alpha-, Beta-, and Gammaproteobacteria and 75% of Frankia sequences. The Frankiaspecific primer sets nifH-f1-forA/nifH-f1-rev and nifH-f1-forB/ nifH-f1-rev hit 92% and 87% of Frankia respectively.
Primer sets with high in silico coverage were used for empirical tests. When tested with DNA from soil, the primer combinations nifH2/R6, nH21f/nifH, nifH1/nifH2, Ueda19f/univ463r, and nifH3/nH21f all produced PCR products of indiscriminate size producing smeared bands in gel electrophoresis and also produced an amplified product from E. coli indicating a lack of specificity for nifH under the amplification conditions tested (Table 6, Figures S9, S10, S11, S12, S13, S14, S15). The primer combinations F2/ R6, IGK3/DVV, and Ueda 19F/388R produced a band of the expected size for a diverse range of genomic and soil DNA templates (Table 6, Figures S3, S4, S5, S6, S7, S8), though Ueda 19F/388R was observed to produce an amplified product from E. coli indicating a lack of specificity for nifH under the amplification conditions tested. Overall, the primer pair IGK3/DVV produced the best performance in our empirical analysis, producing PCR products of the expected size from all nitrogen-fixing strains and soil DNA samples tested, while not generating PCR product from the negative controls or producing non-specific PCR products (Table 6, Figures S5 and S6).  Table 1. Properties of universal primers and their coverage for phylogenetic and environmental groupings in the nifH database; continued in Table 2.

Discussion
We report a comprehensive evaluation of nifH PCR primers. Our analysis of nifH primers reveals disparities in their sequence coverage. Variation in coverage is especially notable for primers designed to be universal, where 23 out of 51 target fewer than 50% of known nifH sequences and only 15 target more than 90% of sequences (Table 1). There could be several reasons for the disparity in primer coverage and specificity. Adequate primer design requires use of a sequence database representing the entire sequence diversity to be targeted by the primer. The number of sequences available in public databases has grown dramatically in recent years and earlier efforts at primer design were constrained in the past by the limited number and diversity of nifH sequences available. There is also a reasonable tendency to seek minimally degenerate primers due to undesirable effects that high levels of primer degeneracy can have on PCR performance. Decisions to lower degeneracy, however, could come at the cost of adequate coverage of target sequences.
Our efforts to evaluate universal nifH primers expand upon previous work to design universal primers for this gene. Marusina et al. designed nifH primers based upon a diverse set of nifH sequences and tested the resulting primers against DNA from cultivated strains [18]. The F2/R6 primer set they designed was one of the best performing in our comparison (Tables 4 and 6, Figures S3 and S4). Fedorov et al. later reexamined some of the primers of Marusina et al. because they found that primer R6 contained mismatches to certain methylobacterial nifH sequences, and they sought to design primers that included this group [19]. The coverage of their new primer, nifH-3r, however, is considerably lower than that of the original R6 primer matching 48% and 96% of nifH sequences respectively (Table 1). Poly et al. also designed a universal primer set, PolF/PolR, and showed that it amplified 19 of 19 test strains and worked well in soils [20]. However, the test strains they used consisted of Alpha-, Beta-, and Gammaproteobacteria, Firmicutes and Actinobacteria and did not include cluster IA, Cyanobacteria, or cluster III sequences. We found that the PolF/PolR primer set only encompassed 25% of nifH diversity in our database (Table 4).
By mapping the 51 universal primers to their complementary binding positions along the A. vinelandii nifH gene (Figure 1), it is evident that the majority of the primers correspond to conserved regions of the nifH gene that encode essential functions like the Ploop, Switch I, and Switch II ( Figure 1; [22] ). Sequence coverage is high in regions of universal primer binding (Figure 1), and the shape of the coverage profile suggests that primer sequences have not been trimmed from a large number of sequences. If this is indeed the case, then there could be some bias in our results since the sequence fidelity between primer and target can vary as a function of the specificity of PCR conditions. If primer sequences have replaced existing nifH polymorphism in database sequences, then the net result would be a bias towards overestimating primer coverage. This is a common problem in public sequence databases and illustrates the need for depositors to remove primer sequences prior to sequence deposition.
Some of the primer sequences we evaluated have unusually low coverage perhaps indicating that the published sequences contain errors, a phenomenon which is not that uncommon as it has been noted in another review of primer sequences [23]. In particular, there appear to be errors in the sequences published for the primers YAA-poly, nifHRb, and röschF-1b [20,24,25]. In the case of primer YAA-poly it appears that the first part of the primer name ''YAA'' was appended to the 59 end of the primer sequence in [20] because the original YAA primer sequence does not have Table 1. Cont.  these nucleotides [26]. The coverage values for the original YAA primer (the one without the 59 YAA nucleotides) are actually those of the primer nifH3 (Table 2). For primers nifHRb and röschF-1b there appear to be single base pair errors in the primer sequences. If a single base pair mismatch is allowed for these primers it causes coverage to increase substantially ( Table 1, Table 2). The primer röschF-1b [25] differs from the primer nifHF-Rösch [24] in that a G rather than a C is present at the 13th nucleotide from the 59 terminus. In addition, the primer AMR-R, though reported as a nifH primer [27], does not match nifH and thus appears to be erroneous. We evaluate primer coverage in silico but it is important to point out that universal nifH PCR primers have been used under a wide range of reaction conditions and variation in annealing temperatures and cycle parameters will have dramatic impacts on actual primer performance. Lowering of PCR annealing temperature, for  Position is relative to A. vinelandii nifH (Genbank ACCN# M20568). f Degeneracy is given as the number of oligonucleotides that comprise the primer. g References in which the primers are described. h We altered these primer names in order to distinguish them from primers with similar name and sequence composition that originate from other sources. NA: Data not available as described in Methods. doi:10.1371/journal.pone.0042149.t002      Table 6. Empirical results of PCR using different nifH primer sets with DNA from isolates and soils a .  example, lowers reaction specificity and may permit amplification of templates with mismatches in the primer binding region. Notably, for many primer sets either a nested, touchdown, or stepdown PCR approach was needed to achieve amplification of nifH genes from environmental samples (e.g. [28,29] ). In Tables 1-3 we indicate primer coverage with up to two mismatches to provide an indication of the potential effects that reducing reaction stringency may have on primer performance. In addition, there are several other factors which could impact the specificity and coverage realized using PCR primers at the bench relative to predictions made using sequence databases. These factors include primer dimerization [30], hairpin formation [31], GC content [32], the location of mismatches [33], and the thermodynamics of primer binding to template [34]. For example, mismatches at the 39 end of a primer may have a greater impact on specificity than those at the 59 end [33] and some methods of primer design exploit this tendency in order to increase primer coverage [35]. Thus, the real test of primer performance comes at the bench. We performed empirical assessment of coverage for primers which we found targeted 90% or more of sequences in the nifH database. The primer combinations F2/R6, IGK3/DVV, and Ueda 19F/388R performed well with DNA from a diversity of phylogenetic groups and from soil, with IGK3/DVV performing best of all. In contrast, the primer sets Ueda19f/univ463r and nifH1/nifH2 (ie: the Zehr-McReynolds primers) had mediocre performance with soils, producing smeared bands indicative of non-specific amplification, and producing a PCR product from negative controls ( Table 6, Figures S13 and S14). All other primer combinations tested had drawbacks such as poor or no soil amplification and amplification of negative controls (Table 6, Figures S9, S10, S11, S12 and S15). There are several limitations to our approach which must be considered. First, only a few full-length nifH sequences are currently available and this lowers the sequence diversity represented along the termini of the nifH gene ( Figure 1). Hence, evaluation of primers that bind near the beginning or end of the alignment must be interpreted with care, especially for phylogenetic groups that are underrepresented in sequence databases. Likewise, nifH diversity remains poorly characterized in some and thus estimates primer performance in specific environments must also be interpreted with care when the number of sequences from those environments are small. We refer the reader to the supplementary material (Dataset S1) which provides the number of sequences currently available for each phylogenetic group and for each environment queried. As the number of sequenced genomes increases, full length nifH sequences from more diverse nitrogen fixers will become available aiding future efforts at primer design and analysis. Secondly, we have made no effort to assess coverage for nested and semi-nested reactions, which are common approaches. Nested amplification strategies, when coupled with low stringency reaction conditions, can allow investigators to amplify a wider diversity of templates than would be predicted through in silico analysis. Logically, however, in silico results from nested designs would always produce a reduction in coverage relative to a single primer set design.
Some of the universal nifH primers amplify paralogous genes not involved in nitrogen-fixation, for example cluster IV genes (Table 1 and Table 2). The nifH gene shares conserved regions with genes of cluster IV and cluster V which is involved in bacteriochlorophyll synthesis [13,22]. We find that a substantial number of nifH universal primers will amplify cluster IV sequences (Table 1 and  Table 2). It would therefore be wise for researchers interested in assessing the diversity and phylogeny of nitrogen-fixation genes from the environment to screen their sequences for the presence of cluster IV and cluster V genes prior to OTU clustering.
Our work outlines a comprehensive approach to primer evaluation. Molecular-based studies are dependent on the effectiveness of the primer sets used to generate the sequence data which serves as our window to the microbial world. These results show that many supposedly universal primer sets miss significant portions of known nifH diversity. Several of the primers that performed well in silico were tested empirically against genomic DNA from a phylogenetically diverse set of strains. The primers that performed well both in silico and empirically should have the greatest utility in further studies of the nifH gene diversity in environmental samples.

Materials and Methods
Primer coverage analyses were performed using an updated version of our previously described nifH database [9]. The current version of the database contains 23,847 sequences, representing all nifH sequences available in Genbank as of July 14, 2010. The database was constructed using the ARB software package [36] as described in [9]. Alignment positions are numbered relative to the Azotobacter vinelandii gene sequence (Genbank ACCN# M20568). The environmental origins of sequences (Tables 1-5) were determined by keyword searches of the sequence records in the nifH database using ARB as described in [9]. The phylogenetic trees and sequence configurations for the environmental groups may be examined as part of the ARB nifH database used for this work which is available at http://www.css.cornell.edu/faculty/ buckley/nifH_database_2010_07_14. arb. The phylogenetic groups evaluated (Table 1 -5) are labeled on the phylogenetic tree of Figure S2 which corresponds to the tree in the ARB database.
We visualized the nucleotide representation of nifH sequence fragments within our nifH database relative to the A. vinelandii nifH sequence ( Figure 1) by first exporting in FASTA format all nifH sequences from the ARB database using the A. vinlandii nifH sequence as a filter so that only positions in the alignment where A. vinelandii nifH had a nucleotide were exported. The FASTA file was then opened in BioEdit [37] where we could calculate a positional nucleotide numerical summary, and the total number of sequences containing sequence information was then plotted for each position in the alignment (Figure 1).
Primer coverage calculations were performed using the EMBOSS programs fuzznuc, dreg, and primersearch [38] to analyze sequence alignment data exported in FASTA format from our nifH database. The program fuzznuc calculates the number of sequences in a given alignment hit by a given primer. Mismatches, or fuzzy searches, are allowed by the program and were performed with the nifH evaluations (Table 1-3). The program primersearch was used for the evaluation of primer pairs (Tables 4 and 5). The program dreg was used to determine the number of records in an alignment that contained sequence data in the alignment region targeted by each primer or primer pair (Tables 1-5). However, because dreg eliminates the gap characters from the FASTA alignment file from ARB, the flanking gap characters were converted to the IUPAC character S, which is preserved by dreg, and the intervening gap characters were subsequently converted to the IUPAC character N. This allowed the original column positions from the ARB alignment to be maintained and reported as output from dreg. To calculate primer and primer pair coverage, the number of hits obtained from fuzznuc or primersearch were divided by the total number of sequences with nucleotide representation in the target region(s) as indicated by dreg.
Unix bash shell scripts were employed to increase the throughput of the in silico primer evaluations by automating the input of multiple primer sequences and other evaluation param-eters into the EMBOSS programs. The scripts were also used to parse the output files and organize the data into tables. These scripts, which would be useful for similar evaluations using databases for other functional genes, are available as supplementary material online (Text S1, S2, S3).
Primer annealing temperatures were calculated with SciTools Oligoanalyzer version 3.1 which calculates oligonucleotide melting temperatures based on nearest neighbor thermodynamics [39]. Oligoanalyzer can account for Inosine but not for P or K bases and thus melting temperatures were not calculated for PicenoF44 and PicenoR436 (Table 1). The parameters used for the calculations were 0.25 mM oligonucleotides, 50 mM Na + , 1.5 mM Mg ++ , and 0 mM dNTPs.
Genomic DNA was extracted from cultures of the bacterial strains listed in Table 6 according to a standard enzymatic, phenolchloroform extraction protocol [40]. DNA concentration was determined with a Nanodrop model 1000 (Thermo Fischer Scientific, Wilmington, DE), and DNA was diluted to 1 ng ml 21 prior to PCR. Soil DNA was obtained from a long-term agricultural site at the William H. Miner institute, Chazy, NY described previously [41]. The agricultural soil sample comes from a tilled site used to grow corn for more than 30 years while the lawn soil sample is from a non-cultivated control site that is adjacent to the agricultural site and contains a mixed community of perennial grasses (Table 6). Soil samples were obtained by coring at 0-5 cm depth. Soil samples were sieved to 2 mm, frozen in the field using liquid nitrogen, and stored at 280uC. DNA was extracted from soils using the PowerSoil DNA Isolation Kit (MoBio, Carlsbad, CA).
Primers were synthesized and desalted by Integrated DNA technologies. All PCR reaction volumes were 50 mL with the following final reagent concentrations: 1X PCR Gold Buffer (ABI, Foster City, CA), 2.5 mM MgCl 2 solution (ABI, Foster City, CA), 0.05% BSA (NEB, Ipswich, MA), 0.2 mM dNTPs, 1 mM each primer, 2.5 U Amplitaq Gold DNA polymerase (ABI, Foster City, CA). As template, 1 ng of genomic DNA was added, or 1 ml of soil DNA extract. To visualize the PCR products, 10 mL of the reactions were loaded onto a 50 ml, 1% agarose gel with 1 mL of SYBR Safe dye (Molecular Probes, Eugene, OR). 5 ml of Hyperladder I (Bioline, Taunton, MA) was loaded onto each gel as a molecular weight marker. Gels ran for 45 minutes at 100 volts and 500 miliamps and were then visualized and photographed. Photos of the electrophoresis gels are available as supplementary material online ( Figures S3, S4, S5, S6, S7, S8, S9, S10, S11, S12, S13, S14, S15).  Figure S3 F2/R6 primer pair at 516C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S4 F2/R6 primer pair at 516C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S5 IGK3/DVV primer pair at 586C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S6 IGK3/DVV primer pair at 586C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S7 Ueda19F/388R primer pair at 516C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S8 Ueda19F/388R primer pair at 516C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S9 nifH2/R6 primer pair at 446C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S10 nifH2/R6 primer pair at 446C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S11 nH21f/nifH1 primer pair at 466C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S12 nifH1/nifH2 primer pair at 466C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S13 Ueda19f/univ463r primer pair at 466C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S14 Ueda19f/univ463r primer pair at 466C annealing temperature. Gel image of PCR products gener-ated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Figure S15 nifH3/nH21f primer pair at 416C annealing temperature. Gel image of PCR products generated using the primers indicated with a range of different DNA templates. Results are summarized and full strain names are reported in Table 6. The gel images have been inverted from black to white. (TIF) Dataset S1