Inference of Antibiotic Resistance and Virulence among Diverse Group A Streptococcus Strains Using emm Sequencing and Multilocus Genotyping Methods

Background Group A Streptococcus pyogenes (GAS) exhibits a high degree of clinically relevant phenotypic diversity. Strains vary widely in terms of antibiotic resistance (AbR), clinical severity, and transmission rate. Currently, strain identification is achieved by emm typing (direct sequencing of the genomic segment coding for the antigenic portion of the M protein) or by multilocus genotyping methods. Phenotype analysis, including critical AbR typing, is generally achieved by much slower and more laborious direct culture-based methods. Methodology/Principal Findings We compare genotype identification (by emm typing and PCR/ESI-MS) with directly measured phenotypes (AbR and outbreak associations) for 802 clinical isolates of GAS collected from symptomatic patients over a period of 6 years at 10 military facilities in the United States. All independent strain characterization methods are highly correlated. This shows that recombination, horizontal transfer, and other forms of reassortment are rare in GAS insofar as housekeeping genes, primary virulence and antibiotic resistance determinants, and the emm gene are concerned. Therefore, genotyping methods offer an efficient way to predict emm type and the associated AbR and virulence phenotypes. Conclusions/Significance The data presented here, combined with much historical data, suggest that emm typing assays and faster molecular methods that infer emm type from genomic signatures could be used to efficiently infer critical phenotypic characteristics based on robust genotype: phenotype correlations. This, in turn, would enable faster and better-targeted responses during identified outbreaks of constitutively resistant or particularly virulent emm types.


Introduction
Group A Streptococcus pyogenes (GAS) is a common agent of pharyngitis, febrile respiratory infection, and pneumonia. Less frequently, it can cause severe symptoms such as toxic shock syndrome, necrotizing fasciitis, and sterile rheumatic sequelae [1]. Penicillin is effective in treatment and prevention of GAS [2][3][4], but other antibiotics are often required for individuals who are allergic to penicillin [5]. Different strains of GAS differ widely in terms of both AbR profiles and virulence [1].
GAS strains are commonly discriminated by identity of the M protein, a primary antigen. Traditionally this has been done by GAS culture and serological analysis of purified M protein, and can distinguish 81 recognized M types [6]. Currently, this is usually done by comparing the sequence of a portion of the emm gene with a database of sequences from previously identified strains ( [6]; http: //www.cdc.gov/ncidod/biotech/strep/protocol_emm-type.htm). The two methods are highly concordant [6]. Sequence analysis revealed new types, yielding over 200 currently recognized emm types [7]. Multilocus sequence typing (MLST) has also been explored, and found to be tightly correlated with emm typing [8].
The correlation between emm gene identity and the sequence of several unrelated MLST loci suggests a high degree of genomic stability within this bacterial species. This is also supported by previously reported associations between AbR profile and emm type (eg, [9]) and specific correlations between virulence and emm type (eg, [10]). This supports exploration of rapid genotyping methods as tools for point-of-care identification of strains and inference of antibiotic susceptibility, outbreak potential, and virulence phenotypes.
One rapid, high-throughput GAS genotyping method has been described. The method involves PCR of 6 housekeeping loci, electrospray ionization mass spectrometry analysis (PCR/ESI-MS) of the resulting amplicons to yield nucleotide content data (deduced from mass), and finally, comparison with data from known emm types. This method has been used to identify GAS in uncultured specimens from an outbreak in the United States [7]. That study also presented validation/control data collected on 132 previously isolated and cultured GAS strains of 17 different emm sequence types from 5 recruit training centers collected in 2002 and 2003 [7]. PCR/ESI-MS yielded single emm type identifications in approximately two thirds of the cases, and lists of two or more equally likely emm types in the remaining cases (these groups of equally likely emm types are hereafter referred to as emm groups). Accuracy was extremely high, with all calls either matching (in the case of single emm type calls) or including (in the case of emm group calls) the sequence-derived emm type.
PCR/ESI-MS can be completed in less than 4 hours, and several hundred specimens can be analyzed in a day using a single instrument. Direct emm sequencing is significantly more difficult and time-consuming, since it is usually done on cultured isolates to decrease background and yield clean sequencing resultsprocedures that take 1 or more days to complete. Direct sequencing from original specimens is possible, but has not been thoroughly explored.
We present AbR and PCR/ESI-MS emm group data from 802 clinical GAS isolates collected from patients with pharyngitis or other apparent GAS-associated symptoms over a period of 6 years at 10 US military facilities. The majority of these samples are from military recruits in training, a group that is highly susceptible to GAS outbreaks and regularly prophylaxed with long-acting penicillin G (bicillin).
Accuracy of PCR/ESI-MS emm grouping identification is addressed by comparison with emm sequence data for 387 isolates. We analyze several temporal/spatial clusters of specific emm groups in their entirety to judge the potential for inference of specific emm type from PCR/ESI-MS emm groups using emm sequence data from a representative subset of samples. The associations between specific AbR phenotypes and emm type (or emm group) are statistically analyzed to quantify the degree of nonrandom association between emm type and specific AbR phenotypes. Observed correlations are compared to previously reported associations. The relative predictive values of emm sequencing and PCR/ESI-MS are discussed.
Outbreaks, rheumatic sequelae, and invasive GAS disease generally involve only a few of the more than 200 known emm types of GAS, and specific strains have been recognized as being more virulent, with greater epidemic potential than other strains [10][11][12][13][14][15]. M1 was associated with severe GAS infections and streptococcal toxic shock syndrome during the mid-to-late1980s. The relative dominance of M1 among strains isolated from patients with uncomplicated pharyngitis was correlated with changes in the rate of severe infections over time-that is, severe GAS complications are more common when M1 is common [10]. The opposite was found to be true for M4 and M12, which are associated with mild and uncomplicated pharyngitis [10]. More recently, M3 was implicated in outbreaks of severe disease in both the United States [7,13,16] and the United Kingdom [17]. Serotypes M1, 3, 5, 6, 18 and 24 have all been identified as rheumatogenic, or capable of causing acute rheumatic fever (ARF), and it has been suggested that the cyclic nature of severe infection and ARF rates is a reflection of the cyclic circulation of specific rheumatogenic serotypes [1].
While rheumatogenic strains are often distinguished by serotype, it has been noted that the presence of the speA scarlet fever toxin gene seems to be closely linked to rheumatogenic characteristics. However, while many isolates of the most notoriously virulent serotypes (M1 and M3) carry speA, others do not [18]. Similarly, mucoid colony morphology has been linked to rheumatogenicity, and can be variable within serotypes [10,13].
During the collection period represented in this study, there were 7 distinct outbreaks of GAS disease, as measured by sudden increases in apparent rate. There were also 3 GAS-associated fatalities. We specifically typed representatives of all outbreaks, as well as all fatal case isolates, to determine if they were generated by strains previously identified as having a high potential for virulence. We also analyzed a representative subset of the identified emm types from the isolate collection for speA by PCR [19].

PCR/ESI-MS correlates well with emm sequencing
Lists comparing the results of PCR/ESI-MS and emm sequencing are shown in Tables 1 and 2. Of 387 sequenced isolates, PCR/ESI-MS emm groups included the correct emm type in 384 cases (99.2% accuracy). All three discrepancies involved emm type 81 in some way, but each was a unique combination of emm type and emm group. This suggests there may be an error for type 81 in either the PCR/ESI-MS database or the GenBank sequences used to determine emm type by the described sequencing method. The precision of PCR/ESI-MS was significantly less than that of direct sequence analysis, since the average emm group included several equally likely emm types. Imprecision (multiplicity of equally possible emm types in each emm group) ranged from 1 to 9, with a weighted average of 2.26 emm types per emm group across the set of 387 sequenced isolates. However, not all of the possible emm types were identified among the isolates: only 29 of the 200+ recognized emm types were identified in the sample set. Many of these maintained a stable presence at specific sites, demonstrating that specific emm types could be inferred across temporally and geographically clustered sets using paired results from a subset that had been tested by both methods. At least one isolate of each emm group per year from each site was sequenced. An additional 188 isolates from the same geotemporal clusters were also sequenced, of which 187 confirmed the initial correspondence, yielding a within-cluster consistency of 99.5%. In the single instance of inconsistency, all isolates from the mixed emm group were sequenced, and all remaining isolates shared the initially identified emm type. In some cases, specific emm sequence types were identified across multiple emm groups, all of which contained the correct emm type (for example, note the three emm groups that were identified for emm 75 in Table 2). In these cases, no phenotypic, geographic, or temporal correlates were identified, suggesting that the differences resulted from variation in the precision of the PCR/ESI-MS method.
Rapid genotyping methods can be used to confidently predict both virulence and antibiotic resistance Tables 3 and 4 show the distribution of AbR phenotypes among the identified emm types (Table 3) and PCR/ESI-MS emm groups (Table 4). These tables clearly show strong associations between specific resistance phenotypes and specific genotypes. The overall rates of AbR seen in the collection as a whole are shown in Tables 5 and 6, along with the chi-square probability of the observed distribution among emm types being random. Statistical significance of individual emm type associations with specific antibiotics is also indicated in Tables 3 and 4, for both positive and negative associations. Both unadjusted and alpha adjusted (Bonferroni adjusted) significance levels are indicated to represent relative levels of confidence. The increased ambiguity yielded by PCR/ESI-MS does not significantly decrease predictive values, though it should be noted that this would not necessarily hold up over longer periods of time, when serotypes sharing PCR/ESI-MS emm groups could shift in relative dominance. This is because multiple emm types represented by each emm group do not always share phenotypes. For example, M25 and M75 share a PCR/ESI-MS emm group. M25 is susceptible to erythromycin, and M75 is generally resistant. The predictive value of this emm group, 25/75, is high in this study, but only because M75 was far more common than the phenotypically opposite M25. PCR/ESI-MS is readily elaborated, and the designers have noted that it would take 12 primer pairs (as opposed to 6, as used here) to completely distinguish all known emm types [7]. Furthermore, this technology is amenable to a higher density multiplex format that could allow many of these assays to be combined, thus enabling elaboration without decreased throughput or additive increases in cost.
The data collected here correlate with past observations. For example, we observed erythromycin resistance among isolates of M4, M5, M6, M11, M12, M22, M44, M58, M75, and M94, with the majority of resistance coming from M75. Other surveys of GAS isolates in which both emm type and AbR type were analyzed have suggested that macrolide (erythromycin) resistance in GAS is strongly correlated with M4, M6, M12, and M75. M75 was associated with over 50% of the macrolide resistance among diverse US isolates collected between 1994 and 1997 [20], while M6 was found to be associated with macrolide resistance among isolates collected from Pittsburg schoolchildren [21]. Among isolates collected from patients in Belgium, M6 was also the most common source of macrolide resistance, but M2, M12, and M4 were also common [22]. M4, M12, and M75 were all commonly associated with macrolide resistance in Greek isolates [23]. M4 and M75 were associated with 79% of all erythromycin resistance seen among isolates collected from several cities over a decade in Spain, followed closely by M12 [9]. Collectively, this demonstrates a global and temporally robust association between erythromycin resistance and a few specific strains.
Of the 7 noted outbreaks during the surveillance period, 1 (the only one with high rates of pneumonia) was associated with M3  [7,16], 1 was associated with M118, and the remaining 5 were associated with M5. The listed strains represented the majority of isolations in the respective outbreaks, though other strains (including M18, M77, M101, and M4) were also isolated during outbreakrelated collection activities. Two of the M5 outbreaks were associated with some invasive disease, and two of the three noted fatalities during the surveillance period were associated with M5. The remaining fatality was associated with M77. This agrees with previous identifications of M3 and M5 as particularly virulent emm types, and demonstrates a strong correlation between emm type and virulence. All M1 and M3 isolates tested in this study were speA positive, and both types were very common. M5 was speA negative.
The speA gene has been implicated in invasive phenotypes. It is interesting that the strain of M5 that became common during the second half of the study period was negative by PCR for speA, yet was strongly associated with both invasive and epidemic characteristics. This behavior is similar to that of M5 in outbreaks in 1989 [13], and similar to what is expected of the common speA-positive strains like M3 and M18 [11,13,16]. It is possible that this strain derives its invasive potential from genes other than speA, or it is possible that the gene has mutated such that it yields a false negative result for speA by PCR but still retains functional expression of the SpeA protein. Correlational analysis of the data suggests the former since most speA-negative emm types, including M5, are speC positive (see Table 3), while most speA-positive types are speC negative.
Rapid genotyping methods yield GAS strain identifications that are tightly associated with critical clinical characteristics, including antibiotic resistance, virulence and outbreak potential (transmissibility). Such methods, if applied directly to patient specimens at the point of care, have the potential to offer clinically relevant information useful for both determination of treatment and identification of appropriate public health responses. We hope that others will continue to contribute to the associative data sets that will allow inferential use of such technologies, both for this organism and for other important human pathogens.
A methodological shift from culture-based analyses toward less laborious rapid antigen tests is happening quickly. However, these tests do not offer any AbR data for treatment decisions or for tracking resistance, nor do they offer genotype data useful for tracking transmission and virulence patterns. The replacement of traditional culture-based methods with on-site rapid diagnostic methods should be made contingent upon development of equally informative rapid technologies, or combined with archiving programs that retain enough of the original specimen to allow further characterization when necessary. Rapid multilocus genotyping methods preserve the value and power of culture-based methods, while offering much greater speed and throughput.

Ethics statement
This research has been conducted in compliance with all applicable federal and international regulations governing the protection of human subjects in research (DoD IRB protocol NHRC.2001.0008). This study was based entirely on the use of deidentified isolates provided with minimal demographic data. These isolates were already-existing, cultured from throat swabs during the process of routine clinical diagnostic procedures at recruit training sites. This protocol was reviewed and approved by the NHRC institutional review board and found to be exempt from consent requirements.

Sample collection
As part of the US military's ongoing GAS surveillance program, 802 samples were collected at 10 military facilities from January 2002 through December 2007. Collection was irregular, based on prevalence of and access to cases of GAS disease. Some sites ceased submission after switching to rapid antigen tests and abandoning culture methods.

Strain typing
PCR/ESI-MS emm groups were generated for all 802 samples. A subset of 387 of these was tested for emm type by sequencing. We initially sequenced one isolate of each PCR/ESI-MS emm group  from each site during each year it appeared at that site. Correlation data were used to generate a translation matrix for inference of specific emm types from the PCR/ESI-MS data. One hundred and eighty-eight further isolates were sequenced to estimate consistency within clusters.

emm sequencing
The sequencing procedure was adapted, using optimized cycling conditions and PCR reagents, from a Centers for Disease Control and Prevention protocol (http://www.cdc.gov/ncidod/biotech/ strep/protocol_emm-type.htm). Extraction was performed with the QIAamp 96 DNA blood kit (QIAGEN, Valencia, CA). Products were processed with an ABI Prism 3100 sequencer (Applied Biosystems, Foster City, CA). Sequences were analyzed with Lasergene (DNASTAR, Inc., Madison WI) and submitted to the CDC BLAST-emm server (http://www.cdc.gov/ncidod/biotech/ strep/strepblast.htm) to obtain emm type match information.

PCR/ESI-MS
Grown isolates were diluted 10-fold in the DNA Dilution Buffer provided in the GAS Genotyping Kit (Ibis Biosciences, Inc., Carlsbad, CA) and boiled at 100uC for 10 min. Automated PCR setup was done on the Janus Liquid Handling Station (PerkinElmer, Inc., Waltham, MA). Five microliters of enzyme mix, containing GAS Kit PCR Enzyme Dilution Buffer (Ibis Biosciences), 2.4 U/rxn FastStart Taq DNA Polymerase (Roche, Indianapolis, IN), and 5 ml of template were added to each well of a GAS Genotyping plate (Ibis Biosciences). PCR cycling conditions were as follows: 95uC for 10 min; followed by 8 cycles of 95uC for 30 sec, 48uC for 30 sec with an increase of 0.9uC per cycle, and 72uC for 30 sec; then 37 cycles of 95uC for 15 sec, 56uC for 20 sec, and 72uC for 20 sec. Final extension was performed at 72uC for 2 min followed by a 4uC hold. PCR was performed on a Bio-Rad DNA Engine (Bio-Rad Laboratories, Hercules, CA). Primer design and sequence information is available elsewhere [7]. Post-PCR desalting, mass spectrometer analysis and emm type analysis were performed on the Ibis T5000 platform (Ibis Biosciences), as previously described [7].

Statistical analysis
Accuracy of PCR/ESI-MS inference was measured in terms of the proportion of PCR/ESI-MS emm group identities that included the correct emm sequence type. Both PCR/ESI-MS emm groups and directly sequenced emm types were independently analyzed for correlation with AbR measures over the course of study. The nonrandomness of the distribution of each type of AbR across all emm types (and groups) was measured by chi-square analysis. The significance of specific genotype:phenotype associations were measured by binomial probability analysis of the observed distributions of resistance for specific genotypes for each type of resistance, with appropriate alpha adjustment for multiple measures across all strain types.

spe testing
Representatives of all identified emm types were tested for speA and speC by nested PCR [19]. DNA amplification was performed as referenced [19], except for the following changes: the reaction volume was reduced from 50 ml to 25 ml, and the annealing temperature was reduced from 57uC for 1 min to 55uC for 45 sec. The reaction volume and cycling conditions were adjusted and optimized for use with the GoTaq Flexi DNA Polymerase and buffer system (Promega BioSciences, San Luis Obispo, CA, USA). Thermal cycling was performed on a Bio-Rad DNA Engine (Bio-Rad).