Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

In silico characterisation of the two-component system regulators of Streptococcus pyogenes

  • Sean J. Buckley,

    Roles Conceptualization, Data curation, Methodology, Writing – original draft

    Affiliation Inflammation and Healing Biomedical Research Cluster, and School of Health and Sports Sciences, Faculty of Science, University of the Sunshine Coast, Sippy Downs, Queensland, Australia

  • Peter Timms,

    Roles Supervision, Writing – review & editing

    Affiliation Inflammation and Healing Biomedical Research Cluster, and School of Health and Sports Sciences, Faculty of Science, University of the Sunshine Coast, Sippy Downs, Queensland, Australia

  • Mark R. Davies,

    Roles Conceptualization, Writing – review & editing

    Affiliation Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Victoria, Australia

  • David J. McMillan

    Roles Conceptualization, Funding acquisition, Methodology, Supervision, Writing – review & editing

    Affiliation Inflammation and Healing Biomedical Research Cluster, and School of Health and Sports Sciences, Faculty of Science, University of the Sunshine Coast, Sippy Downs, Queensland, Australia


Bacteria respond to environmental changes through the co-ordinated regulation of gene expression, often mediated by two-component regulatory systems (TCS). Group A Streptococcus (GAS), a bacterium which infects multiple human body sites and causes multiple diseases, possesses up to 14 TCS. In this study we examined genetic variation in the coding sequences and non-coding DNA upstream of these TCS as a method for evaluating relationships between different GAS emm-types, and potential associations with GAS disease. Twelve of the 14 TCS were present in 90% of the genomes examined. The length of the intergenic regions (IGRs) upstream of TCS coding regions varied from 39 to 345 nucleotides, with an average nucleotide diversity of 0.0064. Overall, IGR allelic variation was generally conserved with an emm-type. Subsequent phylogenetic analysis of concatenated sequences based on all TCS IGR sequences grouped genomes of the same emm-type together. However grouping with emm-pattern and emm-cluster-types was much weaker, suggesting epidemiological and functional properties associated with the latter are not due to evolutionary relatedness of emm-types. All emm5, emm6 and most of the emm18 genomes, all historically considered rheumatogenic emm-types clustered together, suggesting a shared evolutionary history. However emm1, emm3 and several emm18 genomes did not cluster within this group. These latter emm18 isolates were epidemiologically distinct from other emm18 genomes in study, providing evidence for local variation. emm-types associated with invasive disease or nephritogenicity also did not cluster together. Considering the TCS coding sequences (cds), correlation with emm-type was weaker than for the IGRs, and no strong correlation with disease was observed. Deletion of the malate transporter, maeP, was identified that serves as a putative marker for the emm89.0 subtype, which has been implicated in invasive outbreaks. A recombination-related, subclade-forming DNA motif was identified in the putative receiver domain of the Spy1556 response regulator that correlated with throat-associated emm-pattern-type A-C strains.


Streptococcus pyogenes (Group A Streptococcus, GAS) is a human pathogen responsible for a suite of human diseases that vary in both symptom and severity [1]. Colonisation of the throat and skin by this organism can result in the common but self-limiting pharyngitis and impetigo. Potential sequelae of GAS infection include post streptococcal glomerulonephritis (PSGN), acute rheumatic fever (ARF), and rheumatic heart disease (RHD) [2]. Moreover the dissemination of GAS to normally sterile body sites can result in streptococcal toxic shock syndrome (STSS) and necrotising fasciitis (NF) [3]. Combined, mortality due to the various GAS disease exceeds half a million people each year [1].

Nucleotide sequence variation in the 5’ region of the emm gene is the basis for emm-typing, the most commonly employed molecular typing system used to classify GAS at the subspecies level [4]. More than 200 different emm-types have now been described [5]. A subset of these emm-types (for example emm1, emm3, and more recently emm89) are strongly associated with invasive disease outbreaks in Europe and North America, suggesting that genetic determinants that differ between emm-types, and even within emm-types have a role in determining the relative pathogenicity of an emm-type [6, 7]. ARF and RHD are autoimmune sequelae that can follow untreated pharyngeal GAS infection [8], and the leading cause of GAS related mortality. Historically, a subset of GAS M-serotypes (for example, emm1, emm5 and emm6) have been considered as ‘rheumatogenic’ GAS emm-types [9, 10], having stronger association with ARF/RHD than other strains. More broadly, as epidemiological studies of streptococcal disease in developing nations routinely fail to report the presence of traditional ‘rheumatogenic’ emm-types, this concept is being re-examined. [1114].

Several alternate GAS typing systems have been described [5, 15]. Based on amino acid variation, emm-cluster-typing groups emm-types into 48 emm-clusters, with proteins in each cluster displaying similar functional properties [5]. emm-pattern typing, based on the chromosomal arrangement of the emm-gene and flanking emm-like genes, groups GAS into five pattern types. In comparison to other emm-pattern types, emm-pattern A-C type isolates are associated with both throat colonisation and pharyngitis. In contrast, emm-pattern D isolates are associated with skin colonisation, and impetigo [16]. In conjunction with other epidemiological studies, such findings provide additional evidence that genetic variation in GAS can be predictive of niche colonisation potential, and possibly disease propensity of individual GAS isolates [1719]. Despite numerous epidemiological and pathogenesis studies, no definitive causal association between any one GAS gene and disease or colonisation site has been made [4, 20].

emm-typing is used as a surrogate for clonal types, however evidence of recombination involving the emm-gene, and full mga locus, has been reported [21]. As such, these loci may not be appropriate for inferring evolutionary relationships between GAS strains of different emm-type or pattern-type. A typing system that targets multiple loci in a genome is one strategy that can overcome these limitations. Here we have characterised variation in the coding and upstream intergenic regions (IGRs) of the 14 two-component systems (TCS) in GAS [22]. Along with stand-alone regulators and small non-coding RNAs, TCS co-ordinately control the expression of multiple virulence and house-keeping genes in GAS [20, 23]. A complex network of regulation of regulatory genes also exist. Consequently, differences in TCS expression and regulation between strains are likely to result in distinct downstream expression profiles that may impact on pathogenic outcomes.

Our rationale for targeting the TCS IGRs in addition to TCS cds was two-fold. Firstly, while TCS IGRs are under selection pressure [24], it is indirect. That is, the IGRs interact with their cognate DNA-binding proteins, but do not directly interact with their surrounding microenvironment. As such, changes in the IGRs only indirectly affect the interaction of the pathogen with the environment through changes in the expression of other genes controlled by the TCS. In contrast, direct selection pressure encompasses selection based on mutations that cause non-synonymous changes in the protein being translated, which may in turn positively or negatively affect the manner in which a protein interacts with its surrounding environment. Secondly, IGRs upstream of operons are likely to contain promoters and regulatory elements. Because these regulatory elements have not been defined for the majority of GAS TCS operons [2527] direct bioinformatic analysis of these elements alone was also not possible.

Here we analysed genetic variability in TCS cds and IGR sequences. Our results show that individually and collectively, these regions correlate with emm-type and other emm-gene associated classification schemes. Deletion of the malate transporter, maeP, was identified that serves as a putative marker for the emm89.0 subtype, which has been implicated in invasive outbreaks. A subclade-forming recombination event was observed in the receiver domain locus of the spy1556 response regulator gene that correlates with the with throat-associated emm-pattern A-C strains.


Bacterial genomes and extraction of nucleotide sequence data

DNA analysed in this study was extracted from two sources. The first source was the 64 complete GAS genomes representing 27 emm-type sequences present in the NCBI reference genomic sequence database as of 01 January 2018 (Table 1). An additional 879 draft genomes representing 123 different emm-types collected from five geographically disparate countries over the time period 1987 to 2013 were also used [2833] (S1 Table). Where available the clinical data (disease association, year of isolation, country of isolation) was also collected for all genomes.

Bioinformatic analyses

The 14 GAS TCS loci and corresponding upstream IGRs (Table 2 and Fig 1) were extracted from each genome. Sequences were aligned using Muscle as implemented in Geneious 8.1.9 [34, 35]. SNPs in both the IGR and coding regions were identified and independently quantified using Geneious. Individual alleles within a TCS and TCS IGR, defined on the basis of possessing a minimum of one SNP mutation with all other alleles [15], were subsequently assigned a unique allele number.

Fig 1. Schematic drawing of the Group A Streptococcus two-component system operons.

The SF370 locus tags are used where established names not available, and the orientation depicted represents the relative orientation of the operon in the genomes.

Table 2. Distribution of Group A Streptococcus two-component systems in 64 GAS NCBI genomes.

Nucleotide diversity was calculated using DnaSP version 5.10.01 [52]. The coding regions and IGRs were typed, and overall allele diversity was calculated using the Simpsons Index of Diversity [53] and the Wallace coefficient [54] as implemented at The aligned nucleotide sequences were subjected to the DnaSP algorithm [52] to determine the nucleotide diversity (π), and πAS which is the ratio of non-synonymous (πA) to synonymous nucleotide polymorphisms (πS). MEGA7 was used to calculate the ratio of non-synonymous (causing amino acid replacement, KA) to synonymous (silent, KS) nucleotide substitution, KA/KS. Both πAS and KA/KS are indirect measures of the selection pressure exerted on a cds. The nucleotide sequences were translated into amino acid sequences and analysed for polymorphic variants. Phylogenetic relationships were inferred based on nucleotide sequences of individual TCS IGRs and concatenated IGR sequences using the maximum likelihood algorithm, with a bootstrap value of 1000 [55]. Phylogeny relationships were also inferred using concatenated sequences of the variable nucleotides of all 14 TCS IGRs using the GAS genomes archived in the NCBI reference genomic sequences database, and from available published draft genomes.

Analysis of recombination

Recombination and mutation were initially examined manually using the method of Feil et al [15, 56] with modifications described by McMillan et al [57]. In these analyses, the presence of a single mutation within a set of related bacteria was scored as a mutation event. Nucleotide variation at two or more locations, which were also present in more distantly related bacteria, was used as evidence of recombination. In this study different emm-types were used to define unrelated bacteria. Subsequent analysis of both ‘recent’ and ‘ancestral’ recombination events was performed using fastGEAR [58]. In this algorithm recent recombination is defined as recombination that occurs within sequences represented in the current dataset, and ancestral recombination refers to recombinatorial acquisition of DNA not present in the main dataset, where donor-recipient relationship cannot be inferred [59].


Distribution of TCS genes in complete NCBI genomes

Within the 64 complete genomes, only one, emm87 NGAS747 possessed full length versions of all fourteen TCS (Table 2). Twenty-eight possessed 13 TCS, 18 contained 12 TCS, 15 possessed 11 TCS, and two contained less than 10 TCS. The six most conserved TCS (ciaRH, irr/ihk, liaFSR, sptRS, spy1061/2, and vicRK), displaying greater than 95% amino acid identity, were also the TCS present in all genomes. TCS genes completely absent in one or more genomes included salKR, srtRS, and silAB, the latter of which was absent from the majority of genomes. Six of the TCS loci (covRS, fasBCA, maeK, salRK, trxS, and spy1556) possessed alleles containing non-sense mutations or deletions resulting in truncations in the open reading frames encoding the corresponding proteins. However these mutations were not conserved amongst all genomes within an emm-type. As an example, non-sense mutations in covS were observed in 3 of the 10 emm1 isolates, including the invasive MGAS5005. The covS mutation in this strain has been associated with pathogenesis and invasive potential of this strain [60]. The same deletion was also present in covS in a single emm89 isolate (MGAS27061), which also share greater than 99% identity with the MGAS5005 emm1 allelic sequence [60].

Three separate emm89.0 clades have been described. The clades were first described on basis of invasive disease outbreaks occurring in Europe [61]. Subsequent genetic analyses showed differences in both gene content and promoter regions of key virulence factors, including the SLO/NADase and hasABC loci [61, 62] of emm89.0 clade 3 isolates when compared to other clades. Here we found all three clades of emm89.0 isolates possessed an identical deletion in maeP. The deletion also encompassed all of the malate transporter (maeP) gene, and the 5’ end of malic enzyme (maeE) [63]. Subsequent analysis of additional draft genomic sequences (see below) also revealed this deletion to be present in all other emm89.0 subtypes (n = 21), but full length genes were present in emm89.14 (n = 10) and emm89.8 (n = 4) subtypes.

Diversity in TCS IGRs in whole genomes

The TCS IGRs ranged from 39 bp for the salRK to 394 bp for the fasBCA locus (Table 3). In the case of sptRS, only 2 nucleotides separated sptR from spy0873. As this range is not sufficient for meaningful analysis in the context of IGRs, this TCS IGR region was not used in subsequent analyses. spy0873 encodes a putative transcriptional regulator that responds to amino acid deficiency and induces the stringent transcriptomic response [64, 65]. Accordingly the intergenic DNA upstream of spy0873 was used in these analyses. Within the TCS IGRs, overall nucleotide diversity ranged from 0.00066 for silA IGR locus to 0.02233 for the spy0873 IGR locus (Table 3). Single nucleotide polymorphisms (SNP) accounted for most of allelic variation observed. However multi-nucleotide deletions were present in six of the IGR alleles, including a 15 base pair deletion at the 5’ end of spy1556 alleles from emm2, emm3, emm75, and 4 of the 6 emm28 genomes. A deletion was also present in the IfasB5 allele seen in three of the four emm6 genomes, each of which was associated with ARF [66]. Together the 14 TCS loci could be used to identify 38 unique sequence type profiles within the 64 NCBI genomes.

Table 3. Variation in two-component system untranslated intergenic regions.

Association between IGR allelic profile, emm-type and emm-pattern type

In order to assess relationships between emm-type and TCS IGRs, phylogenetic trees of each TCS IGR were constructed (S1 Fig). In most cases TCS IGR allelic variation was conserved within an emm-type; only 14 examples of multiple TCS IGR allelic variation within an emm-type were present across the NCBI dataset (S1 Fig and Table 4). However the same TCS IGR alleles were often present in more than one emm-type demonstrating that individual TCS IGR alleles do not possess sufficient specificity to resolve individual emm-types in all instances (Table 4). Nevertheless, the concordance between emm-type and IGR alleles was high, with adjusted Wallace coefficients ranging from 0.81 to 1.0 (Table 5).

Table 4. Variant allelic-types observed in Group A Streptococcus two-component system intergenic untranslated regions.

Table 5. Wallace coefficients1 of Group A Streptococcus emm-types, emm-clusters, emm-patterns, and intergenic alleles upstream of two-component system operons.

emm-pattern typing [67] and emm-clustering [5] group individual emm-types on the basis of genetic variation across the mga locus [68] or emm gene variation, respectively. emm-pattern type is also a surrogate for GAS niche colonisation preferences. emm-pattern A-C types are typically throat isolates, emm-pattern D types are skin isolates and emm-pattern E isolates are generalists with no tissue tropism. When compared to the individual TCS IGR phylogenetic trees, emm-pattern type did not segregate strongly with clades for any TCS IGR loci, with adjusted Wallace coefficients ranging from 0 to 0.57 (Table 5). There was also only weak association between TCS IGR alleles and emm-cluster type. As an example IfasB5 allele was present in some, but not all D4 emm-cluster genomes, but was also present in emm-cluster E1 and E6 isolates. With regards to absent IGRs, the majority of genomes lacking the srtR IGR belonged to emm-types from emm-cluster clade Y.

Associations between TCS IGRs, rheumatogenicity and nephritogenicity

To test whether single TCS IGRs could be used as markers of classic rheumatogenic or nephritogenic emm-types, the location of emm-types representing these disease groups were mapped to each phylogenetic tree (S1 Fig). Across all the TCS IGRs, the only association observed for rheumatogenic genomes occurred within covR IGR alleles; one of these alleles (IcovR1) was present in the three rheumatogenic emm-types (emm5, 6 and 18) (Fig 2). However, the same allele (IcovR1) was also present in other emm-types, and was in fact, the most abundant IcovR1 allele. In contrast, all emm1 isolates possessed IcovR3. There was also no direct association between any one allele and nephritogenic isolates. Interestingly the ciaR IGR alleles (IciaR4 and IciaR5) from nephritogenic emm-types (emm12, 49, and 59) grouped on a branch separated from the rheumatogenic emm-types (S1 Fig).

Fig 2. Dendrogram of the intergenic region of the covR gene identified within 64 GAS genome sequences.

Bootstrap values (percentage from 1000 replicates) of greater than 40% are shown at the bifurcating nodes. emm-pattern (A-C, D, and E) and disease associations are also shown. ARF/RHD = acute rheumatic fever/ rheumatic heart disease, PSGN = post streptococcal glomerulonephritis, SF = scarlet fever, PH = pharyngitis, COM = asymptomatic community, O = other, and U = unknown.

To analyse associations between disease and evolutionary relationships more closely, the phylogeny of genomes was inferred using a concatenation of all 14 TCS IGRs, representing 2659 base pairs. Across these sequences 137 polymorphic nucleotide sites were present, and used to infer phylogeny (Fig 3). As expected, genomes of the same emm-type clustered together. Conversely, phylogeny did not correlate with emm-pattern type. Again with the exception of emm1, rheumatogenic emm-type genomes (that is, emm5, 6 and 18) clustered together. No evidence for the clustering of nephritogenic (emm12, 49, and 59) invasive emm-types was apparent.

Fig 3. Dendrogram of concatenated variation in the upstream intergenic region of two-component systems within 64 reference GAS genomes.

The tree was constructed using concatenated sequences of the polymorphic nucleotide of all 14 TCS IGR regions. Bootstrap values (percentage from 1000 replicates) of greater than 40% are shown at the bifurcating nodes. Disease and emm-pattern association are also shown. ARF = acute rheumatic fever, PSGN = post streptococcal glomerulonephritis, Scarlet = scarlet fever, Community = asymptomatic carriage.

TCS IGR variation of draft genomic sequences

To assess whether relationships observed above held across a larger number of emm-types, the IGR allelic variants, sequence type profiles and phylogenetic tree based on concatenated sequences were reconstructed using TCS IGR sequences drawn from 879 draft genomes representing 123 emm-types. With the inclusion of these sequences, 397 nucleotide sites were found to be variable, and 289 unique sequence type profiles were also identified. Analysis of this expanded dataset revealed that IcovR1 was again the most prevalent covR IGR allele, as found in 226 of the 879 genomes (26%), compared with 50% of the reference genomes. Again, emm5, emm6, and the majority of emm18 clustered together, with emm23 genomes, but separate to other emm-types. Exception to this was observed in the emm18 isolates (S2 Fig), where phylogenetic clustering corresponded directly to the geographic location and time of sampling. That is, isolates were sampled as follows: M18a, M18f, and M18g were sampled in the United States of America and Canada; M18c and M18j in Kenya (2010–2011); M18b in Kenya (2002); and M18h in Fiji. Additionally, across all 943 genomes, seven of the TCS loci (covRS, irr/ihk, liaFSR, sptRS, maeKR, trxTSR and vicRK) were found full and intact in greater than 95% of the isolates.

Evidence for recombination in IGR regions

Extensive recombination has been described in both GAS virulence genes, as well as genes used for multi-locus sequence typing (MLST) typing in GAS [4, 69, 70]. Within this study, a recombination event was defined as the presence of two or more variant nucleotide positions within an allele of related isolates (as defined by emm-type), one of which is also found in an unrelated isolate (that is, an unrelated emm-type). Using this definition, recombination was not observed in the TCS IGRs of the NCBI dataset. However, variability in the length of the spy1556 IGR of the emm-type 2, 3, and 28 isolates correlated with the integration of the phage-like elements, Φ10270.3, Φ315.4, and Φ6180.2, respectively [16, 71]. Variability was also observed in the length of spy1556 IGR of the emm-types 4, 9, 22, 55, 75, 89, and 102 isolates, suggesting similar integration of mobile genetic elements at this locus in these emm-types (all of which are emm-pattern-type E, except emm55). However, when the same data was interrogated using fastGEAR, no recent or ancestral recombination events within IGRs were identified.

Variation in TCS coding regions

When the variability of the cds of the 14 TCS operons in the 943 genomes was analysed, the majority were intact. SNPs were the most commonly observed variation, but multi-nucleotide insertions and deletion were also observed. Table 6 summarises the key measures of nucleotide diversity including allele-types, polymorphic nucleotide sites, nucleotide diversity, and Simpson diversity of the TCS alleles. The concordance between emm-type and coding sequence alleles was not as high as for IGR alleles, with adjusted Wallace coefficients ranging from 0.453 to 0.765 (S1 Table). An example of the lower intra-strain concordance of GAS TCS genes is represented graphically in the phylogeny of spy1556 variants (S3 Fig). Only ihk, liaS, spy1553, and srtS (all histidine kinases), and spy1062 (a response regulator) were inferred to be under positive selection pressure. FastGEAR output inferred that the trxTSR and spy1556/3 TCS operons tested had the greatest number of predicted ancestral and recent recombination loci with 14 and 12, respectively (Table 6). A distinctive recombination-related polymorphism was observed in the putative receiver domain of spy1556 cds (196–276 nt locus) of the Subclade X isolates (S3 Fig). When translated this locus displayed polymorphism in amino acid residues 72 to 74 of ‘EHA’ or ‘QES’. All A-C emm-pattern-types were observed to have the ‘QES’ variant. spy1556 and spy1553 cds also had the highest values of nucleotide diversity. At the amino acid level, non-sense mutations were most frequently observed in the histidine kinases, CovS and SilB. All the genes lacking non-sense mutations were response regulators (that is, covR, irr, liar, sptR, trxR, and vicR). The deletion of maeP observed in the analysis of the 64 genomes was also reproduced here. Twenty one emm89.0 and two emm73 genomes displayed complete or partial deletion of maeP, respectively.

Table 6. Variation in the nucleotide sequences of the two-component system coding regions.


The identification of common genetic variants that discriminate rheumatogenic and non-rheumatogenic emm-types at both the TCS IGR level, and indeed across entire GAS genomes will result in new insights into the molecular mechanisms underpinning ARF/RHD, or assist in developing molecular tools for predicting the rheumatogenic potential of specific GAS isolates. This is particularly relevant given the paucity of GAS RHD models available to study this specific GAS disease [72]. Here we characterised the TCS cds and IGR regions as a novel approach for the identification of GAS emm-types or strains associated with GAS disease. Unlike virulence genes, the TCS IGRs are not under direct selection pressure. Nevertheless, the fact that these IGR regions modulate TCS expression levels, which in turn alters expression of downstream genes suggests that variation in these regions may be indirectly implicated in the evolution of the pathogenesis of GAS.

Most GAS typing systems are based on the emm gene and its surrounding regions (mga locus). These schemes are used as tools to predict disease and/or niche colonisation propensity, or functional attributes of isolates of specific emm-types. However due to the high levels of recombination in these regions [73, 74], as well as the loci used for T-typing GAS [75] these systems have not been used to infer evolutionary relationships between emm-types. Within our dataset, variation in the polymorphic nucleotide sites across all 14 GAS TCS IGRs was sufficiently powerful to enable discrimination of individual emm-types. By inference sequence variation in these regions was also predictive of emm-pattern and emm-cluster-type. Subsequent phylogenetic analysis of concatenated sequences revealed that three historical ARF associated emm-types (emm5, emm6 and emm18), all belonging to ‘single protein emm–cluster clade Y’ in the emm-cluster-type system, grouped together [66, 76, 77]. As our results reflect variability across 14 loci, they suggest these three emm-types have a shared evolutionary history, and supports recent whole genome comparisons [78]. It also suggests that the genomic sequences of these emm-types may contain conserved nucleotide and or functional protein attributes that increase their propensity to cause ARF, which were vertically inherited by these emm-types. However, several of the emm18 concatenated sequences, as well as all emm1 and emm3 did not group within this cluster. Closer analysis of the clinical and epidemiological data of these isolates indicated differences in both time and geography of sampling when compared to other emm18 isolates in the study, highlighting that genetic drift, clouding potential relationships, can occur within emm-types. Regarding emm1 and emm3, emm-cluster typing also groups these emm-types separately from emm5, emm6 and emm18, placing them in the A-C3 and A-C5 emm-cluster [5]. These findings suggest that evolutionary relatedness of GAS emm-types, as predicted using conserved sequences, will be insufficient in itself to predict propensity of a GAS emm-type to cause disease. More broadly, as epidemiological studies of streptococcal disease in developing nations routinely fail to report the presence of traditional ‘rheumatogenic’ emm-types, the concept of ‘rheumatogenic’ emm-types is currently in flux [79]. In this context, historical rheumatogenic emm-types may have reflected the epidemiology of disease in North America and Europe at the time the studies were conducted [8082], but may not be representative of ARF/RHD at a global scale.

The current study did not attempt to define a minimum set of TCS IGRs that can be used to identify emm–types. The data here suggest that several of the TCS IGRs will be less useful in this regard. Notably, silAB was present in only ~24% of GAS strains tested. Moreover, the locus is variably present within different genomes of the same emm-type. The proximity of silAB to a transposase in silAB-containing genomes also suggest the locus is part of a mobile genetic element [25, 45], and subject to horizontal gene transfer. Secondly, as fewer allelic variants were recovered for the salKR and srtRS IGRs these sequences will provide lower discriminatory power. At the single locus level, the only correlation between IGR allelic variation and rheumatogenicity or nephritogenicity occurred with the covRS and ciaR loci. Each of the genomes from rheumatogenic genomes possessed the IcovR1 allele. However these alleles were also common in non-ARF associated isolates, demonstrating the presence of this allele is not predictive by itself of an ARF emm-type. In contrast ciaR IGR alleles from nephritogenic emm-types segregated separately from the rheumatogenic ciaR IGR alleles. All these alleles, recovered from 12 of the 27 emm-types of the NCBI dataset possessed the same distinctive consecutive four base pair feature, likely caused by two proximal insertion and deletion events. The ciaRH locus regulates the expression of metabolism and stress response genes, including acid stress response [36, 37]. In GAS emm49 (a nephritogenic emm-type) ciaRH has previously been shown to regulate expression of proteins involved in transport of various molecules across the bacterial membrane, as well as virulence factors including hemolysin [36].

Considering now the coding regions, the TCS genes were generally present and conserved intact, consistent with the important functions of the corresponding proteins. No strong correlations were observed between the cds, and emm-type or clinical outcomes. However, a recombination-related DNA motif was observed in the spy1556 response regulator gene that was found in the Subclade X (see S3 Fig) isolates of the genomes tested. When translated, all of the A-C pattern (throat-associated) types possessed the derived ‘QES’ amino acid variant in the putative receiver domain of Spy1556. Spy1556 is a member of the yesN/araC family of response regulators. In previous transcriptomic studies, M5005 GAS mutants of the cognate Spy1553 histidine kinase suggested a role in the regulation of up to 40% of the genome, particularly in the stationary phase [22]. Recently, the transcription of the MGAS8232 Spy1556 homologue was found to be down regulated in the presence of full-length functional RocA [83].

The only IGR with a higher nucleotide diversity than spy1556/3 was that of the sptRS/spy0873 operon. The sptRS TCS is reported to be a crucial regulator of complex carbohydrate metabolism [46]. spy0873 is a relA homologue which is implicated in the synthesis of alarmone in the ‘stringent’ response to amino acid deprivation [65], thus suggesting a complex role for this putative functional group in metabolic adaptation to changing nutritional abundance. Here the nucleotide diversity values of sptR, sptS, and spy0873 were similar to the mean value for the other genes tested, they collectively possessed only four truncated allele-types, and there was only one inference of recombination. Selection pressure analysis inferred negative selection pressure on this operon. The allelic types of sptR, sptS, and spy0873 did not correlated strongly with emm-type, niche or disease; detracting from their utility as biological markers.

GAS emm89 has recently emerged as an invasive epidemic pathogen of global significance [6, 61, 84]. The emm-types are divided into three clades [78, 85], one of which, emm89 clade 3, is responsible for outbreaks in Canada and other European geographic locations. [86]. Despite lacking the hyaluronic acid-producing hasABC operon, emm89 clade 3 has outcompeted both clades 1 and 2 in its rise to pandemic prominence in multiple northern hemisphere countries [61, 84, 87]. However, controversy exists around the evolutionary history of isolates within this emm-type [6]. It has recently been suggested that neither emm-typing nor MLST-typing is capable of resolving the exact evolutionary history, and that examination of other genetic features is required [6, 84]. Friães et al. noted that pre-epidemic emm89 isolates were either ST101 MLST in the United Kingdom, or in other countries they were ST407 and ST408 MLST sequence types which were single locus variants of ST101 [87]. They contend that epidemic emm89 clade 3 has emerged from pre-epidemic isolates of type ST101 in the United Kingdom, and from types ST407 and ST408 in other geographic locations by independent evolutionary events [6, 87]. In contrast, Beres et al arrived at the divergent conclusion that clade 1 is the common ancestral clade from which clades 2 and 3 have evolved [6, 78]. Our study found all emm89.0 subtype isolates lacked maeP, a transporter gene of the malic enzyme (ME) pathway. In contrast emm89.14 (Fiji and Australia) and emm89.8 (Kenya) did not. The reference genomes for the three clades, clade 1 (ST407: USA), clade 2 (ST101: Italy), clade 3(ST101: USA), and the pre-epidemic UK strain of H293 (ST101: UK), all of which are emm89.0 and possess the maeP deletion. Thus, the absence of maeP is a putative marker that may be used for identification of the emm89.0 subtype.

The ME pathway facilitates utilisation of malate as a supplemental carbon source [49]. The genes of the GAS malic enzyme pathway are highly conserved and arranged as two diverging operons. MaeKR encodes the maeKR TCS, while maePE encodes a putative L-malate transporter (MaeP), and malic enzyme (MaeE) [88]. MaeKR is required for the expression of the maePE, in vitro. Expression of maeP and maeE is increased in the presence of malate and acid environments [49, 63], but repressed by glucose [49]. That withstanding, the role of maeKR and maePE in virulence of GAS is not clear. While recombinant GAS isolates harbouring deletions in maeP, maeK and maeR have reduced virulence in murine models, the maeE deficient mutants in the same study displayed increased virulence [49]. Despite a previous study stating all GAS isolates possess the genes of the malic enzyme pathway [49], twenty one emm89.0 and two emm73 genomes displayed complete or partial deletion of maeP, respectively. Of the three MaeP amino acid variants observed, two were present in emm89. Both emm89 and emm73 are of the E4 pattern-type. Together these data suggest that the maeKR loci, involved in malate transport and pH response [49], is not essential in these emm-types.

A finding of this study was that the same single nucleotide deletion was present in emm89.0 MGAS27061 covS and also in emm1 MGAS5005 resulting in a frameshift non-sense mutation with likely loss of function of the protein [89]. Functional covRS regulates up to 15% of the GAS genome, primarily via the repression of gene transcription [90]. In this manner, intact covS mediates a general stress response by transducing multiple environmental cues including elevated Mg2+ signal, temperature, acidic pH, and high salinity [7]. Key neutrophil resistance virulence factor genes including hasA, sic, ideS, sda1, speA, ska, and scp are upregulated in covRS mutants [60]. As such, covRS mutants more resistant to phagocytosis and killing by human neutrophils [60] and display hypervirulence in murine models of systemic GAS infection [60, 91, 92]. Inactivation of covS also abrogates the acidic-stress-dependent repression of the genes, significantly increasing bacterial virulence during infection [93]. GAS covRS mutants have been reported to be less able to establish infection due to increased hyaluronic capsule expression [7]. Furthermore, the key virulence factor speB is strongly downregulated in both covRS mutants and speB expression is a prerequisite for virulence in murine models of invasive GAS disease [7]. Mutations in covRS, the global GAS response regulator, are found more frequently in GAS recovered from invasive infections than from pharyngeal infections, demonstrating a link between TCS polymorphisms and disease outcome [60, 91, 94]. Non-sense mutation of MGAS27061 covS is consistent with the epidemiology of this clade 3 emm89.0 isolate.


Here we have demonstrated that the polymorphic nucleotides of all 14 GAS TCS IGRs were sufficiently different to discriminate emm-type. Phylogenetic analysis of this variability revealed that the ARF-associated emm-types 5, 6, and 18 (‘single protein emm-cluster clade Y’ cluster-types) grouped together, and separately from the majority of other GAS emm-types. These findings suggest that the genetic factors that increase the propensity for these emm-types to cause ARF/RHD are likely found in the core genome of these emm-types, and may have been vertically inherited from a common ancestor. Further complete analysis of full genomic sequences comparing these strains with strains not as strongly associated with ARF/RHD may bring new insights into the molecular mechanisms underpinning ARF/RHD, and assist the development of molecular tools for predicting the rheumatogenic potential of specific GAS isolates. However the fact that there was very little association between individual two-component IGRs and ARF/RHD underscores the complexity of GAS disease, and indicates that while transcription regulatory networks of GAS and the virulence genes they control contribute to ARF/RHD pathogenesis, the presence and/or absence of individual genes or genetic markers are not sufficient to predict the disease causing potential of individual GAS isolates. The TCS coding regions did not correlated as strongly with emm–type as the TCS IGRs, and no strong correlation was observed between the coding regions and disease. We identified a recombination-related, subclade-forming DNA motif in the nucleotide sequence encoding the putative receiver domain of the Spy1556 response regulator, of which the same variant was observed in all of the A-C emm-pattern throat-associated isolates. Finally we identified the deletion of the malate transporter, maeP that serves as a putative marker for the emm89.0 subtype, which has been implicated in invasive outbreaks.

Supporting information

S1 Table. Published draft genomes (Sheet 1) Wallace coefficients of coding regions (Sheets 2).


S1 Fig. Dendrograms of the polymorphic nucleotides in the intergenic regions upstream of individual Group A Streptococcus two-component systems intergenic annotated with emm-type emm-pattern and disease association.


S2 Fig. Dendrogram of concatenated polymorphic nucleotides in the intergenic regions upstream of 14 Group A streptococcus two-component systems.

Annotations of emm-type, emm-cluster, emm-pattern and autoimmune disease association included (unique sequences n = 289 of 943 genomes). Acute rheumatic fever- (ARF), and post-streptococcal glomerulonephritis (PSGN)–related genomes are also shown.


S3 Fig. Maximum likelihood phylogenetic trees based on the allelic variants of the GAS TCS genes.



  1. 1. Carapetis JR, Steer AC, Mulholland EK, Weber M. The global burden of Group A Streptococcal diseases. Lancet Infect Dis. 2005;5:685–94. pmid:16253886
  2. 2. Cunningham MW. Pathogenesis of Group A Streptococcal infections. Clin Microbiol Rev. 2000;13:470–511. pmid:10885988
  3. 3. Tyrrell GJ, Lovgren M, St Jean T, Hoang L, Patrick DM, Horsman G, et al. Epidemic of Group A Streptococcus M/emm 59 Causing Invasive Disease in Canada. Clin Infect Dis. 2010;51:1290–7. pmid:21034198
  4. 4. Bessen DE. Molecular basis of serotyping and the underlying genetic organization of Streptococcus pyogenes. Streptococcus pyogenes: Basic biology to clinical manifestations. Oklahoma City (OK): University of Oklahoma Health Sciences Center; 2016.
  5. 5. Sanderson-Smith M, De Oliveira DM, Guglielmini J, McMillan DJ, Vu T, Holien JK, et al. A systematic and functional classification of Streptococcus pyogenes that serves as a new tool for molecular typing and vaccine development. J Infect Dis. 2014;210:1325–38. pmid:24799598
  6. 6. Wilkening RV, Federle MJ. Evolutionary constraints shaping Streptococcus pyogenes–host interactions. Trends Microbiol. 2017.
  7. 7. Walker MJ, Barnett TC, McArthur JD, Cole JN, Gillen CM, Henningham A, et al. Disease manifestations and pathogenic mechanisms of Group A streptococcus. Clinical microbiology reviews. 2014;27(2):264–301. pmid:24696436
  8. 8. Cunningham MW. Pathogenesis of group A streptococcal infections and their sequelae. Hot Topics in Infection and Immunity in Children IV: Springer; 2008. p. 29–42.
  9. 9. Bisno AL. The concept of rheumatogenic and non-rheumatogenic group A streptococci. Streptococcal diseases and the immune response. 1980:789–803.
  10. 10. Stollerman GH. Rheumatic fever in the 21st century. Clin Infect Dis. 2001;33:806–14. pmid:11512086
  11. 11. Sharma A, Nitsche-Schmitz D. Challenges to developing effective streptococcal vaccines to prevent rheumatic fever and rheumatic heart disease. Vaccine (Auckl). 2014;4:39–54.
  12. 12. McDonald MI, Towers RJ, Fagan P, Carapetis JR, Currie BJ. Molecular typing of Streptococcus pyogenes from remote Aboriginal communities where rheumatic fever is common and pyoderma is the predominant streptococcal infection. Epidemiology & Infection. 2007;135(8):1398–405.
  13. 13. Smeesters PR, Mardulyn P, Vergison A, Leplae R, Van Melderen L. Genetic diversity of Group A Streptococcus M protein: implications for typing and vaccine development. Vaccine. 2008;26(46):5835–42. pmid:18789365
  14. 14. Baroux N, D'ortenzio E, Amédéo N, Baker C, Ali Alsuwayyid B, Dupont-Rouzeyrol M, et al. The emm-cluster typing system for group A Streptococcus identifies epidemiologic similarities across the Pacific region. Clinical infectious diseases. 2014;59(7):e84–e92. pmid:24965347
  15. 15. McGregor KF, Spratt BG, Kalia A, Bennett A, Bilek N, Beall B, et al. Multilocus sequence typing of Streptococcus pyogenes representing most known emm types and distinctions among subpopulation genetic structures. J Bacteriol. 2004;186:4285–94. pmid:15205431
  16. 16. Bessen DE, McShan WM, Nguyen SV, Shetty A, Agrawal S, Tettelin H. Molecular epidemiology and genomics of group A Streptococcus. Infection, Genetics and Evolution. 2015;33:393–418. pmid:25460818
  17. 17. Steer AC, Law I, Matatolu L, Beall BW, Carapetis JR. Global emm type distribution of Group A Streptococci: Systematic review and implications for vaccine development. Lancet Infect Dis. 2009;9:611–6. pmid:19778763
  18. 18. Smeesters PR, McMillan DJ, Sriprakash KS, Georgousakis MM. Differences among Group A Streptococcus epidemiological landscapes: consequences for M protein-based vaccines? Expert Rev Vaccines. 2009;8:1705–20. pmid:19905872
  19. 19. Sumby P, Porcella SF, Madrigal AG, Barbian KD, Virtaneva K, Ricklefs SM, et al. Evolutionary origin and emergence of a highly successful clone of serotype M1 group A Streptococcus involved multiple horizontal gene transfer events. The Journal of infectious diseases. 2005;192(5):771–82. pmid:16088826
  20. 20. McMillan DJ, Beiko R, Geffers R, Buer J, Schouls LM, Vlaminckx B, et al. Genes for the majority of Group A Streptococcal virulence factors and extracellular surface proteins do not confer an increased propensity to cause invasive disease. Clin Infect Dis. 2006;43:884–91. pmid:16941370
  21. 21. Bessen DE. Population biology of the human restricted pathogen, Streptococcus pyogenes. Infection, Genetics and Evolution. 2009;9(4):581–93. pmid:19460325
  22. 22. Sitkiewicz I, Musser JM. Expression microarray and mouse virulence analysis of four conserved two-component gene regulatory systems in Group A Streptococcus. Infect Immun. 2006;74:1339–51. pmid:16428783
  23. 23. Tesorero RA, Yu N, Wright JO, Svencionis JP, Cheng Q, Kim J-H, et al. Novel regulatory small RNAs in Streptococcus pyogenes. PloS one. 2013;8(6):e64021. pmid:23762235
  24. 24. Thorpe HA, Bayliss SC, Hurst LD, Feil EJ. Comparative Analyses of Selection Operating on Nontranslated Intergenic Regions of Diverse Bacterial Species. Genetics. 2017;206(1):363–76. pmid:28280056
  25. 25. Vega LA, Malke H, McIver KS. Virulence-related transcriptional regulators of Streptococcus pyogenes. 2016.
  26. 26. Leday TV, Gold KM, Kinkel TL, Roberts SA, Scott JR, McIver KS. TrxR, a new CovR-repressed response regulator that activates the Mga virulence regulon in Group A Streptococcus. Infection and immunity. 2008;76(10):4659–68. pmid:18678666
  27. 27. Churchward G. The two faces of Janus: virulence gene regulation by CovR/S in Group A Streptococci. Molecular microbiology. 2007;64(1):34–41. pmid:17376070
  28. 28. Athey TB, Teatero S, Li A, Marchand-Austin A, Beall BW, Fittipaldi N. Deriving Group A Streptococcus typing information from short-read whole-genome sequencing data. Journal of clinical microbiology. 2014;52(6):1871–6. pmid:24648555
  29. 29. Zakour NLB, Davies MR, You Y, Chen JH, Forde BM, Stanton-Cook M, et al. Transfer of scarlet fever-associated elements into the Group A Streptococcus M1T1 clone. Scientific reports. 2015;5.
  30. 30. Davies MR, Holden MT, Coupland P, Chen JH, Venturini C, Barnett TC, et al. Emergence of scarlet fever Streptococcus pyogenes emm12 clones in Hong Kong is associated with toxin acquisition and multidrug resistance. Nature genetics. 2015;47(1):84–7. pmid:25401300
  31. 31. Lees JA, Vehkala M, Välimäki N, Harris SR, Chewapreecha C, Croucher NJ, et al. Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes. Nature communications. 2016;7:12797. pmid:27633831
  32. 32. Seale AC, Davies MR, Anampiu K, Morpeth SC, Nyongesa S, Mwarumba S, et al. Invasive Group A Streptococcus infection among children, rural Kenya. Emerging infectious diseases. 2016;22(2):224. pmid:26811918
  33. 33. Tokajian S, Eisen JA, Jospin G, Coil DA. Draft genome sequences of Streptococcus pyogenes strains associated with throat and skin infections in Lebanon. Genome announcements. 2014;2(3):e00358–14. pmid:24831139
  34. 34. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinform. 2012;28:1647–49.
  35. 35. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32:1792–97. pmid:15034147
  36. 36. Riani C, Standar K, Srimuang S, Lembke C, Kreikemeyer B, Podbielski A. Transcriptome analyses extend understanding of Streptococcus pyogenes regulatory mechanisms and behavior toward immunomodulatory substances. International Journal of Medical Microbiology. 2007;297(7):513–23.
  37. 37. Tatsuno I, Isaka M, Okada R, Zhang Y, Hasegawa T. Relevance of the two-component sensor protein CiaH to acid and oxidative stress responses in Streptococcus pyogenes. BMC research notes. 2014;7(1):189.
  38. 38. Kreikemeyer B, Boyle MD, Buttaro BAL, Heinemann M, Podbielski A. Group A Streptococcal growth phase‐associated virulence factor regulation by a novel operon (fas) with homologies to two‐component‐type regulators requires a small RNA molecule. Molecular microbiology. 2001;39(2):392–406. pmid:11136460
  39. 39. Voyich JM, Sturdevant DE, Braughton KR, Kobayashi SD, Lei B, Virtaneva K, et al. Genome-wide protective response used by Group A Streptococcus to evade destruction by human polymorphonuclear leukocytes. Proceedings of the National Academy of Sciences. 2003;100(4):1996–2001.
  40. 40. Ichikawa M, Minami M, Isaka M, Tatsuno I, Hasegawa T. Analysis of two-component sensor proteins involved in the response to acid stimuli in Streptococcus pyogenes. Microbiology. 2011;157(11):3187–94.
  41. 41. Sitkiewicz I, Green NM, Guo N, Bongiovanni AM, Witkin SS, Musser JM. Adaptation of Group A Streptococcus to human amniotic fluid. PLoS One. 2010;5(3):e9785. pmid:20352104
  42. 42. Namprachan-Frantz P, Rowe HM, Runft DL, Neely MN. Transcriptional analysis of the Streptococcus pyogenes salivaricin locus. Journal of bacteriology. 2014;196(3):604–13. pmid:24244008
  43. 43. Okada R, Matsumoto M, Zhang Y, Isaka M, Tatsuno I, Hasegawa T. Emergence of type I restriction modification system‐negative emm1 type Streptococcus pyogenes clinical isolates in Japan. Apmis. 2014;122(10):914–21. pmid:25356467
  44. 44. Plainvert C, Dinis M, Ravins M, Hanski E, Touak G, Dmytruk N, et al. Molecular epidemiology of sil locus in clinical Streptococcus pyogenes strains. Journal of clinical microbiology. 2014:JCM. 00290–14.
  45. 45. Belotserkovsky I, Baruch M, Peer A, Dov E, Ravins M, Mishalian I, et al. Functional analysis of the quorum-sensing streptococcal invasion locus (sil). PLoS pathogens. 2009;5(11):e1000651. pmid:19893632
  46. 46. Shelburne SA, Sumby P, Sitkiewicz I, Granville C, DeLeo FR, Musser JM. Central role of a bacterial two-component gene regulatory system of previously unknown function in pathogen persistence in human saliva. Proceedings of the National Academy of Sciences of the United States of America. 2005;102(44):16037–42. pmid:16249338
  47. 47. Dmitriev AV, McDowell EJ, Kappeler KV, Rieck LD. The Rgg regulator of Streptococcus pyogenes influences utilization of nonglucose carbohydrates, prophage induction, and expression of the NAD-glycohydrolase virulence operon. Journal of bacteriology. 2006;188(20):7230–41. pmid:17015662
  48. 48. Kawada-Matsuo M, Tatsuno I, Arii K, Zendo T, Oogai Y, Noguchi K, et al. Two-Component Systems Involved in Susceptibility to Nisin A in Streptococcus pyogenes. Applied and environmental microbiology. 2016;82(19):5930–9. pmid:27474716
  49. 49. Paluscio E, Caparon MG. Streptococcus pyogenes malate degradation pathway links pH regulation and virulence. Infect Immun. 2015;83:1162–71. pmid:25583521
  50. 50. Hasegawa T, Okamoto A, Kamimura T, Tatsuno I, HASHIKAWA SN, Yabutani M, et al. Detection of invasive protein profile of Streptococcus pyogenes M1 isolates from pharyngitis patients. Apmis. 2010;118(3):167–78. pmid:20132182
  51. 51. Liu M, Hanks TS, Zhang J, McClure MJ, Siemsen DW, Elser JL, et al. Defects in ex vivo and in vivo growth and sensitivity to osmotic stress of Group A Streptococcus caused by interruption of response regulator gene vicR. Microbiology. 2006;152(4):967–78.
  52. 52. Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinform. 2009;25:1451–52.
  53. 53. Carrico J, Silva-Costa C, Melo-Cristino J, Pinto F, De Lencastre H, Almeida J, et al. Illustration of a common framework for relating multiple typing methods by application to macrolide-resistant Streptococcus pyogenes. J Clin Microbiol. 2006;44:2524–32. pmid:16825375
  54. 54. Severiano A, Pinto FR, Ramirez M, Carriço JA. Adjusted Wallace coefficient as a measure of congruence between typing methods. J Clin Microbiol. 2011;49:3997–4000. pmid:21918028
  55. 55. Kumar S, Stecher G, Tamura K. MEGA7: Molecular Evolutionary Genetics Analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:1870–74. pmid:27004904
  56. 56. Feil EJ, Holmes EC, Bessen DE, Chan M-S, Day NP, Enright MC, et al. Recombination within natural populations of pathogenic bacteria: short-term empirical estimates and long-term phylogenetic consequences. Proc Natl Acad Sci. 2001;98:182–7. pmid:11136255
  57. 57. McMillan DJ, Kaul SY, Bramhachari P, Smeesters PR, Vu T, Karmarkar M, et al. Recombination drives genetic diversification of Streptococcus dysgalactiae subspecies equisimilis in a region of streptococcal endemicity. PloS one. 2011;6:21346.
  58. 58. Mostowy R, Croucher NJ, Andam CP, Corander J, Hanage WP, Marttinen P. Efficient inference of recent and ancestral recombination within bacterial populations. Molecular biology and evolution. 2017;34(5):1167–82. pmid:28199698
  59. 59. Yano H, Iwamoto T, Nishiuchi Y, Nakajima C, Starkova DA, Mokrousov I, et al. Population structure and local adaptation of MAC lung disease agent Mycobacterium avium subsp. hominissuis. Genome biology and evolution. 2017;9(9):2403–17. pmid:28957464
  60. 60. Sumby P, Whitney AR, Graviss EA, DeLeo FR, Musser JM. Genome-wide analysis of Group A Streptococci reveals a mutation that modulates global phenotype and disease specificity. PLoS Pathog. 2006;2:5.
  61. 61. Turner CE, Abbott J, Lamagni T, Holden MT, David S, Jones MD, et al. Emergence of a new highly successful acapsular Group A Streptococcus clade of genotype emm89 in the United Kingdom. MBio. 2015;6:00622–15.
  62. 62. Zhu L, Olsen RJ, Nasser W, de la Riva Morales I, Musser JM. Trading capsule for increased cytotoxin production: contribution to virulence of a newly emerged clade of emm89 Streptococcus pyogenes. MBio. 2015;6:e01378–15. pmid:26443457
  63. 63. Pancholi V, Caparon M. Streptococcus pyogenes metabolism. 2016.
  64. 64. Nanamiya H, Kasai K, Nozawa A, Yun CS, Narisawa T, Murakami K, et al. Identification and functional analysis of novel (p) ppGpp synthetase genes in Bacillus subtilis. Molecular microbiology. 2008;67(2):291–304. pmid:18067544
  65. 65. Steiner K, Malke H. Life in protein‐rich environments: the relA‐independent response of Streptococcus pyogenes to amino acid starvation. Molecular microbiology. 2000;38(5):1004–16. pmid:11123674
  66. 66. Port GC, Paluscio E, Caparon MG. Complete genome sequences of emm6 Streptococcus pyogenes JRS4 and parental strain D471. Genome announcements. 2015;3(4):e00725–15. pmid:26139722
  67. 67. Bessen DE, Lizano S. Tissue tropisms in Group A Streptococcal infections. Future Microbiol. 2010;5:623–38. pmid:20353302
  68. 68. Hollingshead SK, Readdy TL, Yung D, Bessen DE. Structural heterogeneity of the emm gene cluster in Group A Streptococci. Mol Microbiol. 1993;8:707–17. pmid:8332063
  69. 69. Hanage WP, Fraser C, Spratt BG. The impact of homologous recombination on the generation of diversity in bacteria. J Theor Biol. 2006;239:210–19. pmid:16236325
  70. 70. Sumby P, Porcella SF, Madrigal AG, Barbian KD, Virtaneva K, Ricklefs SM, et al. Evolutionary origin and emergence of a highly successful clone of serotype M1 Group A Streptococcus involved multiple horizontal gene transfer events. J Infect Dis. 2005;192:771–82. pmid:16088826
  71. 71. Beres SB, Musser JM. Contribution of exogenous genetic elements to the group A Streptococcus metagenome. PloS one. 2007;2(8):e800. pmid:17726530
  72. 72. Rush CM, Govan BL, Sikder S, Williams NL, Ketheesan N. Animal models to investigate the pathogenesis of rheumatic heart disease. Front Pediatr. 2014;2.
  73. 73. Bessen DE, McGregor KF, Whatmore AM. Relationships between emm and multilocus sequence types within a global collection of Streptococcus pyogenes. BMC microbiology. 2008;8(1):59.
  74. 74. McMillan DJ, Sanderson-Smith ML, Smeesters PR, Sriprakash KS. Molecular markers for the study of streptococcal epidemiology. Host-Pathogen Interactions in Streptococcal Diseases: Springer; 2012. p. 29–48.
  75. 75. Falugi F, Zingaretti C, Pinto V, Mariani M, Amodeo L, Manetti AG, et al. Sequence variation in Group A Streptococcus pili and association of pilus backbone types with lancefield T serotypes. The Journal of infectious diseases. 2008;198(12):1834–41. pmid:18928376
  76. 76. Smoot JC, Barbian KD, Van Gompel JJ, Smoot LM, Sylva GL, Sturdevant DE, et al. Genome sequence and comparative microarray analysis of serotype M18 Group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proceedings of the National Academy of Sciences. 2002;99(7):4668–73.
  77. 77. Holden MT, Scott A, Cherevach I, Chillingworth T, Churcher C, Cronin A, et al. Complete genome of acute rheumatic fever-associated serotype M5 Streptococcus pyogenes strain Manfredo. Journal of bacteriology. 2007;189(4):1473–7. pmid:17012393
  78. 78. Beres SB, Kachroo P, Nasser W, Olsen RJ, Zhu L, Flores AR, et al. Transcriptome remodeling contributes to epidemic disease caused by the human pathogen Streptococcus pyogenes. MBio. 2016;7:403–16.
  79. 79. Remenyi B, Carapetis J. Acute rheumatic fever and chronic rheumatic disease. Pediatric and Congenital Cardiology, Cardiac Surgery and Intensive Care: Springer; 2014. p. 2329–50.
  80. 80. Steer AC, Kado J, Jenney AW, Batzloff M, Waqatakirewa L, Mulholland EK, et al. Acute rheumatic fever and rheumatic heart disease in Fiji: prospective surveillance, 2005–2007. Medical Journal of Australia. 2009;190(3):133–5. pmid:19203310
  81. 81. Bessen DE, Carapetis JR, Beall B, Katz R, Hibble M, Currie BJ, et al. Contrasting molecular epidemiology of group A streptococci causing tropical and nontropical infections of the skin and throat. The Journal of infectious diseases. 2000;182(4):1109–16. pmid:10979907
  82. 82. Pruksakorn S, Sittisombut N, Phornphutkul C, Pruksachatkunakorn C, Good MF, Brandt E. Epidemiological analysis of non-M-typeable group A Streptococcus isolates from a Thai population in northern Thailand. Journal of clinical microbiology. 2000;38(3):1250–4. pmid:10699034
  83. 83. Lynskey NN, Goulding D, Gierula M, Turner CE, Dougan G, Edwards RJ, et al. RocA truncation underpins hyper-encapsulation, carriage longevity and transmissibility of serotype M18 group A streptococci. PLoS pathogens. 2013;9(12):e1003842. pmid:24367267
  84. 84. Beres SB, Olsen RJ, Saavedra MO, Ure R, Reynolds A, Lindsay DS, et al. Genome sequence analysis of emm89 Streptococcus pyogenes strains causing infections in Scotland, 2010–2016. Journal of medical microbiology. 2017;66(12):1765–73. pmid:29099690
  85. 85. Zhu L, Olsen RJ, Nasser W, Beres SB, Vuopio J, Kristinsson KG, et al. A molecular trigger for intercontinental epidemics of Group A Streptococcus. The Journal of clinical investigation. 2015;125(9):3545–59. pmid:26258415
  86. 86. Teatero S, Coleman BL, Beres SB, Olsen RJ, Kandel C, Reynolds O, et al., editors. Rapid emergence of a new clone impacts the population at risk and increases the incidence of type emm89 Group A Streptococcus invasive disease. Open Forum Infect Dis; 2017: Oxford University Press US.
  87. 87. Friães A, Machado MP, Pato C, Carriço J, Melo-Cristino J, Ramirez M. Emergence of the same successful clade among distinct populations of emm89 Streptococcus pyogenes in multiple geographic regions. MBio. 2015;6(6):e01780–15. pmid:26628724
  88. 88. Miguel-Romero L, Casino P, Landete J, Monedero V, Zúñiga M, Marina A. The malate sensing two-component system MaeKR is a non-canonical class of sensory complex for C4-dicarboxylates. Scientific Reports. 2017;7(1):2708. pmid:28577341
  89. 89. Friães A, Pato C, Melo-Cristino J, Ramirez M. Consequences of the variability of the CovRS and RopB regulators among Streptococcus pyogenes causing human infections. Scientific reports. 2015;5:12057. pmid:26174161
  90. 90. Dalton TL, Scott JR. CovS inactivates CovR and is required for growth under conditions of general stress in Streptococcus pyogenes. Journal of bacteriology. 2004;186(12):3928–37. pmid:15175307
  91. 91. Walker MJ, Hollands A, Sanderson-Smith ML, Cole JN, Kirk JK, Henningham A, et al. DNase Sda1 provides selection pressure for a switch to invasive Group A Streptococcal infection. Nat Med. 2007;13:981–85. pmid:17632528
  92. 92. Engleberg NC, Heath A, Miller A, Rivera C, DiRita VJ. Spontaneous mutations in the CsrRS two-component regulatory system of Streptococcus pyogenes result in enhanced virulence in a murine model of skin and soft tissue infection. The Journal of infectious diseases. 2001;183(7):1043–54. pmid:11237829
  93. 93. Chiang -Ni C, Tseng H-C, Hung C-H, Chiu C-H. Acidic stress enhances CovR/S-dependent gene repression through activation of the covR/S promoter in emm1-type group A Streptococcus. International Journal of Medical Microbiology. 2017;307(6):329–39. pmid:28648357
  94. 94. Kansal RG, Datta V, Aziz RK, Abdeltawab NF, Rowe S, Kotb M. Dissection of the molecular basis for hypervirulence of an in vivo—Selected phenotype of the widely disseminated M1T1 strain of Group A Streptococcus bacteria. J Infect Dis. 2010;201:855–65. pmid:20151844