Complete Genome Sequence of a Pantón-Valentine Leukocidin-Negative Community-Associated Methicillin-Resistant Staphylococcus aureus Strain of Sequence type 72 from Korea

In the past decade, community-associated (CA-) infections with methicillin-resistant Staphylococcus aureus (MRSA) have emerged throughout the world. Different CA-MRSA strains dominate in different geographical locations. Many CA-MRSA lineages contain genes coding for the Pantón-Valentine leukocidin. However, the role of this leukotoxin in CA-MRSA pathogenesis is still controversial. The genome sequences of two key PVL-positive CA-MRSA strains (USA300, USA400) have been reported, but we lack information on the more recently found PVL-negative CA-MRSA strains. One such strain is the PVL-negative ST72, the main cause of CA-MRSA infections in Korea. Here, we report the entire genome sequence of CA-MRSA ST72 and analyze its gene content with a focus on virulence factors. Our results show that this strain does not have considerable differences in virulence factor content compared to other CA-MRSA strains (USA300, USA400), indicating that other toxins do not substitute for the lack of PVL in ST72. This finding is in accordance with the notion that differential expression of widespread virulence determinants, rather than the acquisition of additional virulence factors on mobile genetic elements, such as PVL, is responsible for the increased virulence of CA- compared to hospital-associated MRSA.


Introduction
Staphylococcus aureus is a dangerous human pathogen and S. aureus infections are among the most frequent causes of deaths in hospitals around the globe [1]. Antibiotic resistance severely complicates the treatment of such infections [2]. After the worldwide spread of penicillin-resistant strains in the mid of the last century, methicillin became the treatment option of choice for S. aureus infections. However, methicillin resistance developed quickly, and nowadays methicillin-resistant S. aureus (MRSA) is pandemic, with many countries reporting methicillin resistance rates among hospital-associated S. aureus isolates that exceed 50% [3].
In the 1990s, MRSA infections -previously limited to predisposed patients in hospitals -started occurring in otherwise healthy people in the community without connections to the hospital setting [4]. These community-associated (CA-) MRSA infections are on a worldwide surge, with the United States so far seeing the most pronounced CA-MRSA epidemic. Most CA-MRSA infections are moderately severe infections of the skin and soft tissues, but more severe and sometimes fatal infections, such as necrotizing pneumonia, are also seen with CA-MRSA. The rise of CA-MRSA is due to the development of strains that combine methicillin resistance with a high level of aggressive virulence not commonly present in hospitalassociated (HA-) MRSA [5]. Globally, CA-MRSA infections are caused by different lineages that are not genetically related [6]. In addition to pronounced virulence, which they all share, some strains may express specific additional factors that further promote pathogenic success. For example, the epidemic U.S. strain, USA300, harbors a mobile genetic element (MGE), called arginine catabolic element (ACME), containing a gene, speG, which abrogates the unique hypersensitivity of S. aureus to host-produced polyamines, thereby increasing survival on the human skin and during skin abscesses [7].
Soon after the first cases of CA-MRSA infections, researchers started determining the genetic composition of CA-MRSA isolates, including by whole-genome sequencing [8,9]. The most important initial finding was that genes encoding a specific leukotoxin, the Pantón-Valentine leukocidin (PVL), were present in virtually all CA-MRSA isolates, while these genes, called lukS and lukF, are much less frequent among HA-MRSA [10]. However, further investigation using animal infection models indicated that the extraordinary virulence of CA-MRSA is only in part, and in comparatively rare types of infections such as severe lung infection, due to PVL [5,11,12]. Rather, high expression of core genome-encoded virulence determinants, such as phenol-soluble modulins (PSMs) and αtoxin, appears to have played a preeminent role in the evolution of CA-MRSA virulence, especially as it relates to skin infections [12][13][14].
In addition to animal experiments casting doubt on the key role of PVL and the acquisition of the prophage ΦSLT containing the lukSF genes during the evolution of CA-MRSA virulence, several CA-MRSA strains have been isolated in the meantime that do not harbor lukSF and therefore do not produce PVL [5]. One such strain is the CA-MRSA clone of sequence type (ST) 72 that is the premier cause of CA-MRSA infections in Korea [15]. Here, to gain insight in the genetic composition as a basis for the extraordinary, PVL-independent virulence of that strain, we determined the whole genome sequence of a CA-MRSA ST72 isolate and analyzed its composition with a focus on virulence factors.

DNA extraction
Total bacterial DNA was isolated from an overnight culture of HL1 (ST72) using lysostaphin digestion and the method of Marmur [16]. The pellet of a 1-ml overnight culture was resuspended with 400 µl buffer P1 (Qiagen), to which 20 µl lysostaphin were added. The sample was incubated at 37°C for

Genome sequencing, annotation and comparative analysis
DNA sequencing was performed using fragment and pairedend libraries on a Roche 454 FLX genome sequencer using Titanium chemistry (454 Life Sciences [a Roche company], Branford, CT), Coverage of 40-60× or higher was obtained according to the manufacturer's recommendations. Reads were assembled using the GS Assembler Version 2.5 software program. 454 reads were re-aligned to the contigs to check for assembly accuracy, and misassembled portions were corrected All gaps between contigs were closed by oligonucleotide primer design, PCR fragment generation, and Sanger sequencing of the PCR products on an Applied Biosystems 3730XL DNA sequencer (Applied Biosystems, Foster City, CA), Primer walking of large gaps or correction of ambiguous base calls was performed by PCR and Sanger sequencing. Open reading frame (ORF) calling was performed using public and proprietary algorithms, with a minimum length cutoff of 40 amino acids, as previously described [17,18]. The genome sequence and annotation of HL1 (ST72) are deposited in DDBJ/EMBL/GenBank under the NCBI accession numbers CP003979 (chromosome), CP003980 (pHL1), CP003981 (pHL2). ORFs displaying evidence of frameshifts or mutations leading to premature stop codons were identified by proprietary algorithms and were manually verified. Genome comparisons were performed using ClustalW alignments.

Genome analysis
To compare genome contents, we used a proprietary algorithm (Integrated Genomics) and compared every ORF from the HL1 genome with the ORFs from the two selected S. aureus genomes (FPR3757, MW2). Each ORF was compared using three metrics: similarity score (1e-10), functional annotation, and protein length. For ORFs to be considered for inclusion the following criteria had to be satisfied. The considered ORFs must have a P-Score similarity to an ORF from the HL1 genome of 1e-10 or less. In addition, the ORF in consideration should either have same functional annotation as of the HL1 genome's ORF or have at least 80% matching protein length with that of the corresponding ORF in the HL1 genome. Further genome analyses were performed using ERGO (Integrated Genomics).

Overview
The CA-MRSA isolate HL1, previously also termed CN1 [13], was obtained from the pus of a necrotizing fasciitis infection in an 80-year old patient in the Seoul area of South Korea [15]. The isolate was then determined to be PVL-negative, resistant to clindamycin and erythromycin, belong to ST72 and spa type t324, and harbor SCCmec type IVa. Furthermore, we previously showed that it has high virulence in a rabbit model of skin infection and promotes neutrophil lysis to an extent almost as pronounced as seen with strain USA300 and within the range of other global CA-MRSA isolates [13].
The isolate has a genome of 2,757,070 base pairs (bp), with 2,726 assigned ORFs, of which 1970 have assigned functions, 53 tRNAs, and 9 rRNAs. The overall GC content is 32.79%. The HL1 genome is thus a little shorter than those of the prominent CA-MRSA strains USA300 (strain FPR3757, 2,917,469 bp, 2672 ORFs) and USA400 (strain MW2, 2,820,462 bp, 2644 ORFs) [4,8], which will serve as comparison in this study. HL1 also harbors two plasmids, which we named pHL1 and pHL2, of 3332 and 2472 bp, respectively.

Virulence factors and pathogenicity islands
HL1 harbors the vSaα, vSaβ, and vSaγ genomic islands and the ΦSa3 prophage (Tab. 1). The vSaα, vSaβ, and vSaγ genomic islands occur in most S. aureus strains and harbor a series of virulence factors. For example, vSaα encodes a series of exotoxins and a restriction/modification system. vSaβ encodes another restriction/modification system, four serine proteases, and the bicomponent leukocidin LukDE. vSaγ encodes exotoxins, fibrinogen-binding proteins, a formyl peptide receptor 1 inhibitory protein, α-toxin, and the phenol-soluble modulins (PSMs) PSMβ1 and PSMβ2. The ΦSa3 prophage is not always present in S. aureus; for example, it is absent from the HA-MRSA strain COL, representing the archaic MRSA lineage. It contains genes encoding immune evasion factors, namely the chemotaxis inhibitory protein CHIPS, the complement inhibitor SCIN, and staphylokinase. Of note, ΦSa3 splits the gene encoding the sphingomyelinase βtoxin into two non-functional parts.
These islands and the ΦSa3 prophage are also present in the USA300 and USA400 CA-MRSA strains. Although the overall genetic composition differs, there are no considerable differences in known virulence determinants (Figs. 1, 2). As a notable exception, HL1 vSaβ does not contain the bsa operon coding for the biosynthesis of the epidermin-like lantibiotic aureodermin. It has been shown that this operon encodes a functional lantibiotic [19]. However, production levels are very low under all conditions tested so far, and the role of aureodermin in S. aureus physiology is unclear [20]. Importantly, HL1 does not contain virulence factors encoded on genomic islands or prophages that are absent from USA300 and USA400. Vice versa, both USA300 and USA400 contain the ΦSa2 (ΦSLT) prophage harboring the lukSF genes coding for PVL. USA300 and USA400 also contain the vSa3, and USA400 the vSa4 pathogenicity islands, which are absent from HL1. vSa3 contains two enterotoxin genes with unknown function in virulence, and vSa4 does not comprise known virulence factors. Finally, USA300 contains ACME, harboring the recently described virulence and colonization factor speG [21].
A genome-wide analysis on 103 virulence genes to determine whether virulence factors present in HL1 are also present in USA300 and USA400 further confirmed that the overall composition of virulence genes in HL1, USA300, and USA400 is almost identical (Tab. 2). USA300 is missing two    1); and USA400 lacks the chemotaxis-inhibiting protein (CHIPs, gene chs), present in HL1 and USA300 on the ΦSa3 prophage (Fig. 2). The gene coding for the staphylococcal complement inhibitor SCIN is truncated in HL1, but not in USA300 or USA400. The gene crtN coding for the biosynthesis of the carotenoid staphyloxanthin immune evasion factor is annotated as truncated in HL1, because it appears to be split in a very  short gene and a larger gene. However, the larger gene likely codes for the functional, previously described CrtN protein [22], while the shorter ORF may be a pseudo-gene. Similarly, HL1 and USA400 have a split coagulase gene, whereas it is not split in USA300. Finally, 28 surface proteins, identified by the sortase substrate LPXTG motif [23], did not show differences in composition between the three analyzed strains (Tab. 3). PSMs, short, amphipathic, α-helical peptides have recently been recognized as key determinants of CA-MRSA virulence [24]. They are produced by all S. aureus strains, except in naturally occurring Agr-defective mutants, in which no PSMs are detectable. HL1, USA300, and USA400 produce comparably high amounts of PSMs, while HA-MRSA often lack pronounced production of PSMs owing to low activity or mutation of the Agr system [13]. Accordingly, all three strains harbor the genetic loci encoding the PSMα peptides PSMα1 through PSMα4, PSMβ1 and PSMβ2, and the δ-toxin, which is encoded within RNAIII of the Agr quorum-sensing virulence regulator. Notably, the psmα genes are often not annotated in S. aureus genomes, due to their short length, but we annotated them in the HL1 genome and ascertained presence in USA300 and USA400.
Altogether, these findings are in good accordance with the comparable virulence of HL1 and USA400 in the rabbit model of skin infection that we performed previously [13]. The slightly higher virulence of USA300 in that model may be due to ACME-encoded speG, which a recent report indicates promotes virulence during skin infection [7]. Furthermore, they are in line with previous epidemiological studies and infection models performed in the PVL-sensitive rabbit indicating that PVL, as the only major virulence determinant that is absent from HL1 and present in USA300 and USA400, does not have a significant impact on virulence during CA-MRSA skin infection [12,13,25].

SCCmec
The comparison of the HL1 genome with those of USA300 and USA400 only revealed pronounced differences, encompassing a series of genes, in a very limited number of locations. The strongest difference was seen in and surrounding the SCCmec IVa element (Fig. 3). As previously described by Park et al., the class B mec cassette of ST72 SCCmec IVa element contains a tnp20 IS element and a pUB110 region in addition to the SCCmec IVa element of USA300 and USA400 [26]. The tnp20 IS element is also found at another location in the genome of USA300, but not USA400. The pUB110 region comprises genes involved in kanamycin and bleomycin resistance, in addition to plasmid replication and recombination enzymes. In the left extremity (L-C) region of HL1 SCCmec, there are four genes with high similarity to genes found in the S. epidermidis ATCC12228 genome, indicating that they may have been acquired from S. epidermidis -similar to SCCmec IV in general, which is believed to have originated from S. epidermidis at an earlier time [27]. One of the genes in this region has a homologue at another location in the USA400 genome, but not in USA300, while the others lack homologues in those strains. The tnp20 IS element and the USA400 homologue are in vicinity of the nifR3 gene, encoding a putative nitrogen regulatory protein, in the respective USA300 and USA400 genomes, while the nifR3 gene is found close to the J1 region of HL1 SCCmec, indicating that recombination between these two genomic sites occurred.

Concluding remarks
Our analysis of a CA-MRSA genome of a previously not analyzed ST advances our understanding of the CA-MRSA pandemic, especially as it represents the first sequenced genome of a PVL-negative CA-MRSA strain. It further strengthens the hypothesis that the success of CA-MRSA as pathogens is multi-factorial rather than dependent on the acquisition of specific, CA-MRSA-characteristic virulence determinants. In particular, we did not find virulence determinants in ST72 with an apparent role of substituting for the absence of PVL, such as, especially, other leukotoxins. While further detailed functional and gene expression analyses will be necessary, these findings suggest that the virulence of ST72 CA-MRSA is mainly dependent, as previously suggested for CA-MRSA in general [5,14], on gene regulatory adaptations enhancing the expression of core genome-encoded virulence determinants.