Identification of Gene Clusters Associated with Host Adaptation and Antibiotic Resistance in Chinese Staphylococcus aureus Isolates by Microarray-Based Comparative Genomics

A comparative genomic microarray comprising 2,457 genes from two whole genomes of S. aureus was employed for the comparative genome hybridization analysis of 50 strains of divergent clonal lineages, including methicillin-resistant S. aureus (MRSA), methicillin-susceptible S. aureus (MSSA), and swine strains in China. Large-scale validation was confirmed via polymerase chain reaction in 160 representative clinical strains. All of the 50 strains were clustered into seven different complexes by phylogenetic tree analysis. Thirteen gene clusters were specific to different S. aureus clones. Ten gene clusters, including seven known (vSa3, vSa4, vSaα, vSaβ, Tn5801, and phage ϕSa3) and three novel (C8, C9, and C10) gene clusters, were specific to human MRSA. Notably, two global regulators, sarH2 and sarH3, at cluster C9 were specific to human MRSA, and plasmid pUB110 at cluster C10 was specific to swine MRSA. Three clusters known to be part of SCCmec, vSa4 or Tn5801, and vSaα as well as one novel gene cluster C12 with homology with Tn554 of S. epidermidis were identified as MRSA-specific gene clusters. The replacement of ST239-spa t037 with ST239-spa t030 in Beijing may be a result of its acquisition of vSa4, phage ϕSa1, and ϕSa3. In summary, thirteen critical gene clusters were identified to be contributors to the evolution of host specificity and antibiotic resistance in Chinese S. aureus.


Introduction
Staphylococcus aureus is an opportunistic pathogen and the major causative agent of numerous hospital-and community-acquired infections in humans. It is also a common causative agent of animal infections. The major MRSA clones that cause infectious diseases worldwide are reported to belong to only a few pandemic lineages. In China, the most common human MRSA lineages belong to ST239 and ST5 [1]. Meanwhile, ST9 was identified as the dominant swine MRSA lineage in China [2]. S. aureus contains many types of genomic islands including plasmids, transposons (Tn), insertion sequences (IS), bacteriophages, pathogenic islands, and staphylococcal cassette chromosomes. These elements play a central role in the pathogen's adaptation process to different stresses, and are means to transfer genetic information among and within bacterial species [3]. Each S. aureus lineage carries a unique combination of genomic islands. In the genome of S. aureus Mu50, nine genomic islands have been identified, including vSa3, vSa4, vSaa, vSab, vSac, SCCmec, phage wSa1, phage wSa3, and Tn5801 [4]. The carriage of genomic islands in S. aureus can alter the pathogenic and resistance potential of the strains. The dissemination of particular clones in a specific environment or host in favor of other strains, or the replacement of clones in a single environment suggests a genetic basis for epidemics related to genomic islands. This has fuelled efforts to identify novel genomic islands associated with the evolution of antibiotic resistance and host adaptation in Chinese S. aureus.
Comparative genome hybridization (CGH) is an efficient method to identify critical gene clusters. When applied to pathogenic S. aureus, CGH unveils the variability in terms of gene content in regions related to pathogenicity and gives new insights into the evolutionary aspects of S. aureus. The high discriminatory power of this technique has been used to distinguish major MRSA lineages, community-associated MRSA strains, and predominant S. aureus lineages [5,6,7,8].
This study aimed to compare the genetic repertoire of different S. aureus clones through microarray-based comparative genomics to identify the gene clusters that may explain the evolutionary mystery of S. aureus: (i) Many articles reported that human MRSA may originate in animals [9], but host-specific genes or gene clusters were rarely found. (ii) MSSA showed more diverse patterns compared with the relative preponderance of a few MRSA clones. (iii) ST239 and ST5 were the most predominant MRSA clones in China [1]. From 1994 to 2000 in Beijing, ST239spa t030 rapidly replaced t037 and became the major MRSA clone [10]. In this study, we identified 13 gene clusters in the S. aureus genome associated with the evolution of antibiotic resistance and host specificity by using CGH microarray. The gene clusters were confirmed by large-scale validation via polymerase chain reaction (PCR) in 160 clinical strains. Among these clusters, several critical genes and four novel gene clusters related to the evolution of resistance and host specificity in Chinese S. aureus have not yet been reported.

Overall Genome Diversity in S. aureus
The microarray comprised all the genetic information found in only two S. aureus genomes, Mu50 and CN79. CGH microarray analysis revealed extensive genome diversity within the S. aureus species. Within the 2,457 genes present on the S. aureus microarray, all of the 50 strains shared 1,738 genes (70.7%) and 719 (29.3%) genes were absent in at least one strain. An average of 260 (10.6%) genes were absent per strain compared to the genes present on the microarray.
Cluster analysis indicated that all of the 50 strains were clustered into seven different complexes (Fig. 1). Strains in the same complex showed similar backgrounds such as isolation time, location, species, and lineage. Different complexes represented different backgrounds. Complex 1 included 11 MRSA (ST239-spa t037) isolated in Beijing before 2000. Complex 2 included 12 MRSA (ST239-spa t030) isolated from 2000 to 2006. Complex 3 included 7 MRSA and mostly ST5. All of the strains in Complex 4 were MSSA. Complex 5 included 3 ST59-MRSA and 2 ST59-MSSA. Swine MRSA were clustered in complex 6. The Australian strains were scattered, and 3 out of 6 were in complex 7.
Comparative Genomics of Human and Swine S. aureus Strains CGH microarray was used to study human-and swine-derived MRSA at the genomic level. A total of 1,851 genes were present in both human and swine strains. A total of 102 genes were associated with host specificity, specifically in human or swine MRSA (Fig. 2). Among these genes, 96 genes were present in greater than 80% human MRSA while 6 genes were present in all swine MRSA. Host-specific genes contained 56 pathogenicity island genes (3 in vSa3, 5 in vSa4, 2 in Tn5801, 6 in vSaa, 10 in vSab, 2 in vSac, and 28 in phage wSa3), 10 phage-related genes, 4 resistant-related genes (fmhC, mecR1, mecI, and lytN) [11], 2 global regulators (sarH2 and sarH3), 5 transposes, 2 helicases, and 24 hypothetical proteins.
Among these genes, several continuous genes formed 10 clusters (i.e. more than three contiguous genes) ( Table 1 and Fig. 3). Seven clusters belonged to known genomic islands vSa3, vSa4, vSaa, vSab, Tn5801, and phage wSa3. Human-specific genomic island phage wSa3 contained immune evasion complex genes that encode the staphylokinase (sak). This prophage, integrated into bhemolysin locus, has been found in most isolates infecting humans but not animals [12]. Human-specific genomic island vSab included six virulence genes, namely, splA, splB, splC, splD, splF, and lukD, which enhanced the virulence of MRSA and facilitated human infection. In addition, type I restriction modification (R-M) system gene hsdS (SAV0432 and SAV1807) were identified in human-specific genes, which confirmed their function in regulating gene horizontal transfer. Three gene clusters (C8, C9 and C10) were distinct from any known genomic islands. Cluster C8 (SAV1312-SAV1314) contained three function-unknown genes. Cluster C9 (SAV2481-SAV2499) carried two global regulators, sarH2 and sarH3, indicating its potential regulatory function in host specificity. Swine-specific cluster C10 (SAV0028-SAV0035) belonging to plasmid pUB110 contained the resistance gene aadD.
The representative genes of 10 clusters associated with host specificity were further analyzed in 76 human MRSA and 20 swine MRSA strains. The presense of representative genes showed significant differences between human and swine MRSA strains ( Table 2, P,0.005). Interestingly, not all human MRSA carried human-specific gene clusters identified by our CGH microarray. ST59 MRSA did not carry human-specific clusters C3, C4, and C6, and showed a similar pattern as swine MRSA.

Comparative Genomics of MRSA and MSSA Strains
MSSA showed more diverse patterns compared with the relative preponderance of a few MRSA clones. The differences between 6 MSSA and 18 MRSA isolated in China were analyzed, and 75 genes were found to be more frequently present in MRSA. Among these genes, 38 are located at known genomic islands, including SCCmec, vSa4, vSaa and Tn5801 (Table 1 and Fig. 3), which contained several resistance genes such as mecA, ermA, tetM, and ant (9). We also identified a novel MRSA-specific gene cluster C12, which contained 2 resistance genes [ermA and ant(9)], 3 transposase genes, and 1 function-unknown genes. Sequence alignment indicated that C12 had homology with Tn554 of S. epidermidis and may represent a novel resistant island.
To determine whether the resistance genes in MRSA isolates differed between lineages, 10 resistance genes in all of the MRSA strains analyzed by microarray were compared ( Table 3). The mecA, aphA, dfrA, isaA, msrA, and msrB were detected in all of the analyzed isolates. The ermA and tetM were absent in human ST59, swine ST9, and several ST5 strains. The ant (9) gene was detected in various lineages: ST239 (n = 20), ST59 (n = 2), and swine ST9 (n = 1), was but absent in all of the ST5 strains. The aadD gene was specific to all of the swine ST9 strains and single ST5 isolate, but absent in the ST239 and ST59 strains. In summary, the ST239 and ST5 MRSA isolates display considerable antimicrobial resistance genotype diversity, with ST239 and ST5 being the most predominant clones in China [1].
To further validate these gene clusters, 48 clinical MRSA and 48 MSSA strains were analyzed by detecting representative genes in these clusters. PCR validation results showed that four representative genes of clusters were detected in most MRSA, but were absent in most MSSA strains (Table 4, P,0.0001).

Comparative Genomics of Predominant MRSA Clones in China
From 1994 to 2000 in Beijing, the most predominant MRSA clone was ST239-spa t037. Since 2000, ST239-spa t030 has rapidly replaced t037 and has become the major clone [10]. A comparison of 11 spa t037 and 5 spa t030 genome information showed that 309 variable genes were variable in 16 strains. Ninety-eight genes are more frequent in spa t030, and 2 genes (SAV0059 and SAV0864) are more frequent in spa t037. Fifty-four pathogenic island genes (10 in vSa4, 8 in vSa3, 4 in vSaa, and 30 in phage wSa3), 12 phage-related genes, and 1 transcription-related gene were included in these genes. For ST239-spa t030 specific genes, four gene clusters (Table 1) may contribute to the MRSA evolution from ST239-spa t037 to ST239-spa t030 (Fig. 3). These gene clusters belonged to previously characterized genomic islands vSa4, phage wSa1, and wSa3. Notably, phage wSa3 was unique to ST239-spa t030 MRSA and carried two toxin genes, sak (staphylokinase) and sep (enterotoxin P), which may contribute to its increased virulence and epidemiology [13]. Besides, most variable genes found in these islands have unknown functions.
Except for ST239, ST5 was the second predominant clone in China. We analyzed the presence of antibiotic-resistant clusters via large-scale PCR validation in 43 ST239 or ST5 MRSA strains and 5 ST59 MRSA strains. Greater than 60% of the predominant clones ST239 and ST5 existed these antibiotic-resistant clusters, but none of ST59 strains existed these clusters. The carriage of multiple antibiotic resistance gene clusters probably enhanced the adaptability and competitiveness of ST239 and ST5, as well as contributed to the prevalence in China.

Discussion
Extensive genetic variations were identified among 50 strains representing the major dominant lineages of S. aureus from human or swine in China by microarray-based comparative genomic. Within the 2,457 genes present on the S. aureus microarray, 1,738 genes (70.7%) were present in all of the S. aureus strains studied, suggesting that these genes were essential for S. aureus maintenance. Conversely, 29.3% of S. aureus genes were strain-specific. Some of these genes encoded genomic islands that facilitate the colonization of specialized host or antibiotic resistance.
The carriage of genomic islands in S. aureus has the ability to alter the pathogenic-and resistance-potential of strains [3]. Overall, each S. aureus lineage carried a unique combination of genomic islands. Genomic comparison of the different complexes revealed 13 gene clusters (Table 1). Among these clusters, vSa3, vSa4, vSaa, vSab, phage wSa1, phage wSa3, SCCmec, and Tn5801 have been identified [4]. These genomic islands carried approximately one-half of the S. aureus toxins or virulence factors, and the variation of these genes contributed to the pathogenic potential of this species [14]. Meanwhile, four novel gene clusters that have not been reported before were notably revealed.
Previous studies identified that phage wSa3 was more common in human isolates than in animal isolates [6]. The phage wSa3 encoded scin, chip, and/or sak was involved in the host immune evasion and was proven to interact specifically with the human immune system [15]. In our research, genomic islands vSa3, vSa4, vSaa, and vSab, as well as two novel gene clusters (C8 and C9) were also associated with human specificity [16]. In particular, type I R-M system gene hsdS was located at vSaa, vSab, and global regulators, sarH2 and sarH3 at C9. SarH2, also known as sarU, is sarA homolog, which is repressed by sarH3 (also known as sarT) and regulates virulence genes in S. aureus [17]. The two global regulators possibly enhance the regulatory efficiency of MRSA in human infection. Further investigation of these regulators is necessary.
SCCmec, Tn5801, vSaa, vSa4, and a novel gene cluster were more frequently present in MRSA than in MSSA. These gene clusters contained abundant resistance genes [mecA, tetM, ermA, and ant (9)] that increased the virulence and resistance of MRSA [18]. Novel gene cluster C12 associated with resistance was similar to Tn554 of S. epidermidis by sequence alignment, which may transfer from S. epidermidis. Tn554 containing ermA gene was related to macrolides-lincosamides-streptogramin B resistance [19].
ST239 and ST5 were the most predominant MRSA clones in China. From 1994 to 2008 in Beijing, ST239-spa t030 rapidly replaced t037 and became the major MRSA clone [10]. In this study, vSa4, phage wSa1, and phage wSa3 were found to be unique to ST239-spa t030 and carried two toxin genes, sak and sep, that may contribute to its increased virulence and rapid replacement of ST239-spa t037 [13]. Meanwhile, large-scale validation indicated that the two major epidemic clones, ST239 and ST5 MRSA, display considerable antimicrobial resistance genotype diversity that contributes to the prevalence in China.
Comparative analysis of S. aureus suggested variations in the evolutionary history of genomic islands [20]. The movement of these genomic islands may enable S. aureus to evolve and grow through the acquisition of virulence and resistance genes. Clearly, horizontal gene transfer has played a fundamental role in the evolution of pathogenic S. aureus, particularly by the assortive recombination of genomic islands containing virulence and antibiotic resistance genes. Several genomic islands distributed within certain lineages at a higher frequency than others, suggesting some barriers to the successful horizontal transfer of genomic islands. A general barrier to horizontal gene transfer in S. aureus is the R-M system [21]. The role of the R-M system is to prevent the uptake of potentially harmful or lethal DNA such as bacteriophage that lyses and kills bacteria or to prevent the acquisition of superfluous genes that may compromise fitness due to the metabolic demand associated with their expression [22]. In our studies, the type I R-M system gene hsdS varied significantly between the strains of different complexes. Therefore, different strains with different horizontal gene transfer abilities resulted in the epidemiology of specific clones.
CGH is an efficient method to identify novel genomic islands. The microarray comprised all the genetic information found in only two S. aureus genomes, Mu50 and CN79. However, if isolates with specific biological characteristics were analyzed with these microarrays, this specific genetic information will most likely not be detected. The way to entirely exclude this problem would be to sequence all of the strains, which will help us to understand the detailed information of genomic islands identified in this study.  In brief, our study provided an overview of the genome diversity present in S. aureus in China, an important human and animal pathogen worldwide. The microarray-based comparative genomic analysis clarified the functions of known genomic islands and four novel gene clusters in the evolution of antibiotic resistance and host adaptation in Chinese S. aureus. Further investigation on the gene cluster functions is necessary.

Microarray Design
A total of 2,457 open reading frames (genes) were amplified from the whole-genome S. aureus sequenced strains Mu50 and CN79 by PCR using gene-specific primers [23]. S. aureus CN79 isolated from blood was determined as heterogeneous vancomycin-intermediate S. aureus by a population analysis profile (PAP)area under the curve (AUC) method; it belonged to the Chinese predominant clone ST239-spa t030 MRSA [24]. The purified PCR products were spotted in duplicate on CSS-1000 silylated glass slides (CEL) using a SpotArray72 microarray printing system (Perkin-Elmer Life Sciences, Massachusetts, USA) to construct the DNA microarrays.

Microarray Labeling, Hybridizations, and Scanning
Genomic DNAs were extracted using conventional sodium dodecyl sulfate lysis and phenol-chloroform extraction method. A mixture of equal quantities of Mu50 and CN79 genomic DNAs was used as reference DNA. Purified PCR products were referred to as the tested DNA. Cy3-or Cy5-labeled probes were generated by priming the reference or test DNA with random hexamers and extension with Klenow polymerase. The labeled reference and test DNAs were combined to hybridize with the microarrays by dualfluorescence hybridization. The hybridized slides were scanned using a GenePix 4100A personal microarray scanner (Axon Instruments, Foster City, California, USA.). The scanning images were processed, and the data were further analyzed using GenePix Pro 5.0 software (Axon Instruments, Foster City, California, USA) combined with Microsoft Excel software.

Microarray Data Analysis
Spots with signal intensity (median) in the channel of the reference DNA less than two folds of the local background intensity (median) were rejected from further analysis. Spots with bad data because of slide abnormalities were discarded as well. Data normalization was performed on the remaining spots using total intensity normalization methods. A ratio of intensity (Test DNA normalized intensity/Reference DNA normalized intensity) was recorded for each spot and then converted to log2. Genes with fewer than three data points were considered unreliable and were accordingly removed. The averaged log2 ratio for each remaining gene on the two replicate slides was ultimately calculated. If 20% of the strains had a gene with missing data, the gene was removed. A total of 2,457 genes were included in the final dataset. A log2 value equal to or lower than 21 was used to define the absence of a gene in a given strain. The microarray data had been deposited in public database ArrayExpress (Accession NO.: A-MEXP-2250).

Clustering and Phylogenetic Analysis
The final absent (0) or present (1) was assigned to each gene for each strain in the CGH data,. Hierarchical clustering of gene expression across species was performed with Cluster 3.0 using the uncentered Pearson correlation as the distance metric [25]. The clustered microarray data were displayed by the TreeView tool.

PCR Validation
The selected representative genes by CGH analysis were confirmed via gene-specific PCR. The primers used, listed in Table S1, were the same ones used to generate PCR products spotted on our microarray. PCR products were amplified with the following conditions: 94uC for 5 min, followed by 30 cycles of 94uC for 30 s, 60uC for 30 s, 72uC for 60 s, and a final elongation step of 72uC for 5 min. 4 mL of each reaction was run on a 1% agarose gel. A positive reaction was recorded if a single clear band with the correct size was present.

Statistical Analysis
Statistical analysis was carried out using Statistical Package for Social Sciences 14.0 for Windows (SPSS). For statistical analysis, x 2 test or Fisher's exact test was used to analyze the results. A P value of ,0.05 was considered statistically significant.