Molecular investigations on a chimeric strain of Staphylococcus aureus sequence type 80

A PVL-positive, methicillin-susceptible Staphylococcus aureus was cultured from pus from cervical lymphadenitis of a patient of East-African origin. Microarray hybridisation assigned the isolate to clonal complex (CC) 80 but revealed unusual features, including the presence of the ORF-CM14 enterotoxin homologue and of an ACME-III element as well as the absence of etD and edinB. The isolate was subjected to both, Illumina and Nanopore sequencing allowing characterisation of deviating regions within the strain´s genome. Atypical features of this strain were attributable to the presence of two genomic regions that originated from other S. aureus lineages and that comprised, respectively, 3% and 1.4% of the genome. One deviating region extended from walJ to sirB. It comprised ORF-CM14 and the ACME-III element. A homologous but larger fragment was also found in an atypical S. aureus CC1/ST567 strain whose lineage might have served as donor of this genomic region. This region itself is a chimera comprising fragments from CC1 as well as fragments of unknown origin. The other deviating region comprised the region from htsB to ecfA2, i.e., another 3% of the genome. It was very similar to CC1 sequences. Either this suggests an incorporation of CC1 DNA into the study strain, or alternatively a recombination event affecting “canonical” CC80. Thus, the study strain bears witness of several recombination events affecting supposedly core genomic genes. Although the exact mechanism is not yet clear, such chimerism seems to be an additional pathway in the evolution of S. aureus. This could facilitate also a transmission of virulence and resistance factors and therefore offer an additional evolutionary advantage.


Introduction
Staphylococcus aureus (S. aureus) is a versatile pathogen that colonises or infects a large fraction of the world´s human population as well as several species of animals. Thus, it can a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 asymptomatically colonise its carriers, or alternatively cause various infections ranging from superficial skin and soft tissue infections to severe bloodstream infections. Many of its virulence factors are variably present and their genes are localized on mobile genetic elements such as plasmids, phages, transposons or on pathogenicity islands. In recent decades, some strains of S. aureus acquired resistance to many or most antibiotics. Again, resistance genes are localized on mobile, or potentially mobile, genetic elements such as staphylococcal chromosomal cassette mec (SCCmec) cassettes. Despite a vast variety of variable, mobile elements, and despite some incremental, mutation-driven variation, the overall structure of the S. aureus genome is conservative with all core genomic elements being present in all strains in one uniform sequential arrangement. Multilocus sequence typing (MLST) enables the assignment of isolates to taxonomic units, sequence types (ST) and clonal complexes (CC), based on numbered alleles of seven housekeeping genes assuming that these genes cannot be lost or truncated because of their crucial function and that the accumulation of mutations in their sequences is purely a function of time. This lead to a model of a clonal evolution of the S. aureus core genome that is driven by a time-dependent accumulation of single point mutations allowing classification based on a few marker genes into a limited number of clonal complexes comprising a number of sequence types that differ only in random mutations in these marker genes as well as of others. However, there are observations that certain non-mobile markers (among them, alleles of agr genes, genes determining capsule types or MLST markers) occur in non-related lineages [1][2][3][4]. This suggests that multiple recombination events affected at least most lineages that subsequently evolved and expanded clonally [4] and that large-scale recombination events played a role in driving the evolution of S. aureus along a frequent exchange of mobile genetic elements [3].
Some S. aureus strains show evidence of large-scale recombination events, with large fragments of their genomes clearly originating from other lineages and being inserted at the appropriate position of the recipient strain. Such a phenomenon was first postulated for ST239 where a CC30 DNA fragment of approximately 635,000 base pairs (ca. 20% of the genome) is integrated into a CC8 recipient with the integration site being localised around oriC [5,6]. Another sequence type, ST2249, harbours ST239 DNA comprising fragments of both, CC8 and CC30 that in turn are integrated into a CC45 genome [7]. Further examples for chimeric strains are ST34 and ST42 (where CC10 fragments are integrated into CC30 genomes) [5] or CC398 strains that harbour fragments of CC9 origin [8,9] as well as ST71 that carries a large insert of unknown provenance in a CC97 backbone genome [10,11].
The isolate described herein was initially subjected to microarray hybridisation, primarily for typing and detection of resistance and toxin genes. The procedure revealed unusual features for a CC80 isolate (presence of ORF-CM14 and absence of edinB and etD) that could be explained by a large-scale horizontal gene transfer. This observation prompted further investigations including Illumina and Nanopore sequencing of its entire genome and a search for the donor strain of regions assumed to be introduced by horizontal gene transfer.

Clinical background and isolates
A patient of East African background was admitted to the Dresden University Hospital (Dresden, Saxony, Germany) with a cervical SSTI that initially was suspected to be suppurative tuberculous lymphadenitis. While no mycobacteria were detected (neither immediately by microscopy after staining for acid-fast bacilli nor subsequently in MGIT and Ogawa cultures), culture of pus yielded a PVL-positive, methicillin-susceptible S. aureus (isolate Dresden-275757).
A second isolate (Oerebro-086360) was further characterised because of certain similarities with the study isolate (see below). It was isolated in Oerebro, Sweden, originating from an approximately 50 years old female patient with lobar pneumonia probably secondary to an influenza B infection. She was a Swedish citizen and denied any travel outside Sweden.

Microarray-based typing
The S. aureus isolates were initially characterized using different DNA microarray-based assays. Probes, primers as well as amplification and hybridization protocols have previously been described in detail [38][39][40].

Ilumina sequencing
Sequencing of the two strains was performed at two geographically distant facilities and at different dates (Jena, Germany, and Ö rebro, Sweden, in spring and autumn 2018, respectively), ruling out any possibility of carry-over contaminations.
Genomic DNA of Dresden-275757 was prepared using a Qiagen kit (Qiagen, Hilden, Germany) after an enzymatic lysis step with lysostaphin, lysozyme and RNAse as previously described [38][39][40]. Afterwards, whole-genome sequencing was carried out using the Illumina HiSeq2500 genome analyser and the Illumina Experiment Manager 1. ). An average coverage of 139 was achieved. The reads were assembled to contigs using SPAdes. Sequencing of Oerebro-086360 was performed as previously described [41]. DNA was automatically extracted using the QIAsymphony DSP Virus/Pathogen kit (Qiagen, Hilden, Germany) following manufacturer's instructions. Sequencing was done with the Nextera XT kit (Illumina Inc, San Diego, CA, USA) on an Illumina MiSeq. The reads with a coverage of 120 were de novo assembled with version 1.1.04 of Velvet26.

Nanopore sequencing
The Nanopore Oxford MinION platform was used for whole-genome sequencing of Dresden-275757. A detailed protocol is given in the S1 File. Briefly, no size selection was performed using 0.5 v/v AMPure XP beads (Beckman Coulter GmbH, Krefeld, Germany) to avoid DNA fragments smaller than 1.000 bp. The DNA library was generated using the native barcoding expansion kit EXP-NBD103 and the sequencing kit SQK-LSK109 (Oxford Nanopore Technologies, Oxford, UK) according to manufacturer's instructions. The flow cell FLO-MIN106 (R9-Version) was primed by the flow cell priming kit EXP-FLP001 (Oxford Nanopore). The protocol "1D Native barcoding genomic DNA" was used in version NBE_9065_v109_-revB_23May2018 (Last update: 03/09/2018). The albacore basecaller (Oxford Nanopore) translated the minion raw data (FAST5) into short long quality tagged sequence reads (FASTQ).

Bioinformatic analysis
Iterated BLAST searches were used for analysis of individual contigs in this genome (https:// blast.ncbi.nlm.nih.gov/Blast.cgi). This analysis was conducted using automated scripts for full text comparison and BLAST analysis and an in-house database of known, annotated and previously identified S. aureus genomes, genes and fragments to the query sequence. This enables the determination of identity, gene content, clonal parentage and of position within the genome of each contig given the constant order of core genomic genes in S. aureus. Finally, Nanopore and Ilumina sequences were aligned manually for reasons discussed below.
The sequence was compared to the representative CC80 strain 11819-97, GenBank CP003194.1/SAMN02603886. This is a PVL-positive CC80-MRSA-IVc with-as essentially all canonical CC80 strains-with an etD/edinB pathogenicity island. Its genome has a size of 2,846,546 nt and an average G/C content of 32.9%. Other sequences analysed included the CC8 strain COL, GenBank CP000046, and to MW2, BA000033, as reference sequence for CC1.
For comparison, two genomes where provisionally assembled from raw sequencing reads from the Short Read Archive (https://www.ncbi.nlm.nih.gov/sra/) and analysed using the program blastn (Camacho et al., 2009) from the NCBI blast+ suite. All sites were identified that matched the probe sequences. A probe without mismatches was assigned signal intensity of 0.9; with one mismatch, a signal of 0.6; with two mismatches, a signal of 0.3; with three mismatches, a signal of 0.1. The results were analysed in the same way as real array hybridisation experiments allowing identification of similar/related strains among published genome data [42].

Comparison of sequencing methods
A total of 36 Illumina contigs was considered to be chromosomal. Another one contained typically plasmid-borne sequences (including blaZ and cadX; see below). The average fragment size of the library was 220 nt. Visual inspection and comparison of these contigs to the Nanopore sequences revealed faulty assemblies of four contigs that needed to be split into two "subcontigs" each in order to allow alignment to the Nanopore sequence. Most significantly, Illumina failed to resolve a ca. 5,000 nt region within the ACME-III element that consisted of repetitive sequences. On the other hand, Nanopore showed a poor resolution of poly-A and poly-T sequence fragments resulting in the loss of approximately 15,800 nucleotides across the entire chromosome.

Characterisation of the clinical isolate
Array hybridization revealed the presence of the enterotoxin homologue ORF-CM14 and of an ACME-III element, as well as the absence of edinB and etD. Otherwise, the isolate matched previously characterized CC80 strains (see S2 File). In order to explain these discrepancies, it was sequenced using both, Illumina and Nanopore methods and resulting sequences were aligned resulting in a continuous chromosome with a total length of 2,789,663 nt and an overall G/C content of 32.98%. MLST was performed based on the consensus genome sequence and it yielded ST-80 (arcC-1, aroE-3, glpF-1, gmk-14, pta-11, tpi-51, yqiL-10).
A comparison of core chromosomal genes revealed that two separate regions in Dresden-275757 differed from CP003194 confirming and explaining the differences to canonical CC80 as observed in the array experiment (Fig 1).

Deviating Region 1
Deviating Region 1 extends from walJ (locus tag MS7_0024 in CP003194, SACOL0023 in CP000046) with a putative recombination sites located in the intergenic region between walL

PLOS ONE
and walJ. It extends to certainly include sirB (MS7_0106, SACOL0098), possibly even to sbnE (MS7_0112, SACOL0104) although the differences to canonical CC80 sequences are not large enough to clearly determine a recombination site. It can be estimated at 84,363 nt (based on a consensus of the Illumina and the Nanopore sequences, and including walJ to sirB). This corresponds to roughly 3% of the genome and includes ca. 34,000 nt belonging to the ACMEelement.
The gene content of Deviating Region 1 is described in S5 File/ Table 1 where it is also compared to a CC80 reference sequence CP003194 as well as to Oerebro-086360.
Deviating Region 1 consists of four different fragments. The first comprises the genes between walJ (MS7_0024; SACOL0023) and orfX.
The second one is an ACME-III element including the opp operon. This is a potentially motile element and thus it is not necessarily connected to the genomic replacement in this strain. It will be discussed below.

The SCC element as part of Deviating Region 1
Deviating Region 1 also comprises orfX, i.e., the integration site of SCC elements. While most published isolates and sequences of CC80 harbour SCCmec IVa or IVc elements, Dresden-275757 carries an SCC element without mecA/C genes.
The gene content of the SCC element is summarized in S5 File/ Table 2. Sequences are provided in S6 File. In short, the element consists of • a type II restriction-modification system, • ccrA/B1 recombinase genes and adjacent genes showing some similarity or relationship to SCCmec IX sequences (strain JCSC6690, GenBank AB705452.1), • a large gene with repetitive sequences that is very similar to the gene encoding a hypothetical protein DLJ55_14705 in the chromosomal DNA of strain MOK042 (GenBank CP029627.1) as well as on a plasmid of a ST508/CC45 strain, AR_0471 (chromosome CP029652.1, plasmid CP029650.1) • and an oligopeptide permease operon, i.e., opp genes or ACME-III as well as some genes for "putative proteins" as known from the ST42 strain C427, GenBank ACSQ.
A search of the short read archive of GenBank revealed two near-identical sequences of deviant CC80 strains. One of them (SAMEA3671725) harbours ACME-III while it is absent from the other one (SAMEA48342418). When performing a BLAST search with the NCBI GenBank, no significant hits over the entire length of the SCC element were obtained indicating that this element has not yet been observed, although most of its genes have already been found in other SCC elements.

Identification and characterisation of the ST567 isolate Oerebro-086360 as a potential donor of Deviating Region 1
The observation of the enterotoxin homologue ORF-CM14 rather than of the enterotoxin H gene seh normally present in canonical CC1 strains, followed by a set of CC1-like genes strongly suggests that Deviating Region 1 is of chimeric origin itself. Our database of ca. 25,000 microarray hybridization profiles was searched for potential donors of Deviating Region 1, i.e., for strains that harbour ORF-CM14 in a CC1-like core genomic backbone. Only one isolate, Oerebro-086360 a deviant strain CC1 (ST567, MLST profile 10-1-1-1-1-1-1, spa type, t1242; 07-23-12-34-34-16-34-33-13) matched these criteria. Thus it was also sequenced using Illumina Miseq.
Oerebro-086360 is a PVL-positive CC1 MSSA that differs from canonical CC1 in several features including a presence of the ORF-CM14 enterotoxin homologue and an absence of seh.
Other differences compared to canonical CC1 strains are the presence of deviant alleles of aur and isaB as well as an absence of cna and Q2G1R6/cstB (BA000033.2: 66419-67753). It also harboured an ACME-III element (opp genes and ccrAB1). The MLST marker arcC was different compared to ST1 (arcC 10 instead of arcC 1) but this difference is due to a single nucleotide polymorphism indicating mutation rather than recombination.
These observations are consistent with integration of a large alien insert around oriC. Excluding the ACME-III element, this insert can therefore be estimated to comprise roughly 150,000 nt, ranging from between arcC and aur across oriC and orfX to Q7A890/Q2YUT2 (see Fig 1).
Deviating Region 1 of Dresden-275757 and the corresponding region in the ST567 isolate Oerebro-086360 can be considered identical. This includes the gene content and the gene sequences, the presence and sequence of an ACME-III element and the fault line between Q7A890 and Q2YUT2 separating a region of unknown origin from CC1-like sequences.
The ACME-III elements of Dresden-275757 and Oerebro-086360 were largely identical to each other in both, gene content and gene sequences (see S6 File).
Therefore, we assume the lineage or strain represented by isolate Oerebro-086360 to be the donor of Deviating Region 1 in the lineage of Dresden-275757. However, the lineage of Oerebro-086360 is itself of chimeric nature comprising a large insert of DNA from a yet unidentified donor into a CC1 genome.

Deviating Region 2
Deviating Region 2 (S5 File/ Table 3, Fig 1) extends from htsB (MS7_2199, SACOL2166) to Q8NVB9 (MS7_2323, SACOL2297) or to ecfA2 (MS7_2242; SACOL2211) having a size of 33,939 to 38,645 nt (1.2 to 1.4% of the genome, which is smaller than the corresponding fragment of the CC80 reference sequence which encompasses 115,604 nt). The reason is that it spans the integration site that in canonical CC80 harbours a motile genomic element comprising of hsdS, hsdM, etD, F3TKB7, edinB and F5W4X2 (MS7_2226 to MS7_2231). This element is absent from Dresden-275757. It is also absent from all CC1 sequences.
Deviating Region 2 also comprises a gene cluster from rplQ (MS7_2243; SACOL2212) to rpsJ (MS7_2271; SACOL2240) encoding several ribosomal proteins. These genes are highly conserved among all S. aureus sequences. However, when BLASTing (https://blast.ncbi.nlm. nih.gov/Blast.cgi) contig 16 (that entirely is a part of Deviating Region 2; the others are on 21 and 19), the five highest scoring matches over the entire length of the query sequence (68,165 nt) are CC1 genomes (with, e.g., 23 nt mismatches and 2 nt gaps for BX571857.1). In general, this region in Dresden-275757 is more related to CC1 than to canonical CC80 sequences. It also appears to be closer to Oerebro-086360 than to MW2 but given the overall similarity of all sequences concerned, this is hard to assess. The adjacent regions, up-and downstream of Deviating Region 2, are very similar in Dresden-275757, Oerebro-086360 as well as reference CC1 and CC80 sequences.

The hla gene and its neighbouring genes
When comparing the sequence as well as the hybridization profile of Dresden-275757 to the CC80 reference sequence, the absence of the hla gene and its neighbouring genes (A5IS45, Q6GHS5, A5IS47, A6U0Y3, Q2FZB4, i.e., MS7_1116 to MS7_1120 or SACOL1171 to SACOL1175) can be detected. The presence of hla appears to be variable in the deviant CC80 lineage; SAMEA48342418 also lacks hla while it is present in SAMEA3671725.

Prophages
When excising the phage sequence (Contig-0007:RC, positions 133,678 to end and Contig-0012 positions 1 to 42,938) and performing a NCBI BLAST search, the four best matches, with identities of 99.97%, are all PVL phages from CC80 strains, phiSa2wa_st80 (MG029515.1), NCTC13435 (LN831036.1), GR2 (CP010402.1) and 11819-97 (CP003194.1). The PVL prophage in Dresden-275757 is integrated into the same site of the chromosome as the one in CP003194.1. The prophage sequences from both strains are co-linear and they comprise the same set of genes.

Resistance genes
Dresden-275757 carried the blaZ/I/R operon and a cadmium resistance operon (cadD/cadX) together on one contig without any known chromosomal markers, thus presumably on a plasmid.
Other resistance genes that frequently can be encountered in canonical CC80, namely aphA3, aadE and sat (neomycin, streptomycin and streptothricin resistance) as well as far1/ fusB and tet(K) (fusidic acid and tetracycline resistance), were absent.

Discussion
We identified a virulent, PVL-positive CC80 MSSA that differed in key features from canonical CC80 strains. Analysis was performed using array hybridisation, Illumina and Nanopore sequencing. While array hybridisation yields less information than sequencing, it can be routinely performed fast, automatically and economically on high numbers of clinical isolates that, in the present case, allowed the identification of the initial isolate Dresden-275757 as being of special interest as well as of Oerebro-086360 as putative donor. Illumina sequencing provided short reads of high quality sequences, but it has difficulties with repetitive sequences that, as the most relevant problem in the current project, led to a virtual miss of DLJ55_14705 within the ACME-III element. Nanopore sequencing proved unreliable with regard especially to poly-T and poly-A sequences, but it can handle repetitive sequences much better which in S. aureus also include MSCRAMM genes such as spa. With two large core genomic replacements being present in one single isolate, we assume that such-large scale horizontal gene transfers might be more common in S. aureus than previously perceived, and that the resolution of MLST with seven markers is not high enough to identify all chimeric strains. However, the combination and interaction of microarray-based assays as screening tool and NGS allows reliable identification and detailed analysis of such strains [45].
Deviating Region 1 is located close to oriC which appears to be a hotspot for chromosomal replacements (see Introduction). It comprises sequences identical to the ones from the atypical CC1/ST567 strain Oerebro-086360. This includes an ACME-III element. It also includes a stretch of DNA upstream and downstream of ACME-III with the latter part including ORF-CM14. Theoretically, this might give a hint on the putative donor of Deviating Region 1.
Possible donors for ORF-CM14 to both isolates obviously must include strains form ORF-CM14 positive lineages that are ST12, ST71, ST93, ST121, ST509, ST567, CC772, CC705, ST707, ST760, ST816, ST848, ST1094, ST1643, ST2272, ST2425, ST2616 and ST2972 (based on published sequences and author´s own microarray data). Unfortunately, genome sequences of several of these STs are not available and those that are available do not match fully the sequence of Deviating Region In both isolates, a fault-line can be observed between Q7A890 and Q2YUT2 separating downstream sequences of unknown origin from those upstream that are related to CC1 (i.e., the right border between "red" and "blue" sectors in Fig 1 and the last two columns of S5 File). This means Deviating Region 1 of Dresden-275757 includes the fault line separating the alien insert in Oerebro-086360 from the canonical CC1 core genome of that strain. This makes it very likely that an Oerebro-086360-like strain was indeed the donor of Deviating Region 1, and that this region itself is of chimeric nature, spanning CC1 and non-CC1 sequences (see Fig 1). The upstream fault line in Dresden-275757 separating CC1 from CC80 sequences (between sirB and spa or sbnE) cannot exactly been determined because of the general relatedness of CC1 and CC80 sequences.
Deviating Region 1 of Dresden-275757 and Oerebro-086360 also comprise an ACME-III element. The presence of opp genes and ccrA/B-1 recombinase genes are reminiscent of the CC34 strain 21342 (GenBank AHKU) although the sequence of ccrB-1 appears to be more related to the one from SCCmec IX. It also includes, as revealed primarily by Nanopore sequencing, a gene with repetitive sequences that is very similar to the gene encoding a hypothetical protein DLJ55_14705 in strain MOK042. This strain belongs to ST71, a lineage that comprises a large insert of unknown origin in a CC97 genome [10,11]. In strain MOK042, the gene encoding DLJ55_14705 is localised on that insert but it is not a part of a SCC element.
In addition, there is a second Deviant Region elsewhere in the genome of Dresden-275757. It is localised at a position distant from known "recombination hotspots" around oriC and the conjugative transposon ICE6013 [3] (with the latter being identical to canonical CC80). Gene content and sequences of Deviant Region 2 are highly similar to CC1 sequences including Oerebro-086360 but clearly differ from canonical CC80. Differences include, but are not limited to, an absence of edinB and etD. The region in question includes genes whose origin cannot be determined because of their high degree of conservation. For this reason, the exact boundaries of the Deviating Region cannot be identified. Interestingly, the regions adjacent to Deviant Region 2 are very similar in all sequences analysed, i.e., Dresden-275757, Oerebro-086360 as well as the reference CC1 and CC80 sequences (with differences being less than 0.5%). This could suggest that the region corresponding to Deviant Region 2 was "deviant" not in Dresden-275757 but, compared to the other three sequences, in the CC80 reference sequence. This might indicate that Deviant Region 2 in Dresden-275757 was not an alien insert of CC1 origin but that its sequence represents shared, ancestral CC1/CC80 stock and that the corresponding region in canonical CC80 (including edinB and etD) itself was an insert from another, yet unidentified, lineage.
In conclusion, the core genome of Dresden-275757 bears evidence of at least two, possibly three large-scale recombination events. First, ORF-CM14, among other genes, was introduced into a CC1 strain and, second, the resulting ORF-CM14/CC1 composite fragment was introduced into CC80. In addition, another recombination event introduced either Deviating Region 2 from CC1 into the ancestor of Dresden-275757 or the corresponding region, possibly together with edinB and etD, from an unknown donor into canonical CC80.
Thus, such complex and large-scale recombination events are unlikely to be rare and exceptional, despite a distinct clonal nature of S. aureus [46]. Although the exact mechanism is not clear, chimerism, or horizontal gene transfer of core genomic fragments not associated with mobile genetic elements, seems to be an additional pathway in the evolution of S. aureus, possibly being responsible for a transmission of virulence factors (such as ORF-CM14 in the case described herein) or of resistance genes [7]. From a more theoretical point of view, large-scale genomic substitutions, chimerism or hybridisation facilitate evolutionary leaps that cannot be achieved by accumulation of single point mutations or that would require immeasurably much more time to be achieved by mutations. If one considers the ability to evolve and adapt as an evolutionary advantage, an organism that can shuffle, swap or exchange major parts of its genome by whatever unknown mechanism should be in a better position than a strictly clonal organism. We thank for the excellent technical assistance of Byrgit Hofmann, Anja Hackbart (Friedrich-Loeffler-Institut, Jena, Germany) and Bianca Stenmark (Faculty of Medicine and Health, Ö rebro University, Ö rebro) as well as Peter Slickers, Jena, for help and advice regarding sequence analyses.