The Genome Sequence of ‘Mycobacterium massiliense’ Strain CIP 108297 Suggests the Independent Taxonomic Status of the Mycobacterium abscessus Complex at the Subspecies Level

Members of the Mycobacterium abscessus complex are rapidly growing mycobacteria that are emerging as human pathogens. The M. abscessus complex was previously composed of three species, namely M. abscessus sensu stricto, ‘M. massiliense’, and ‘M. bolletii’. In 2011, ‘M. massiliense’ and ‘M. bolletii’ were united and reclassified as a single subspecies within M. abscessus: M. abscessus subsp. bolletii. However, the placement of ‘M. massiliense’ within the boundary of M. abscessus subsp. bolletii remains highly controversial with regard to clinical aspects. In this study, we revisited the taxonomic status of members of the M. abscessus complex based on comparative analysis of the whole-genome sequences of 53 strains. The genome sequence of the previous type strain of ‘Mycobacterium massiliense’ (CIP 108297) was determined using next-generation sequencing. The genome tree based on average nucleotide identity (ANI) values supported the differentiation of ‘M. bolletii’ and ‘M. massiliense’ at the subspecies level. The genome tree also clearly illustrated that ‘M. bolletii’ and ‘M. massiliense’ form a distinct phylogenetic clade within the radiation of the M. abscessus complex. The genomic distances observed in this study suggest that the current M. abscessus subsp. bolletii taxon should be divided into two subspecies, M. abscessus subsp. massiliense subsp. nov. and M. abscessus subsp. bolletii, to correspondingly accommodate the previously known ‘M. massiliense’ and ‘M. bolletii’ strains.


Introduction
Rapidly growing mycobacteria (RGM) are nontuberculous mycobacteria that cause a wide spectrum of human infections; the Mycobacterium abscessus complex is the most frequent group associated with pulmonary, skin, soft tissue, and bone diseases [1,2]. The M. abscessus complex is the most drugresistant mycobacterial species known, and isolates are typically only susceptible in vitro to some parenteral agents (amikacin, cefoxitin, and imipenem) and macrolides (clarithromycin and azithromycin) [1,2], resulting in less than satisfactory treatment response rates [3,4]. Moreover, the number of cases of M. abscessus complex diseases has been reported to be increasing and have become emerging infectious diseases [3][4][5][6][7][8][9].
The M. abscessus complex has undergone many taxonomic changes since its first description in 1953 [10,11]. Because of the heterogeneity within the group, the M. abscessus complex (M. abscessus sensu lato) was divided into three species in 2006, namely M. abscessus sensu stricto, 'M. massiliense' [12], and 'M. bolletii' [13]. The three independent species were proposed primarily based on differences in the rpoB sequences and phenotypic patterns of the type strains [12,14]. However, the intraspecies-level phenotypic variation subsequently identified in clinical isolates rendered the three species phenotypically indistinguishable [15]. Thus, the differentiation of these three species has relied upon the sequencing of one or more housekeeping genes, such as rpoB, hsp65, and secA, although the emergence of isolates with interspecific composite patterns has led to conflicting identification results [16]. For example, isolates from recent Brazilian outbreaks display genetic characteristics consistent with either 'M. massiliense' or 'M. bolletii', depending on the housekeeping genes selected for the identification [15].
In 2011, M. abscessus, 'M. massiliense', and 'M. bolletii' were eventually united as a single species, M. abscessus, based on their overlapping phenotypic patterns, 100% 16S rRNA gene sequence identity, and >70% DNA-DNA hybridization values [17]. Simultaneously, two subspecies were proposed within this taxon based on the internal variability of the genotypes. The name M. abscessus subsp. abscessus was proposed to accommodate the M. abscessus sensu stricto strains, whereas the name M. abscessus subsp. bolletii was proposed to accommodate the previously known 'M. massiliense' and 'M. bolletii' strains [17]. The two subspecies can be distinguished genotypically by a single HaeIII band difference in a PCR restriction analysis (PRA)-hsp65 pattern [15]. In addition, the two subspecies share less than 96.6 and 98.7% sequence similarities in the 711-bp rpoB and 401-bp hsp65 gene sequences, respectively [15].
However, the suitability of combining the 'M. massiliense' and 'M. bolletii' strains as a single subspecies, M. abscessus subsp. bolletii, remains a matter of debate [18], mainly because they are clinically different in terms of antibiotic susceptibility and treatment [14,[19][20][21][22]. In fact, M. abscessus subsp. abscessus harbors an erythromycin ribosomal methylase gene, erm (41), and the presence of this gene is associated with inducible resistance to macrolides [23]. Indeed, inducible resistance to clarithromycin has been suggested as an explanation for the lack of efficacy of clarithromycin-based treatments against M. abscessus subsp. abscessus infection. By contrast, the 'M. massiliense' group exhibits marked susceptibility to clarithromycin, and inducible resistance to clarithromycin has not been observed during prolonged incubation [20]. Therefore, treatment response rates to clarithromycin-based antibiotic therapy are higher in patients with 'M. massiliense' group lung disease than those infected with M. abscessus subsp. abscessus [20]. However, unlike other RGM, the 'M. bolletii' group, a relatively rare pathogen at present [5,[24][25][26], has proven to be highly resistant to clarithromycin [5,13,14,19].
Differential diagnosis of the subtypes within the M. abscessus complex has been strongly recommended to determine clinical significance and assist patient management [1]. However, the traditional molecular methods (e.g. PRA or rpoB and hsp65 sequence analysis) occasionally results in the inaccurate discrimination of the complex [15,16]. In addition, the M. abscessus complex has undergone the taxonomic changes [11,15,17] and there appears to be a lack of consensus for the nomenclature of these taxa. Thus, the valid and acceptable taxonomic-designation of the M. abscessus complex is needed to be proposed.
With the advent of next-generation sequencing, which permits the analysis of high-quality genome sequences and their comparison with other genomes in public databases, taxonomic studies are increasingly using average nucleotide identity (ANI) as a method of re-classification [27]. Accordingly, in this study, we sequenced the genome of 'Mycobacterium massiliense' strain CIP 108297 and revisited the taxonomic status of 'M. massiliense' and 'M. bolletii' based on the analysis of their whole-genome sequences to ascertain whether the two tested strains comprise a single subspecies status at the genomic level.

Genome sequencing
For the genome sequencing, strain CIP 108297, the previous type strain of 'M. massiliense', was obtained from CIP (Collection of Institut Pasteur, Paris, France), and genomic DNA was extracted with a Wizard genomic DNA purification kit (Promega, WI). The draft genome sequence of strain CIP 108297 was determined by 100 base-paired end sequencing reads using the Genome Analyzer IIx (Illumina, CA), which generated 20,301,763 reads that have Q30 over 79.9%. The mapped reads covered whole genome over 364-fold and 1.6% of total reads remained only as unmapped to the genome. The sequencing reads were assembled using the CLC genomics workbench 4.5 (CLCbio, Denmark) and CodonCode Aligner (CodonCode Co., MA).

Annotation and comparative genomics
Gene identification and annotation were achieved using the RAST server [28]. Orthologous gene prediction and comparative genomic analyses were conducted at the nucleotide and amino acid levels, as described previously [29]. In brief, a segment on a target contig homologous to a query ORF was identified using the BLASTN program, and this potentially homologous region was expanded in both directions by 1,000 bp. The nucleotide sequences of the query ORF and the selected target homologous region were then aligned using a pairwise global alignment algorithm [30], and the resulting matched region in the subject contig was extracted and saved as a homolog. For amino acid comparison, BLASTP comparison of query ORFs to subject ORFs was performed using only annotated genomes. Orthologs and paralogs were differentiated by reciprocal comparison. All genome sequences belonging to the genus Mycobacterium which compared each other in this study were obtained from NCBI microbial genome resources (http://www.ncbi.nlm.nih.gov/genome).

ANI calculation & tree construction
The inter-genomic distances between two genome sequences were determined from fully or partially sequenced genomes using the average nucleotide identity (ANI) [31]. For a given pair of genomes, the query genome was cut into small pieces in silico (1020 bp), and the high-scoring segment pairs (HSPs) between the two genome comparison were determined using the BLAST algorithm [32,33]. The ANI was then calculated from the sets of HSPs, and its complement to 1 was considered for converting ANI into a distance. From this pairwise distance matrix, an ANI tree was constructed using the UPGMA (Unweighted Pair Group Method with Arithmetic Mean) clustering method.

Orthologous gene tree construction
Using the calculated gene-by-gene similarity, orthologous genes above a defined level of similarity were selected and then aligned using CLUSTALW [34]. The resulting multiple alignments were concatenated and used to construct a genome tree by the neighbor-joining method [35], as implemented in the MEGA program [36]. An evolutionary distance matrix for the neighbor-joining tree was generated according to the model of Jukes & Cantor [37]. The resultant tree topologies were evaluated by bootstrap analyses [38].

Genome sequence
The genome sequence of strain CIP 108297 was assembled into 18 contigs (>1 kb long) and deposited into GenBank under the project number PRJNA80701. The genome size was 4.97 Mb (excluding the gaps), with a G+C content of 64.1% and 4,828 predicted CDSs. M. abscessus subsp. abscessus ATCC 19977 T (PRJNA61613) [39] and M. abscessus subsp. bolletii BD T (PRJNA73695) [40] were selected as representative strains, and their genomes were compared with the newly sequenced genome. The overall genomic characteristics of the representative strains were well positioned within the range of the genomes of the M. abscessus complex ( Figure 1 and Table  1).

Genome tree
The ANI genome tree, a dendrogram based on ANI values showing evolutionary relationships, was generated using 53 M. abscessus genomes available in NCBI microbial genome resources (http://www.ncbi.nlm.nih.gov/genome) (Figure 2   subsp. abscessus ATCC 19977 T (PRJNA61613) was selected to represent highly conserved proteins of the genus Mycobacterium [43]. An orthologous gene tree was built using aligned amino acid sequences, demonstrating that the three representative strains clustered among the other Mycobacterium strains (Figure 3). In addition, 11 housekeeping genes (argH, cya, gdhA, glpK, gnd, murC, pgm, pknA, pta, purH, and rpoB, 18,893 bp) were selected for the species-level comparison from a previous study [44] of 53 Mycobacterium abscessus genomes (Figure 4). Their nucleotide sequences were aligned using CLUSTALW and the phylogenetic tree was drawn. Finally, phylogenetic tree was built using the rpoB single-gene nucleotide sequence ( Figure 5) because the rpoB sequence has been widely used as a taxonomic-classification marker [45,46]. The overall tree structure was similar, with the representative strains forming their own clusters (Figures 4 and  5). In addition, a comparative BLASTP analysis based on the three representative strains revealed group-specific genes among the representative strains (see Information S1).

Discussion
In this study, we evaluated the current taxonomic status of M. abscessus and its subspecies using a similarity analysis based on the single gene, multilocus gene, several hundreds of orthologous gene, and whole-genome levels. This species has recently been regrouped as follows: M. abscessus subspecies bolletii by uniting M. bolletii and M. massiliense, and M. abscessus subspecies abscessus [17]. The classification of these bacteria has been based on polymorphisms in various target genes [47]; however, the introduction of whole-genome sequencing has increased the availability of genetic information to facilitate the identification of organisms.
The ANI value between a given pair of genomes has been recognized as a simple and effective method to reflect the degree of evolutionary distance between the compared genomes [31,32]. Comparative studies have reported that a value of 94-96% identity represents a DNA-DNA hybridization boundary of 70%, the gold standard for categorizing a bacterial species under the current prokaryotic species concept [31,32,48,49]. The ANI values among the three genomes   representing the three groups of the M. abscessus complex were all above the cut-off value for species demarcation, clearly indicating that the three strains belong to the same species (see Information S2). Within this species boundary, the genomic distances of the three strains from one another were equivalent ( Figure 1). Moreover, a greater difference between strain CIP 108297 and M. abscessus subsp. abscessus ATCC 19977 T (3.5% of difference) was observed than between the two type strains of the two established subspecies (3.3%). This relatively high genomic distance strongly suggests the independence of the 'M. massiliense' group from the 'M.
bolletii' group at the subspecies level. This degree of genetic difference between the 'M. massiliense' and 'M. bolletii' strains is also supported by an ANI-based genome tree and a phylogenetic tree using orthologous gene alignment. The three representative strains form three distinct phylogenetic clades within the radiation of M. abscessus, indicating the existence of three independent subspecies within the species (Figures 2 and 3). Of note, many strains in public databases that are named M. massiliense and M. abscessus are mixed in the same cluster of the genome tree, obviously reflecting the need to clarify the M. abscessus subsp. abscessus clade and revise the current taxonomic status ( Figure 2).
Previous studies using single or multilocus sequencing approaches (MLSAs) have yielded inconsistent results [5,15,44]. Indeed, discrepancies between an rpoB-based gene tree and MLSA sequence identification results have been reported, with results indicating low divergence values among these groups [44]. In the present study, phylogenetic trees based on single or several genes appeared to be less powerful for the discrimination of the boundary strains (Figures 4 and 5). M. abscessus M139 belonged to the 'M. massiliense' group in the rpoB single-gene tree ( Figure 5) yet was not clustered with any representative group in the MLSA sequence tree ( Figure  4). It has been proposed that horizontal gene transfer between the M. abscessus, M. massiliense, and M. bolletii groups causes inconsistency in subspecies clustering [44], and our ANI-based genome tree definitely included M. abscessus M139 in the 'M. massiliense' group ( Figure 2). In addition, the singlegene and MLSA sequence trees in this study showed three distinct, well-clustered groups, but the currently available M. abscessus whole-genome sequences are biased toward specific sampling sites. Indeed, if the sequences from various clinical strains are added, the single-gene or MLSA sequence tree would show an ambiguous shape, as reported previously. To identify other classification criteria, 25 representative groupspecific genes were selected as molecular identification candidates by a comparative genome analysis (see Information S1). For more clear identification, all homologs which can be found in other Mycobacterium genus genomes were excluded from the candidates. Recently, Shallom et al. proposed by an array based genomic approach that subspecies-level separation is a more reasonable taxonomic identification of the M. abscessus group [50]. These results are consistent with our current study, indicating that the subspecies-level identification of the M. abscessus group should include three taxa: M. abscessus subsp. abscessus, M. abscessus subsp. massiliense, and M. abscessus subsp. bolletii.
Based on the low genomic relatedness (96.5% ANI against the other two representative strains) and the distinct phylogenetic clade discovered in this study, it is evident that 'M. massiliense' is not affiliated within M. abscessus subsp. bolletii. Thus, we propose a novel subspecies within M. abscessus to accommodate the previously known 'M. massiliense', namely M. abscessus subsp. massiliense subsp. nov. Strain CIP 108297 is the type strain of the newly suggested subspecies.
Taken together, although the genetic characteristics responsible for the different phenotypes among the representative groups were not identified, the genomic content that clearly divides these strains into three clusters and their differential drug susceptibility patterns indicate the need to reevaluate the taxonomic status of M. abscessus and classify representative groups into three subspecies. Increasing wholegenome data would facilitate the differentiation of genetic factors for the molecular differentiation of boundary strains with insufficient diversity.