Variation among human, veterinary and environmental Mycobacterium chelonae-abscessus complex isolates observed using core genome phylogenomic analysis, targeted gene comparison, and anti-microbial susceptibility patterns

Mycobacterium chelonae is a member of the Mycobacterium chelonae-abscessus complex and a cause of opportunistic disease in fish, reptiles, birds, and mammals including humans. Isolates in the complex are often difficult to identify and have differing antimicrobial susceptibilities. Thirty-one previously identified rapidly-growing, non-tuberculous Mycobacterium sp. isolates cultured from biofilms, fish, reptiles, mammals, including humans, and three ATCC reference strains were evaluated with nine M. chelonae-abscessus complex whole genome sequences from GenBank by phylogenomic analysis, targeted gene comparisons, and in-vitro antimicrobial susceptibility patterns to assess strain variation among isolates from different sources. Results revealed minimal genetic variation among the M. chelonae strains. However, the core genomic alignment and SNP pattern of the complete 16S rRNA sequence clearly separated the turtle type strain ATCC 35752T from the clinical isolates and human reference strain “M. chelonae chemovar niacinogenes” ATCC 19237, providing evidence of two distinct subspecies. Concatenation of the partial rpoB (752 bp) and complete hsp65 (1,626 bp) sequence produced the same species/subspecies delineations as the core phylogeny. Partial rpoB and hsp65 sequences identified all the clinical isolates to the appropriate species level when respective cut-offs of 98% and 98.4% identity to the M. chelonae type strain ATCC 35752T were employed. The human strain, ATCC19237, was the most representative strain for the evaluated human, veterinary, and environmental strains. Additionally, two isolates were identified as Mycobacterium saopaulense, its first identification in a non-fish or non-human host.

. Furthermore, 170 human sequences contributed by the MNRL were included in evaluation of the sequences for potential sequevars by evaluation of single nucleotide polymorphisms (SNPs) in the 752 bp sequence. The M. chelonae ATCC 35752 T reference strain was designated as sequevar 1 and subsequent sequevars were identified by SNPs in relation to it. These sequences were then translated for evaluation of amino acid discrepancies at loci of nucleotide difference.
RAxML (version 7.2.8) was used to estimate phylogenies and produce phylogenetic comparison matrices [41]. Phylogenetic trees were obtained from DNA sequences by GTR Gamma rapid bootstrapping and search for best scoring Maximum Likelihood model with 1000 bootstrap replications. In addition, concatenated sequences, partial hsp65 (441 bp) and rpoB, as well as the concatenated complete hsp65 (1,626 bp) and rpoB (752 bp) were evaluated as described above and compared to the core genomic phylogeny for evaluation of potential for diagnostic use.
Erm (41). All isolates were evaluated for presence of erm (41) by generating a custom BLAST database for each individual assembly followed by BLASTn using the 673 bp erm (41) GenBank M. abscessus subsp. abscessus ATCC 19977 T NC 010397 as a query sequence [45].
hsp65 and PCR-restriction fragment length polymorphism analysis of hsp65 (hsp65 PRA). Extraction of the partial hsp65 (441 bp) from the annotated genome assemblies was performed in-silico. Primers Tb11 and Tb12 [18] were used to identify and extract a 441 bp region of interest including flanking sequence. Primer sequences were included in the analysis as minor variation in primer binding areas of sequences did occur.
In-silico restriction length polymorphism analysis of the partial hsp65 sequence was performed targeting restriction sites for enzymes BstEII and HaeIII. A virtual gel was used to evaluate fragments larger than 35 bp. Using an algorithm similar to Taylor et al. [46] Minimum inhibitory concentrations (MIC) and colony morphology. Antimicrobial susceptibility testing was performed for 30 isolates harvested from Middlebrook 7H11 plates using a Sensititre RAPMYCO panel (Thermofisher Thermo Scientific, Oakwood Village, OH), following Clinical and Laboratory Standards Institute recommendations [48]. Clarithromycin was evaluated on days 3 and 14 of incubation. Sensititre RAPMYCO uses a standard-ordered broth microdilution panel for susceptibility testing and previously established breakpoints for rapidly growing mycobacteria (RGM) [49,50]. In addition, colony morphologies were recorded.

Core genomic analysis
Phylogenetic comparison of isolates using core genes observed in all genomes separated and identified species within the M. chelonae-abscessus complex, as well as two outliers, seahorse1 and pipefish. The outliers were 99.4% identical to each other, but the closest reference strain, M. chelonae ATCC 35752 T , shared only 75.1% identity. BLASTn searches of the NCBI database placed the two closest to NZ_CP011269.1 Mycobacterium fortuitum strain CT6 and CP009914.1 Mycobacterium sp. VKM Ac-1817D, with only 88% identity and were removed from further analysis. The core genomes of the remaining 41 strains produced a 3,204,105 bp in length sequence with 3,141 coding sequences (CDS). Of the CDS, 2,367 were confirmed by RAST as genes, 683 were hypothetical protein CDS, and the remaining 91 were probable CDS. Within the core CDS, 16S rRNA, rpoB, hsp65 partial, hsp65 whole, gyrA, EF-Tu, Mn-sodA, and recA were present. Twenty-nine strains grouped closely with M. chelonae ATCC 35752 T using the core genomic comparison. However, four isolates were determined to be members of the M. chelonaeabscessus complex, but not M. chelonae. These isolates included seakrait, cow, turtle, and H9 (Fig 1).
Twenty-five of the 31 clinical isolates clustered with the sequenced M. chelonae ATCC 19237 with 98.4-99.6% identity (Fig 1). A mixture of human, fish, reptile, and biofilm isolates Mycobacterium chelonae-abscessus. sequences relative to nine GenBank genome sequences using a core genome from all 41 sequences. Phylogeny was produced using the best scoring Maximum Likelihood model with 1000 bootstrap replications. Dotted box delineates M. chelonae clinical isolates clustered with "M. chelonae chemovar niacinogenes" ATCC 19237 and breakdown into clustered in this large group, all with greater than 98.1% identity to each other. The current type strain M. chelonae ATCC 35752 T branched separately, with no greater than 96.5% identity to the 25 M. chelonae isolates. Minimal genetic variation was present within the isolates, although four distinct subclusters were present.

Targeted gene analysis
Gene targets evaluated by multisequence alignment produced an identity matrix for comparison of sequences. Alignments of 16S rRNA, gyrA, gyrB, EF-Tu, recA, and Mn-sodA produced erroneous clustering or separation of the isolates and/or reference strains evidenced by inaccurate phylogenetic placement of the human isolates (EF-Tu, Mn-sodA, gyrA, gyrB) or lack of species separation (16S rRNA, recA) when compared to the core genomic results. Evaluation based on these alignments was not pursued further. However, the sequences for the clinical isolates and ATCC 19237 had at least three single nucleotide polymorphisms in the complete 16S rRNA sequence that distinctly separated them from the type strain ATCC 35752 T (S1 Fig and S1 Table). Furthermore, inclusion of 13 M. chelonae and 9 M. sp. isolates from Germany and Belgium revealed higher similarity to"M. chelonae chemovar niacinogenes" ATCC 19237 and M. salmoniphilum ATCC 13758 T , respectively (S2 Fig). ITS. A 257 bp ITS sequence was extracted for the M. chelonae-abscessus isolates. However, different ITS extraction product lengths were observed for isolate H9, M salmoniphilum ATCC 13758 (256 bp), M immunogenum CCUG 47286 (267 bp), and pipefish and seahorse1 (245 bp). Multi-sequence alignment of the clinical isolates and reference strains revealed adequate grouping into species-specific branches, but the high percent identity (99.1%) between H9 and the cow and turtle strains did not provide an accurate separation of the identities of the three isolates. For this study, isolates with greater than 98.8% (254/257bp) identity at the ITS locus to M. chelonae ATCC 35752 T were considered M. chelonae (S3 Fig). hsp65. Targeted extraction of the 441bp partial hsp65 gene sequence reproduced the main M. chelonae ATCC 35752 T clusters generated by core genome analysis (S4 Fig). Isolates with greater than 98.4% identity (434-441/441 bp) to M. chelonae ATCC 35752 T were considered M. chelonae. Although minimal sequence diversity is present at this locus (0-7 bp difference), two large sub-clusters, each containing strains 99.8-100% identical to each other are present. One sub-cluster contained exclusively human isolates (H7, H10, H11, H15, H18, H19, H20) and the other a mixture of M. chelonae ATCC 19237, human (H8, H12, H13, H14), fish (cichlid, trumpetfish, seadragon1, seadragon2, seahorse2, seahorse3, seahorse4, seahorse5), and biofilm (biofilm1, bioflm2, biofilm3) isolates. The partial hsp65 sequence of human isolate H9 was 98.4% identical (434/441 bp) to M. franklinii DSM45524. The turtle and cow isolates also branched separately from the M. chelonae cluster and were 99.5% identical (439/441bp) to M. saopaulense EPM 10906. Inclusion of M. chelonae and M. sp. isolates from Nogueira et al. [19] showed a similar distribution where human M. chelonae isolates clustered together with 100% similarity to a mixture of environmental isolates, veterinary isolates, and "M. chelonae chemovar niacinogenes" ATCC 19237.
The complete 1,626 bp hsp65 multisequence alignment was more discriminating than the partial sequence and produced some clusters mirroring the core genome phylogeny (S5 Fig). All isolates with greater than 95.3% identity (1,550/1,626 bp) to M. chelonae ATCC 35752 T at the complete hsp65 were considered M. chelonae. As with the core genome and partial hsp65 phylogenies, the same group of human isolates branched together (H7, H10, H11, H15, H18, H19, H20) and shared 99.9-100% (1,625-1,626/1,626 bp) identity, but all M. chelonae isolates were greater than 99.1% identical to each other, showing minimal genetic variation in the group at this locus.

MIC susceptibility and colony morphology
Twenty-seven non-genetically identical clinical isolates and three ATCC strains were evaluated using the Sensititre RAPMYCO panel (S3 Table). Subtle phenotypic differences in colony morphologies were observed when isolates were viewed simultaneously. The majority (22/30) were nonpigmented, smooth, glossy, and raised. The cow and turtle isolates produced similar colonies, but turned the 7H11 media brown after 7 days. The pipefish and seahorse1 outliers grew as nonpigmented, granular, glossy, raised, colonies, different from all others. Isolates H12, H13, H17, seahorse5 and python1 produced nonpigmented, rough, crusty, raised colonies.

Discussion
Disease caused by members of the M. chelonae-abscessus complex in healthy and immunocompromised humans is increasing [14,[51][52][53]. M. chelonae infections are common in aquatic species and cause significant losses in certain groups of fish, particularly syngnathids (seahorses, seadragons and pipefish) [15,54,55]. Since M. chelonae-abscessus complex organisms are a human and veterinary health concern, characterization and appropriate identification methods are key to understanding the delicate balance of NTM interactions among humans, veterinary species, and the environment for disease control. Whole genome sequencing and core genome analysis was used to characterize NTM from fish, reptiles, mammals, and aquatic biofilms to investigate their genetic variation. High sequence homology was observed across M. chelonae isolates. Genetically similar strains infected a range of hosts and existed within environmental samples. A correlation between the environmental presence of M. chelonae and human disease has been established [56]. Similar strain characteristics and the low genetic variability of M. chelonae isolates from fish and biofilms suggests an environmental source of infection, a theory supported by a study of diseased pompano Trachinotus carolinus [12].
Certain human isolates tended to cluster using the different gene targeted sequencing methods, while others were more genetically similar to the aquatic animal or biofilm isolates. The consistent clustering of isolates H7, H10, H11, H15, H18-H20, suggests an epidemiologic link, although they share no known geographic or environmental associations. Human isolates H12, H13, H14, and H16 were genetically similar to fish and biofilm isolates, and to human "M. chelonae chemovar niacnogenes" ATCC 19237. It is reasonable to speculate that they may have originated from aquatic sources [57][58][59].
Core genomic comparison accurately identified closely related species in the M. chelonaeabscessus complex, as well as two divergent outliers (pipefish and seahorse1) cultured from syngnathid fish. Additional targeted gene sequencing, dDDH, and PRA analysis (S2 Table and [61]. M. chelonae ATCC 35752 T also had a slightly different antimicrobial sensitivity profile than ATCC 19237 and the other M. chelonae isolates (S3 Table). Likewise, dDDH showed a difference in relatedness between the clinical isolates and M. chelonae ATCC 35752 T . The genomic and antimicrobial data support recognition of two M. chelonae subspecies and indicate that use of M. chelonae ATCC 35752 T as a type strain may not be optimal for phylogenetic studies of M. chelonae isolates.
Core genome comparison revealed that earlier identification methods lacked fidelity for identification of M. chelonae isolates. Power of the core comparisons was high, because over half of the bacterial genome consisting of 4,898,027 bp and 4,489 CDS for M. chelonae ATCC 35752 T [62], was used for analysis. In the core alignment, 65.4% of the genome and 70% of the conserved coding regions were analyzed, including common housekeeping genes that are employed independently for species identification, such as EF-Tu, SecA, gyrA, Mn-SodA, 16S rRNA, rpoB, and hsp65. As a result, two human mycobacterial sequences in GenBank previously identified as M. chelonae 1518 and M. abscessus subsp. abscessus MC 1518 were found to be incorrect. The core alignment and presence of erm (41)  Similar to other published studies, WGS provided the greatest discrimination of M. chelonae-abscessus complex isolates, but is not yet practical in diagnostic settings where multilocus sequence analysis offers a practical alternative [10,19,63]. Comparison of commonly targeted genes to the core genome indicated that concatenated complete hsp65 and partial rpoB sequences were diagnostically useful. Isolates with identities greater than 98.4% to turtle reference strain M. chelonae ATCC 35752 T were considered M. chelonae. While promising for species identification, there is no published data to support the proposed threshold and a larger sample size is needed to validate the method. Using the concatenated complete hsp65 and partial rpoB sequences, the turtle type strain M. chelonae ATCC 35752 T and human reference strain M. chelonae ATCC 19237 both had greater than 99.1% identity to the main M. chelonae group of isolates, making differentiation between the potential subspecies difficult.
As previously reported, 16S rRNA analysis did not adequately differentiate species in the M. chelonae-abscessus complex [22] (S1 and S2 Figs and S1 Table). However, similar to that stated by Ballard et al. [64], SNPs patterns of the tested isolates designated M. chelonae were the same as ATCC 19237, not the turtle type strain ATCC 35752 T (three nucleotides different), further supporting the two as subspecies of M. chelonae. The genes gyrA, gyrB, EF-Tu, RecA, and Mn-Sod did not reliably identify species or produced inaccurate phylogenetic positioning, while the ITS, partial and complete hsp65, and rpoB loci were the most discriminating and identified isolates similarly to the core genomic analysis (S3, S4, S5 and S6 Figs). Partial hsp65, complete hsp65, and rpoB sequences identified the cow and turtle isolates as M. saopaulense, while rpoB and partial hsp65 delineated H9 as M. franklinii. However, contradictory to the core genome analysis, hsp65 (partial and complete), and the rpoB phylogenies, the ITS sequences of M. salmoniphilum ATCC 13758 T and H9 (M. franklinii) were 98.1% identical, which may not differentiate the species.
Regardless of phylogenetic differences produced by hsp65 (partial and complete), partial rpoB, and the core genome, these methods can identify M. chelonae and closely related species when specified breakpoints are employed [19]. With other bacterial genera this is widely done for the16S rRNA locus where a 98.7% identity is applied as a cut-off level [65]. Breakpoints of 98.4% for partial hsp65 (441 bp), 95.4% for complete hsp65, and 97.9% for rpoB or greater will identify M. chelonae when compared to the turtle type strain M. chelonae ATCC 35752 T . Furthermore, inclusion of M. chelonae isolates from Germany and Belgium to the partial hsp65 and rpoB analyses provides additional support for these breakpoints and the representative nature of ATCC 19237 to the current clinical isolates being evaluated worldwide, potentially making it a better candidate for comparison and identification purposes. Although a breakpoint was found for hsp65, additional partial and complete sequences are needed to confirm their validity.
Examination of a 170 sequence dataset provided by the Mycobacteria/Nocardia Research Laboratory confirmed the 97.9% rpoB breakpoint differentiates M. chelonae from other closely related species, but does not agree with Adékambi et al., which found intraspecies homology was 98.3-100% for the partial rpoB [24,61,66]. This discrepancy may be the result of comparison to M. fortuitum rather than M. chelonae strains in the earlier study. Further evaluation of SNPs from the rpoB sequences separated isolates into sequevars. Translation of the sequences confirmed that gene function was likely not affected, as amino acid sequences were unchanged in all but one sequence. Identifying rpoB sequevars may be useful for epidemiologic tracking of outbreaks, but no such connection could be made from the data set.
Replacement of PRA by targeted gene sequencing is supported by findings in this study. Comparisons of the partial hsp65 PRA algorithm of Telenti et al. [18] and revised by Taylor et al. [46] and Chimera et al. [67] using in-silico digested fragments confirms the inability of PRA to differentiate species closely related to M. chelonae, likely a result of the greater discriminating power of "in-silico" analysis (1 bp) versus human interpretation of agarose gels (up to 10 bp). The fragments produced were 9-15 bp different than those derived using previously reported algorithms. For example, the PRA pattern for M. chelonae is 320/130 bp for BstEII and 200/60/55 bp for HaeIII, compared to the "in-silico" restriction pattern of 310/131 bp and 197/60/58/54 bp, respectively [46]. PRA analysis should not be used to identify mycobacteria in the M. chelonae-abscessus complex without revision of the algorithm to accommodate in-silico fragment sizes and fragments less than 60 bp in length, which were not assessed in the earlier studies that used traditional methods. Susceptibility patterns, including significant antimicrobial resistance, have been reported for Mycobacterium chelonae-abscessus isolates and a multitude of acquired resistance mechanisms exist [31,[68][69][70]. One such example is the MspA gene, which, when expressed, has shown differential resistance of M. chelonae 9917 and M. chelonae ATCC 35752 T to rifampin (rifampicin), vancomycin, ciprofloxacin, clarithromycin, erythromycin, linezolid, and tetracycline. Investigation into specific resistance genes was not pursued for this study; however, the observed variable resistance to amikacin, ciprofloxacin, moxifloxacin, trimethoprim/sulfamethoxide, imipenem, cefoxitin, and linezolid among genetically similar isolates suggests differential expression of regulatory genes. The evaluated clinical isolates exhibited multidrug resistance, but biofilm isolates had the broadest resistance patterns [30,49,69]. Regardless of their origin, 96% of M. chelonae strains were susceptible to clarithromycin. Isolate H10 was resistant to clarithromycin and a gene mutation associated with resistance is suspected.
The erm (41)  Colony morphology and phenotypic traits can aid conventional and molecular diagnostics [72,73], but as demonstrated here, rarely provide sufficient evidence for definitive identification. Most isolates produced similar raised nonpigmented colonies that were smooth to dry and flaky, and virtually impossible to distinguish without side by side observation. Exceptions were the novel pipefish and seahorse1 isolates, which produced granular rough colonies, and M. saopaulense, which turned agar brown after several days of incubation [2]. This morphologic variance supported identification of the turtle and cow isolates as M. saopaulense, not M. chelonae as originally determined.
This whole genome evaluation of environmental, non-mammalian, and mammalian M. chelonae-abscessus isolates provides insight into the diversity of isolates within the complex and similarity of M. chelonae isolates. Identification of isolate similarity throughout different sources supports the necessity to understand the intricate relationship and interactions of the bacteria with humans, animals, and the environment. Especially because the high sequence homology among isolates from different geographic locations and host origin suggest an epidemiologic link. Core genome, dDDH, and 16S rRNA sequences indicate that M. chelonae is not a homogeneous species and that the current turtle type strain ATCC 35752 T and human ATCC 19237 represent two M. chelonae subspecies. Core genome comparison was the most discriminatory method for species identification, but concatenation of the complete hsp65 and partial rpoB genes produced similar results and could be used for identification purposes.