Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Comprehensive analysis of genomic variation, pan-genome and biosynthetic potential of Corynebacterium glutamicum strains

  • Md. Shahedur Rahman ,

    Roles Conceptualization, Data curation, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing – review & editing

    ms.rahman@just.edu.bd

    Affiliations Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore, Bangladesh, Department of Genetic Engineering and Biotechnology, Bioinformatics and Microbial Biotechnology Laboratory, Jashore University of Science and Technology, Jashore, Bangladesh

  • Md. Ebrahim Khalil Shimul,

    Roles Formal analysis, Investigation, Software, Writing – original draft

    Affiliations Department of Genetic Engineering and Biotechnology, Jashore University of Science and Technology, Jashore, Bangladesh, Department of Genetic Engineering and Biotechnology, Bioinformatics and Microbial Biotechnology Laboratory, Jashore University of Science and Technology, Jashore, Bangladesh

  • Md. Anowar Khasru Parvez

    Roles Conceptualization, Methodology, Writing – review & editing

    Affiliation Department of Microbiology, Jahangirnagar University, Savar, Dhaka, Bangladesh

Abstract

Corynebacterium glutamicum is a non-pathogenic species of the Corynebacteriaceae family. It has been broadly used in industrial biotechnology for the production of valuable products. Though it is widely accepted at the industrial level, knowledge about the genomic diversity of the strains is limited. Here, we investigated the comparative genomic features of the strains and pan-genomic characteristics. We also observed phylogenetic relationships among the strains based on average nucleotide identity (ANI). We found diversity between strains at the genomic and pan-genomic levels. Less than one-third of the C. glutamicum pan-genome consists of core genes and soft-core genes. Whereas, a large number of strain-specific genes covered about half of the total pan-genome. Besides, C. glutamicum pan-genome is open and expanding, which indicates the possible addition of new gene families to the pan-genome. We also investigated the distribution of biosynthetic gene clusters (BGCs) among the strains. We discovered slight variations of BGCs at the strain level. Several BGCs with the potential to express novel bioactive secondary metabolites have been identified. Therefore, by utilizing the characteristic advantages of C. glutamicum, different strains can be potential applicants for natural drug discovery.

1. Introduction

Corynebacterium glutamicum is a gram-positive, non-sporulating, non-pathogenic, and generally recognized as safe (GRAS) organism. It remains very robust against oxygen and substrate supply oscillation in the case of large-scale fermentations [1,2]. It is one of the most used microorganisms in industrial fermentation for producing amino acids, like lysine and glutamate, for decades [3,4]. C. glutamicum has undergone substantial modification to provide a wide range of beneficial products including chemicals, proteins, polymers, natural products, and biofuels [58]. Many studies of C. glutamicum have been published in the past decade [9], yet the genetic variations among the strains are unexplored.

Whole genomes of closely related and geographically co-occurring microbial strains show enormous variation within species, resulting from allelic and gene content changes [1013]. However, it is challenging to distinguish between two lineages that are thought to be the same species yet have significantly different gene contents using conventional taxonomic approaches [1416]. Hence, a better understanding of the genomic characteristics of different C. glutamicum strains is required.

Genes for the production, control, and resistance of secondary metabolites are often grouped to create biosynthetic gene clusters (BGCs) in microbial genomes [17]. Utilization of bioinformatics tools for the analysis of microbial genome sequences reported that a single genome may include 20–80 distinct BGCs [18]. On the other hand, a microorganism may possess certain BGCs but it may not express them in laboratory conditions [19,20]. Research in this area will support wet lab methods development for natural product (NPs) producing strains that have greater potential to produce new compounds [18]. In 2017, Yang and Yang conducted a comparative analysis of C. glutamicum genomes, providing insights into the genetic diversity and evolutionary relationships within this significant industrial bacterium [21]. The research also pinpointed crucial mutations associated with amino acid production in various genetically engineered strains. However, certain limitations and challenges persist. Specifically, the pan-genome analysis was conducted with a relatively limited number of strains, potentially not encompassing the entire spectrum of species diversity. Furthermore, the identification of BGCs remains incomplete, highlighting areas for further investigation. So, it should be helpful to use functional genomic approaches to identify those unidentified BGCs at the genomic level. Therefore, the BGCs distribution and evolutionary connections among the C. glutamicum strains need to be explored. The primary aim of this study is to analyse pan-genomic variations within different strains and explore the distribution patterns of BGCs.

2. Materials and methods

2.1 Whole genome comparison

Genomic datasets of C. glutamicum strains were collected from National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/datasets, accessed on 30th May 2022). Initially, 65 complete genome sequences of C. glutamicum strains were retrieved in addition to the reference genome (In the NCBI database, C. glutamicum SCgG2 serves as the primary reference genome), all in FASTA format. The complete whole genome sequences of C. glutamicum were selected using the NCBI genome filter tool, and the assembly level was set to "complete". The choice of genomes was guided by contemporary research, emphasizing the pivotal role that high-quality genomes play in pangenome and genome mining analyses [22]. Consequently, this study excluded draft and scaffold level assemblies to ensure the integrity and reliability of the genomic data under examination. The use of complete genomes enhances the reliability and comprehensiveness of the study’s findings, contributing to a more accurate understanding of the C. glutamicum’s genetic diversity, functional capabilities, and evolutionary insights. Then, whole genome comparisons were executed using OrthoANI v0.5.0 with default parameters, which uses an enhanced pairwise average nucleotide identity (ANI) algorithm [23]. After the comparison, we selected 30 complete genomes, other 35 were discarded due to 100% similarity match. The program was also used to clear species boundaries and to get diversity at the genetic level among whole genomes (Table 1). In this way, redundancy was avoided and the genetic diversity of C. glutamicum was ensured.

thumbnail
Table 1. List of C. glutamicum strains used in this study with their metadata.

https://doi.org/10.1371/journal.pone.0299588.t001

2.2 Genome annotation

The process of locating and designating all the pertinent features on a genomic sequence is known as genome annotation [40]. Selected whole genome sequences were re-annotated using Prokka v1.14.6 with default parameters [41]. Prokka uses BLAST+ and identifies best match of annotated protein and candidate genes from various databases [41]. Prokka and FragGeneScan v1.31 were used with default parameters to identify the number of genes in each genome [42]. It uses a novel gene prediction technique and improved prediction of the protein-coding region in short reads by combining codon usages and sequencing error models in a Hidden Markov Model (HMM) [42].

2.3 Pan-genome analysis

Pan-genomic analysis was conducted utilizing Roary v3.11.2 (with default parameters), a robust computational tool specifically designed for such analyses. Roary classifies genes into distinct categories, including ’core genes’, ’cloud genes’, ’shell genes’, and ’soft-core genes’, employing a rigorous computational framework [43]. Bacterial Pan-genome Analysis tool (BPGA v1.3) [44] was employed for the systematic classification of orthologous genes into core, accessory, and unique genomes. Subsequently, strains containing a relatively higher number of unique genes were subjected to annotation using the blast algorithm against the Clusters of Orthologous Genes (COG) database [45]. To gain in-depth insights into the functional aspects of these genes, further analyses were conducted utilizing the blast algorithm against both the COG and Kyoto Encyclopedia of Genes and Genomes (KEGG) database [46]. The estimation of the pan-genome and core genome was performed using the USEARCH v11.0.667 [47] program available in BPGA, employing a 50% sequence identity cut-off. The resulting data were then subjected to nonlinear fitting based on the model extrapolation of the pan-genome and core genome, ensuring a robust and comprehensive analysis of the bacterial genomic elements under investigation [44,48].

2.4 Phylogeny

FastTree v2.1.11 (with default parameters) was used to generate phylogenetic tree, which uses the maximum-likelihood method with generalized time-reversible (GTR) models of nucleotide evolution [49]. iTOL, an online platform was used to visualize the phylogenetic tree [50].

2.5 Identification of BGCs

We used three platforms to predict BGCs, which can accurately predict microbial secondary metabolite encoding regions by using sophisticated computer model services [51]. These are namely antiSMASH 6 (https://antismash.secondarymetabolites.org/, accessed on 9th, June 2022) [52], PRISM 4 (http://prism.adapsyn.com, accessed, accessed on 28th, June 2022) [53] and BAGEL4 (http://bagel4.molgenrug.nl, accessed on 29th, June 2022) [54]. BGC boundaries in this study was detected using antiSMASH 6, a computational tool that employs several techniques. Firstly, antiSMASH determines BGC boundaries based on the physical distance to core domains within the analyzed sequences [55]. It utilizes ClusterCompare output, conducting a search of all gene products against a database comprising highly conserved enzyme Hidden Markov Model (HMM) profiles indicative of specific BGC types [56]. The tool applies pre-defined cluster rules to identify individual protoclusters encoded in the genomic region. To standardize gene locations, antiSMASH employs a reference genome as a common coordinate system, allowing for the normalization of gene positions. Additionally, antiSMASH maps genomes of other strains containing the same or similar BGCs to the reference genome through alignment tools. This enables the identification and comparison of genomic regions corresponding to the BGCs across different strains in relation to the reference genome [52]. PRISM 4 predicts BGCs by analysing open reading frames from various databases [53]. BAGEL4 identifies ribosomally synthesized and post-translationally modified peptides (RiPPs), and Bacteriocin. It discovers gene clusters by using peptide database and/or through HMM motifs that are present in relevant contextual genes, augmented with literature references and links to UniProt and NCBI [54].

2.6 Genomic analysis and single nucleotide polymorphism identification

Genome comparisons among C. glutamicum strains were conducted using BLAST Ring Image Generator (BRIG-0.95-dist) with default settings. BRIG plays a pivotal role in facilitating the assessment of genotypic distinctions within closely related prokaryotic organisms [57]. In this study, we utilized the Mauve genome alignment system to analyze C. glutamicum strains [58]. Throughout evolution, microbial genomes can experience substantial mutations, including rearrangements and lateral transfers, leading to notable differences in gene order and content among closely related organisms. Mauve, a powerful tool, was employed to identify these events, enabling comprehensive comparisons of multiple microbial genomes, even in the presence of high recombination rates. The Mauve system was configured with default settings, employing deed weight, full alignment, and iterative refinement techniques.

In our study, we utilized single nucleotide polymorphism (SNP) analysis as a methodology to discern genetic variations within the strains of C. glutamicum. The identification of variants among these strains was conducted through the implementation of Snippy v4.6.0 [59], with the reference sequence being C. glutamicum SCgG2. Notably, the prediction of Core SNPs was an additional aspect addressed in our analysis, employing the same Snippy tool for this specific task. This comprehensive approach allowed for a detailed exploration of genetic diversity and core variations within the C. glutamicum strains under investigation.

2.7 Identification of horizontal gene transfer

The prediction of horizontally transferred genes was carried out using HGTector v2.0b3 (with default settings) [60]. The analysis focused on identifying horizontal gene transfer (HGT) events within C. glutamicum AJ1511 and C. glutamicum AR1 genomes. A search was conducted utilizing the default remote database with stringent criteria, requiring a minimum identity and coverage of greater than 50%. The analysis was executed with default parameters to ensure comprehensive and accurate detection of potential HGT events in the studied strains.

2.8 Pathogenic and non-pathogenic properties and plasmid typing

The prediction of pathogenicity for the chosen strains was carried out using the PathogenFinder web tool [61]. This tool employs a predictive model that considers both the probability score and the resemblance to known pathogenic species in order to assess the likelihood of pathogenicity.

The plasmid sequences were obtained from the NCBI database. To ascertain the classification of plasmids, Plasmid Multi-Locus Sequence Typing (Plasmid MLST) was employed. Plasmid MLST is a molecular typing method that analyzes specific genetic markers across plasmids, providing insights into their type and lineage. This approach aids in categorizing plasmids based on their sequence diversity and assists in understanding the genetic variation and relationships among different plasmid strains.

3. Results

Demographic information about the strains used in this study are listed in Table 1. Among the strains, 11 were isolated from soil and others were isolated from air, mucus, rotten onion and lab strains. Among them 20 strains were isolated from Asian countries, 2 strains were isolated from Germany, and 1 strain from Portugal and United States of America each. Others origin are unknown.

3.1 Whole genome comparison

The degree of relatedness in the studied strains were identified by calculating ANI. ANI also clarifies whether the genomes reside in the same species by a cut-off values of ≥ 95% for same species. Our studied genomes have shown higher than 97% ANI values, confirming that all the genomes of the strains belong to the same species (S1 Table). A heat-map generated from the ANI scores have shown (Fig 1). There are five sub-groups in the heat-map and can be called as five clades. The clades were extracted from pairwise ANIs by using a hierarchical clustering algorithm with a cut-off value of 0.5. This means that strains with ANI values higher than 0.5 were grouped together in the same clade. The clades do not seem to have a strong correlation with the source and geographic location of the genomes. For example, clade 1 contains strains from soil and mucus sources, and from China and USA locations. Clade 2 contains strains from soil sources, and from Germany, China, South Korea, and Portugal locations. Clade 3 contains strains from soil, air and lab sources, and from China and Japan locations. Clade 4 contains strains from soil and rotten onion sources, and from China and South Korea locations. Clade 5 contains strains from soil sources, and from South Korea locations. Strains belonging to clade 1 (R, SCgG1, SCgG2) exhibit big genome size, a notable presence of multiple copies of NAPAA biosynthetic gene clusters (BGCs) and concurrently possess betalactone BGCs. This characteristic occurrence may contribute to their distinctiveness as outliers within the broader spectrum of analyzed genomes.

thumbnail
Fig 1.

(A) ANI based whole genome comparison of C. glutamicum strains. The linkage method was average linkage, which calculates the average distance between all pairs of points in two clusters. The distance metric was Euclidean distance, which measures the straight-line distance between two points in a multidimensional space. The distance threshold was 0.5, which means that clusters with a distance less than or equal to 0.5 were merged together. This resulted in five clades, as shown by the horizontal dashed line in the plot. (B) ANI comparisons conducted among strains isolated from soil environments and strains isolated from both soil and non-soil environments within C. glutamicum.

https://doi.org/10.1371/journal.pone.0299588.g001

Fig 1B represents the ANI comparisons among various C. glutamicum strains isolated from different sources, including soil and non-soil environments. The ANI values revealed significant insights into the genetic relationships among these strains, shedding light on the impact of isolation sources on their genetic similarity. When we performed a more detailed ANI analysis, we observed that the strains isolated from soil environments, such as C. glutamicum XV, C. glutamicum ZL, and C. glutamicum YI, exhibited ANI values close to 98%, indicating a high genetic similarity. This suggests a common genetic background among these soil-isolated strains. On the other hand, when comparing soil-isolated strains with those from non-soil sources, the ANI values were notably lower, hovering around 97%. This discrepancy underscores the genetic divergence between strains from soil and non-soil origins. Such divergence could potentially be attributed to environmental factors and selective pressures specific to these habitats, leading to genetic adaptations unique to each niche.

3.2 Comparative genomic features of C. glutamicum strains

The average genome size of C. glutamicum strains was 3.24 Mbp (ranging from 2.84 Mbp to 3.36 Mbp) (Fig 2A and S2 Table). Coding sequence (CDS) count was predicted with the highest 3200 CDS to lowest 2610 CDS with a mean of 3007 CDS among the whole genomes (Fig 2A and S2 Table). The average GC content was 54.15% among the genomes, and the approximate number of tRNA genes ranged from 57 to 65, while the predicted rRNA genes were 18 among all 28 strains excluding strain TCCC11822 and strain YI having 15 rRNA genes (Fig 2B and S2 Table).

thumbnail
Fig 2. Overview of genomic features.

(A) Genome size and CDS. (B) Genomic features (tRNA, rRNA and GC content (%)). (C) Gene number.

https://doi.org/10.1371/journal.pone.0299588.g002

The gene count, as determined by Prokka, displayed a range of 2688 to 3281 genes, with a mean value of 3078 genes per genome. In contrast, gene predictions through FragGeneScan exhibited a range of 2778 to 3369 genes, yielding a mean of 3197 genes per genome. It is noteworthy that Prokka’s predictions resulted in a comparatively lower gene count than those obtained via FragGeneScan (Fig 2C and S2 Table).

3.3 Pan-genome analysis

Roary analysis predicted total 6854 protein-coding gene sequences. The number of core genes was 29% (99% < = strains < = 100%), the number of soft core genes was 2.86% (95% < = strains < 99%), the number of shell genes was 20.27% (15% < = strains < 95%), and the number of cloud genes was 47.78% (0% < = strains < 15%) (Fig 3A).

thumbnail
Fig 3. Pan-genome analysis of C. glutamicum.

(A) The number of core genes, soft core genes, shell genes and cloud genes in the pan genome. (B) Gene frequency versus genomes number. (C) The pan genome profile trends obtained using BPGA v1.3. (D) Genomic G+C content (%) and accessory gene counts in various C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.g003

The high number of cloud genes exhibit significant variation and shows the ’open’ nature of the C. glutamicum pan-genome (Fig 3B). The pan-genome of C. glutamicum was analysed using an empirical power law regression function based on the Allometric1 model (f(x) = 3059.17x0.136303). The obtained parameter exponent (0.136303), falling between 0 and 1 and indicates that the pan-genome grows more slowly than other bacteria (possibly due to slower genetic diversification), but will grow indefinitely nonetheless (Fig 3C). In the context of Heaps’ law, an ’open’ pan-genome suggests the presence of a substantial and indeterminate number of additional genes, with its size potentially increasing boundlessly as more strains are included in the analysis [6264]. C. glutamicum strains TQ2223, ATCC 13032, and HA exhibit a relatively low GC content coupled with a notable abundance of accessory genes (883, 925, and 924, respectively). Among these strains, C. glutamicum SCgG2 displays the lowest GC content and concurrently possesses the highest number of accessory genes (Fig 3D). In Fig 4A and 4B, the distribution of COG and KEGG categories for core, accessory, and unique genes is illustrated. Fig 4C displays the phylogenetic relationships among C. glutamicum strains based on core genes.

thumbnail
Fig 4.

(A) COG distribution of core, accessory and unique genes. (B) KEGG distribution of core, accessory and unique genes. (C) Phylogenetic analysis of C. glutamicum strains based on core genes.

https://doi.org/10.1371/journal.pone.0299588.g004

The core genome is primarily associated with essential biological functions such as amino acid transport and metabolism, translation, ribosomal structure and biogenesis, transcription, carbohydrate transport and metabolism, inorganic ion transport and metabolism, and post-translational modification, protein turnover, and chaperones. Simultaneously, the number of unique genes within C. glutamicum genomes varied significantly, indicating individual differences and a relatively high level of genomic diversity. This variability suggests their potential adaptation to diverse and extreme environments. Furthermore, KEGG pathway analysis revealed that these unique genes are involved in various biological processes related to metabolism, environmental information processing, and cellular processes.

3.4 Diversity and abundance of potential BGCs

AntiSMASH prediction identified six different classes of BGCs among the whole genomes. Identified BGCs include terpene, non-alpha poly-amino acids like e-polylysin (NAPAA), Betalactone, type 1 polyketide synthase (T1PKS), other unspecified ribosomally synthesized and post-translationally modified peptide product cluster (RiPP-like), and lanthipeptide class IV. Terpene synthesis BGCs were the most abundant BGCs in the genomes. NAPAA and T1PKS were the second most abundant BGCs, and collectively these 3 BGCS (Terpene, NAPAA, and T1PKS) comprised over 87% of the BGCs among the 30 strains of C. glutamicum (Fig 5 and S3 Table).

thumbnail
Fig 5. BGCs among C. glutamicum strains.

(A) Distribution of different classes of BGCs among C. glutamicum strains. (B) BGCs frequency per genome. (C) Different classes of BGCs occurrence in the genomes. (D) BGCs of C. glutamicum AR1.

https://doi.org/10.1371/journal.pone.0299588.g005

The strains harbour 2 to 6 BGCs and maximum 21 strains harbour 4 BGCs. The highest 6 BGCs were found in strain B253 and the lowest 2 BGCs in strain C1. Besides, 6 strains which are strain BE, 14067, YI, R, SCgG1, and SCgG2 contain 5 BGCs in their genomes (Fig 5B and 5C). Betalactone and lanthipeptide class IV BGCs were the most common BGCs. Betalactone BGCs were predicted in strain R, B253, SCgG1, and SCgG2, while lanthipeptide class IV BGCs were only found in strain B253 (Fig 5C and S4 Table). C. glutamicum strains have also been found harbouring double copies of same BGCs class in 20 strains. Terpene class BGCs were observed to be duplicated in up to 13 strains, exemplified by strain AR1, which possesses two distinct terpene class BGCs (identified as CGLAR1_11505 and CGLAR1_03580). Both clusters encode phytoene synthase, yet their gene products differ in size, with lengths of 287 and 304 amino acids, respectively (Fig 5D). In contrast, NAPAA class BGCs were identified as duplicated entities in 7 strains, as illustrated in Fig 5C and detailed in S4 Table. The antiSMASH analysis of C. glutamicum strains identified terpene BGCs across different clades, demonstrating considerable diversity in encoded compounds. In Clade 1, strains like CGMCC1.15647, USDA-ARS-USMARC-56828, R, SCgG2, and SCgG1 produced phytoene/squalene synthase. CGMCC1.15647 exhibited multiple copies of phytoene/squalene synthase genes in different genome regions, indicating intra-strain variability. Clade 2 strains, including ATCC_13032, HA, TQ2223, MB001, CR101, and BCA, showed varying similarity scores for phytoene synthase, highlighting potential enzyme differences. Clade 3 strains, except C1, produced phytoene synthase, with variations seen in strains JH41 and B414. ATCC_21573 produced phytoene/squalene synthase. These findings suggest nuanced terpene production within clades, with variability in gene length and location across strains, underscoring the intricate diversity in C. glutamicum terpene biosynthesis (Table 2).

thumbnail
Table 2. Predicted terpene BGCs in different clades of C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.t002

The analysis conducted using BAGEL4 showcased the antimicrobial capabilities within C. glutamicum strains. Clade 1 and clade 2 strains were devoid of identifiable specific bacteriocin BGCs. Clade 3 and clade 4 strains exhibited a shared putative bacteriocin BGC named "Lactococcin_972," indicating potential similar antimicrobial characteristics. Conversely, Clade 5, akin to clade 1 and 2, did not demonstrate distinct bacteriocin BGCs (Table 3).

thumbnail
Table 3. Predicted bacteriocin BGC in C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.t003

Correlation between BGCs number with genome size and total gene count indicates a moderate positive correlation (R2 = 0.349 and R2 = 0.358 respectively) (Fig 6). The diversity of BGCs among the strains with phylogenetic relationship were shown in five clades (Fig 7).

thumbnail
Fig 6. BGCs distribution in C. glutamicum strains.

(A) Correlation of BGCs and genome size. (B) Correlation of BGCs and gene number.

https://doi.org/10.1371/journal.pone.0299588.g006

thumbnail
Fig 7. Major classes of BGCs in the genomes of C. glutamicum strains with phylogenetic distribution.

These BGC classes are categorized into five clades, each delineated based on their specific biosynthetic gene content.

https://doi.org/10.1371/journal.pone.0299588.g007

Additionally, strain B253, R, SCgG1, and SCgG2 contain hybrid BGCs. All four strains contained hybrid BGCs comprised with NAPAA and betalactone. But the locations of NAPAA-betalactone hybrid BGCs are different in the genomes. The locations are 256574–294301 base pairs in strain B253, 334064–369207 base pairs in strain R, 319,462–354,607 base pairs in strain SCgG1, and 319,463–354,608 base pairs in SCgG2 (Fig 8).

thumbnail
Fig 8. Hybrid BGCs structure C. glutamicum strains.

Hybrid BGCs in four strains harbour same structure of NAPAA-betalactone. The different locations of NAPAA-betalactone are, 256574–294301 base pairs in strain B253, 334064–369207 base pairs in strain R, 319,462–354,607 base pairs in strain SCgG1, and 319,463–354,608 base pairs in SCgG2.

https://doi.org/10.1371/journal.pone.0299588.g008

PRISM 4 identified 4 major classes of BGCs which were polyketide, nonribosomal peptide, dehydratase, class II/III confident bacteriocin. Polyketide and nonribosomal peptide BGCs were present in all strains, while dehydratase were found in 21 strains and class II/III confident bacteriocin were found in 12 strains of C. glutamicum (S5 Table).

Besides, genome mining by BAGEL4 revealed bacteriocin coding clusters among 12 strains (S5 Table). Our identified BGCs from different online platform for each strain is listed in Table 4.

thumbnail
Table 4. Different hits of BGCs from different genome mining tools using C. glutamicum genomes.

https://doi.org/10.1371/journal.pone.0299588.t004

3.5 Genomic and SNP analysis

In this study, we employed BRIG-0.95 for comprehensive genome comparisons among various strains of C. glutamicum. The reference genome, C. glutamicum SCgG2, was utilized as a baseline for these comparisons. Notably, a substantial portion of genes present in SCgG2 were found to be shared by the other strains, indicating a core genomic similarity among these strains.

However, a detailed examination of the genomic alignments revealed significant disparities between SCgG2 and other strains, as denoted by white gaps in Fig 9. These gaps signify regions where genes were absent in certain strains, indicating potential genetic variations. Such discrepancies could be attributed to the integration of mobile genetic elements, horizontal gene transfer events, or recombination phenomena. These mechanisms are known to drive genetic diversification in bacterial populations, leading to the acquisition or loss of specific genes over evolutionary time. The identification of these genomic variances underscores the dynamic nature of C. glutamicum genomes and highlights the genomic plasticity within this bacterial species.

thumbnail
Fig 9. BRIG Diagram illustrating homologous chromosome segments of C. glutamicum strains using strain SCgG2 as the reference genome.

https://doi.org/10.1371/journal.pone.0299588.g009

Fig 10 illustrates the output from pairwise whole-genome Mauve alignments, confirming the presence of significant structural variations among the genomes of the analysed strains. In each comparison, matching coloured blocks and connecting lines delineate homologous genome sections between the compared pairs. Notably, strains TCCC11822, TQ2223, BCA, CR101, HA, and ATCC 21573 exhibited the most significant variations, indicating diverse genomic structures within these strains. These visual cues provide insights into the shared genomic regions and structural differences between the analysed strains.

thumbnail
Fig 10. The pairwise whole-genome Mauve alignment analysis revealed substantial structural variations within the circular chromosomes of C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.g010

SNPs analysis within the various C. glutamicum strains provided valuable insights into the genetic diversity of these strains. Table 5 presents a comprehensive analysis of genetic variants among various strains of C. glutamicum. Notably, C. glutamicum USDA-ARS-USMARC-56828 exhibited the highest number of total variants (41270), characterized by substantial counts in complex variants (7908) and SNPs (32960). This strain displayed a significant divergence compared to others. Conversely, strains like C. glutamicum SCgG1 showed minimal variants, with only 28 total variants. Several strains, such as C. glutamicum R, displayed a relatively low total variant count (20433) and a notable prevalence of deletions (141) and insertions (130). These findings underscore the genetic diversity within C. glutamicum strains, with certain strains exhibiting distinctive patterns of variation, potentially influencing their biological characteristics. The presence of unique SNPs in each strain suggests specific genomic changes, potentially influencing their functional attributes and ecological roles.

The phylogenetic tree, as illustrated in Fig 11 based on Core SNPs analysis, delineates the evolutionary relationships among the C. glutamicum strains. The tree is rooted with a reference strain (SCgG2). Noteworthy patterns emerge, revealing distinct clusters and branches that denote genetic proximity. For instance, strains like AJ1511, WM001, and TCCC11822 form a cluster, suggesting a shared genetic ancestry. Similarly, ZL-6 and ATCC 21799 exhibit close genetic relatedness. The tree also portrays a bifurcation between B253 and its cluster, including BE, ATCC 14067, and YI, reflecting their divergence. Further branching showcases the genetic relationships among diverse strains, emphasizing the intricate evolutionary dynamics within the C. glutamicum species. The placement of the reference strain in the analysis enables a comparative understanding of genetic variations, highlighting its pivotal role in contextualizing the evolutionary history of the examined strains. Overall, the phylogenetic tree provides a visual representation of the genetic distances and relationships, offering valuable insights into the evolutionary landscape of C. glutamicum.

thumbnail
Fig 11. Visualizing the phylogeny of C. glutamicum strains based on core SNP genes.

https://doi.org/10.1371/journal.pone.0299588.g011

3.6 Horizontal gene transfer

Utilizing the HGTector tool, an exhaustive analysis was performed, revealing a substantial number of HGT events within the genomes of C. glutamicum strains. Specifically, in the AJ1511 strain, 684 distinct HGT events were identified from a dataset of 3014 predicted proteins. These events were predominantly sourced from Actinomycetes (71%) and to a lesser extent, Micrococcales (21%). Similarly, in the AR1 strain, a total of 237 genes were predicted to have undergone HGT events out of 2759 proteins analysed. Notably, the majority of these events were attributed to Actinomycetes (73%), with a smaller fraction originating from Micrococcales (23%) as illustrated in Fig 12. Considering the prevalence of HGT events in AJ1511 and AR1, it is likely that other C. glutamicum strains, would reveal a mosaic of genetic origins. The genomic plasticity observed in these two strains is indicative of the adaptive strategies employed by C. glutamicum populations, emphasizing the role of HGT in shaping their genetic repertoire.

thumbnail
Fig 12. HGT events in C. glutamicum AJ1511 and AR1 Strains.

(A) Scatter plot illustrating horizontally transferred genes in AJ1511 (Colour dots represents horizontally transferred genes and colourless dots represents native genes). (B) Distribution of donor organisms and the corresponding number of genes transferred in AJ1511. (C) Scatter plot showcasing horizontally transferred genes in AR1 (Colour dots represents horizontally transferred genes and colourless dots represents native genes). (D) Distribution of donor organisms and the corresponding number of genes transferred in AR1. (In the scatter plots, coloured dots represent genes transferred through HGT).

https://doi.org/10.1371/journal.pone.0299588.g012

3.7 Pathogenicity, virulence properties and plasmid analysis

The investigation revealed that none of the strains belonging to C. glutamicum exhibited characteristics indicative of human pathogenicity. A detailed presentation of these findings is encapsulated in the Table 6. This underscores the non-pathogenic nature of the examined C. glutamicum strains concerning human health. It is noteworthy that non-pathogenic bacteria lack the genetic elements associated with virulence, thereby affirming their incapacity to induce infections or diseases in humans.

thumbnail
Table 6. Pathogenicity prediction results for various C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.t006

The plasmid analysis across different strains of C. glutamicum revealed diverse characteristics. Strains CP, XV, B253, USDA-ARS-USMARC-56828, AR1, ATCC_21831, and ATCC_13869 were found to harbor IncA/C type plasmids, with varying lengths and GC content (Table 7). Notably, strains R and CGMCC1.15647 exhibited distinct plasmid types, namely IncI1 and IncHI1, respectively, and displayed substantial variations in plasmid sizes. The gene content of these plasmids varied among strains, encompassing differences in coding sequences (CDs), pseudo genes, CRISPR arrays, rRNAs, tRNAs, ncRNA, and frameshifted genes. Among the strains analyzed, 10 were reported to carry single plasmids, while C. glutamicum CGMCC1.15647 was unique with two plasmids. The prevalent IncA/C type plasmid, found in the majority of strains, is known for its role in modulating changes to bacterial host chromosomes. In contrast, C. glutamicum R carries an IncI1 type plasmid, responsible for encoding sex pili in bacteria. IncHI1 type plasmid is associated with antibiotic resistance. This comprehensive analysis underscores the diversity and functional significance of plasmids in C. glutamicum strains.

thumbnail
Table 7. Plasmid characteristics in different C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.t007

4. Discussion

Whole genome comparison by ANI calculation revealed high degree of relatedness between C. glutamicum strains. ANI computation with a higher than 97% score verifies that our studied genomes belong to the same species and are closely related. ANI comparison between Corynebacterium cystitidis strains showed a 95.1% score when isolated from the different hosts but showed a >99% score when isolated from the same host [65]. Our demographic data also support a >97% score since most of our strains are from soil sources.

The average genome size (3.24 Mbp) of the strains was slightly high, compared with non-pathogenic C. casei LMG S-19264 (3.11 Mbp) and C. efficiens YS-314 (3.15 Mbp) [66]. Moreover, the average number of genes (3197) was also higher than C. casei LMG S-19264 (2872) and C. efficiens YS-314 (3064) [66]. On the other hand, the average GC content was lower among C. glutamicum strains (54.15%) than other non-pathogenic C. variabile DSM 44702 (76.1%) and C. efficiens YS-314 (69.93%) [67]. We found variation in tRNA coding genes among the C. glutamicum strains, since the tRNA genes varied from 57 to 65 among the strains. Whereas, C. variabile DSM 44702 and C. efficiens YS-314 contains 59 and 56 tRNA genes respectively [67]. Additionally, C. glutamicum strains possess more rRNA genes (15–18 rRNA genes) compared with other non-pathogenic Brevibacterium auranticum strains and Brevibacterium linens ATCC 19391 (12 rRNA genes) [68]. Besides, the average CDS among the strains was 3007, comparatively higher than C. efficiens YS-314 (2950 CDS) [69].

Pan-genome study of Corynebacterium at genus level showed very low number of core genes [66,67]. Analysis between 51 strains of various pathogenic and non-pathogenic species of Corynebacterium genus showed 8.69% of core genes [66]. Similarly, study of eleven Corynebacterium species showed 6.68% of core genes [67]. Contrary to genus level, we found core genes of 29.1% at sub-species level among C. glutamicum strains, which is somewhat higher than C. pseudotuberculosis core genes (26.1%) at sub-species level [70]. The number of cloud genes (strain-specific genes) was considerably large and covered 47.78% of the pan-genome, similar to C. pseudotuberculosis cloud genes (42.34%) [70]. The low percentage of core genes in C. glutamicum species likely results from a combination of factors such as horizontal gene transfer, adaptation to diverse environments, evolutionary divergence, and specialization. From an evolutionary perspective, this genetic diversity contributes to the species’ ability to adapt, survive, and thrive in different ecological niches. Which strongly demonstrates the diversity among the strains. Large accessory genomes and a high number of strain-specific genes are frequently linked to horizontal gene transfer (HGT) in microorganisms [71]. Besides, we found low GC content of C. glutamicum strains with other non-pathogenic species of Corynebacterium genus. Our study also suggests a clear inverse relation between the abundance of accessory genes and the genomic GC content. Specifically, as the GC percentage increases, there is a notable decrease in the number of accessory genes observed. This finding supports the idea of possible relation of low GC content with horizontal gene transfer and codon reassignment of C. glutamicum [7275].

Our study shows the open nature of the C. glutamicum pan-genome, which indicates that new gene families continuously will be added to the pan-genome. The open pan-genome of Corynebacterium at genus level was also reported by the pan-genomic analysis of 40 strains of eleven different Corynebacterium species [76]. Thus, the pan-genome of C. glutamicum indicates the diversity of the gene pool and the likeliness of increasing gene number.

Another objective of our study was to uncover the diversity and distribution of BGCs among the strains. Although BGCs producing metabolic products remained undocumented, predictions based on bioinformatics revealed that several of them might encode products with unique structures [7779]. Thus, our computational approaches were to predict BGCs as a screening process for new bioactive compound production, which are to be effectively applied in the wet laboratories.

NAPAA of Nonribosomal peptide synthetases (NRPs) gene clusters and T1PKS of Polyketide synthases (PKSs) gene clusters were found in all the studied strains. Additionally, Terpene BGCs were found in 96.67% strains. T1PKS, Terpene, NAPAA and other NRPs were also most common in Gordonia hongkongensis EUFUS-Z298 [80], Burkholderia spp. [18], in activated sludge microbiome [81], and in Ktedonobacteria [82]. NAPAA, particularly e-poly-lysine, demonstrate notable antimicrobial efficacy, showcasing widespread utility in the food and pharmaceutical sectors. Conversely, T1PKS harbor the capability to biosynthesize peptides with antibiotic and antitumor properties. Terpenoids exhibit robust and specific biological activities, notably against diseases such as cancer and malaria. The consistency of limited number of BGCs among closely related bacterial population was previously reported [83]. Which indicates that BGCs ‘fixation’ can be occurred as a strong positive selection and to survive specific environment by the activity of encoded products [17]. The novel BCGs identified from the strains used for analysis include betalactone and lanthipeptide class IV BGCs. Betalactone BGCs were predicted in strain R, B253, SCgG1, and SCgG2, while lanthipeptide class IV BGCs were only found in strain B253. Betalactones manifest noteworthy bioactivity against bacteria, fungi, and cancer cell lines. Lanthipeptides, belonging to the subclass of ribosomally-synthesized and posttranslationally-modified peptides (RiPPs), generally display feeble antibacterial activities, with Lenthipeptide-class-IV standing out as a noteworthy example. A study of Bacillus cereus strains identified different lanthipeptide classes, and concluded that several lanthipeptide classes can evolve independently, and most of the lanthipeptide BGCs can originated from intra-species horizontal gene transfer [84].

Additionally, PKS and NRPs BGCs which were most common in our studied genomes, are considered as representatives of two major classes of antibiotics [80]. Kalimantacin antibiotics with strong antistaphylococcal effect, from Alcaligenes species YL-02632S [85,86] and antibiotic batumin from Pseudomonas batumici have been produced utilizing these BGCs [87]. C. glutamicum is suitable for T1PKS and NRPs synthesis by heterologous expression since it possesses endogenous 4’-phosphopantetheinyl transferase (PPTase), PptAcg [88]. Roseoflavin, a broad-spectrum antibiotic was already produced using C. glutamicum via the heterologous expression of its BGCs [89]. We also found bacteriocin gene clusters in 12 strains of C. glutamicum. Bacteriocins have been seen as a feasible alternative to traditional antibiotics because of their distinct antibacterial processes. Besides, it can be used as innovative carrier molecules [90] and also as plant growth-promoting agent, antiviral agent, and anti-cancer agents [91].

Whole genome comparison based on ANI scores also revealed the phylogenetic relationship among the strains. We divided all 30 strains into five clades. Clade 1 with five strains, clade 2 with seven strains, clade 3 with eight strains, clade 4 with four strains, and clade 5 with six strains. We have seen diversity of the BGCs among clade 1, clade 2, and clade 4. Whereas members of clade 3 and clade 5 contain the same number of BGCs, but these two clades harbour different BGCs. We observed similar BGCs among the soil isolated strain CICC10064, B414, and TQ2223. Similarly, soil isolated strain XV, ZL-6, YI, TCCC11822, ATCC 13869, and WM001 have similar BGCs, where strain YI have gained extra NAPAA class. Soil isolated strain SCgG1 and ScgG2 have similar BGCs class with betalactone. On the other hand, strain C1, which is an engineered derivative of ATCC 13032 have lost double Terpene BGCs.

Additionally, we identified NAPAA-betalactone hybrid BGCs among strain B253, R, SCgG1 and SCgG2. Hybrid BGCs encodes genes that are responsible for multiple scaffold-synthesizing enzymes [92,93]. Occurrence of hybrid BGCs are common for some bacteria (98% occurrence in Streptomyces) [94], yet the exact roles of hybrid BGCs are not completely known [95,96]. It is noteworthy, that the specific locations of these hybrid BGCs within the genomes of these strains exhibit variation, as illustrated in Fig 8. This disparity implies that these hybrid BGCs might have undergone acquisition or rearrangement through horizontal gene transfer or recombination events, thereby contributing to genomic diversity across the strains. Consequently, our assertion of identifying hybrid BGCs is rooted in their gene content and functional characteristics, rather than their precise physical placement within the genomes.

We found that the number of BGCs is positively correlated with the genome size and the gene number of the strains. Strain SCgG1, ScgG2, BE, YI, 14067, and strain R with larger genome size and with high number of genes, each harbouring 5 BGCs in their genomes. Though, strain CGMCC1.15647 with the highest gene number and the largest genome size contains 4 BGCs. Thus, our correlation regression analysis shows that if the genome size and the gene number increase, the number of BGCs is more likely to increase. Generally, strains with larger genomes tend to exhibit a higher number of BGCs, a phenomenon attributed to the potential accumulation of accessory genes and genomic islands carrying BGCs [97].

The potential presence of sequencing errors within publicly available databases remains a notable concern. Only complete genome sequences of C. glutamicum strains, which enhance the reliability and comprehensiveness of the study’s findings, were considered, addressing the potential presence of sequencing errors within publicly available databases. Prokka and FragGeneScan were employed for genome annotation and gene prediction, representing widely used and validated tools for prokaryotic genomes. To ensure robust genome comparison and species delineation, OrthoANI, a pairwise average nucleotide identity (ANI) algorithm, more robust and accurate than traditional methods, was utilized. The pan-genome analysis employed Roary, BPGA, and USEARCH, utilizing rigorous computational frameworks and sequence identity cut-offs for gene classification and estimation. This comprehensive approach aimed to mitigate concerns related to sequencing errors, enhance reliability, and employ validated tools for effective genome annotation and pan-genome analysis in the study of C. glutamicum strains. Nevertheless, our investigation has revealed discernible diversity across various genomic features among the strains, along with variations in the abundance of biosynthetic gene clusters (BGCs) within their genomes. Virulence genes are pivotal elements that contribute to the pathogenicity of microorganisms, enabling them to induce diseases. In contrast, BGCs are typically responsible for encoding enzymes and proteins involved in synthesizing specific secondary metabolites, such as T1PKS, Terpene, NAPAA, betalactone, and lanthipeptide. The connection between BGCs and virulence is diverse, as certain secondary metabolites produced by BGCs can influence the virulence of microorganisms.

However, in our investigation, no identifiable secondary metabolites produced by BGCs were associated with virulence. Remarkably, all examined strains were found to be non-pathogenic. This suggests that there might be an absence of virulence genes located within the BGCs of these strains. The collective non-pathogenic nature of the strains reinforces the notion that the BGCs under scrutiny may not harbor genes contributing to virulence, further emphasizing the safety profile of these microorganisms in the context of human health.

While our study successfully identified numerous distinct polymorphic sites among the strains under investigation, it is crucial to acknowledge a limitation. The specific interaction or overlap between these polymorphic sites and BGCs in C. glutamicum has not been thoroughly explored within the scope of our research. This unexplored aspect represents a noteworthy limitation, suggesting a promising avenue for future investigation.

In all, we can say that strains of C. glutamicum can be a good candidate for engineering to produce various novel compound through BGCs expression. Also the strain may have potential to produce antibiotic, plant growth promoting agent, antiviral agent and anti-cancer agent.

5. Conclusions

Our objectives of the study were to elucidate the genetic variation, pan-genomic characteristics, and distribution of BGCs among 30 strains of C. glutamicum. We observed genetic variation and diversity in the BGCs distribution. Pan-genomic study of C. glutamicum strains revealed diversity at the sub-species level. We found a large number of strain-specific genes and the open nature of the C. glutamicum pan-genome. This study has yielded valuable insights into previously unexplored biosynthetic gene clusters (BGCs) that play a role in the production of betalactones, lanthipeptides, and NAPAA-betalactone hybrids. Thus, we conclude that various strains of C. glutamicum should be on focus for the discovery of natural drugs at the industrial level.

Supporting information

S1 Table. Average Nucleotide Identity (ANI) values for C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.s001

(XLSX)

S2 Table. Genomic characteristics of various C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.s002

(XLSX)

S3 Table. Biosynthetic Gene Clusters (BGCs) and corresponding hit counts.

https://doi.org/10.1371/journal.pone.0299588.s003

(XLSX)

S4 Table. BGCs distribution across C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.s004

(XLSX)

S5 Table. BGCs and genomic characteristics comparison of various C. glutamicum strains.

https://doi.org/10.1371/journal.pone.0299588.s005

(XLSX)

References

  1. 1. Buchholz J., et al., CO2/HCO3− perturbations of simulated large scale gradients in a scale-down device cause fast transcriptional responses in Corynebacterium glutamicum. Applied microbiology and biotechnology, 2014. 98(20): p. 8563–8572.
  2. 2. Käß F., et al., Assessment of robustness against dissolved oxygen/substrate oscillations for C. glutamicum DM1933 in two-compartment bioreactor. Bioprocess and biosystems engineering, 2014. 37(6): p. 1151–1162. pmid:24218302
  3. 3. Krämer R., Secretion of amino acids by bacteria: physiology and mechanism. FEMS Microbiology Reviews, 1994. 13(1): p. 75–93.
  4. 4. Hermann T., Industrial production of amino acids by coryneform bacteria. Journal of biotechnology, 2003. 104(1–3): p. 155–172. pmid:12948636
  5. 5. Becker J., Rohles C.M., and Wittmann C., Metabolically engineered Corynebacterium glutamicum for bio-based production of chemicals, fuels, materials, and healthcare products. Metabolic engineering, 2018. 50: p. 122–141. pmid:30031852
  6. 6. Shanmugam S., et al., High-efficient production of biobutanol by a novel Clostridium sp. strain WST with uncontrolled pH strategy. Bioresource technology, 2018. 256: p. 543–547. pmid:29486913
  7. 7. Shanmugam S., et al., Enhanced bioconversion of hemicellulosic biomass by microbial consortium for biobutanol production with bioaugmentation strategy. Bioresource technology, 2019. 279: p. 149–155. pmid:30716607
  8. 8. Wendisch V.F., Mindt M., and Pérez-García F., Biotechnological production of mono-and diamines using bacteria: recent progress, applications, and perspectives. Applied microbiology and biotechnology, 2018. 102(8): p. 3583–3594. pmid:29520601
  9. 9. Lee J.-Y., et al., The actinobacterium Corynebacterium glutamicum, an industrial workhorse. 2016.
  10. 10. Croucher N.J., et al., Diversification of bacterial genome content through distinct mechanisms over different timescales. Nature communications, 2014. 5(1): p. 1–12. pmid:25407023
  11. 11. Zhu A., et al., Inter-individual differences in the gene content of human gut bacterial species. Genome biology, 2015. 16(1): p. 1–13.
  12. 12. Levade I., et al., Vibrio cholerae genomic diversity within and between patients. Microbial genomics, 2017. 3(12). pmid:29306353
  13. 13. Chang Q., et al., Genomic epidemiology of meticillin-resistant Staphylococcus aureus ST22 widespread in communities of the Gaza Strip, 2009. Eurosurveillance, 2018. 23(34): p. 1700592. pmid:30153881
  14. 14. Jaspers E. and Overmann J.r., Ecological significance of microdiversity: identical 16S rRNA gene sequences can be found in bacteria with highly divergent genomes and ecophysiologies. Applied and environmental microbiology, 2004. 70(8): p. 4831–4839. pmid:15294821
  15. 15. Segerman B., The genetic integrity of bacterial species: the core genome and the accessory genome, two different stories. Frontiers in cellular and infection microbiology, 2012. 2: p. 116. pmid:22973561
  16. 16. Land M., et al., Insights from 20 years of bacterial genome sequencing. Functional & integrative genomics, 2015. 15(2): p. 141–161. pmid:25722247
  17. 17. Jensen P.R., Natural products and the gene cluster revolution. Trends in microbiology, 2016. 24(12): p. 968–977. pmid:27491886
  18. 18. Alam K., et al., In silico genome mining of potential novel biosynthetic gene clusters for drug discovery from Burkholderia bacteria. Computers in Biology and Medicine, 2022. 140: p. 105046. pmid:34864585
  19. 19. Xu F., et al., A genetics-free method for high-throughput discovery of cryptic microbial metabolites. Nature chemical biology, 2019. 15(2): p. 161–168. pmid:30617293
  20. 20. Baltz R.H., Gifted microbes for genome mining and natural product discovery. Journal of Industrial Microbiology and Biotechnology, 2017. 44(4–5): p. 573–588. pmid:27520548
  21. 21. Yang J. and Yang S., Comparative analysis of Corynebacterium glutamicum genomes: a new perspective for the industrial production of amino acids. BMC Genomics, 2017. 18(1): p. 940. pmid:28198668
  22. 22. Dey S., et al., Unravelling the Evolutionary Dynamics of High-Risk Klebsiella pneumoniae ST147 Clones: Insights from Comparative Pangenome Analysis. Genes, 2023. 14(5): p. 1037. pmid:37239397
  23. 23. Yoon S.-H., et al., A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie Van Leeuwenhoek, 2017. 110(10): p. 1281–1286. pmid:28204908
  24. 24. Gui Y., et al., Complete genome sequence of Corynebacterium glutamicum CP, a Chinese l-leucine producing strain. Journal of Biotechnology, 2016. 220: p. 64–65. pmid:26784991
  25. 25. Ma Y., et al., Comparative Genomic and Genetic Functional Analysis of Industrial L-Leucine–and L-Valine–Producing Corynebacterium glutamicum Strains. 2018.
  26. 26. Nishio Y., et al., Analysis of strain-specific genes in glutamic acid-producing Corynebacterium glutamicum ssp. lactofermentum AJ 1511. The Journal of General and Applied Microbiology, 2017. 63(3): p. 157–164. pmid:28392541
  27. 27. Kawaguchi H., Sazuka T., and Kondo A., Complete and draft genome sequences of amino acid-producing Corynebacterium glutamicum strains ATCC 21799 and ATCC 31831 and their genomic islands. Microbiology Resource Announcements, 2020. 9(32): p. e00430–20. pmid:32763926
  28. 28. Wu Y., et al., Complete genome sequence of Corynebacterium glutamicum B253, a Chinese lysine-producing strain. Journal of Biotechnology, 2015. 207: p. 10–11. pmid:25953304
  29. 29. Yukawa H., et al., Comparative analysis of the Corynebacterium glutamicum group and complete genome sequence of strain R. Microbiology, 2007. 153(4): p. 1042–1058. pmid:17379713
  30. 30. Meng L., et al., Enhancement of heterologous protein production in corynebacterium glutamicum via atmospheric and room temperature plasma mutagenesis and high-throughput screening. Journal of Biotechnology, 2021. 339: p. 22–31. pmid:34311028
  31. 31. Baumgart M., et al., Construction of a prophage-free variant of Corynebacterium glutamicum ATCC 13032 for use as a platform strain for basic research and industrial biotechnology. Applied and environmental microbiology, 2013. 79(19): p. 6006–6015. pmid:23892752
  32. 32. Lee J.-Y., et al., Adaptive evolution of Corynebacterium glutamicum resistant to oxidative stress and its global gene expression profiling. Biotechnology letters, 2013. 35(5): p. 709–717. pmid:23288296
  33. 33. Marques F., Luzhetskyy A., and Mendes M.V., Engineering Corynebacterium glutamicum with a comprehensive genomic library and phage-based vectors. Metabolic Engineering, 2020. 62: p. 221–234. pmid:32827704
  34. 34. Baumgart M., et al., Corynebacterium glutamicum chassis C1*: building and testing a novel platform host for synthetic biology and industrial biotechnology. ACS synthetic biology, 2018. 7(1): p. 132–144. pmid:28803482
  35. 35. Linder M., et al., Construction of an IS-Free Corynebacterium glutamicum ATCC 13 032 Chassis Strain and Random Mutagenesis Using the Endogenous ISCg1 Transposase. Frontiers in bioengineering and biotechnology, 2021. 9.
  36. 36. Park J., et al., Accelerated growth of Corynebacterium glutamicum by up-regulating stress-responsive genes based on transcriptome analysis of a fast-doubling evolved strain. 2020.
  37. 37. Park S.H., et al., Metabolic engineering of Corynebacterium glutamicum for L-arginine production. Nature communications, 2014. 5(1): p. 1–9. pmid:25091334
  38. 38. Yang J. and Yang S., Comparative analysis of Corynebacterium glutamicum genomes: a new perspective for the industrial production of amino acids. BMC genomics, 2017. 18(1): p. 1–13.
  39. 39. Ma W., et al., Poly (3-hydroxybutyrate-co-3-hydroxyvalerate) co-produced with l-isoleucine in Corynebacterium glutamicum WM001. Microbial cell factories, 2018. 17(1): p. 1–12.
  40. 40. Richardson E.J. and Watson M., The automatic annotation of bacterial genomes. Briefings in bioinformatics, 2013. 14(1): p. 1–12. pmid:22408191
  41. 41. Seemann T., Prokka: rapid prokaryotic genome annotation. Bioinformatics, 2014. 30(14): p. 2068–2069. pmid:24642063
  42. 42. Rho M., Tang H., and Ye Y., FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Research, 2010. 38(20): p. e191–e191. pmid:20805240
  43. 43. Page A.J., et al., Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics, 2015. 31(22): p. 3691–3693. pmid:26198102
  44. 44. Chaudhari N.M., Gupta V.K., and Dutta C., BPGA- an ultra-fast pan-genome analysis pipeline. Scientific Reports, 2016. 6(1): p. 24373. pmid:27071527
  45. 45. Tatusov R.L., et al., The COG database: an updated version includes eukaryotes. BMC bioinformatics, 2003. 4(1): p. 41. pmid:12969510
  46. 46. Kanehisa M., et al., KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research, 2016. 45(D1): p. D353–D361. pmid:27899662
  47. 47. Edgar R., Usearch, 2010, Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States).
  48. 48. Liu R., et al., Comparative genomics reveals intraspecific divergence of Acidithiobacillus ferrooxidans: insights from evolutionary adaptation. Microbial Genomics, 2023. 9(6): p. 001038. pmid:37285209
  49. 49. Price M.N., Dehal P.S., and Arkin A.P., FastTree 2–approximately maximum-likelihood trees for large alignments. PloS one, 2010. 5(3): p. e9490. pmid:20224823
  50. 50. Letunic I. and Bork P., Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Research, 2021. 49(W1): p. W293–W296. pmid:33885785
  51. 51. Machado H., et al., Genome mining reveals unlocked bioactive potential of marine Gram-negative bacteria. BMC genomics, 2015. 16(1): p. 1–12.
  52. 52. Blin K., et al., antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Research, 2021. 49(W1): p. W29–W35. pmid:33978755
  53. 53. Skinnider M.A., et al., Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nature communications, 2020. 11(1): p. 1–9.
  54. 54. van Heel A.J., et al., BAGEL4: a user-friendly web server to thoroughly mine RiPPs and bacteriocins. Nucleic acids research, 2018. 46(W1): p. W278–W281. pmid:29788290
  55. 55. Salamzade R., et al., Evolutionary investigations of the biosynthetic diversity in the skin microbiome using lsaBGC. Microbial Genomics, 2023. 9(4). pmid:37115189
  56. 56. Eddy S.R., Profile hidden Markov models. Bioinformatics (Oxford, England), 1998. 14(9): p. 755–763. pmid:9918945
  57. 57. Alikhan N.-F., et al., BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons. BMC Genomics, 2011. 12(1): p. 402. pmid:21824423
  58. 58. Darling A.C., et al., Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome research, 2004. 14(7): p. 1394–1403. pmid:15231754
  59. 59. Seemann T., Snippy: rapid haploid variant calling and core SNP phylogeny. GitHub. Available at: github. com/tseemann/snippy, 2015.
  60. 60. Zhu Q., Kosoy M., and Dittmar K., HGTector: an automated method facilitating genome-wide discovery of putative horizontal gene transfers. BMC Genomics, 2014. 15(1): p. 717. pmid:25159222
  61. 61. Cosentino S., et al., PathogenFinder-distinguishing friend from foe using bacterial whole genome sequence data. PLOS ONE, 2013. 8(10): p. e77302. pmid:24204795
  62. 62. Tettelin H., et al., Comparative genomics: the bacterial pan-genome. Current Opinion in Microbiology, 2008. 11(5): p. 472–477. pmid:19086349
  63. 63. Hyun J.C., Monk J.M., and Palsson B.O., Comparative pangenomics: analysis of 12 microbial pathogen pangenomes reveals conserved global structures of genetic and functional diversity. BMC Genomics, 2022. 23(1): p. 7. pmid:34983386
  64. 64. Rajput A., et al., Pangenome analysis reveals the genetic basis for taxonomic classification of the Lactobacillaceae family. Food microbiology, 2023. 115: p. 104334. pmid:37567624
  65. 65. Elbir H., Almathen F., and Almuhasen F.M., Genomic differences among strains of Corynebacterium cystitidis isolated from uterus of camels. The Journal of Infection in Developing Countries, 2022. 16(01): p. 134–146. pmid:35192531
  66. 66. Pal S., et al., Comparative evolutionary genomics of Corynebacterium with special reference to codon and amino acid usage diversities. Genetica, 2018. 146(1): p. 13–27. pmid:28921302
  67. 67. Ali A., et al., Microbial comparative genomics: an overview of tools and insights into the genus Corynebacterium. J Bacteriol Parasitol, 2013. 4(167): p. 2.
  68. 68. Levesque S., et al., Mobilome of Brevibacterium aurantiacum sheds light on its genetic diversity and its adaptation to smear-ripened cheeses. Frontiers in microbiology, 2019. 10: p. 1270. pmid:31244798
  69. 69. Brune I., et al., The individual and common repertoire of DNA-binding transcriptional regulators of Corynebacterium glutamicum, Corynebacterium efficiens, Corynebacterium diphtheriae and Corynebacterium jeikeium deduced from the complete genome sequences. BMC genomics, 2005. 6(1): p. 1–10. pmid:15938759
  70. 70. Araújo C.L., et al., In silico functional prediction of hypothetical proteins from the core genome of Corynebacterium pseudotuberculosis biovar ovis. PeerJ, 2020. 8: p. e9643. pmid:32913672
  71. 71. Pohl S., et al., The extensive set of accessory Pseudomonas aeruginosa genomic components. FEMS microbiology letters, 2014. 356(2): p. 235–241. pmid:24766399
  72. 72. Santos M.A., et al., Driving change: the evolution of alternative genetic codes. TRENDS in Genetics, 2004. 20(2): p. 95–102. pmid:14746991
  73. 73. Zhang R. and Zhang C.-T., A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I. Bioinformatics, 2004. 20(5): p. 612–622. pmid:15033867
  74. 74. Osawa S. and Jukes T.H., Evolution of the genetic code as affected by anticodon content. Trends in Genetics, 1988. 4(7): p. 191–198. pmid:3070867
  75. 75. Osawa S., et al., Recent evidence for evolution of the genetic code. Microbiological reviews, 1992. 56(1): p. 229–264. pmid:1579111
  76. 76. Nasim F., Dey A., and Qureshi I.A., Comparative genome analysis of Corynebacterium species: The underestimated pathogens with high virulence potential. Infection, Genetics and Evolution, 2021. 93: p. 104928. pmid:34022437
  77. 77. Bentley S.D., et al., Complete genome sequence of the model actinomycete Streptomyces coelicolor A3 (2). Nature, 2002. 417(6885): p. 141–147. pmid:12000953
  78. 78. Challis G.L. and Ravel J., Coelichelin, a new peptide siderophore encoded by the Streptomyces coelicolor genome: structure prediction from the sequence of its non-ribosomal peptide synthetase. FEMS microbiology letters, 2000. 187(2): p. 111–114. pmid:10856642
  79. 79. Pawlik K., et al., A cryptic type I polyketide synthase (cpk) gene cluster in Streptomyces coelicolor A3 (2). Archives of microbiology, 2007. 187(2): p. 87–99. pmid:17009021
  80. 80. Mattheus W., et al., Isolation and purification of a new kalimantacin/batumin-related polyketide antibiotic and elucidation of its biosynthesis gene cluster. Chemistry & biology, 2010. 17(2): p. 149–159. pmid:20189105
  81. 81. Liu L., et al., Charting the complexity of the activated sludge microbiome through a hybrid sequencing strategy. Microbiome, 2021. 9(1): p. 1–15.
  82. 82. Zheng Y., et al., Genome features and secondary metabolites biosynthetic potential of the class Ktedonobacteria. Frontiers in microbiology, 2019. 10: p. 893. pmid:31080444
  83. 83. Jensen P.R., et al., Species-specific secondary metabolite production in marine actinomycetes of the genus Salinispora. Applied and environmental microbiology, 2007. 73(4): p. 1146–1152. pmid:17158611
  84. 84. Xin B., et al., The Bacillus cereus group is an excellent reservoir of novel lanthipeptides. Applied and environmental microbiology, 2015. 81(5): p. 1765–1774. pmid:25548056
  85. 85. Kamigiri K., et al., Kalimantacins A, B and C, novel antibiotics from Alcaligenes sp. YL-02632S I. Taxonomy, fermentation, isolation and biological properties. The Journal of Antibiotics, 1996. 49(2): p. 136–139.
  86. 86. Tokunaga T., et al., Kalimantacin A, B, and C, novel antibiotics produced by Alcaligenes sp. YL-02632S II. Physico-chemical properties and structure elucidation. The Journal of Antibiotics, 1996. 49(2): p. 140–144.
  87. 87. Smirnov V., et al., Isolation of highly active strain producing the antistaphylococcal antibiotic batumin. Prikladnaia Biokhimiia i Mikrobiologiia, 2000. 36(1): p. 55–58.
  88. 88. Kallscheuer N., et al., Microbial synthesis of the type I polyketide 6-methylsalicylate with Corynebacterium glutamicum. Applied microbiology and biotechnology, 2019. 103(23): p. 9619–9631. pmid:31686146
  89. 89. Mora-Lugo R., Stegmüller J., and Mack M., Metabolic engineering of roseoflavin-overproducing microorganisms. Microbial cell factories, 2019. 18(1): p. 1–13.
  90. 90. Chikindas M.L., et al., Functions and emerging applications of bacteriocins. Current opinion in biotechnology, 2018. 49: p. 23–28. pmid:28787641
  91. 91. Drider D., et al., Bacteriocins: not only antibacterial agents. Probiotics and antimicrobial proteins, 2016. 8(4): p. 177–182. pmid:27481236
  92. 92. Zotchev S.B., Genomics-based insights into the evolution of secondary metabolite biosynthesis in actinomycete bacteria, in Evolutionary biology: genome evolution, speciation, coevolution and origin of life2014, Springer. p. 35–45.
  93. 93. Cimermancic P., et al., Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell, 2014. 158(2): p. 412–421. pmid:25036635
  94. 94. Belknap K.C., et al., Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria. Scientific reports, 2020. 10(1): p. 1–9.
  95. 95. Gallagher K.A. and Jensen P.R., Genomic insights into the evolution of hybrid isoprenoid biosynthetic gene clusters in the MAR4 marine streptomycete clade. BMC genomics, 2015. 16(1): p. 1–13.
  96. 96. Khaldi N., et al., Evidence for horizontal transfer of a secondary metabolite gene cluster between fungi. Genome biology, 2008. 9(1): p. 1–10. pmid:18218086
  97. 97. Hollensteiner J., et al., Pan-genome analysis of six complete Paracoccus type strain genomes from hybrid next generation sequencing. bioRxiv, 2023: p. 2023.06. 19.545646.