In silico genomic insights into aspects of food safety and defense mechanisms of a potentially probiotic Lactobacillus pentosus MP-10 isolated from brines of naturally fermented Aloreña green table olives

Lactobacillus pentosus MP-10, isolated from brines of naturally fermented Aloreña green table olives, exhibited high probiotic potential. The genome sequence of L. pentosus MP-10 is currently considered the largest genome among lactobacilli, highlighting the microorganism’s ecological flexibility and adaptability. Here, we analyzed the complete genome sequence for the presence of acquired antibiotic resistance and virulence determinants to understand their defense mechanisms and explore its putative safety in food. The annotated genome sequence revealed evidence of diverse mobile genetic elements, such as prophages, transposases and transposons involved in their adaptation to brine-associated niches. In-silico analysis of L. pentosus MP-10 genome sequence identified a CRISPR (clustered regularly interspaced short palindromic repeats)/cas (CRISPR-associated protein genes) as an immune system against foreign genetic elements, which consisted of six arrays (4–12 repeats) and eleven predicted cas genes [CRISPR1 and CRISPR2 consisted of 3 (Type II-C) and 8 (Type I) genes] with high similarity to L. pentosus KCA1. Bioinformatic analyses revealed L. pentosus MP-10 to be absent of acquired antibiotic resistance genes, and most resistance genes were related to efflux mechanisms; no virulence determinants were found in the genome. This suggests that L. pentosus MP-10 could be considered safe and with high-adaptation potential, which could facilitate its application as a starter culture and probiotic in food preparations.


Introduction
Lactobacilli are ubiquitous in the environment and food production (reviewed in [1]), and they are also part of intestinal, vaginal and oral microbiota [2]. As members of the lactic acid Comparison of ORFs sequences among L. pentosus MP-10, L. pentosus KCA1, and L. pentosus IG1 (aligned by MAUVE algorithm) showed that the synteny of genes was similar (Fig  2A), although inversion and rearrangements among all L. pentosus strains occurred (Fig 2A). Inversion and rearrangement are the main evolutionary phenomena observed among L. pentosus strains and provide a complete picture of genetic differences among the strains colonizing different ecological niches. The phylogenetic distance between L. pentosus MP-10 and L. The circles from outside to inside are the annotated CDS elements in forward orientation, the annotated CDS elements in the reverse orientation, several COG functions, the structural RNA, the GC content and the GC screw. (B) The circles from outside to inside of each plasmid are the annotated CDS elements in forward orientation, the annotated CDS elements in the revers orientation, several COG functions, the GC content and the GC screw.

Defense mechanisms of Lactobacillus pentosus MP-10
Among the defense mechanisms revealed in the L. pentosus MP-10 genome sequence by in silico analysis, 12 genes were found to be involved in defense responses to viruses and bacteria. Further, we identified the presence of two CRISPR systems: CRISPR1 and CRISPR2 [17] that represent an acquired and adaptive immune system providing protection against mobile genetic elements (i.e., viruses, transposable elements and conjugative plasmids) [22,23]. In general, a CRISPR mechanism depends on a leader sequence, CRISPR array and CRISPR associated protein responsible genes (cas genes) in bacteria since the expression of CRISPR array could be constitutive or inducible [24,25]. Analysis carried out with the CRISPRs finder program showed that L. pentosus MP-10 genome possessed genes that encoded nine potential CRISPR arrays (CR) between 159,766 and 3,085,353 bp distributed on the entire whole genome ( Fig  3A): six were confirmed CRISPRs, and three were questionable CRISPRs ( Fig 3A, Table 1).  This may reflect chromosomal plasticity as a means of increasing fitness or changing ecological lifestyles.
Each CRISPR array comprised of short spacer sequences that were fragments of foreign DNA, either derived from the phage or plasmid, incorporated into the host between degenerate repeats (DR consensus). The number of confirmed CRISPR arrays was similar in both L. pentosus strains (MP-10 and KCA1); however, the number of repeats and spacers, the CRISPR length, and the DR consensus sequence were different, although two identical repeats were found in both L. pentosus strains (MP-10 and KCA1) ( Table 1). Comparison of CRISPR arrays of L. pentosus MP-10 and phylogenetically related lactobacilli, such as L. plantarum, L. paraplantarum and L. brevis (available in CRISPRs database), showed that one DR consensus (5´-GTCTTGAATAGTAGTCATATCAAACAGGTTTAGAAC-3´) or its reverse complement was shared by all L. pentosus and L. plantarum strains except L. pentosus IG1 (Table 1). Such DR consensus could be considered as a more conserved repeat signature in L. plantarum group.
The number of spacers ranged from four in CR5 to eleven in CR6 identified within the six confirmed CRISPR arrays with lengths ranging from 29 to 51 bp (40 bp average length) ( Table 2). The search of protospacer was done using CRISPR Target program to localize the DNA target acquired by horizontal gene transfer, and the results revealed the presence of protospacers related to plasmids and phages. These protospacers were located within genes encoding structural viral protein (such as tail-fiber protein) or bacterial enzymes such as thioredoxin reductase, short-chain dehydrogenase, excinuclease ABC subunit A and FMN-dependent oxidoreductase, nitrilotriacetate monooxygenase family protein, et al. (Table 2). Furthermore, the protospacers were also identified within genes of unknown function and in intergenic regions ( Table 2).
Given that the spacers were usually added at one side of the CRISPR system, the chronological record of the viruses and plasmids (protospacers), which invaded L. pentosus MP-10 or its ancestors, could be detected by searching for the spacers with BLAST (Basic Local Alignment Search Tool). For example in CR1, we suggested that the primary invasion was accomplished by Haematospirillum jordaniae H5569 Plasmid unnamed 2, then by other short sequences followed by Borrelia miyamotoi FR64b Plasmid_07, and Clostridium taeniosporum 1/k Plasmid pCt3 ( Table 2). On the other hand, multiple targets were observed for all confirmed CRISPR spacers of L. pentosus MP-10 except for CR7 (Table 2). This suggests that L. pentosus MP-10 could target many diverse viruses and plasmids. As such, they could possess an efficient defense mechanism against different pathogens, not only in food systems, but also in intestinal tract-thus reinforcing their probiotic capacity.
Regarding the CRISPR-associated protein involved in sequence-specific recognition and cleavage of target DNA complementary to the spacer, according to the classification suggested by Makarova et al. [26], three major types of the CRISPR-Cas systems were differentiated (Types I, II and III). However, in the present study both signature genes for the Type I (cas3) and Type II (cas9) systems were detected in L. pentosus MP-10 genome (S1 Table, Fig 3B).      CRISPR1 and CRISPR2 consisted of three Type-II-C and eight Type-I genes, respectively ( Fig  3B), and they were closely associated with the palindromic repeat/spacer units ( Fig 3A). CRISPR1 operon consisted of only three genes (cas1, cas2 and cas9), which were similar to those of Streptococcus thermophilus (S1 Table) and adjacent to the CR1 array ( Fig 3A). A comparison of L. pentosus MP-10 and L. pentosus KCA1 revealed that CRISPR1 of L. pentosus KCA1 contained one more gene encoding a protein involved in adaptation (the csn2 gene) [27]; while CRISPR1 of L. pentosus KCA1 belonged to Type II-A, CRISPR1 of L. pentosus MP-10 belonged to Type II-C lacking this fourth gene ( Fig 3B). Regarding CRISPR2 of L. pentosus MP-10, this operon consisted of eight genes: the coding genes for CRISPR-associated endonucleases Cas1 and Cas2 (ygbT and ygbF genes); the CRISPR system Cascade subunit CasC (casC gene); and the CRISPR system Cascade subunit Cas5 (XX999_01592 gene ID of L. pentosus MP-10), which were similar to Escherichia coli, the Cas3 nuclease/helicase (cas3 gene) in Streptococcus thermophilus, the CRISPR-associated endoribonuclease Cse3 in Thermus thermophilus and two genes unique for L. pentosus MP-10 (XX999_01589 gene ID, or cse1_Lpe gene, and XX999_01590 gene ID, or cse2_Lpe gene) (S1 Table). Among the eight genes of CRISPR2, five of them were shared by both L. pentosus strains (MP-10 and KCA1): cas1, cas2, cas3, casC, cas5 and cse3 ( Fig 3B); however, both unique genes for L. pentosus MP-10 (XX999_01589 gene ID, or cse1_Lpe gene, and XX999_01590 gene ID, or cse2_Lpe gene) corresponded to CRISPRassociated protein (KCA1_RS06550) and cse2/casB (KCA1_RS06555) in L. pentosus KCA1. Alignment of these genes revealed that the cse1-Lpe gene from L. pentosus MP-10 showed high similarity to the CRISPR-associated protein from L. pentosus DSM 20314 and L. pentosus FL0421 (99.8% identity) and also with L. pentosus KCA1 (94.2%). However, it showed only 71.6% identity with cse1 gene sequence from L. pentosus IG1, which formed a separate lineage from the other cluster representing the four lactobacilli (Fig 4A). On the other hand, the cse2-Lpe gene from L. pentosus MP-10 was identical to the cse2 gene from L. pentosus DSM 20314 and L. pentosus FL0421 (100% identity) and highly similar to cse2/casB gene from L. pentosus KCA1 (90.2% identity); however, L. pentosus IG1 formed a different lineage (67.3% identity) from the main cluster of other lactobacilli (Fig 4B). It is noteworthy to highlight that the CRISPR genes found in L. pentosus MP-10 were more highly similar to those of L. pentosus DSM 20314 (isolated from corn silage), L. pentosus FL0421 (isolated from temperate deciduous-forest biome soil), and L. pentosus KCA1 (isolated from the vagina), than L. pentosus IG1 isolated from fermented olives. These data provided new insight into the evolution of bacterial resistance against mobile elements in Lactobacillus spp., which highlight their interconnection between different ecosystems; thus L. pentosus MP-10 possess multiple CRISPR elements of various nature, which are (again) of great relevance for the application of this bacterium, not only as a promising probiotic, but also as starter culture at industrial scale.
On the other hand, screening for prophage DNA within L. pentosus MP-10 genome, using bioinformatic tools such as PHAST, determined the presence of five temperate phage regions. Two regions were intact (Regions 2 and 5, score > 90), the other two were questionable (Regions 1 and 4, score 70-90), and the last one was incomplete (region 3, score < 70) (Fig 3A,  Table 4). The complete prophage regions of L. pentosus MP-10 chromosome were identified as Lactobacillus phage Sha1 (region 2; GC content, 40.35%; region length, 39.2 kb) [29] and  [29] and Listeria phage B025 (region 4; GC content, 42.96%; region length, 20.9 kb) [31]. The incomplete prophage region was identified as Lactobacillus phage Sha1 (region 3; GC content, 42.61; region length, 26.7 kb) [29]. The occurrence of prophage DNA within bacterial genomes is common; over 40 Lactobacillus prophages have been reported [32] and their presence highlights the genetic diversity and fitness of the Lactobacillus genome. In our case, the presence of prophages may confer selective advantage to the cell, promoting its survivability and its resistance to other infecting phages. S2 Table shows the proteins encoded by the five prophage regions predicted by PHAST tool in L. pentosus MP-10 genome. The complete prophages corresponded to regions 2 and 5 encoded 49 and 57 proteins, respectively (Table 4) and were homologous to Lactobacillus phage Sha1 isolated from traditional Korean fermented food "kimchi" [29] and Oenococcus phage phi 9805 from red wine [30]. Those data suggest that different species colonizing different ecosystems may share the same prophages and their architecture due to the interconnection between different habitats via lateral genetic exchange [33]. Each prophage region of L. pentosus MP-10 genome showed the presence of an integrase: one integrase in each complete prophage (region 2 and 5), two integrases in incomplete prophage (region 3), and a single integrase in the questionable prophage (region 1) (S2 Table); also phage attachment sites (attL and attR) (in regions 1, 2, 3 and 5) were found to be potentially involved in the integration of prophage regions in host chromosome. However, screening of the whole genome (outside prophage regions) of L. pentosus MP-10 for phage integrases as markers for mobile DNA elements, such as prophages, determined the presence of fifteen integrase core domain proteins not adjacent to the prophage-like region, thus we deduce that they were not involved in prophage mobility (data not shown). However, lysis genes (endolysin and holin) detected in prophage regions may be used by L. pentosus MP-10 in their own ecological niche or could be used in the food industry to eliminate undesirable bacteria during fermentation, particularly in cheese making to accelerate ripening. However, studies concerning the application of L. pentosus MP-10 in several fermentations should be studied in depth.

In silico analysis of safety properties of L. pentosus MP-10
To generate further insights into the food-safety aspects of L. pentosus MP-10, we surveyed the genes related with antibiotic resistance and virulence factors in their genome.  Antibiotic resistance. Firstly, a BLAST search was conducted for each annotated element of L. pentosus MP-10 genome sequence against the antibiotic resistance genes database (CARD). The search predicted the presence of several genes involved in antibiotic resistance although their identity to known resistance genes were low (< 90%), thus we could not suggest that the genes in L. pentosus MP-10 genome were homologous to the described genes (data not shown). To predict the complete resistome from L. pentosus MP-10 genome, including resistance genes and mutations conferring antibiotic resistance, we used the Resistance Gene Identifier (RGI) tool available in the recent updated CARD database [34], which used archive's curated AMR (antimicrobial resistance) detection models. Here, we detected strict hits, which were defined as being within the similarity cut-offs of the individual AMR detection models and represented likely homologs of AMR genes according to Jia et al. [34]. The RGI revealed that L. pentosus MP-10 chromosome contained specific resistance genes for different antibiotics: aminocoumarin (alaS, an alanyl-tRNA synthetase gene, 1 hit), fluoroquinolone (mfd gene, 1 hit) and mupirocin (ileS or isoleucyl-tRNA synthetase gene, 2 hits), as well as genes coding for efflux pump proteins conferring resistance to multiple antibiotics (Fig 6, S3 Table). Among them, we found LmrB and LmrD multidrug efflux pumps that confer resistance to lincosamides in Bacillus subtilis, and Streptomyces lincolnensis and Lactococcus lactis, respectively [35][36]; the regulator of ArlR efflux-pump that binds to the norA promoter to activate its expression [37]; and the multidrug efflux pump EmeA from Enterococcus faecalis conferring resistance to several antimicrobial agents (S3 Table). Previous phenotypic analysis of antibiotic susceptibility of L. pentosus MP-10 [38] revealed that this strain showed resistance to cefuroxime, ciprofloxacin, teicoplanin, trimethoprim, trimethoprim/sulfamethoxazole and vancomycin. However, L. pentosus MP-10 was sensitive to clindamycin [38], thus lmrB and lmrD genes coding for multidrug efflux pumps were not involved in clindamycin resistance.
On the other hand, a loose algorithm, which works outside of the detection model cut-offs to provide detection of new, emergent threats and more distant homologs of AMR genes [34], was also used; S4 Table shows the results. Considering the previous results of antibiotic resistance phenotypic screening [38], we can suggest that resistance to cefuroxime, ciprofloxacin, teicoplanin, trimethoprim, trimethoprim/sulfamethoxazole and vancomycin may be mediated by new genes responsible (not determined up to date) for the intrinsic resistance; however, further studies are required to confirm this hypothesis.
In summary, in silico analysis of antibiotic resistance in L. pentosus MP-10 showed the absence of acquired antibiotic resistance genes, and the resistome was mostly represented by efflux-pump resistance genes responsible of the intrinsic resistance exhibited by this strain.
Virulence. Regarding virulence, the BLAST searches against a virulence gene database (PHAST) revealed the presence of 14 coding genes for P1, P2a and P2b prophage proteins, an alanine racemase and a DNA-binding ferritin-like protein similar to L. plantarum WCFS1 (>90% identity; Table 5). As such, Lb. pentosus MP-10 chromosome contained mostly P2b prophage elements, which were located in the predicted questionable prophage region (Region 1, Fig 3A; PHAGE_Strept_315.2_NC_004585(3)], Table 4), and included: DNA packaging genes (encoding small and large terminase, portal protein), head-tail genes (head-to-tail joining), helicase and DNA replication gene (Table 5). These results were in accordance of those reported in S2 Table for Region 1. Furthermore, several proteins of unknown functions of P2b (proteins 10 and 21) prophage from Lb. plantarum WCFS1 were also detected (Table 5); however, van Hemert et al. [39] showed that prophage P2b protein 21 was involved in modulating peripheral blood mononuclear cell (PBMC) cytokine interleukin 10 (IL-10) and IL-12 production, which might be responsible for the stimulation of anti-or pro-inflammatory immune responses in the gut. Comparing P2b prophage region of Lb. pentosus MP-10 and Lb. plantarum WCFS1, we observed a strong synteny between prophage regionss from the two distinct species of Lactobacillus, despite the comparison being done with proteins with >90% identity (Table 5). In this case, nine homologous proteins were shared, although each species occupies a different ecological niches: human saliva and olives [16,40], respectively. Similar results were reported by Zhang et al. [41] for other lactobacilli.

Concluding notes
The new annotated genome sequence of L. pentosus MP-10 is currently considered the largest genome among lactobacilli; their additional genes may reflect the microorganism's ecological flexibility and adaptability. In silico analysis of the genome identified a CRISPR (clustered regularly interspaced short palindromic repeats)/cas (CRISPR-associated protein genes) system involved in bacterial resistance against mobile elements, which consisted of six arrays (4-12 repeats) and eleven predicted cas genes (CRISPR1 and CRISPR2 consisted of three TypeII-C and eight TypeI-E genes) with high similarity to L. pentosus KCA1. Bioinformatic evidence of L. pentosus MP-10 did not reveal any acquired antibiotic resistance genes, and most inherent resistance genes were antibiotic efflux genes. No virulence factors were found. Thus, we can suggest that L. pentosus MP-10 could be considered safe for food processing, and high their adaptation potential could facilitate their application as a probiotic and starter culture in industrial processes.

Materials and methods
Genome sequence of L. pentosus MP-10 The complete genome sequence of L. pentosus MP-10 was obtained by using PacBio RS II technology [17] and deposited at the EMBL Nucleotide Sequence Database (accession numbers FLYG01000001 to FLYG01000006). The assembled genome sequences were annotated at Lifesequencing S.L. (Valencia, Spain) using the Prokka annotation pipeline, version 1.11 [42]. This involved predicting tRNA, rRNA, and mRNA genes and signal peptides in the sequences using Aragorn, RNAmmer, Prodigal, and SignalP, respectively, [43][44][45]. To evaluate the alignment and the synteny of genes between the L. pentosus MP-10, L. pentosus KCA1 and L. pentosus IG1 genome data sets, comparison was done by using Mauve algorithm in Lasergene's MegAlign Pro software (Lasergene 14).
Genomic analysis of mobile genetic elements and safety aspects of Lactobacillus pentosus MP-10 The annotated genome sequence of L. pentosus MP-10 was screened for the presence of CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) loci and the mobile genetic elements (i.e., conjugative plasmid, transposase, transposon, IS elements and prophage). Furthermore, we used the CRISPR finder tool (available in the CRISPRs web server; http://crispr.i2bc.paris-saclay.fr/Server/) to identify CRISPRs and extract the repeated and unique sequences in the L. pentosus MP-10 genome. The localization of CRISPR RNAs targets was done by using CRISPR Target program (http://bioanalysis.otago.ac.nz/CRISPRTarget/ crispr_analysis.html). For prophage region search and annotation, we screened chromosomal DNA of L. pentosus MP-10 against a phage finding tool (PHAST, PHAge Search Tool) considered as an accurate or slightly more accurate than most available phage finding tools, with sensitivity of 85.4% and positive predictive value of 94.2% [46].
The predicted CDSs were annotated by using BLAST (Basic Local Alignment Search Tool) against the CARD (Comprehensive Antibiotic Resistance Database) and the MvirDB (a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications) databases for antibiotic resistance and virulence factor screening (last version downloaded on January, 2017), respectively, with the associated GO (Gene Ontology) terms obtained by using Swiss-Prot database. Furthermore, the Resistance Gene Identifier (RGI) software (as part of CARD tools) was used for prediction of L. pentosus MP-10 resistome from protein or nucleotide data based on homology and SNP (Single Nucleotide Polymorphism) models, based on the CARD 0 s curated AMR (antimicrobial resistance) detection models. Moreover, the ResFinder (acquired antimicrobial Resistance gene Finder) software version 2.1 (https://cge.cbs.dtu.dk//services/ResFinder/) was used for screening of acquired antibiotic resistance genes [47] with selected %ID threshold of 90.00% and Selected minimum length of 60% (last accessed in January, 2017).  Table. AMR detected in Lactobacillus pentosus MP-10 genome by using hits with weak "loose" similarity in RGI software. (DOC)