De novo genome assembly of Bacillus altitudinis 19RS3 and Bacillus altitudinis T5S-T4, two plant growth-promoting bacteria isolated from Ilex paraguariensis St. Hil. (yerba mate)

Plant growth-promoting bacteria (PGPB) are a heterogeneous group of bacteria that can exert beneficial effects on plant growth directly or indirectly by different mechanisms. PGPB-based inoculant formulation has been used to replace chemical fertilizers and pesticides. In our previous studies, two endophytic endospore-forming bacteria identified as Bacillus altitudinis were isolated from roots of Ilex paraguariensis St. Hil. seedlings and selected for their plant growth-promoting (PGP) properties shown in vitro and in vivo. The purposes of this work were to assemble the genomes of B. altitudinis 19RS3 and T5S-T4, using different assemblers available for Windows and Linux and to select the best assembly for each strain. Both genomes were also automatically annotated to detect PGP genes and compare sequences with other genomes reported. Library construction and draft genome sequencing were performed by Macrogen services. Raw reads were filtered using the Trimmomatic tool. Genomes were assembled using SPAdes, ABySS, Velvet, and SOAPdenovo2 assemblers for Linux, and Geneious and CLC Genomics Workbench assemblers for Windows. Assembly evaluation was done by the QUAST tool. The parameters evaluated were the number of contigs ≥ 500 bp and ≥ 1000 bp, the length of the longest contig, and the N50 value. For genome annotation PROKKA, RAST, and KAAS tools were used. The best assembly for both genomes was obtained using Velvet. The B. altitudinis 19RS3 genome was assembled into 15 contigs with an N50 value of 1,943,801 bp. The B. altitudinis T5S-T4 genome was assembled into 24 contigs with an N50 of 344,151 bp. Both genomes comprise several genes related to PGP mechanisms, such as those for nitrogen fixation, iron metabolism, phosphate metabolism, and auxin biosynthesis. The results obtained offer the basis for a better understanding of B. altitudinis 19RS3 and T5S-T4 and make them promissory for bioinoculant development.

Introduction and compaction of soil, and nutrient loss combined with little or no soil fertilization [40]. This motivates the research and development of a biofertilizer from native bacteria isolated from yerba mate to recover crop performance.
In our previous study, two Gram-positive endophytic endospore-forming bacteria, coded as 19RS3 and T5S-T4, were isolated from roots of I. paraguariensis St. Hil. seedlings. These bacteria were selected for their in vitro PGP properties. Both strains were identified morphologically and molecularly as Bacillus altitudinis. Also, B. altitudinis 19RS3 and B. altitudinis T5S-T4 showed in vivo growth promotion in yerba mate seedlings in greenhouse conditions with promising results [41].
The study of the genome of microorganisms used for biofertilizer production is important to bioinoculant technology because it helps to identify genes that contribute to the beneficial activity and increasing knowledge of the molecular mechanisms related to plant growth potential. In the last decade, the development of new bioinformatics tools and next-generation sequencing technologies has allowed researchers to gain deeper insights into the molecular and genetic mechanisms of plant growth-promoting (PGP) activities such as the study of Pho regulon involved in the inorganic phosphate (Pi) solubilization, the detection of nif gene cluster associated to nitrogen fixation, the study of metabolic pathways related to the siderophore production, and the discovery of antibiotics and volatile compounds production mechanisms implicated in biocontrol properties. These advances were accompanied by an exponential increase in the number of assembler algorithms available to obtain complete prokaryotic genomes [42]. Principio del formulario Final del formulario Currently, there are two widely used classes of algorithms: overlap-layout-consensus (OLC) and de-Bruijn-graph (DBG) [43]. The DBG algorithm is based on the k-mers approach [44]. This value divides the short sequences into smaller fragments of size k, and these k-mers overlap with k-1, which represents the next k-mer. Since Illumina sequencing technology entered the global market, several short-read assembly software based on DBG have been developed, such as Velvet [45], ABySS [46], SPAdes [47], and SOAPdenovo2 [48]. Despite this, the selection of assembly tools, the determination of the parameters to be executed, as well as the evaluation of the assemblies, are still a challenge [49].
In this context, to advance knowledge of PGP mechanisms, the genomes of B. altitudinis 19RS3 and B. altitudinis T5S-T4 were sequenced. The purposes of this work were to assemble both genomes, to compare the results obtained using different de novo assemblers available for Windows and Linux operating systems, and to select the best assembly for each B. altitudinis strain. Finally, both genomes were automatically annotated to detect genes involved in PGP capabilities and compare these sequences with other Bacillus sp. genomes reported.

Bacteria
B. altitudinis 19RS3 and B. altitudinis T5S-T4 were isolated from roots of I. paraguariensis St. Hil. seedlings [41]. Both strains were identified by analysis of 16S rRNA gene sequencing (accession number MH883312 and MH883235, respectively) and characterized as Gram-positive endospore-forming rod-shaped bacteria. B. altitudinis 19RS3 and B. altitudinis T5S-T4 were deposited into the bacterial collection of the Instituto de Biotecnología Misiones "Dra. María Ebe Reca", under accession numbers LBM250 and LBM251, respectively. Bacteria were preserved in 50% glycerol stocks at -80˚C until the performance of this study.

DNA extraction
The strains were cultivated in nutrient broth (Britania Lab. SA) for 24 h at 30˚C. The DNA extraction procedures were done using Sambrook´s work protocol modified [50,51]. The DNA was resuspended by 20 μL of sterile distilled DNAse-free water (BioPack 1). The extracted DNA was qualitatively evaluated by agarose gel (1% w/v) electrophoresis stained with a solution of GelRed1 (Sigma-Aldrich, Germany). The DNA quantification was performed by UV spectrophotometry.

Library preparation and genome sequencing
Genomic TruSeq Nano DNA library (350) construction and draft genome paired-end sequencing were performed by Macrogen Co. (Seoul, Korea) services using Illumina HiSeq technology.

Genome assembly and evaluation
The quality of the FASTQ files was verified with FastQC [52] and reads were trimmed to ensure high quality (Phred score > 30) using Trimmomatic version 0.39 [53].
The genomes were assembled using different de novo assemblers available for Linux and Windows operating systems ( Table 1).
The k-mer values were selected according to the manual user instructions of each assembler. In general terms, the values were odd to avoid palindromes and were strictly inferior to read length.
The assemblies obtained in Linux were evaluated using QUAST (Quality Assessment Tool for Genome Assemblies) [56][57][58]. The assemblies generated in Windows showed their own statistics tables.
The parameters evaluated were the number of contigs � 500 bp, the number of contigs � 1000 bp, the length of the longest contig, and the value of N50.

Genome annotation
Gene prediction and annotation were performed using The Rapid Prokaryotic Genome Annotation (Prokka) [59]. Putative genes involved in plant growth-promoting mechanisms were determined using the Rapid Annotations using Subsystems Technology (RAST) [60] annotation server and KEGG Automatic Annotation Server (KAAS) [61].
In the assembly of B. altitudinis 19RS3 genome, ABySS and Velvet generated the contigs with the highest N50 value. These two assemblies produced N50 values that are more than five times higher than the worst assemblies. Velvet also generated the fewest number of contigs and performed considerably better than the other assemblers. Geneious generated the worst assembly with the fewest N50 value and the highest number of contigs.
For the assembly of B. altitudinis T5S-T4 genome, ABySS had the highest N50 value, followed by Velvet. This last assembler also generated the fewest number of contigs. The CLC Genomics Workbench assembly, despite its large N50 contig size, had more contigs than any other assembler.
The best assembly for both genomes was obtained using the Velvet software. The B. altitudinis 19RS3 genome was assembled into 15 contigs (� 500 bp) with an N50 value of 1,943,801 bp and the longest contig length of 1,943,801 bp. The B. altitudinis T5S-T4 genome was assembled into 24 contigs (� 500 bp) with an N50 of 344,151 bp and the longest contig length of 805,135 bp. The B. altitudinis 19RS3 and B. altitudinis T5S-T4 assembled contigs were deposited in Genbank under accession numbers JACAAH01 and JACAAI01 respectively.
Genomic features of B. altitudinis 19RS3 (Fig 1) and B. altitudinis T5S-T4 (Fig 2) presented similar size, noncoding sequences, ribosomal RNA sequences, and transfer RNA (Table 4). In the chromosome sequence of B. altitudinis 19RS3 a total of 3861 CDSs and 80 RNAs genes were predicted (Table 4). Among these CDSs 2762 (68.43%) genes were classified into 469 functional subsystems. Similarly, in the chromosome sequence of B. altitudinis T5S-T4 a total of 3801 CDSs and 67 RNAs genes were predicted (Table 4). Among these CDSs 2750 (69.54%) genes were classified into 472 functional subsystems. There is a high similarity among the number of genes in each category between B. altitudinis 19RS3 and B. altitudinis T5S-T4; but the former has more genes in several metabolism-related functions in cellular processes such as cell wall formation and capsule formation, regulation and cellular signaling, and genes related to phages, prophages, transposable elements, plasmids ( Table 5). Most of the genes were associated with the metabolism of carbohydrates and amino acids derivates.
The presence of related genes to PGPR mechanisms or the metabolic pathway prediction of RAST was found from the gene annotation. The production of enzymes involved in the metabolism of indole acetic acid (IAA) via the tryptophan pathway coded by the gene cluster trp(ABD) was predicted, suggesting that B. altitudinis 19RS3 and B. altitudinis T5S-T4 have the potential to biosynthesize auxin. The gene cluster that encodes to produce bacilibactin, dhb(ACEBF), was also found in both genomes showing the potential for the production of siderophore.
The pst(SCAB) genes, coding for Pi-specific transporter, were found in the genome of B. altitudinis 19RS3 and B. altitudinis T5S-T4 suggesting the capacity of both strains for inorganic phosphate solubilization. Finally, the genes nif(U) and nif(S), were present in both strains which are involved in nitrogenase enzymatic activity responsible for the biological fixation of nitrogen. However, the presence of the complete gene cluster which is essential for the nitrogenase activity was not found.
Volatile compounds as 2,3-butanediol and acetoin might be produced by B. altitudinis 19RS3 and B. altitudinis T5S-T4 given that it has the potential to produce the enzymes α−acetolactate synthetase, α−acetolactate decarboxylase, and acetoin utilization protein. Coding regions for surfactin production were also found and the complete gene cluster srf(ABCD) was annotated in each genome. Genes responsible for flagellar motility, chemotaxis, and biofilm synthesis, which allow B. altitudinis 19RS3 and T5S-T4 to move toward root-exudates facilitating adhesion to plant surfaces, were encountered. Also, some genes related to stress response, such as implicated in osmotic stress, oxidative stress, cold and heat shock, and detoxification, in addition to genes related to sporulation were present in both genomes, indicating a possible protection mechanism to extreme environmental conditions.
The genomes comparison revealed specific gene clusters involved in PGP capabilities ( Table 6). All the genomes presented genes associated with the production of volatile compounds such as 2,3-butanediol and acetoin. Only B. altitudinis 19RS3, B. altitudinis T5S-T4 and B. velezensis FZB42 showed the presence of genes associated with surfactin production. This commercial strain also presented genes for phytase and iturin production. Interestingly, other genes coding for bacilibactin, IAA production, Pi-specific transporter, and PHO regulon were discovered only in our studied strains.

Discussion
In the present study the genome of two PGP strains isolated from I. paraguariensis St. Hil., B. altitudinis 19RS3 and B. altitudinis T5S-T4, were sequenced and assembled. We compared the

PLOS ONE
assembled genome quality statistics generated by different de novo assemblers available for Windows and Linux operating systems. Although no assembler was the best in all the various metrics simultaneously, the Velvet assembler generated the fewest contig number and the higher N50 value. We also annotated both genomes, detected the genes associated with PGP properties, and determinate the presence of these sequences in two B. altitudinis genomes reported in the NCBI. The prokaryotic genomic structure characteristics were considered to select the sequencing platform, as well as the construction of the library. Some authors [62] indicate it may be useful to try different strategies for de novo assembly of a newly sequenced organism. They propose

PLOS ONE
to evaluate the strategies for the construction of contigs and analyze their effect on the assembly when choosing the best parameters. They also emphasize that knowing the characteristics of the genomic structure of an organism, the sequencing platform, and the construction of the library can be especially useful when choosing assembly tools.
The raw reads obtained for both genomes were processed to eliminate adapters and possible contaminants that can affect the quality of the results, creating a problem when comparing the efficiency of the assemblers. Some authors [28] recommend a trimming step to ensure the high quality of the data. We agree and highlight the importance of filtering and trimming to generate better results because in a previous study we evaluated the effect of the use of raw and filtered reads as input files, in the assembly of the genome of B. altitudinis 19RS3 and obtained a better assembly using the filtered reads [63]. When considering the number of contigs, the longest contig length, and the N50 value in the assemblies of both genomes, the software Velvet and ABySS generated the best results. As in our study, other authors [64] evaluated de novo assemblers using reads of prokaryotic genomes. In their work, Velvet showed a greater number of contigs and a lower value of N50, while ABySS generated a lower number of contigs in the paired data sets and showed a higher N50 value. The authors associated these results variation to factors such as the quality of the data and the k-mer size. About this last item, we decided to use different k-mer values considering their effect in genome assemblies. For SPAdes assembler, we used a k-mer value of 63 greater than the average size of the reads and we sought to gradually increase the values, getting to obtain more precise assemblies with k-mer values of 79 and 97. Large k-mers often result in larger contigs, but excessively large k-mers can cause a fragmented graph with a higher number of contigs. SPAdes, ABySS, and SOAPdenovo2 generated their best assemblies with the highest k-mer value, however, they also produced the most fragmented assemblies. Several authors [65,66], showed SPAdes stands out as one of the best assemblers for the assembly of Illumina data, due to its quality and high precision. Although in our study, assemblies with a fewer number of contigs were obtained using other software, SPAdes produced very good results for the assembly of both genomes.
As showed in the assemblies obtained in this work, the value that presented the greatest variation was the number of contigs. We agree with some authors [49] that the wide variety of assemblers' available use different heuristic approaches to meet the challenges of genome assembly and this results in significant differences when comparing the number of contigs they generated. For this reason, we consider necessary a thorough and complete evaluation of the assembled genome quality statistics generated by different assemblers before selecting the best assembly. In the present study, we predicted genes and enzymes associated with PGP mechanisms in the B. altitudinis 19RS3 and B. altitudinis T5S-T4 genomes. We detected genes related to the conversion pathway of tryptophan to indole, which is consistent with the determined indole production observed in the in vitro assays [41]. The presence of the bacillibatin gene cluster showed the potential of siderophore production, while the detection of Pi transporters and the Pho regulon indicated a possibility for inorganic phosphate solubilization. The presence of nif (U) and nif(S) was also determined in both genomes suggesting the possibility of the strain to fix environmental nitrogen. The properties mentioned above are consistent with the in vitro and in vivo PGP activities determined experimentally in previous studies for both strains [41].
The results obtained for the assembly of B. altitudinis 19RS3 and B. altitudinis T5S-T4 genomes are like the reported for other Bacillus PGP strains such as B. flexus KLBMP 4941 , B. pumilus GM3FR [68], B. mycoides GM6LP [69], B. vallismortis NBIF-001 [70], B. velezensis 2A-2B [71], and B. velezensis UCMB5140 [14]. Particularly the genome of B. altitudinis FD48 [72] comprises several genes related to plant growth promotion mechanisms, such as those for the biogenesis of organic acids involved in inorganic phosphorus solubilization, iron, and siderophore uptake systems, and nitrogen metabolism. Perhaps of this, genome annotation isn´t available to realize a deeper genome comparison. The PGP genes reported for B. subtilis EA-CB0575 [28] related to IAA, siderophore production, acetoin, 2,3-butanediol, and LPs production, nitrogen fixation, and phosphate solubilization are like those detected in B. altitudinis 19RS3 and B. altitudinis T5S-T4 genomes. The comparison realized with B. altitudinis W3, B. altitudinis GQYP101, and B. velezensis FZB42 genomes indicated that our strains present some unique genes able to promote I. paraguariensis growth. The five genomes present genes associated with the production of volatile compounds as 2,3-butanediol and acetoin, but the other PGP gene clusters were only detected in our studied strains. Also, we determinate the presence of loci for surfactins codification in the genomes of B. altitudinis 19RS3, B. altitudinis T5S-T4, and B. velezensis FZB42. The commercial strain FZB42 also presents genes to the phytase and iturin production. Each Bacillus PGP strain provides a subtle difference in terms of their plant growth-promoting and biocontrol activities. Future design of an effective bioinoculant should be based on combinations of PGP strains supplementing each other.

Conclusion
The best assembly for B. altitudinis 19RS3 and B. altitudinis T5S-T4 was obtained using the Velvet software. A great number of genes associated with PGP mechanisms were annotated and analyzed. It was found genes involved in auxin biosynthesis, siderophore production, phosphate metabolism, and nitrogen fixation. Also, other genes associated with acetoin and 2,3-butanediol production, motility, chemotaxis, adhesion, sporulation, and defense functions were encountered. The gene detection realized in the present study supports the PGP properties observed in previous assays.
The results obtained offer the basis for a better understanding of B. altitudinis 19RS3 and T5S-T4 biology and make them promissory for the development of novel strategies in the biotechnological application of these bacteria as bioinoculant. The information presented here will allow in-depth functional and comparative genome analyses to provide a better understanding of beneficial plant-bacteria associations.
Supporting information S1