Parallel engineering of environmental bacteria and performance over years under jungle-simulated conditions

Engineered bacteria could perform many functions in the environment, for example, to remediate pollutants, deliver nutrients to crops or act as in-field biosensors. Model organisms can be unreliable in the field, but selecting an isolate from the thousands that naturally live there and genetically manipulating them to carry the desired function is a slow and uninformed process. Here, we demonstrate the parallel engineering of isolates from environmental samples by using the broad-host-range XPORT conjugation system (Bacillus subtilis mini-ICEBs1) to transfer a genetic payload to many isolates in parallel. Bacillus and Lysinibacillus species were obtained from seven soil and water samples from different locations in Israel. XPORT successfully transferred a genetic function (reporter expression) into 25 of these isolates. They were then screened to identify the best-performing chassis based on the expression level, doubling time, functional stability in soil, and environmentally-relevant traits of its closest annotated reference species, such as the ability to sporulate and temperature tolerance. From this library, we selected Bacillus frigoritolerans A3E1, re-introduced it to soil, and measured function and genetic stability in a contained environment that replicates jungle conditions. After 21 months of storage, the engineered bacteria were viable, could perform their function, and did not accumulate disruptive mutations.

Notably, the results were acquired after overnight outgrowth in rich media (LB), and the quantification is not representative of actual abundance in the original soil sample. frigoritolerans A3E1 WT (left) and GFP+ (right) with progenitor assemblies before soil incubation. The MUMmer plots present the novel undomesticated bacteria shotgun genome assembly on the Y-axis compared with their original progenitor novel assembly on the X-axis. Each horizontal and vertical line represents one assembly contig of the novel assemblies-below each plot is the total length of the assembly, the reciprocal percentages of coverage, and identity.

16S rRNA gene sequencing analysis of soil samples
The method for DNA extraction from soil was adapted from previously published studies 6,7 . From all soil samples, 3 g of soil were washed three times with 3 mL TE buffer (50 mM Tris-HCL (Sigma 5941), 50mM EDTA (Affymetrix|USB, Cleveland, OH, #15697), pH 8.0. Samples were frozen using liquid nitrogen and macerated by mortar and pestle to a fine powder. Next, 1 g of each sample was moved to 15 mL sterile tubes (VWR #89039), and 2 mL of TE buffer and 2 mL of phenol-chloroform-isoamyl alcohol (25:24:1; Sigma #P3803) were added. The mixture was gently vortexed for 1 min. Next, the mixture was centrifuged at 2700 g for 10 min at 4°C (Eppendorf 5910R). Isopropanol precipitation was performed, followed by DNA with 220 bp pair-end reads.
The following was performed for data analysis quality control. Removal of lower sequencing quality bases at the 5' end of the sequences was evaluated using the Illumina QC files. This resulted in the removal of the first 15 nucleotides from each sequence using the fastxtoolkit/0.0.13 short sequence trimming package. Read1 (forward reads) and read2 (reverse reads) from the sequencer were merged using pear 8 /0.9.10 pair-end merger package. Non-biological sequences were removed at this step. 16S rDNA analysis was performed using qiime2 (q2cli version 2021.2.0) 9 . The fastq files were imported into Qiime2 with the qiime tools import method. Next, using the dada2 plugin 10 , quality filtering and chimera removal were performed, and all sequences were trimmed to 150bp. Non-redundant sequences and the count artifacts were created and converted to plain text files using the qiime tools export and biom convert methods. GreenGenes 97 reference sequences and taxonomy data were imported using the qiime tools import method. Closed reference clustering was performed with the qiime vsearch cluster-featuresclosed-reference method. We have removed the Enterobacteriaceae family (E. coli, a common lab contaminant) from the dataset as high and consistent levels of contamination was detected across samples. Bar plot visualization was generated in R using the ggplot2/3.3.3 package 11 .

16S rDNA phylogeny of undomesticated soil bacteria
Each conjugated isolate was plated overnight on LB agar plates supplemented with kanamycin at 37°C.
The next day, single colonies were picked and lysed according to manufacturer protocol using 35 μL InstaGene Matrix (Bio-rad), 1 μL of the lysis supernatant was used as the PCR template. The 16S rDNA genes were PCR amplified using ReadyMade Primers (IDT, Coralville, IA catalog# 51-01-19-06, 51-01-19-07). The amplicons were sent for Sanger sequencing. The resulting sequences were trimmed and analyzed using EZBioCloud 12 . All sequences are available in Supplementary Table 3. The phylogenetic tree was generated using Phylogeny.fr 13 with default parameters.

Whole-genome sequencing
All undomesticated soil isolates were streaked from glycerol stocks onto LB agar plates supplemented with kanamycin (except for the WT strains of A1E2 and A3E1) and incubated overnight at 37°C. One colony was picked for each isolate and incubated in 3 mL of LB for overnight incubation at 37°C and 250 rpm in RefSeq genomes accessed through the command "kraken2-build --standard". Draft scaffolds were assembled using the SPAdes de-Novo assembler (v. 3.14.1) for isolates and the MetaSPAdes de-novo assembler (v. 3.14) for metagenomic samples (i.e., for the 20-months old soil samples), using default parameters in both cases 15,16 . Each assembly was polished using the Pilon package with "--fix all" 17 .
Finally, synteny was computed using the MUMer3.0 package 18 . Synteny plots comparing the assembly for each sample to its closest reference in RefSeq were generated using the command "nucmer -p nucmer," followed by "mummerplot." Genomic sequencing of the soil samples after the long-term storage experiment were done by frigoritolerans A3E1 done before long-term storage using the breseq 19 tool. We listed mutations that affected an annotated gene ( Supplementary Fig. 8). To avoid listing sequencing artifacts, we listed only mutations where the raw reads covering the mutations have at least 10X depth and pure coverage (>90%).
In addition, to discount mutations that are a result of sequencing errors in the assembly of the original B.
frigoritolerans A3E1 genome, all mutations were checked against the closest reference, B. frigoritolerans DSM-8801 (GCF_001636405.1), and any mutation that was identical to it was removed from the list.

Fluorescent microscopy of soil samples
Soil samples (1-5 mg) containing bacteria were taken from dry and rehydrated pots. The soil was vortexed in 50 μL PBS for 1 minute. Small soil particles resuspended in 10 µL PBS were immobilized on glass slides for microscopy, using agarose (Lonza, Basel, Switzerland, Seaplaque agarose, 50004) patches prepared as described elsewhere 20 . Fluorescent imaging was performed using an Observer z1 microscope (Zeiss), using x20 and x40 magnifications with the EGFP fluorescence setting (509 nm) and 30 ms exposure time.
Supporting tables S1